RAID-1 Sneakernet Part II
So. The disks got shipped off to AWS after having been formatted and having the SIGNATURE files put on them. Each disk was sent as a separate job (hence the RAID-1)
I received an email from AWS for each job stating “In the processing of XXXXX, we discovered that your device does not contain a valid SIGNATURE file. A valid SIGNATURE file is required to authenticate your device.”
I’m pretty sure I double and triple checked before packing them up, but anyway I responded to each notification with the requested info that was the output of the import export jar. I would really like to see Import/Export brought into the man AWS console.
A few days later AWS told me the jobs were complete and that the log files of the export job were available in S3.
While waiting to the disks to arrive I thought it might be worth looking through these logs. Lo and behold about 600,000 out of 700,000 total files had been renamed….. Yay..
It turns out that AWS will not put more than 100,000 files in a single directory. Any other files above this limit are treated as if they have an invalid filename and are put in the recovery path as something like this: /EXPORT-RECOVERY/NNNN/NNNN/NNNN
Luckily each of these occurrences are logged, and there was enough information in there to be able to reconstruct it all.
This ‘feature’ is not unfortunately not listed in the Import/Export FAQ page at http://aws.amazon.com/importexport/faqs/ - but I did find a reference to this issue in the AWS forums https://forums.aws.amazon.com/message.jspa?messageID=238051
One other odd thing that happened is that the entire S3 bucket was exported, and not just the sub folder I’d specified. I was lucky enough te send big enough disks for it.
This process is OK on the whole, but if you are trying it I would recommend that you allow a lot of margin for error, especially time.