s3cmd vs s3fs
s3cmd is one of the most heavily used AWS tools after Amazon’s own command-line toolsets. It is a good utility that allows to to interact with S3 from the command line and synchronise local files to S3.
The biggest drawback for s3cmd is its speed for synchronising many small files. The average speed is about 250kB/s for files averaging 80kB.
We have been experimenting with using s3fs – a FUSE (Filesystem in User Space) module that enables Linux servers to access and S3 bucket as if it were a locally mounted drive, in conjunction with rsync – a well-known versatile folder synchronisation tool.
So far the results have been promising – 2MB/s for the same dataset.
One minor gotcha is that AWS change $0.01 per 1,000 PUT requests and $0.10 per 10,000 requests for all other types (GET, COPY etc), so it’s could cost quite a bit for a high usage disk. It all comes down to your use-case, requirements and business case.
For example, the most common situation we are looking at using this for is database backups where the s3 bucket is only mounted for a short period while files are copied.
The s3cmd-based process goes something like this:
- Execute database dumps from the database replica server
- Compress database dumps to a local folder
- Synchronise local folder to s3 with s3cmd
- Using s3fs it becomes something like this:
Execute database dumps from the database replica server * Compress database dumps to a local folder * Mount s3 bucket using s3fs * Use rsync to mirror the files across * Unmount the s3 bucket.