The use cases for Superhug are heavy on uploading and downloading large(ish) files. Rails itself isn’t so well suited to this sort of task, and it’s best to keep state away from application servers wherever possible. We chose to use Amazon S3 and CloudFront to bypass Rails for all of the uploading, downloading and image processing grunt work. This is a rundown of the approach we took.
The use cases
Here’s a summary of the use cases involving uploads and downloads.
- Designers need to upload images of their themes to sell through Superhug
- Designers need to upload portfolio images of their previous work
- Designers need to upload zip files containing the actual source for each design
- All users need to upload avatars (this could have been solved with Gravatar, but we felt that our customers wouldn’t be bothered, and it seemed like added complexity)
- Images need to be resized and cropped in various ways
- All of the above types of file need to be served to customers from a CDN
After reading the (somewhat outdated) O’Reilly book on AWS, it was clear that uploads and downloads didn’t need to involve an application too much. One can upload directly to S3 from the browser using signed forms, and download private content directly from S3 using signed links. This means that our Rails app can render a form that can’t be tampered with by the user, but that submits directly to Amazon, without troubling our application servers until the upload is complete. The same goes for download links: don’t trouble the application server with actually serving the download, but provide a time-limited link to a file instead.
So: signed forms for uploads, signed links for downloads, and every publicly available download (i.e. those not requiring access control) could be cached with CloudFront.
So I set about looking for what was already out there in terms of Ruby libraries for uploading and downloading. I found d2s3 on Github. It was okay — it gave me a start in signing a form for upload direct to S3 — but the HTML it produced was invalid (hence our initial fork). It also had no tests, and included too much logic in the view helper for my liking. At this point I knew we’d have a side project to work on, and this became Ungulate.
Downloading with signed links was handled pretty well by the AWS:S3 gem. But AWS:S3 didn’t support EU buckets, and all of our AWS stuff is in Ireland. Aha! But newbamboo’s fork did. Awesome. It’s a shame that the right_aws gem didn’t do signing of S3 links. Oh well, I thought, we’re using two AWS gems, no biggie.
Extracting application logic to the gem
Our first development version of Superhug that handled uploads used d2s3, and our cucumber tests were written against this. We started work on Ungulate as soon as we needed something to process images. I believe we were still using d2s3 while our first Ungulate server was running.
After a while it became necessary to rework the code that generated forms. So, we rewrote the signature code for signing forms (see FileUpload and ViewHelpers). Instead of implementing the encryption ourselves (unsure why Matthew Williams et al did this?), we used libraries, and based the algorithm on the description at Amazon.
Using the gem
Here’s an example usage, with controller, view and model as separate bits. Note that we generate a key in advance of uploading. It’s a restriction of S3 that you must specify the URL that you want when uploading. This simplifies Amazon’s architecture, but complicates ours a little. We use a timestamp to (almost) guarantee unique keys per upload (see the ‘key’ variable assignment, below):
The view helper
What you don’t see above is what the view helper is doing. It will include the required fields for the parameters set in the controller, and all you need do in your view is create a file field (this must be called ‘file’), and a submit button.
Our controller’s create action has the effect of enqueuing a job for an Ungulate server to process. It does this just by changing the avatar_key. Our model (User) has a callback that calls its ungulate_enqueue method when the avatar_key is changed. In this case we want two versions of the avatar: one at 64×64 and one at 24×24.
FileUpload.enqueue sends the hash provided to SQS as a YAML object. When the Ungulate server pops the queue, it processes the images according to the job description and PUTs new, publicly-readable images to S3. These can then be served to visitors of the site, while the original file hangs around in case we ever want different image sizes.
How we could do it differently, and future improvements.
This section will no doubt grow. We’d appreciate your feedback on our approach to this task, and forks to Ungulate are more than welcome. There’s plenty more refactoring that could be done, and new features would be nice. I’d like to see this competing in the Paperclip and Carrierwave space.
A common approach to thumbnailing is using an on-the-fly thumbnailing service. I see the benefits of such a service, but (as explained on the Ungulate wiki), being inspired by the Amazon philosophy of expecting failure in components, I felt at the time that such a service would be a single point of failure that would be difficult to scale. In retrospect, such a service that performed 301 redirects to CloudFront URLs might be feasible, and even preferable to the approach described here, especially in solving the problem of showing a new thumbnail for the first time. Incidentally, this is done on Superhug by polling CloudFront URLs for non-404s, then setting a flag indicating that the thumbnail is complete. This isn’t pretty. A separate service to call when requesting images might be on the cards.