Uploading directly to Amazon S3 from a Rails application

The use cases for Superhug are heavy on uploading and downloading large(ish) files. Rails itself isn’t so well suited to this sort of task, and it’s best to keep state away from application servers wherever possible. We chose to use Amazon S3 and CloudFront to bypass Rails for all of the uploading, downloading and image processing grunt work. This is a rundown of the approach we took.

The use cases

Here’s a summary of the use cases involving uploads and downloads.

  • Designers need to upload images of their themes to sell through Superhug
  • Designers need to upload portfolio images of their previous work
  • Designers need to upload zip files containing the actual source for each design
  • All users need to upload avatars (this could have been solved with Gravatar, but we felt that our customers wouldn’t be bothered, and it seemed like added complexity)
  • Images need to be resized and cropped in various ways
  • All of the above types of file need to be served to customers from a CDN

After reading the (somewhat outdated) O’Reilly book on AWS, it was clear that uploads and downloads didn’t need to involve an application too much. One can upload directly to S3 from the browser using signed forms, and download private content directly from S3 using signed links. This means that our Rails app can render a form that can’t be tampered with by the user, but that submits directly to Amazon, without troubling our application servers until the upload is complete. The same goes for download links: don’t trouble the application server with actually serving the download, but provide a time-limited link to a file instead.

So: signed forms for uploads, signed links for downloads, and every publicly available download (i.e. those not requiring access control) could be cached with CloudFront.

Prior art

So I set about looking for what was already out there in terms of Ruby libraries for uploading and downloading. I found d2s3 on Github. It was okay — it gave me a start in signing a form for upload direct to S3 — but the HTML it produced was invalid (hence our initial fork). It also had no tests, and included too much logic in the view helper for my liking. At this point I knew we’d have a side project to work on, and this became Ungulate.

Downloading with signed links was handled pretty well by the AWS:S3 gem. But AWS:S3 didn’t support EU buckets, and all of our AWS stuff is in Ireland. Aha! But newbamboo’s fork did. Awesome. It’s a shame that the right_aws gem didn’t do signing of S3 links. Oh well, I thought, we’re using two AWS gems, no biggie.

Extracting application logic to the gem

Our first development version of Superhug that handled uploads used d2s3, and our cucumber tests were written against this. We started work on Ungulate as soon as we needed something to process images. I believe we were still using d2s3 while our first Ungulate server was running.

After a while it became necessary to rework the code that generated forms. So, we rewrote the signature code for signing forms (see FileUpload and ViewHelpers). Instead of implementing the encryption ourselves (unsure why Matthew Williams et al did this?), we used libraries, and based the algorithm on the description at Amazon.

Using the gem

Here’s an example usage, with controller, view and model as separate bits. Note that we generate a key in advance of uploading. It’s a restriction of S3 that you must specify the URL that you want when uploading. This simplifies Amazon’s architecture, but complicates ours a little. We use a timestamp to (almost) guarantee unique keys per upload (see the ‘key’ variable assignment, below):

The view helper

What you don’t see above is what the view helper is doing. It will include the required fields for the parameters set in the controller, and all you need do in your view is create a file field (this must be called ‘file’), and a submit button.

The server

It’s all very well uploading stuff to Amazon with a given key, but what happens next? Amazon redirects the user that uploaded the file to the redirection URL specified in the signed form. You may notice above that this is set to create_user_avatar_url(current_user). We expose a creation method that responds to a GET request for this purpose. Not normally what you’d do in a RESTful system, but in this case it’s all we can do (unless we relied completely on JavaScript to handle uploads).

Our controller’s create action has the effect of enqueuing a job for an Ungulate server to process. It does this just by changing the avatar_key. Our model (User) has a callback that calls its ungulate_enqueue method when the avatar_key is changed. In this case we want two versions of the avatar: one at 64×64 and one at 24×24.

FileUpload.enqueue sends the hash provided to SQS as a YAML object. When the Ungulate server pops the queue, it processes the images according to the job description and PUTs new, publicly-readable images to S3. These can then be served to visitors of the site, while the original file hangs around in case we ever want different image sizes.

How we could do it differently, and future improvements.

This section will no doubt grow. We’d appreciate your feedback on our approach to this task, and forks to Ungulate are more than welcome. There’s plenty more refactoring that could be done, and new features would be nice. I’d like to see this competing in the Paperclip and Carrierwave space.

On-the-fly thumbnailing

A common approach to thumbnailing is using an on-the-fly thumbnailing service. I see the benefits of such a service, but (as explained on the Ungulate wiki), being inspired by the Amazon philosophy of expecting failure in components, I felt at the time that such a service would be a single point of failure that would be difficult to scale. In retrospect, such a service that performed 301 redirects to CloudFront URLs might be feasible, and even preferable to the approach described here, especially in solving the problem of showing a new thumbnail for the first time. Incidentally, this is done on Superhug by polling CloudFront URLs for non-404s, then setting a flag indicating that the thumbnail is complete. This isn’t pretty. A separate service to call when requesting images might be on the cards.

This entry was posted in AWS, Ruby and tagged , , . Bookmark the permalink.

16 Responses to Uploading directly to Amazon S3 from a Rails application

  1. robodo says:

    Thanks for sharing your solution!

  2. prashanth says:

    In the example the create method seems to be exposed i.e anyone can update a particular user with some random key.

    • Andrew says:

      yes, you’d want before_filters that checked the user’s authenticity and authorisation to perform the action. For simplicity, the example doesn’t include these steps.

  3. awesome, though, why do you want to force the utilization of SQS? Are there any intentions to make the queueing system agnostic?

  4. Raphael Caldas says:

    This gem looks great! Just what I needed…

    One question: how would I integrate the ungulate_upload_form_for inside another form? I wanted to use a single form to upload a single file and collect more data associated to the user.

    Again, thanks a lot!

    • Andrew says:

      Hi Raphael, glad you like it. The short answer is, you don’t: it’s easier to keep direct-to-s3 uploads on their own form and use an iframe to send it if you want it to ‘look’ like it’s part of another form (this is what we do on Superhug).

      The long answer is, I think amazon lets you include extra data that will be passed on to the success redirect. You’d have to check the S3 API for that, and potentially make a change to ungulate to support it. If you do this, please send a pull request!

      • Raphael Caldas says:

        Thanks, Andrew!

        I’ll try the iframe method for some immediate relief and then dig a little deeper into your other proposed solution.

        Would be really exciting to contribute to Ungulate! I must warn you though that I’m an inexperienced programmer and you may come in contact with some ugly code ;)

        Cheers!

        • Raphael Caldas says:

          Andrew,

          I’m sorry to bother you again, but could you point me out to a snippet showing how to implement the iframe technique you mentioned?

          Given my inexperience I really couldn’t figure out what I need to do.

          I’ve postedthis gist that shows how my form would ideally look like, in case it helps.

          Thanks and, once again, sorry to bother!

          • Andrew says:

            A snippet isn’t going to help too much. I assume you’ll be using a specific JavaScript library such as JQuery. I use YUI3 for Superhug, which is the site this gem was created for, and the JavaScript is quite specific to Superhug’s needs.

            As a step in the right direction though, you can’t nest forms within forms. You’ll need to have the ungulate form separate to your data form, then use a library such as JQuery or YUI3 to upload the file using an iframe. In YUI3 this is a case of choosing the iframe transport with the io module, in JQuery you may need to use a plugin such as http://plugins.jquery.com/project/iframe-post-form, but I can’t vouch for its quality.

            If you’re not up to using an iframe, you could perhaps redesign your workflow to redirect the user somewhere once the upload has succeeded (using success_action_redirect).

  5. Raphael Caldas says:

    Thanks, Andrew!

  6. Marco Antonio says:

    in the controller example, this line have a security issue:
    {‘success_action_redirect’ => create_user_avatar_url(current_user)}
    One can change the user id in the HTML, and perform operations as other user.

    I still am trying to figure out how to track the uploaded file from the S3 callback, without exposing sensitive data. Maybe use the encoded policy?

    • Andrew says:

      Yes, it does. But, I’ll re-iterate: the examples aren’t intended to show how to write secure code, but are simplified to show how to use the Ungulate gem.

    • Andrew says:

      When Amazon redirects the user to your success_action_redirect URL, it includes a few parameters: bucket, key and etag. You can store the key in your database for future retrieval from S3.

  7. Marco Antonio says:

    Yeah, you’r right, I haven’t realized those parameters by the time I asked. Thanks.

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>