Securing AWS S3 Websites Behind CloudFront

Alan Monnox

Development

28 Jan 2021

If you are hosting a site out of an AWS S3 bucket and serving up the content through Amazon’s CloudFront, you will almost certainly want your visitors to access the site through the CloudFront route rather than going directly to the Web endpoint on the S3 bucket. In this post, we’ll look at how we can easily restrict this backdoor access using an S3 policy.

Keeping it Simple with Static Sites

There’s a lot to like about Web Content Management Systems (CMS), such as WordPress, but these comprehensive and fully featured products can seem over the top when only a collection of static HTML pages is needed.

The advantage of a static site from a security perspective, is that it has fewer moving parts than your typical CMS. There’s no management console for an attacker to try and hack their way onto and there’s no need to be vigilantly upgrading versions of PHP and WordPress. Although granted the latter is handled for you, at a price, if you elect for a WordPress as a service option.

If you don’t have a need for dynamic content then you can easily manage your site using a HTML generator. The popular Jekyll is an open source example of one of these types of products and it will happily convert markdown documents into HTML pages using a theme of your choosing.

Once you’ve generated your site you’ll need a way to host it and this is where CloudFront and S3 enter the picture. Amazon’s S3 buckets combined with CloudFront provide a convenient and cost-effective set of services for hosting static HTML sites. With this approach, our HTML pages are hosted directly by a public S3 bucket while CloudFront handles the TLS end of the connection. We need to have a service like CloudFront in the mix as S3 doesn’t support HTTPS. With this approach we get our TLS support and the advantages of a CDN.

There’s plenty of resources online that cover both Jekyll and S3 but you might want to check out this post from Brendon Matheson, who covers the setup in some detail.

So far so good but to operate as a static site the web endpoint on the S3 bucket must be open to all, not just CloudFront. To get the benefits of the CPN, and enforce HTTPS, we really want our visitors to navigate to the site through CloudFront.

Challenges to Keeping the S3 Bucket Private

The official line from Amazon is to restrict access to the contents of the bucket to CloudFront only by using an origin access identity, or OAI. Unfortunately this technique won’t work in all cases due to how CloudFront serves up the default object for subdirectories within the site. Specifically, the problem is CloudFront doesn’t serve up these objects and may find you have a site full of broken links.

To make this problem clear, if you have the URL https://yoursite/yourarticle, a web server will serve up index.html as the default object for the page if it exists in the ‘yourarticle’ subdirectory. CloudFront is a CDN, not a Web server and so we don’t get the default object returned. Jekyll follows this convention for links, so if you are using Jekyll to generate your site then you’ll likely encounter this very issue.

This problem only occurs when connecting CloudFront to the S3 bucket using the S3 bucket’s REST endpoint. The problem is resolved if we instead connect to S3 using the S3 bucket’s HTTP endpoint for the static website.

By taking this approach you’ll now have a working site, that’s the good news. The bad news is the use of the HTTP endpoint means we can’t use OAI to establish a private connection.

S3 Bucket Policy

Thankfully there is a workaround, that while not completely securing access to the bucket, provides a sufficient level of security for most scenarios. Here, we are going to rely on header information in the request that the S3 bucket receives from CloudFront. For our purposes, we are interested in the user-agent header that comes through with each request.

CloudFront sets this header with a value of ‘Amazon CloudFront’. We can set an S3 policy condition to only allow requests that contain the user-agent header with this value set.

This is the JSON for the condition:

"Condition": {
    "StringEquals": {
        "aws:UserAgent": "Amazon CloudFront"
    }
}

Putting it all together and this is what your full S3 bucket policy may look like:

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Sid": "PublicReadGetObject",
            "Effect": "Allow",
            "Principal": "*",
            "Action": "s3:GetObject",
            "Resource": "arn:aws:s3:::your-site-name.com/*",
            "Condition": {
                "StringEquals": {
                    "aws:UserAgent": "Amazon CloudFront"
                }
            }
        }
    ]
}

Here, we allow GetObject access to all the bucket’s contents where the user-agent header is set to ‘Amazon CloudFront’.

You may also consider using the ‘referer’ header and setting the condition to check for a string that contains your domain name. While this approach may seem practical it will not work behind CloudFront, as the header is only set if you are coming on to the site. If you are navigating an internal link then the ‘referer’ header is not present in any subsequent requests to S3.

Conclusion

The advantage of this approach is we simply restrict most access to the S3 bucket by only changing the S3 policy. You don’t need to make any changes to the CloudFront configuration.

The use of the ‘user-agent’ header is obviously not going to provide a bulletproof security solution for your S3 bucket, as the header is easily spoofed and the value CloudFront assigns the header is in the public domain. In its defense though, the technique offers a practical level of security, as the aim of making the bucket private is to simply steer all users to the CloudFront front door.

If you wanted to add an extra level of rigour you could assign a secret to the ‘user-agent’ header in the CloudFront config and then check for the secret value in your S3 policy’s condition. This requires changes to both S3 and CloudFront but it would also safeguard against Amazon changing the value of the ‘user-agent’ header unexpectedly, as well as making requests much harder to spoof.