Using Amazon's EC2 cloud service to host our entire web infrastructure

Following our successful migration to Amazon’s S3 service for media storage and delivery, we decided to move our entire server infrastructure from its traditional data centre colocation to Amazon’s Elastic Compute Cloud (or ‘EC2’). Using this cloud-based infrastructure instead of data centre colocation provides two main benefits for us.

Firstly, EC2 is much more flexible than traditional colocation. Our infrastructure can now be expanded or altered, quite radically if necessary, in hours or even minutes, just by typing a few commands or clicking a few buttons. Previously, making similar changes would have taken weeks of planning, ordering, delivery, physical installation, software installation, etc.

Secondly, EC2 works out cheaper. EC2 server instances tend to work out cheaper than their physical colocation equivalents, especially when you’re able to plan ahead with pricing by using Amazon’s ‘Reserved’ pricing instead of their standard ‘On-Demand’ pricing. Also, because of EC2’s flexibility, it’s only ever necessary to run precisely the infrastructure required at the present moment, so you’re never paying for more than you need. With a traditional colocation setup, the lack of flexibility means that you will always have more hardware than you need at the present moment in order to cope with possible traffic spikes, hardware failures, etc., and that extra hardware increases the cost even further.

Getting up and running with EC2 is pretty straightforward. You just need to provide payment details, set up security details, and then you’re free to fire up server instances and try it out. There are various instance types to choose from depending on the hardware spec you need. Much of this can now be done via the Amazon Web Services (‘AWS’) Management Console, although you can also install the EC2 command line tools in order to operate EC2 via the CLI. I prefer using the CLI, probably because I’ve been doing it that way for longer, but also I guess there are still some things you can do via the CLI which you can’t do via the AWS Management Console.

Any decent system administrator will want to set up and configure instances according to his own tastes and requirements, and the best way to do this is to start an instance from an image that’s closest to what you want, then modify it accordingly, then create your own image from that running instance. I originally started up an instance from a CentOS image provided by RightScale, then modified it heavily according to my own preferences, then created an image of my own from it. This image is what I called the ‘base’ image.

I then created several different instance types for each of the different types of servers we have in our infrastructure. For each of these I started from the ‘base’ image, installed the applications required for that instance type, then created a new image from it, so that I ended up with several images: one for web server instances, another for mail server instances, and so on. Each instance type has a certain set of events which need to occur when that instance is first run, e.g. working out its own external hostname, setting up certain directories that are needed, etc., and for this I have a startup script in /etc/rc.d/init.d on each instance which works out whether it’s a brand new instance and, if so, performs all these tasks.

Once all the required instance types have been created and made available as images which can be started as required, it’s almost possible to start up a full web application infrastructure with all the components required for it. However, there are a few extra things to think about first.

Firstly, it’s necessary to consider disaster recovery. Conveniently, most of our instances do not contain any unique data which would be missed if the instance died, and if an instance dies then it’s normally just a case of firing up a new one to take its place. Obviously requirements will vary from one company to another, but in our case only the database instance contains unique data, i.e. the live database. I solved the disaster recovery requirement for the database instance by utilising an AWS storage solution called Elastic Block Store (or ‘EBS’). An EBS volume is a storage volume which can be connected to a running instance and which can also be snapshotted. So, all the data for our live database is on an EBS volume connected to the database instance, and this gets backed up and snapshotted every night. I have a Bash script which does this automatically from cron, and which also removes any shapshots older than 28 days.

Therefore, if the database instance dies, I can just fire up a new instance and connect the EBS volume to the new instance. In the unlikely event that the EBS volume dies, I can just create a new one from the latest snapshot and connect it to the database instance. This is therefore a very handy DR solution for the live database, and EBS volumes have the added bonus of being faster than the standard instance-based storage, which is obviously ideal for a database.

All instance images and EBS snapshots are stored on S3. S3 is an exceptionally safe form of storage, so the chances of anything getting lost on it are incredibly low, and this very low level of risk might be reassurance enough for many companies. In our case, however, I’m exceptionally paranoid so I also take backups of everything we have on S3 (including all our media which is also stored there). Every month I use the rsync-style functionality within the very handy s3cmd tool to incrementally backup all our S3 buckets to a disk on a server in our office.

Another consideration is how to load balance the front-end servers. Amazon provide a great solution for this called Elastic Load Balancing. This can be used to quickly and easily create a load balancer which can be configured to distribute whatever traffic you like across whichever instances you need. It even has support for sticky sessions. The only problem I’ve found with ELB is that it requires you to point a CNAME at the load balancer rather than an A record, which means that you can’t use root domains (i.e. apple.com instead of www.apple.com) because the DNS spec does not allow a root domain to be a CNAME. There are various hacks to get around this, but none of them are perfect, and it’s a problem I hope Amazon will solve soon. In the meantime, this thread in the AWS developer forums gives more information.

It’s also necessary to consider the delivery of email. Unfortunately, due to the ease with which server infrastructures can be created on EC2, a lot of spammers use it to send email, which means that EC2 IP addresses are heavily blacklisted. For each EC2 instance you wish to send email from, therefore, you need to follow a process which helps Amazon to ensure that you’re not sending spam. Firstly, you need to assign an Elastic IP to the instance. An Elastic IP is an IP address you can request and then keep for your own use, assigning it to whichever instance you like at any given time instead of using whichever IP address that instance happened to have by default. Once the Elastic IP is assigned to the mail server instance, you just need to tell Amazon to set up reverse DNS for that Elastic IP and remove its email sending limitations. This process works well, and we haven’t had any problems sending mail from our mail server instance since completing it.

As with the general EC2 functions, EBS, ELB and Elastic IPs can be controlled via the AWS Management Console, but the CLI tools tend to be more powerful than the console. CLI commands for EBS and Elastic IPs are included within the EC2 command line tools, but ELB has its own set of CLI tools.

The actual migration of our infrastructure to EC2 was similar to other infrastructure migrations I’ve done in the past, but with the added bonus of EC2’s flexibility. This means that you can set up the infrastructure roughly how you think it will need to be, and then tweak it as you go along by starting and stopping instances, changing instance types, etc., instead of having to predict things as best you can then hope for the best. Since the migration, our entire infrastructure has been successfully running in the cloud for several months, and we continue to be very happy with Amazon Web Services. We closed our account with our colo provider some time ago, then sold all our physical servers.

For those of you who haven’t yet started using EC2, or who haven’t got very far with it yet, lots of useful details on the practicalities of hosting on EC2 can be found in this excellent series of blog posts written by my friend Paul Norman.

There’s a lot more you can do with EC2 which I haven’t covered, but hopefully this post gives an idea of how easy it is to set up a web infrastructure on there, and by reading the documentation and following the updates from Amazon, you can easily work out which additional details you need.