How to use Ansible for automated AWS provisioning

I’ve recently produced a series of articles aimed at startups, entrepreneurial solo developers, etc. wanting to take their first steps into Amazon Web Services (AWS) setups for app deployment:

I then wanted to move on from discussing manual setup via the GUI interface of the AWS web console, to DevOps-style command-line programmatic setup for automated provisioning of an AWS infrastructure for app deployment, i.e. infrastructure as code (IaC). I have therefore created a suite of Ansible playbooks to provision an entire AWS infrastructure with a Staging instance and an auto-scaled load-balanced Production environment, and to deploy a webapp thereon. The resulting set of Ansible AWS provisioning playbooks and associated files can be found in a repository on my GitHub, so go ahead and grab it from there if you want to try them out. Keep reading for information on how to set up and use the playbooks (and you can also refer to the README in the repo folder, which contains much of the same information).

With these playbooks, firstly the EC2 SSH key and Security Groups are created, then a Staging instance is provisioned, then the webapp is deployed on Staging from GitHub, then an image is taken from which to provision the Production environment. The Production environment is set up with auto-scaled EC2 instances running behind a load balancer. Finally, DNS entries are added for the Production and Staging environments.

Integration of a database has not been included in these playbooks, but that’s something that could easily be added. I also haven’t included SSL/TLS/HTTPS, but again that can be added without too much trouble. The playbooks are configured for very modest requirements, with a maximum of three t2.micro instances for auto-scaling, but it’s trivial to change these settings by editing the playbook files. There’s no reason why this set of playbooks shouldn’t handle scaling out to a much larger infrastructure, with more powerful and specialised instance types as needed.

I created a very basic Python webapp to use as an example for the deployment here, but anyone using these playbooks can replace that with their own webapp should they so wish.

Installation/setup

You’ll need an AWS account with a VPC set up, and with a DNS domain set up in Route 53.
Install and configure the latest version of the AWS CLI. The settings in the AWS CLI configuration files are needed by the Ansible modules in these playbooks. Also, the Ansible modules don’t yet support target tracking auto-scaling policies, so there is one task which needs to run the AWS CLI as a local external command for that purpose. If you’re using a Mac, I’d recommend using Homebrew as the simplest way of installing and managing the AWS CLI.
If you don’t already have it, you’ll need Python 3. You’ll also need the boto and boto3 Python modules (for Ansible modules and dynamic inventory) which can be installed via pip.
Ansible needs to be installed and configured. Again, if you’re on a Mac, using Homebrew for this is probably best.
Copy etc/variables_template.yml to etc/variables.yml and update the static variables at the top for your own environment setup.

Usage

These playbooks are run in the standard way, i.e:

ansible-playbook PLAYBOOK_NAME.yml

Note that Step 3 also requires the addition of -i etc/inventory.aws_ec2.yml to use the dynamic inventory.

To deploy your own webapp instead of my basic Python app, you’ll need to edit deploy_staging.yml so that Step 3 deploys your app with your own specific requirements, files, configuration, etc.

Playbooks for provisioning/deployment

provision_key_sg.yml – provisions an EC2 SSH key and Security Groups.
provision_staging.yml – provisions a Staging instance based on the official Amazon Linux 2 AMI.
deploy_staging.yml – sets up a Staging instance and deploys the app on it.
- Requires dynamic inventory specification, so run as follows:
- ansible-playbook -i etc/inventory.aws_ec2.yml deploy_staging.yml
image_staging.yml – builds an AMI image from the Staging instance.
provision_tg_elb.yml – provisions a Target Group and Elastic Load Balancer ready for the Production environment.
provision_production.yml – provisions the auto-scaled Production environment from the Staging AMI, and attaches the Auto Scaling group to ELB Target Group.
- This playbook does not wait for instances to be deployed as specified, so it will take some time after the playbook runs before the additions/changes become apparent.
provision_dns.yml – provisions the DNS in Route 53 for the Production environment and the Staging instance.
- Note that it may take a few minutes for the DNS to propagate before it becomes usable.

Running later playbooks without having run the earlier ones will fail due to missing components and variables etc.

Running all seven playbooks in succession will set up the entire infrastructure from start to finish.

Once the infrastructure is up and running, any changes to the app can be redeployed to Staging by running Step 3 again. You would then run Step 4 and Step 6 to rebuild the Production environment from the updated Staging environment. Note that in this situation, the old instances in Production are replaced with new ones in a rolling fashion, so it will take a while before the old instances are terminated and the new ones are in place.

Playbooks for deprovisioning

destroy_all.yml – destroys the entire infrastructure.
delete_all.yml – clears all dynamic variables in the etc/variables.yml file.

USE destroy_all.yml WITH EXTREME CAUTION! If your shell is configured for the wrong AWS account, you could potentially cause serious damage with this. Always check before running that your shell is configured for the correct environment and that you are absolutely 100 percent sure you want to do this. Don’t say I didn’t warn you!

Due to the fact that it might take some time to deprovision certain elements, some tasks in destroy_all.yml may initially fail. This should be nothing to worry about. If it happens, wait for a little while then run the playbook again until all tasks have succeeded.

Once everything has been fully destroyed, it’s safe to run the delete_all.yml playbook to clear out the variables file. Do not run this until you are sure everything has been fully destroyed, because the SSH key file can never be recovered again after it has been deleted.

Checking the Staging and Production sites

To check the app on Staging once deployed in Step 3, you can get the Staging instance’s public DNS via the AWS CLI with this command:

aws ec2 describe-instances --filters "Name=tag:Environment,Values=Staging" --query "Reservations[*].Instances[*].PublicDnsName"

Then check it in your browser on port 8080 at:

http://ec2-xxx-xxx-xxx-xxx.xx-xxxx-x.compute.amazonaws.com:8080/

(replacing “ec2-xxx-xxx-xxx-xxx.xx-xxxx-x.compute.amazonaws.com” with the actual public address of the instance).

To check the app on Production once deployed in Step 6, you can get the ELB’s DNS name by grepping the variables.yml file:

grep elb_dns etc/variables.yml | cut -d " " -f 2

Then just check that in your web browser.

Once Step 7 has been run to create the DNS entries (and you’ve waited a little while for the DNS to propagate) you can visit your Production site at http://www.yourdomain.com/ and your Staging site at http://staging.yourdomain.com:8080/ (noting the use of port 8080 for Staging, and obviously replacing “yourdomain.com” with your actual domain as specified in the /etc/variables.yml file).

Load testing to check auto-scaling response

If you don’t have enough traffic coming in to trigger an Auto Scaling event and you’re wondering if the scaling is working as intended, you can use a benchmarking tool such as Apache’s ab to artifically create large amounts of incoming traffic. This should raise the load on the Production instance enough to trigger the automatic launch of an additional Production instance. I’ve found running this command simultaneously from two separate servers is usually sufficient (if you don’t have any suitable servers, you can temporarily fire up a couple of EC2 instances for the purpose):

ab -c 250 -n 1000000 http://www.yourdomain.com/

This will simulate 250 simultaneous requests from each server, and will keep going until you cancel it (or until it hits a million requests, but an auto-scaling event should occur well before that number gets reached).

Connecting to instances via SSH

If you need to SSH into the Staging instance once it’s running after Step 2, get the public DNS name using the command above, then SSH in with:

ssh -i etc/ec2_key.pem ec2-user@ec2-xxx-xxx-xxx-xxx.xx-xxxx-x.compute.amazonaws.com

If you need to SSH into the Production instances once they’re running after Step 6, get the list of public DNS names for the Production instances with this command (there may only be one instance):

aws ec2 describe-instances --filters "Name=tag:Environment,Values=Production" --query "Reservations[*].Instances[*].PublicDnsName"

Then connect via SSH in the same way as with the Staging instance.

Running ad hoc Ansible commands

To run ad hoc commands (e.g. uptime in this example) remotely with Ansible (without playbooks) you can use the ansible command as follows:

ansible -i etc/inventory.aws_ec2.yml -u ec2-user --private-key etc/ec2_key.pem tag_Environment_Staging -m shell -a uptime

That can be used for the Staging instance. To run the command on all the Production instances at once, replace “tag_Environment_Staging” with “tag_Environment_Production”.

Final thoughts

I hope this suite of Ansible playbooks proves useful, and that people will find it beneficial to use, share and improve them. Any feedback would be very welcome.

If this is the kind of thing you need but you’re struggling to implement this on your own, or if you’re getting on OK with this level of infrastructure automation but need to improve and expand, do get in touch with me to discuss the SysAdmin and DevOps services I provide. I would be very happy to talk about your issues to potentially solve problems and help develop the infrastructure side of your business more effectively.