1 of 2

Terraform

The Digital Team use terraform to manage the AWS configuration.

Terraform is a CLI utility synchronizes AWS with scripts. In essence, it uses a series of scripts to detect and make changes to AWS. Terraform commands are run from a terminal session on a machine with Terraform libraries installed. See website: See documentation

Installing Terraform

Terraform installation instructions are here, and tutorials are here.

Terraform AWS provider info can be found here.

Once Terraform is installed, clone a local copy of the digital-terraform repository and switch to the production branch.

Terraform commands (typically plan, apply and import) can be run from a shell from the cloned repo/apps folder .

Updating the ECS cluster AMI

How to update the AMI on our ECS cluster instances

The Digital webapps cluster uses the Elastic Container Service on AWS. We have a handful of EC2 instances that actually host the containers.

These instances use a stock Amazon Machine Image (AMI) from Amazon designed for Docker that comes with the ECS agent pre-installed. From time to time, Amazon releases a new version of this “ECS-optimized” image, either to upgrade the ECS agent or the underlying OS.

This process is sometimes referred to as “rolling” the cluster though it’s more accurate that we set up a second cluster of machines and migrate to it.

Performing the AMI update from GitHub

Update the instance_image_id value for the staging_cluster module to the new AMI ID from step 1 above. Save/commit the file as a new branch, not directly to the production branch.
Make a PR which merges the new branch into the production branch, and assign a person to review the changes.
After viewing the plan, if you need to update the terraform scripts, be sure to save the changes to the new branch. If comitting your changes does not trigger the atlantis plan automatically, you can run it manually by creating a new comment with atlantis plan.
Keep an eye on the “ECS Instances” tab in the cluster’s UI. You should see the “Running tasks” on the draining instance(s) go down, and go up on the new instances.
Once all the tasks have moved, the old instance(s) will terminate and Terraform will complete. Check a few URLs on staging to make sure that everything’s up-and-running.
Now that Atlantis’s apply finished, you can merge the staging PR and repeat the process (steps 2-6) for the production cluster.

Performing the AMI update using Terraform CLI

Ensure your cloned copy of the digital-terraform repository is on the production branch, and that the branch it up to date with the origin on GitHub.
Create a new branch from the production branch.
In your preferred IDE open the /apps/clusters.tf file and update the instance_image_id value for the staging_cluster module to the new AMI ID from step 1 above. Save/commit the file to the new new branch (not directly to the production branch).
in a terminal/shell from the repo/apps/folder, run the command: terraform plan
When the plan is done, inspect the output and expect to see changes to: - resource "aws_autoscaling_group" "instances" - resource "aws_cloudwatch_metric_alarm" "low_instance_alarm" - resource "aws_launch_configuration" "instances" (This last one will have the new AMI guid) Any other changes the plan identifies should be carefully investigated. Terraform may be proposing to make changes to the AWS environment you don't want, or at least are not expecting.
Keep an eye on the “ECS Instances” tab in the cluster’s UI. You should see the “Running tasks” on the draining instance(s) go down, and go up on the new instances.
Once all the tasks have moved, the old instance(s) will terminate and Terraform will complete. Check a few URLs on staging to make sure that everything’s up-and-running.
Now that terraform's apply is finished, you can repeat the process (steps 2-9) for the production cluster.
Finally you should merge the changes in your new (local) branch into the local production branch, and then push the your local production branch to the origin in Github.

What are Atlantis and Terraform ?

What happens during a terraform/atlantis apply with an updated AMI?

When the AMI is updated the terraform plan command will:

create a new Launch Configuration for cluster instances (i.e. EC2 instances) that uses the new AMI,
create a new Autoscaling Group that uses the Launch Configuration,
trigger deletion of the old Autoscaling Group.

The instance-drain Lambda function will tell ECS to drain the tasks from the instances that are being shut down (terraform won’t delete the Autoscaling Group until its instances are fully terminated). ECS will automatically start those tasks up on the new instances that got created by the new Autoscaling Group.