Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
Loading...
How to encrypt and use a .env file for a service hosted on S3.
Apps are configured by putting files in a secure configuration bucket on S3. The ENTRYPOINT
script for our apps pulls all files in from the app’s path in the bucket before starting up. This allows an app to be securely configured with a .env
file and, for example, server.crt
and server.key
files for TLS connections.
An AWS CLI login (this is different from the AWS account you can log into via the web)
kms:Encrypt
permissions
The AWS Command Line Interface package installed and configured
Though .env
files are stored encrypted in S3 and only transferred securely, we still encrypt environment variables like passwords so that they are not seen in plain when editing the .env
files.
Each service is configured to have its own private key in the Amazon KMS keystore. Only the task role may decrypt with that key.
Adding the _KMS_ENCRYPTED
suffix to an environment variable’s name in the .env
file will cause the task to decrypt the variable at runtime, storing it in process.env
after stripping the suffix.
To create an encrypted environment variable value:
Visit the “Customer managed keys” section of the KMS part of the AWS web console.
Look for the Key ID for your service. Save it in the $CONFIG_KEY_ID
environment variable.
Log in to AWS CLI with an account that has kms:Encrypt
permissions for the key.
Run the following command with a leading space so that it doesn’t appear in your command history: aws kms encrypt --output text --key-id $CONFIG_KEY_ID --plaintext STR-TO-BE-ENCRYPTED
Copy the encrypted value (the output up to the whitespace) as the value in .env
: PASSWORD_KMS_ENCRYPTED=…
Note that using --plaintext
on the command line will cause aws kms
to encrypt the ASCII as-is. When using the fileb://
form to reference a file on disk, aws kms
will first Base64 encode the value, which will cause a failure on the app side, which does not expect Base64-encoded values.
Decrypting a variable which was encrypted using the method above is possible using the following commands in a terminal session:
If you have multiple profiles on your computer, you may use this option in the aws kms decrypt command:
--profile=myprofile
See the README.md in CityOfBoston/digital-terraform.
Introduction to how the Digital webapps infrastructure is set up in AWS
The Digital team uses Docker via Amazon’s Elastic Container Service to deploy its webapps. We migrated to AWS from Heroku primarily so we could establish a VPN connection to internal City databases (such as those used for boards and commissions applications and the Registry certificate ordering).
Our production cluster runs two copies of each app, one in each of two AZs. This is more for resilience against AZ-specific failures than for sharing load.
Almost all of our AWS infrastructure is described by and modified using our Terraform configuration.
The webapps that the City has developed so far are extremely small and low-traffic. Docker containers let us pack a few machines with as many webapps as we can; right now we’re limited only by memory. Docker keeps these apps isolated from each other. It also makes it easy to do rolling, zero-downtime deployments of new versions.
The typical limitations of Docker (stable storage is a pain, as is running related processes together, loss of efficiency for high loads) are not concerns for the types of apps we’re building.
Amazon’s ECS, along with its Application Load Balancers, handle restarting crashed jobs and routing traffic to the containers.
Our app containers are run on EC2 instances that live in four private subnets (2 AZs × 2 environments). These instances do not have public IPs and therefore cannot communicate directly with the public internet, which gives us some level of safety through isolation.
These ECS cluster instances receive traffic from Amazon’s ALB load balancers, which live in corresponding public subnets. They can contact public web services through NAT gateways, which also live in the public subnets. The ECS cluster instances also have access to internal City datacenters through the VPN gateway.
The instances are further isolated by having security groups that only allow traffic from the security groups of their corresponding ALBs (and SSH traffic from the bastion instance).
The VPN gateway connects from our VPC to the City datacenter. It has two connections running simultaneously for redundancy. AWS VPNs need to have regular traffic to keep them active, and if they do disconnect they need traffic from outside AWS to cause them to come back online.
We have a SiteScope rule set up with the CoB network team that pings an EC2 instance inside of our VPC. (Currently this EC2 instance does not seem to be created via Terraform.) This rule does a ping every few minutes, which keeps traffic running on the connection and also will bring it back up if it does go down.
Additionally, we have a CloudWatch alarm that fires if one or both of the VPN connections goes down. If one has gone down traffic should still be flowing over the other, and usually it will come back up of its own accord. Contact NOC if there are issues.
In general, you should not need to SSH on to the cluster instances. Definitely not for routine maintenance (do that through an ECS task if you need that kind of thing). It may be necessary to troubleshoot and debug issues, however.
Instructions for how to SSH on to our bastion machine using an SSH key loaded into your IAM account, and from there how to SSH on to a cluster instance, are in the digital-terraform’s README.md file.
To access the AWS resources (e.g. EC2 devices) you first need to SSH into the AWS environment.
You can access the SSH Bastion from the City Hall network (140.241.0.0/16
) if you have an SSH key on your AWS account and are in the SshAccess
IAM group.
Request an AWS Admin to add you to the SshAccess
IAM group.
From the IAM console, upload a public key for your account
Edit your /etc/hosts
to add the following line: 35.169.164.239 apps-bastion
Initialize your account on the bastion by SSHing without a public key: ssh -o PubkeyAuthentication=no <username>@apps-bastion
Note: your bastion username is the bit before @boston.gov
on your account name.
Control-C out when it asks for a password.
SSH in with your public key: ssh -A <username>@apps-bastion
(the -A forwards the SSH agent, which is important for SSH'ing on to the instances.)
From the Bastion, you can get to the EC2 instances which host the ECS services.
Request that the AWS Admin share the ec2-user private keys and passwords with you via dashlane. There are 2 keys one for production and one for staging. Save whichever you need, or both, into your ~/.ssh
folder.
Ensure the permissions on the private key file/s are set to 600 (chmod 600 xxxx
)
Note the Private IPv4 address
of the EC2 instance from the EC2 instances page in the AWS console - this will be 10.40.15.x
for staging and 10.40.115.x
for production.
There are 2 production instances, you can use either.
These IPAddresses change after each deployment, so check regularly.
Once you have successfully SSH'd onto the bastion (#6 in Step 1 above), you will be able to ssh onto the instance ssh ec2-user@<ipaddress>
Once you’re on a container instance (#4 step 2 above), you can use docker
commands to inspect containers
for example some useful commands are:
??? Outside of the containers, the ec2-user
account can use sudo -s
to open up a shell with root access.
The Digital Team use terraform to manage the AWS configuration.
Terraform is a CLI utility synchronizes AWS with scripts. In essence, it uses a series of scripts to detect and make changes to AWS. Terraform commands are run from a terminal session on a machine with Terraform libraries installed. See website: See documentation
How to restart an ECS service when you change its configuration.
When you update a service’s configuration in S3 you’ll need to manually restart it to pick up the file changes. Because we do rolling ECS updates, you can do this without dropping traffic.
You will need to have an AWS Console account.
First, visit the ECS page on AWS and choose your cluster (AppsStaging or AppsProd).
Then, click the checkbox next to the service you want to restart and press the Update button.
Don’t touch any other settings, but make sure to click the Force new deployment checkbox. That will start up new containers, even though the code hasn’t changed from what’s currently running.
Click Next step through all of the screens, and then click Update service.
Navigate to the service’s “Events” tab and keep an eye on things. You should see it start new tasks and eventually deregister and stop the old tasks. Once it says “…has reached a steady state” again then you know things were successful.
Guide to mounting an s3 bucket via SFTP as a drive on your computer
PREP
Check/Create SSH/RSA Keys
The RSA Keys should not have passwords, create new keys (without a password) if the current keys the user has were setup with a password
Setup SFTP Account on AWS
If you’re not an Admin ask one (David, Phill) to create your account
Add the users SSH/RSA key to their FTP account
Make sure the user you use in your computer is an admin on that computer
We’ll need to run commands under `sudo`
SETUP
Download FUSE & SSHFS from https://osxfuse.github.io/
Install FUSE
Install SSHFS
Restart the computer
Open the Terminal App, can be found in the Applications Folder under Utilities
Check sshfs is install with his command ```sshfs --help```
Create two directories, `mnt/patterns`
mkdir ~/mnt
mkdir ~/mnt/patterns
Locate the SSH/RSA Keys
This is probably at `~/.ssh/`,
Save/copy the users FTP account username (ask AWS admin if you don’t have it)
Try connecting/mounting the Drive with this command, replacing RSA Key and Username with the values from the previous two steps
If this doesn’t work try ```sftp -o IdentityFile=RSAPublicKeyLocation username@assets_sftp.boston.gov```
This should work, if not trouble shoot by looking at the logs from the previous command (#1)
Now you are able to mound it’s time to create a Bash script that will run when the user logs in.
Using a Text or Code Editor create a bash file at ```/Library/Startup.sh``
Copy the code below into the file
From the Terminal app, make the file executable: ```chmod +x /Library/Startup.sh```
Get to this file using the `Finder`, then right-click on the file and select the 'Get Info' option. Use the 'Open with:' to use 'Terminal'. Can be found under 'Applications > Utilities' and check the 'Enable' drop down to 'All Applications'
Open up “System Preferences” and go to “Users & Groups”
Switch to the “Login Items” tab, unlock the ability to edit these settings by clicking the Padlock in the bottom left.
Use the “+” button to add a new action in “Login Items”, this will open up a file browser window.
Use the File Browser to locate the “Startup.sh” file we created in the “Library” and select it.
Use the Apple icon on the top left of the screen to “Log Out”
When you sign in again open up a “Finder” window and check if the drive mounted at ~/mnt/patterns
Debug Tips
How to update the AMI on our ECS cluster instances
The Digital webapps cluster uses the Elastic Container Service on AWS. We have a handful of EC2 instances that actually host the containers.
These instances use a stock Amazon Machine Image (AMI) from Amazon designed for Docker that comes with the ECS agent pre-installed. From time to time, Amazon releases a new version of this “ECS-optimized” image, either to upgrade the ECS agent or the underlying OS.
Thanks to our instance-drain Lambda function, updating the cluster EC2 images is a zero-downtime process. Nevertheless, it’s best to run this during the weekly digital maintenance window, and make sure that staging looks good before doing it on production.
This process is sometimes referred to as “rolling” the cluster though it’s more accurate that we set up a second cluster of machines and migrate to it.
Find the latest ID for the ECS-optimized AMI. You can do this on the Amazon ECS-optimized AMIs page.
In a browser navigate to CityOfBoston/digital-terraform repository and edit the apps/clusters.tf
file.
Update the instance_image_id
value for the staging_cluster
module to the new AMI ID from step 1 above. Save/commit the file as a new branch, not directly to the production
branch.
Make a PR which merges the new branch into the production
branch, and assign a person to review the changes.
When you make the PR, GitHub will automatically execute an atlantis plan
process (see what atlantis is).
When the plan is done, inspect the output and expect to see changes to:
- resource "aws_autoscaling_group" "instances"
- resource "aws_cloudwatch_metric_alarm" "low_instance_alarm"
- resource "aws_launch_configuration" "instances" (This last one will have the new AMI guid)
Any other changes the plan identifies should be carefully investigated.
Terraform may be proposing to make changes to the AWS environment you don't want, or at least are not expecting.
After viewing the plan, if you need to update the terraform scripts, be sure to save the changes to the new branch.
If comitting your changes does not trigger the atlantis plan automatically, you can run it manually by creating a new comment with atlantis plan
.
Once the atlantis plan is finished, and the PR has been approved, create a new comment atlantis apply.
This will cause Atlantis to apply changes to AWS. (Atlantis runs a terraform apply
command in a background process). See what happens.
Keep an eye on the “ECS Instances” tab in the cluster’s UI. You should see the “Running tasks” on the draining instance(s) go down, and go up on the new instances.
Once all the tasks have moved, the old instance(s) will terminate and Terraform will complete. Check a few URLs on staging to make sure that everything’s up-and-running.
Now that Atlantis’s apply finished, you can merge the staging PR and repeat the process (steps 2-6) for the production cluster.
If you have terraform installed on your local computer, you can do the update directly from your computer.
Find the latest ID for the ECS-optimized AMI. You can do this on the Amazon ECS-optimized AMIs page.
Ensure your cloned copy of the digital-terraform
repository is on the production
branch, and that the branch it up to date with the origin on GitHub.
Create a new branch from the production
branch.
In your preferred IDE open the /apps/clusters.tf
file and update the instance_image_id
value for the staging_cluster
module to the new AMI ID from step 1 above. Save/commit the file to the new new branch (not directly to the production
branch).
in a terminal/shell from the repo/apps/
folder, run the command:
terraform plan
When the plan is done, inspect the output and expect to see changes to: - resource "aws_autoscaling_group" "instances" - resource "aws_cloudwatch_metric_alarm" "low_instance_alarm" - resource "aws_launch_configuration" "instances" (This last one will have the new AMI guid) Any other changes the plan identifies should be carefully investigated. Terraform may be proposing to make changes to the AWS environment you don't want, or at least are not expecting.
Once you are happy with the changes that terraform will apply to the AWS environment, you can run the command:
terraform apply
See what terraform apply does.
Keep an eye on the “ECS Instances” tab in the cluster’s UI. You should see the “Running tasks” on the draining instance(s) go down, and go up on the new instances.
Once all the tasks have moved, the old instance(s) will terminate and Terraform will complete. Check a few URLs on staging to make sure that everything’s up-and-running.
Now that terraform's apply is finished, you can repeat the process (steps 2-9) for the production cluster.
Finally you should merge the changes in your new (local) branch into the local production
branch, and then push the your local production
branch to the origin in Github.
After the production instances are fully up, check that they have roughly equal “Running tasks” numbers. ECS should schedule duplicate tasks on separate machines so that they are split across AZs. If you see a service has both of its tasks on the same instance you can run a force deployment to restart it. (See Restarting an ECS service)