The Cloud Engineering team provides a number of shared services for teams to integrate into their accounts. These are mostly provided by Terraform modules in the Infrastructure repo, and in most cases are enabled through tagging. All defaults can be varied according to the needs of the team.
The Backup module uses AWS Backup. Enable backups by using the BackupPolicy tags in the Tagging module. This stores a backup of the following resources in the same account:
- EC2 instances
- EBS volumes
- RDS databases
- DynamoDB tables
The default backup policies are:
- RPO 24 hours
- RTO 4 hours
- 30 days immutable retention
These defaults must only be varied with the agreement of the Information Asset Owner.
The Patching module automatically scans all Windows and Ubuntu EC2 instances to make sure they have the most up-to-date patches. Note that this is OS only, not application patches. Enable by using the TOSCAN and TOPATCH tag groups (note: case-sensitive).
- Scans EC2s for Critical or Important patches
- Alerts the Cloud Engineering team if patches are required
- Will trigger a cron job to patch the instance out of hours
The Cloud Engineering team has built a tagging module in Terraform to make it easier to apply tags when building infrastructure as code. Tags are used for many purposes in AWS; we principally use them to:
- Manage costs;
- Identify which resources to backup;
- Identify which resources power down during "out of office" hours;
- Identify resources by team, application, environment and project;
- Identify which EC2 resources to scan and/or patch.
Our certificates are managed in AWS Certificate Manager. This is integrated into the Hub, and will automatically renew any certificates generated by it.
Route53 manages our DNS. DNS entries can be added, amended, or removed through Terraform in the platform/public-dns project in the Infrastructure repo. All changes must be made in a branch, and the PR must be approved by the Cloud Engineering team. See this How to HackIT for more detail.
The Overnight Shutdown module will automatically shut down EC2 instances overnight. The default is 19:00 to 07:00 Monday–Thursday and 19:00 Friday to 07:00 Monday, in the development and staging environments.
The Monitoring module provides monitoring on EC2 (Windows only) and RDS instances via CloudWatch. This is part of the EC2/RDS Terraform and the coverage includes:
- EC2 – status, disk space, disk i/o, and CPU & memory usage
- RDS – status, CPU, disk space, memory, login failures, custom metrics
Some manual configuration is required for this service. Alerts are currently via SNS.