In Hackney, we use AWS CloudWatch to implement monitoring and alerting.
Any logs created in our APIs are recorded and accessible in AWS CloudWatch. Creation of log groups is automated via the current serverless setup.
Metric filters are a useful feature that allows you to find patterns and terms in your logs. Following the logging standards identified earlier in this document, metric filters can be created to easily identify logs related to a certain phrase or term like
Using the filters, developers can easily narrow down the logs they see to only the ones related to any error that has occurred, hiding all other logs such as ones for successful requests.
CloudWatch also provides a way to create alarms based on metric filters, so we can get notified if a log with matching a certain pattern/term has occurred.
Needs to be filters we want commonly available for each API
Need to decide which logs should have an alarm associated with them
We use AWS CloudWatch Canaries to monitor the availability of our APIs and front-end applications.
Set to run every 5 mins
A canary invokes an API endpoint to check it’s availability
Needs to be set up per API endpoint to ensure all endpoints provided by an API are functioning as expected
The current creation process for a canary is manual
Can monitor the availability of a web page
Alarms can be set to alert if availability of a given web page falls down
Logs recorded can be used to identify performance issues associated with loading a specific item
Can check for broken links
A max number of links to follow is set up
The canary crawls through the links and returns the first broken link identified
We also use CloudWatch alarms to monitor for specific events in the log streams.
Specific metrics can be established as triggers on application logs which can fire off alerts in the form of emails or other messaging mediums. We can create up to 5000 alarms per region per account which should give us sufficient capacity.
It may also be possible to consolidate these alarms if we have a standard format for logs (this may also be achievable by creating composite alarms but uses up available alarms.