Incident Report 02-03-2022
#
SummaryAt 14:20 GMT
a deployment to production was completed and users reported they were getting error messages. The root cause of the issue was that a new banner that was added to the page did not consistently contain a URL that was required for it to work. This meant that some users were unable to access residents' pages. The issue was fully resolved at 15:40 GMT
and the application was fully functional again.
#
Timeline14:20 GMT - A deployment to production was completed
14:21 GMT - First report of the incident, reports of
{"error":"Error logging in"}
14:26 GMT - Reports of a different error message
âError message: Cannot read properties of undefined (reading 'auth')â
14:30 GMT - Developers start mobbing to identify and resolve the issue
15:18 GMT - Communication was sent to the Children and Family Service (CFS) to explain an issue has occurred and a fix is being worked on
15:18 GMT - Communication was sent to the Adults Social Care (ASC) service to explain an issue has occurred and a fix is being worked on. (Alex noted that the initial comms bounced back so he had to contact them by an alternative method)
15:40 GMT - A fix was released to production that resolves the issue
15:44 GMT - Communication was sent to CFS to confirm the issue has been resolved
15:44 GMT - Communication was sent to ASC to confirm the issue has been resolved
#
Root CauseDuring a planned deployment to production, a preview banner was added to the resident view that gave the option to use a new resident view and provide feedback. The link to provide feedback was not consistently populating. Due to this, the resident view was not consistently rendered and some users were experiencing an error stating
âError message: Cannot read properties of undefined (reading 'auth')â.
#
Technical DetailThe preview banner that was added used a Link field along with an href attribute.
Link fields will attempt to prefetch the href that has been passed. Link fields use a
formatUrl
function to properly format the URL.The first step of the
formatUrl
function is to destructure the passed in URL. If no URL is passed to theformatUrl
function then the function is unable to destructure this and an error is thrown. This was the error that was seen by users during this incident.
#
Resolution and Recovery- As an immediate resolution, the new preview banner was removed from the resident view. This resolved the issue for users.
- This issue with the preview banner meant that certain pages were not loading however no data was lost due to this issue and recovery was not required.
#
Corrective and preventive measures- The preview banner was reviewed further following the resolution of this incident.
- The code was changed to use a different field to display the link to the feedback form rather than a Link, this provides a more robust way of showing a link.
- To prevent a scenario where the feedback form URL was unavailable, the URL was hardcoded into the new field.
This was tested and deployed to production on 03/03/2022.
â---------------------------------------------------------------------------
#
Incident CommentarySetting: remote group mobbing
Screen presenter: Rachel Hutchinson - Made Tech
Scribe: Tony Griffin - Made Tech
â---------------------------------------------------------------------------
2:30 pm onwards
Dev team made aware of incident
Mobbed from this time onwards
Cloudwatch production account was checked for error warnings
Tuomo confirmed there were normal amount of errors appearing in Cloudwatch
Lawrence asks if weâve tested the URL that we were sent, ie the
â/people/[id]â
Error in console: âCannot read properties of undefined ( reading âauthâ )At Object.t.formatURL ( )"
-- Question: Whatâs changed recently?
- Identified newly deployed code has rolled to prod and coincides with the start of the incident (roughly 7 mins before devs were notified of the incident)
-- Question: Is it tested?
- Lawrence asks if it was tested/has tests written?
Sentry:
Check the Main Application
- The difficulty here is to spot errors without a good source map being available in Sentry
AWS:
Errors in lambda logs
Tuomo: Says heâs seeing 403 errors in the front end Lambda logs that he hasnât seen before
âLambda runtime failed to post handler success response 403â
-- Question: Has the API key been changed?
- Lawrence asks if the API key has changed for any reason? - Apparently not
-- Question: Quickest fix a redeployment?
Rachel asked if the quickest fix is to redeploy the last working deployment? - Possibly yes
Lawrence confirms the error is definitely in the FE as there is no accompanying HTTP error.
The error appears due to some kind of object destructuring as per the console error.
Due to the
âSystem Errorâ
banner appearing on pages, the error never hits Sentry properly so we donât get a stack trace of the error.Jack notes that if you change
â/people/[id]â
to the newer URL ofâ/residents/[id]â
then that works correctly.Jaye says people and residents use the same API endpoint, so have the same data source.
He lets us know that 404 errors can be thrown by the app trying to extract workerâs names from their email (So they are expected errors in some cases)
Remove Banner component code
(Rachel removed the Banner code changes for the repo at this stage)
This is this code that we believe is generating the error
https://github.com/LBHackney-IT/lbh-social-care-frontend/pull/870
3:08 pm onwards
Same error reported previously
- Lawrence notes that the same error has been noted before - the error is definitely due to the
<Link>
tag & how it does restructuring on the URL object.
(Happening in the pre-build phase when deploying to Vercelâs set up) in the static content build possibly?
https://github.com/vercel/next.js/issues/30028
-- Question: Need for FEEDBACK_URL?
Rachel asks if we need the
NEXT_PUBLIC_FEEDBACK_URL
? (To be decided)At this point we know it's definitely one of the links on the pages
-- Question: Why does it work on Staging but not Production?
Jaye notes that this works on staging but not on prod.
Rachel notes that we donât see this preview banner for an Adult on staging.
This leads to investigating the feature flag set up later
Error thrown from inside node modules
- Lawrence identifies the line throwing an error
node_modules/next/dist/shared/lib/router/utils/format-url.js ( line:32 )
âurlObj"
is undefined- NextJSâs router uses this in itâs
LINK
component on itsâhrefâ
attribute
PreviewBanner component
PhaseBanner component uses a straight
<a>
tagPreviewBanner component uses a
<Link>
tag and it doesnât have aâhrefâ
set to true, so it tries to do a pre-fetch on the URL & if the URL is undefined it will fail and throw an error.<Link>
tags are meant to be used to link to resources internal to the application only, not external links.
This component was removed from the application as we believed it was the source of the issue.
Fix: Deployment to Staging
Rachel lined up some people on Staging to see if the deployment had fixed the problem
The following is an adults record:
Rachel checked:
â/residents/7â â/residents/7/detailsââ/residents/7/workflowsâ
- Must wait till the deployment changes hit production to confirm the fix
Fix: Deployment to Production
3:35 pm
Person we checked on prod:
â/residents/14â
Banner component was totally removed
Contacted Alex at this time to let him know.
Notes on types of people records checked
Adult: â/people/34â Foo BarChild: â/people/19â Cristobal Cawdell (I believe to be a Child record, redirects to â/residents/19â) (Not sure on this?)
Location of error in the PreviewBanner component
/components/NewPersonView/PreviewBanner.tsx (Line: 32)
Feature flag setup
We noted that we donât see this preview banner for an Adult on staging.
This picks up on Rachelâs earlier comment and leads to investigating the feature flag & seeing that we canât see the Banner component in Staging as the feature flag was set up to only be active on production.
Issues Identified after the Deployment
PreviewBanner component needs to be rewritten so it uses a straight a tag and removes the tag.
Work out why the env var
NEXT_PUBLIC_FEEDBACK_URL
isnât set when the app is built?. Lawrence suggests making it a nonNEXT_PUBLIC
env var in CircleCiSort error boundaries within the Nextâs app - to allow Sentry to retrieve errors
Tuomo suggests hardcoding the
FEEDBACK_URL
as the form is restricted to Hackney staff.We confirmed that this was the case by having Jack try to access the
FEEDBACK_URL
form using his Made Tech account and was unable to.
We hardcoded the
FEEDBACK_URL
in thePhaseBanner
andPreviewBanner
components & remove reference to it in the serverless.yml file in the application.Deleted
NEXT_PUBLIC_FEEDBACK_URL
from CircleCI & the AWS Staging and Prod environments
Follow up Commits
- Re-added: Preview Banner - to
âpages/people/[id]â
& rewrote component to use an<a>
tag.
https://github.com/LBHackney-IT/lbh-social-care-frontend/pull/871/files
- Update feature flag - to show everywhere except for on production environment
#
Action points- Ensure all code is supported by having tests written for it.
- Breaking changes should show up when a piece of code has tests written around it.
- Create a pull request and have another engineer look over the proposed changes
- Before deploying - Always send a message in the relevant channels that you would like to do a deployment and check how this may affect the work of others.
- The current deployment process will push all code that is ready to be deployed, not just yours.
- Clear channels for communications to both the CSF and ASC services should be set up and verified to enable quick correspondence between us and the services should there be need to contact them quickly in the future.