Import files from google to s3
Preparing to import the file from Google drive
- Open the file you would like to import
- Ensure that all columns have headers. Columns without headers will be lost
- Click
Share
in the top right corner of the sheet - If the document is unnamed, name it
- Paste in the service account email address you have been provided into the email box
- Ensure the suggested email matches the service account email and select it
- On the new window, choose from the dropdown on the right hand side and select
Viewer
- Uncheck the
Notify people
checkbox - Click
Share
- You will be asked to confirm sharing outside the organisation, click
share anyway
- Your file is now available for import
Getting file detail
You will need to obtain the document key from the url
The document id is the portion of the url between
https://docs.google.com/file/d/
and/edit#gid=0
. See example below
Setting up the copier lambda
Before setting up an AWS Glue job, ensure that the relevant department configuration for that account is set up in AWS
- see
Adding a department
section inmanaging-departments.md
- see
Open the Data Platform Project. You'll need to have a Github account (which you can create yourself using your Hackney email) and have been added to the 'LBHackney-IT' team to view this project (you'll need to request this from Rashmi Shetty). If you don't have the correct permissions, you'll get a '404' error.
Navigate to the main
terraform
directory (data-platform/terraform)Open the
65-g-drive-to-s3
terraform fileSwitch to 'edit mode' (using edit button on top right)
Copy one of the modules above, paste at the bottom of the file and update the following fields:
module
= "your-unique-module-name" (it is helpful to keep the same naming convention as your dataset/folder)lambda_name
= "Your lambda name" (this is what you'll see in the Glue console, can be the same as your module name)file_id
= "Your document id - see theGetting file detail
section above"file_name
= "The name of the file you are importing including the file extension and using underscores instead of spaces"service_area
= "The name of the service area folder you would like to store in e.g.housing
,social-care
" (if this folder doesn't already exist in S3 you can name it here and this script will create it)
Committing your changes: The Data Platform team needs to approve any changes to the code, so your change won't happen automatically. To submit your change:
- Provide a description to explain what you've changed
- Select the option to create a
new branch
for this commit (i.e. the code you've changed). You can just use the suggested name for your branch. - Once you click 'Propose changes' you'll have the opportunity to add even more detail if needed before submitted for review.
- You'll receive an email to confirm that your changes have been approved.