Ensure a performant connection between the data platform and Hackney Business Insights tools. This spike will consider the following:
- Apache Drill
- Amazon Redshift Spectrum
- Amazon Athena
- Qlik Custom Connectors
While investigating these options the following factors should be taken into account:
- Time to implement the solution
- Cost of the solution
- Performance of reading large quantities of data
Apache Drill is a Schema-free SQL Query Engine for Hadoop, NoSQL and Cloud Storage providers such as Amazon S3. The software is provided for free, however there would be associated costs with deploying the product with the Data Platform.
Apache Drill is not currently provided as a managed service by AWS, which means the product would need to be installed and maintained by Hackney. We are unaware of anyone within Hackney with knowledge of Apache Drill knowledge and this could prove to be an issue in the longer term.
We could install and configure the service in conjunction with Hackney staff to ensure knowledge share through this would not resolve longer term support issues.
Amazon Redshift is a Data Warehousing that provides an SQL style interface for querying data. In addition to being able to warehouse data Amazon Redshift Spectrum is able to query data directly from external storage such as Amazon S3 without the additional cost of storing the data within the warehouse itself. Essentially separating the compute-layer from the storage-layer. Amazon Redshift is provided as an AWS managed and supported service, with the ability to upscale and downscale as Hackney requirements change in addition to being paused and unpaused on a scheduled basis.
The biggest limitation with Amazon Redshift is that it can get expensive to run large clusters. Running a single node continuously is $120 per month and with our lack of visibility of the quantities of data that is being queried.
We can reduce the overall cost of running the infrastructure by scheduling period where the nodes are paused such as evening and weekend.
Qlik provides the option of building a custom connector which can allow the system to interact with a dataset. This in theory could be used to allow Qlik to interact with AWS S3 directly.
Building a Qlik Custom Connector would require unspecified amount of development time and would likely require further investigation into the specifics of how this would be implemented.