We often have a need to directly query unstructured data stored in S3 Buckets in various data formats such as CSV, JSON, AVRO, ORC, PARQUET for ad-hoc querying or may be as a part of building a comprehensive data solution.
Below are some AWS Serverless services that you can use to directly query your S3 data.
1. Amazon Athena
Suitable for ad-hoc data discovery and SQL querying. In this service you are charged based on the amount of data scanned.
2. Amazon Redshift Spectrum
Suitable if you have to use more complex queries and also if you need to support a large user base.
Redshift spectrum is recommended due to below reasons.
Uses Redshift Data warehouse SQL syntax which can spans Redshift Tables and S3 Data Lakes.
Provides sophisticated query optimization.
Distributes queries across multiple nodes for parallel processing.
Can be used with already existing BI tools.
Thank you for your time...
Top comments (0)