Data Lake Houses

#aws

I am writing about the emerging connections between cloud data warehouses and cloud data lakes. This is called a 'data lake house' pattern (concept using AWS services shown below).

Over the years I've built many AWS Redshift data warehouses and a couple of AWS S3 data lakes. What about you? What types of AWS solutions are you building for data analytics lately?

I am particularly interested in communicating with those of you who are using the AWS Lake Formation services. These 'sit on top of' AWS S3 adding federated security and other key services to an AWS Data Lake. When I built data lakes, AWS Lake Formation wasn't yet available, so my teams had to build these control services manually.

I've also been looking at [AWS Glue Data Brew(https://aws.amazon.com/glue/features/databrew/). The data profiling looks to be very useful (shown below).

I am very curious about Data Brew's scalability and cost in production vs. more traditional ETL and/or Hadoop/Spark batch transform methods.

I'll be posting on this topic over the next few month's. If this is of interest to you, follow me on twitter as well. Let's connect!

Top comments (3)

Dendi Handian • Sep 6 '22

Gonna read this later

Top2World • Aug 31 '21

Nice post :)
You can also check out Top 10 Most Expensive Houses in the World if you are intrested.

Andrew Brown 🇨🇦 • Apr 26 '21

Looking forward to hearing more. My knowledge of data warehousing is limited to certification knowledge until I have more real-world use-cases I need to put into practice.

DEV Community

Data Lake Houses

Top comments (3)

Read next

Interview Questions on AWS EC2 and Compute Services

How to Define AI Agents with Cloudformation and SAM: A Builder's Guide

Optimise AWS Costs: Automate Unused EBS Snapshot Cleanup with Lambda

Building Scalable Applications in AWS