DEV Community

Data Engineering Podcast

Reflecting On The Past 6 Years Of Data Engineering

Summary

This podcast started almost exactly six years ago, and the technology landscape was much different than it is now. In that time there have been a number of generational shifts in how data engineering is done. In this episode I reflect on some of the major themes and take a brief look forward at some of the upcoming changes.

Announcements

  • Hello and welcome to the Data Engineering Podcast, the show about modern data management
  • Your host is Tobias Macey and today I'm reflecting on the major trends in data engineering over the past 6 years

Interview

  • Introduction
  • 6 years of running the Data Engineering Podcast
  • Around the first time that data engineering was discussed as a role
    • Followed on from hype about "data science"
  • Hadoop era
  • Streaming
  • Lambda and Kappa architectures
    • Not really referenced anymore
  • "Big Data" era of capture everything has shifted to focusing on data that presents value
    • Regulatory environment increases risk, better tools introduce more capability to understand what data is useful
  • Data catalogs
    • Amundsen and Alation
  • Orchestration engine
    • Oozie, etc. -> Airflow and Luigi -> Dagster, Prefect, Lyft, etc.
    • Orchestration is now a part of most vertical tools
  • Cloud data warehouses
  • Data lakes
  • DataOps and MLOps
  • Data quality to data observability
  • Metadata for everything
    • Data catalog -> data discovery -> active metadata
  • Business intelligence
    • Read only reports to metric/semantic layers
    • Embedded analytics and data APIs
  • Rise of ELT
    • dbt
    • Corresponding introduction of reverse ETL
  • What are the most interesting, unexpected, or challenging lessons that you have learned while working on running the podcast?
  • What do you have planned for the future of the podcast?

Parting Question

  • From your perspective, what is the biggest gap in the tooling or technology for data management today?

Closing Announcements

  • Thank you for listening! Don't forget to check out our other shows. Podcast.__init__ covers the Python language, its community, and the innovative ways it is being used. The Machine Learning Podcast helps you go from idea to production with machine learning.
  • Visit the site to subscribe to the show, sign up for the mailing list, and read the show notes.
  • If you've learned something or tried out a project from the show then tell us about it! Email hosts@dataengineeringpodcast.com) with your story.
  • To help other people find the show please leave a review on Apple Podcasts and tell your friends and co-workers

The intro and outro music is from The Hug by The Freak Fandango Orchestra / CC BY-SA

Sponsored By:

Support Data Engineering Podcast

Episode source