You might have started hearing the term Data Contract recently and wonder what it is.
TL;DR: Data contracts are just integration testing, CICD, and APIs for data.
But, the difficult part of data contracts comes from the organizational complexity of implementing them.
technically data contracts require to invest into data lineage and data profiling. Nothing much new here.
Data quality is a team sport though, you really need everyone in the org to buy into the concept of data contracts to implement them effectively.
Data quality is also hard, so you need to clearly articulate the value it brings for people to stick on caring about it!
So, for data contracts to succeed, together with the tech we also need:
create the right environment for people to collaborate cross-functionally.
Communicate effectively across the whole organization, starting from leadership, about the value of data quality or the cost of bad quality in data.
Focus on incremental data quality improvements instead of trying to build a complete end to end solution. Data quality is a continuous process anyway and you want to start delivering value as soon as possible.
Learn from other disciplines. Alert fatigue is a thing and SREs and DevOps have known about it for a long time. No need to reinvent and learn by doing the same mistakes other engineers have already done.
Finally, remember that Data Contracts already exist in any organization. They are just implicit. The whole point of talking about Data Contracts is to encourage people to make them explicit and to understand the value you get by doing that.
For more on this topic, Chad Sanderson who invented the term Data Contract did an amazing job explaining everything on this Data Stack Show episode.
Top comments (0)