DEV Community

Jess Lee
Jess Lee Subscriber

Posted on

Delivering a data platform, not a set of tools

Scenario

You want to be a data informed organization so you've decided to dedicate resources to ETL (extract, transform, load) all your data sources into a data warehouse. You spent the time investigating the pros & cons of AWS vs. BigQuery and tapped the shoulders of all the data engineers you know to figure out which pipeline tools are most appropriate for your needs. Once you got all the data into that warehouse (which you later realize could have just been a managed RDS), you threw on the latest and greatest data visualization tool you could find. You had your data analysts create beautiful dashboards and invested in data exploration tools so no one needs to write in a querying language ever again.

Great. SQL appears to be abstracted. The platform is ready. Now what?

The hard part, human adoption.

Tools don't make decisions, people do. And in order to become a data informed organization, you'll need to empower your team to ask the right questions and train them on how to use the tools at hand, thoughtfully. And ideally, independently.

Solution

This creates a unique opportunity to turn data engineers and analysts into teachers and full-time creative thinkers. They're teachers because communicating data concepts, best practices, and how to use a new set of tools usually can't be accomplished through a presentation. It's accomplished through creating tutorials, holding training sessions, and grading assignments/exercises. Sometimes it means not giving staff members access to certain tools until they've passed a quiz to demonstrate their new ways of thinking. This is a big time and resource commitment.

But it's worth it. Once a data curriculum is set in motion, you'll notice the trello board of data questions disappear. Your data analysts won't spend hours triaging CSV requests and daily queries from various departments. Everyone in marketing will know exactly how to pull daily active users, filtered by age and location and whether or not they like kombucha and clicked on the link to that fitness ad. Your data scientists will spend their time on the hard stuff. The stuff you might need a masters in statistics or a bootcamp certification for.

Common Pain Points

Unless there's a data cheerleader on each product/team/service, it's easy to add a new field to a table because of a biz dev request and forget to tell the data team. Biz dev probably assumes that when a field is added, the tech team has 'handled it' and that the data will make it to the one-stop-shop sql-free exploration tool they just trained on. It doesn't, and biz dev is up against a reporting deadline, so the data team ends up scrambling to make something work. OR, major work is done to spin up a new app but no one consults the data team and building an API isn't prioritized. Data doesn't make it to the warehouse, and this new app kicks off with completely siloed information.

It's time to treat data as a first class citizen.

But how?! By including data in product conversations. Data touches everything so every tech team needs to be thinking about it. Make it habitual to run ideas and ask for feedback from the data team (before the roadmap is 'finalized' and OKRs are set) they'll know best whether or not a new feature or fix will take work on their end and affect reporting for end business users. Decide which people need to be accountable for keeping the right data representative in the loop. And don't call your org data informed until they are :)

Top comments (1)

Collapse
 
adrienauclair profile image
Adrien Auclair

thanks Jess for this very interesting article. I couln’t agree more with what you said.

I would add that one option for the data people to have more teaching time (to be “data evangelist”) is to give them tools that are less time-consuming to setup/manage.
It is quite simple to setup a Redshift (or a RDS) but you still need some time to do it, and you also need to configure the ETL, to plug the dataviz tool, to setup some cron tasks… A simpler path is to use a unified data platform. You get a simple ETL/data-warehouse/dataviz/scheduler in a single cloud app. Less time spent on tech/compatibility/data transfers issues, more time for teaching.

Another point is about the required skills. It’s not so easy to find people who will be able to tune the redshift distribution keys and prepare a training session for non tech users. Of course, there are some people who are good at both, and people can be trained to gain skills they are not good at. But if a unified platform can be used, that reduces a lot the skills required on the tech side. That opens the door to people who are brillant teachers, good data analysts but poor server administrators.

Adrien (disclaimer, I’m a co-founder of Serenytics - a unified data platform with features for coders)