DEV Community

MLOps Community

How to Avoid Suffering in Mlops/Data Engineering Role // Igor Lushchyk // MLOps Meetup #55

MLOps community meetup #55! Last Wednesday we talked to Igor Lushchyk, Data Engineer, Adyen.  

// Abstract:
Building Data Science and Machine Learning platforms at a scale-up. Having the main difficulty in finding correct processes and basically being a toddler who learns how to walk on a steep staircase. The transition from homegrown platform to open source solutions, supporting old solutions and maturing them with making data scientists happy.  

// Bio:
Igor is a software engineer with more than 10 years of experience. With a background in bioinformatics, he even started PhD but didn't finish it.
As a data engineer, Igor has been working for the last 6 or 7 years, or maybe more - because he was doing almost the same data engineering stuff but his position was named differently.
Igor has been doing a lot of MLOps in 4-5 years now. He doesn't know what he was doing more then - Data Engineering or MLOps. And that’s how this topic came about.  

// Final thoughts
Please feel free to drop some questions you may have beforehand into our slack channel
(https://go.mlops.community/slack)
Watch some old meetups on our youtube channel:
https://www.youtube.com/channel/UCG6qpjVnBTTT8wLGBygANOQ

----------- Connect With Us ✌️-------------   
Join our Slack community:  https://go.mlops.community/slack
Follow us on Twitter:  @mlopscommunity
Sign up for the next meetup:  https://go.mlops.community/register

Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/
Connect with Igor on LinkedIn: https://www.linkedin.com/in/igor-lushchyk/

Timestamps:
[00:00] Introduction to Igor Lushchyk
[02:05] Igor's background in tech
[07:42] Tips you can pass on
[11:05] How these tools work and how they play together and what is underneath?
[13:18] Dedicated MLOps team
[13:55] Central Data Infrastructure Section
[16:57] Transfer over to open-source
[20:24] If you don't plan for production from the beginning, then it's going to be painful trying to go from POC to production.
[22:08] Ho do you handle data lineage?
[25:09] You chose that back in the day but you're regretting it.
[26:34] "Try to use tools which solve 80% of your use cases and maybe 20% you'll have the suffering but at least it's not 100% suffering."
[27:27] Friction points
[28:53] Interaction with Data Scientists
[29:21] "We have alignment sessions. We have different levels of representations. We share our progress."
[32:42] Build verse by decisions
[34:04] When to build or grab an open-source tool
[35:51] Build your own or buy open-source?
[37:11] Certain maturity and a certain number of engineers
[38:11] Startup to go with open-source
[40:14] Correct transition process
[40:56] "There are no other ways but to communicate with data scientists. Your team needs to have a close loop for future priorities, what to take with you and what to leave behind."
[44:51] What to use in monitoring piece
[45:36] Prometheus and Grafana
[48:07] Do you automatic retriggering monitoring of Models set up?
[51:55] Hardware for on Prim model training
[52:38] "Machine Learning model prediction is a spear bomb."
[53:55] War or horror stories
[54:15] "Guys, don't do context switching!"
[55:54] "I won't say that Adyen is a company that allows you to make mistakes but you can make mistakes."

Episode source