DEV Community

MLOps Community

The Art and Science of Training LLMs // Bandish Shah and Davis Blalock // #219

Huge thank you to ⁠Databricks⁠ AI for sponsoring this episode. Databricks - http://databricks.com/

Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing.

Davis Blalock is a Research Scientist and the first employee of Mosaic ML: a GenAI startup acquired for $1.3 billion by Databricks. MLOps podcast #219 with Databricks' Engineering Manager, Bandish Shah and Research Scientist Davis Blalock, The Art and Science of Training Large Language Models. // Abstract What's hard about language models at scale? Turns out...everything. MosaicML's Davis and Bandish share war stories and lessons learned from pushing the limits of LLM training and helping dozens of customers get LLMs into production. They cover what can go wrong at every level of the stack, how to make sure you're building the right solution, and some contrarian takes on the future of efficient models. // Bio Bandish Shah Bandish Shah is an Engineering Manager at MosaicML/Databricks, where he focuses on making generative AI training and inference efficient, fast, and accessible by bridging the gap between deep learning, large-scale distributed systems, and performance computing. Bandish has over a decade of experience building systems for machine learning and enterprise applications. Prior to MosaicML, Bandish held engineering and development roles at SambaNova Systems where he helped develop and ship the first RDU systems from the ground up, and Oracle where he worked as an ASIC engineer for SPARC-based enterprise servers. Davis Blalock Davis Blalock is a research scientist at MosaicML. He completed his PhD at MIT, advised by Professor John Guttag. His primary work is designing high-performance machine learning algorithms. He received his M.S. from MIT and his B.S. from the University of Virginia. He is a Qualcomm Innovation Fellow, NSF Graduate Research Fellow, and Barry M. Goldwater Scholar. // MLOps Jobs board https://mlops.pallet.xyz/jobs // MLOps Swag/Merch https://mlops-community.myshopify.com/ // Related Links Website: http://databricks.com/ Davis' Newsletters: Learning to recognize spoken words from five unlabeled examples in under two seconds: https://arxiv.org/abs/1609.09196 Training on data at 5GB/s in a single thread: https://arxiv.org/abs/1808.02515 Nearest-neighbor searching through billions of images per second in one thread with no indexing: https://arxiv.org/abs/1706.10283 Multiplying matrices 10-100x faster than a matrix multiply (with some approximation error): https://arxiv.org/abs/2106.10860 Hidden Technical Debt in Machine Learning Systems: https://proceedings.neurips.cc/paper_files/paper/2015/file/86df7dcfd896fcaf2674f757a2463eba-Paper.pdf --------------- ✌️Connect With Us ✌️ ------------- Join our slack community: https://go.mlops.community/slack Follow us on Twitter: @mlopscommunity Sign up for the next meetup: https://go.mlops.community/register Catch all episodes, blogs, newsletters, and more: https://mlops.community/ Connect with Demetrios on LinkedIn: https://www.linkedin.com/in/dpbrinkm/ Connect with Davis on LinkedIn: https://www.linkedin.com/in/dblalock/ Connect with Bandish on LinkedIn: https://www.linkedin.com/in/bandish-shah/

Episode source