One of the biggest pain working on a machine learning project is data management.
Using VCS (like git) helps a lot during the development phase. But git doesn't work for the large dataset (in the likes of 100s of gbs). This problem can be fixed using DVC. DVC is like git for large dataset. This is not even the best part of DVC, in fact, the best part is it completely work like git (conceptually, syntactically)
DVC has excellent & intuitive Documentation. I would say it is one of the best out there.
Follow this Official tutorial guide to getting started.
DVC works with local file storage as well as cloud storage like AWS S3, GCP storage, etc
Top comments (0)