Version your ML Training Data for Easy Reproducibility
Most data science and machine learning workflows are not linear. ML experimentation is an iterative process and you have to go back and forth between different components. Most of us experiment with different data labeling methods, data cleaning and pre-processing techniques and various feature selection methods during model training to arrive at an accurate model.
Thus, being able to reproduce a specific iteration of the ML experiment is important to achieve scalable and quality ML models. That is, capturing the version of training data, ML code and model artifacts at each iteration is mandatory. In order to efficiently version these ML experiments without duplicating your code, data and models, you should opt for a data versioning tool like lakeFS. lakeFS allows you to version all components of ML experiments without the need to keep multiple copies of them and saves your storage costs as a fringe benefit as well.
In this webinar, we will show you how to use lakeFS to intuitively and easily version your ML experiments and reproduce any specific iteration of the experiment as needed.
We will cover:
- Creating a basic ML experimentation framework with lakeFS (on Jupyter notebook)
- Reproducing ML components from a specific iteration of an experiment
- Building an intuitive, zero-maintenance experiments infrastructure with lakeFS