Building reproducible ML processes with open source
Join us September 7, 2023 9:00 AM Pacific Time
Machine learning experiments consist of Data + Code + Environment. While MLFlow Projects are a great way to ensure reproducibility of Data Science code, it cannot ensure the reproducibility of the input data used by that code.
In this talk, we'll go over the trifecta required for truly reproducible experiments:
- Code (MLFlow and Git),
- Data (lakeFS)
- Environment (Infrastructure-as-code).
This talk will include a hands-on code demonstration of reproducing an experiment, while ensuring we use the exact same input data, code and processing environment as used by a previous run.
We will demonstrate programmatic ways to tie all moving parts together: from creating commits the snapshot the input data, to tagging and traversing the history of both code and data in tandem.
Speakers
Co-founder & CTO
Register