Create a Dev/Test Environment for Data Pipelines Using Spark and Python

Delivering high-quality data requires strict testing of pipelines before deploying them into production.

Today, in order to test ETLs, one either needs to use a subset of the data, or is forced to create multiple copies of the entire data. Testing against sample data is not good enough. The alternative, however, is costly and time consuming.

In this webinar we will demonstrate how to develop and test on the entire production data set with zero-copy.

We'll explore:

How to set up your environment in under 5 minutes
How to create multiple isolated testing environments without copying data
How to easily run multiple tests on your environment using git-like operations (commit, branch, revert, etc.)

Speakers

Iddo Avneri

VP Customer Success
lakeFS

Create a Dev/Test Environment for Data Pipelines Using Spark and Python

Speakers

Iddo Avneri

Watch recording