Create a Dev/Test Environment for Data Pipelines Using Spark and Python
Delivering high-quality data requires strict testing of pipelines before deploying them into production.
Today, in order to test ETLs, one either needs to use a subset of the data, or is forced to create multiple copies of the entire data. Testing against sample data is not good enough. The alternative, however, is costly and time consuming.
In this webinar we will demonstrate how to develop and test on the entire production data set with zero-copy.
- How to set up your environment in under 5 minutes
- How to create multiple isolated testing environments without copying data
- How to easily run multiple tests on your environment using git-like operations (commit, branch, revert, etc.)