Reproduce & Troubleshoot Data in an Apache Airflow DAG
Apache Airflow enables you to build multistep workflows across multiple technologies. The programmatic approach, allowing you to schedule and monitor workflows, helps users build complicated ETLs on their data that will be otherwise difficult to achieve automatically.
This enabled the evolution of ETLs from simple single steps to complicated, parallelized, multi steps advance transformations.
The challenge is, complicated ETLs mean complicated troubleshooting. When a particular step in the execution fails, it is incredibly difficult to understand after the fact what caused the failure.
What to do when a failure happens? An effective method here would be to first revert the production data to a consistent state (before the issue occurred) for improved data availability. And then investigate the problem.
lakeFS lets engineers revert production data to an error-free state in an extremely simple, one-line command. It supports Git-like branching, committing, and reverting operations on the data lake, enabling a safe and error-free way of troubleshooting production issues.