Advancing Data Governance: Data Lineage and Auditing

Join us in this 2-part series

In this series, we will demonstrate how to quickly achieve lineage by utilizing lakeFS capabilities for commits, metadata on top of the commit, blame functionality and logcommits APIs. Secondly, we will trace who is actually making the transformation, when and who is accessing the data, using lakeFS auditing capabilities.

More about this session:

Data lineage is crucial when working with data lakes because it allows organizations to track the journey of data from its origin to its final destination. In data lake environments, data can come from various sources, in different formats, and with varying levels of quality. As such, it is essential to clearly understand the data's lineage to ensure data quality, regulatory compliance, and effective data governance.

Achieving data lineage for data processed with ETLs on data lakes can be challenging due to several factors. First, ETL processes can be complex, involving multiple steps and transformations that may obscure the data's original lineage. Second, data lakes can store vast amounts of data from diverse sources, making it difficult to track the data's origin and movements. Finally, ETL processes can occur at different times, making it difficult to maintain an accurate and up-to-date lineage.

Register for the first part in this series, where we will cover RBAC for Data Lakes.


VP Customer Success

Director Solution Engineering


Watch on demand