Home Tech Updates Databricks Announces New Unity Catalog Offering

Databricks Announces New Unity Catalog Offering

by Helen J. Wolf
0 comment

Databricks has significantly expanded the data management capabilities on the lake house by revealing the data lineage for Unity Catalog.

Data lineage describes how data flows through an organization. The latest feature from the data and AI company enables customers to understand areas such as where the data in their lakehouse came from, who created it and when, and how it has changed over time and how it is used.

Databricks notes that businesses deal with large amounts of data from various sources. Understanding these different areas can be extremely difficult, but having insight is key to ensuring trust and assessing risk.

Data lineage for Unity Catalog enables data teams to view any downstream consumer affected by data changes to get a clear picture of how severe the impact is and to notify relevant stakeholders of changes quickly.

Think of applications, dashboards, machine learning models, and datasets.

Databricks Announces New Unity Catalog Offering

In addition, the offering enables data consumers such as data scientists, data engineers, and data analysts to be context-aware when performing their work, resulting in better results.

Data stewards can also see which data sets are no longer in use or are obsolete. They can delete unnecessary data, reduce risk and ensure that end users only use high-quality data.

These new capabilities in Unity Catalog provide organizations with a complete view of the entire data lifecycle, so data leaders can understand how data is collected, whether it has been updated, and what processes are being used.

“Governance capabilities like DataLine are critical in building the industry’s most robust Lakehouse platform,” said Matei Zaharia, Databricks co-founder and chief technology officer.

“Without a good data line, it is challenging to follow the business and verification processes that data-driven organizations need to be successful.

“Our goal is to ensure our customers can focus on insights and move toward proactive data management practices through a unified, transparent view of their entire data ecosystem.”

One of the key features of Unity Catalog is automated runtime lineage to capture all lines generated in Databricks, allowing for greater accuracy and efficiency than manual tagging.

This information is captured for tables, views, and columns to provide a detailed statement of upstream and downstream data flows.

Lineage also works for all workloads supported by Databricks, including SQL, Python, R, and Scala. All data personas can build on their tools with data intelligence and more substantial insights, such as capturing lineage for items such as notebooks, workflow, and dashboards.

Furthermore, data lineage also helps companies meet compliance standards, making it easier to track data flow subject to compliance regulations, including the General Data Protection Regulation, California Consumer Privacy Act, and Health Insurance Portability and Accountability Act.

Databricks says this aspect of data traceability is an important part of a modern data architecture that helps customers meet their regulatory requirements.

Data lineage for Unity Catalog is now available for preview on AWS and Microsoft Azure.

You may also like