Waterline Data Inventory New Version Enables Enterprises to Deploy a Governed Data Lake

Share Article

Waterline Data catalogs the Hadoop data lake with expanded support for governance and semantic discovery

Waterline Data

Waterline Data today announced the launch of Waterline Data Inventory Version 2, the only data catalog that can automatically inventory every field of data in Hadoop.

Waterline Data Inventory has been used by large enterprise customers in healthcare, insurance, consumer marketing, automotive industries, and government. Version 2 builds on that success, and delivers expanded capabilities in the areas of enhanced discovery of business metadata, data governance and security, and enhanced performance and scalability.

Data governance and security

Version 2 introduces role-based access control, creating four user roles that control access to metadata operations such as creating tags and tag domains, and associating tags. A new catalog view of data enables end users to view metadata without having read access to underlying data, maintaining authorization policies over files and folders.

Enhanced automated semantic discovery and business metadata

Waterline Data Inventory provides both self-service and automated tagging of files and fields in Hadoop, and the new release introduces tag domains, which are separately managed tag glossaries for different datasets. Support for Apache Tika provides an expanded ability to recognize, inventory, search, tag and analyze lineage for files even when their formats are not supported for profiling. Additional file types and algorithms are now supported for profiling (including Parquet and Adobe Audience Manager), and bank and credit card tag association.

Native Hadoop scale and performance

Waterline Data Inventory runs natively on all popular Hadoop distributions, using the scalability of Hadoop to solve the problem of profiling and tagging millions of files. Version 2.0 now performs tag discovery in MapReduce and has an automated upgrade process for metadata repositories created from previous releases.

Waterline Data was recognized in the Cool Vendor in Information Governance and MDM 2015 report by Gartner, Inc., and was named a TiE50 startup company at the 2015 international TiECon conference.

About Waterline Data

Waterline Data was founded in 2013, and is backed by Menlo Ventures and Jackson Square Ventures (formerly Sigma West). The inspiration for the name "Waterline" came from the metaphor of the data lake where the data is hidden below the waterline. The mission of Waterline Data is to help data engineers and data scientists find the best suited and most trusted data without coding and manual exploration - in other words they should be able to "Hadoop above the waterline." Waterline Data was developed to leverage the power and scalability of Hadoop to automate the inventory of data assets in the data lake and enable self-service with governance, so business users can find and understand the data in a secure and compliant way.

Share article on social media or email:

View article via:

Pdf Print

Contact Author

Oliver Claude
Waterline Data
+1 (650) 946-2104
Email >

Denise Sawicki
Follow >
Visit website