Hadoop World New York, New York (PRWEB) October 16, 2014
Strata + Waterline Data Science today announced at Strata + Hadoop World New York a relationship with Pivotal, the software company at the intersection of big data, PaaS, and agile development. Waterline Data Science will integrate Pivotal HD with Waterline Data Inventory to enable data self-service on Hadoop, allowing users to find, understand, and help govern Hadoop data.
Companies are deploying Hadoop “data lakes” to provide unprecedented access to data for data science and analytics to uncover new business insight. But Hadoop’s advantages of frictionless ingest, flexible schema on read, and lack of data governance, present problems for users trying to find and understand the data. Waterline Data Inventory addresses these problems by building a complete inventory of data assets in Hadoop and by opening access to Hadoop data through data self-service. As a result, data scientists can be more productive, business analysts can easily augment reporting and BI with Hadoop data without coding, and data governance teams can start controlling Hadoop data.
“There is no point building a predictive model of the wrong column, and without a data inventory, you don’t know if you have the wrong column,” said John Mount, co-author of the book, Practical Data Science with R. A data inventory is also valuable for Hadoop data governance, according to Sunil Soares, author of Big Data Governance.
Alex Gorelik, Founder and CEO, states “a major complaint with Hadoop is once you’ve loaded the data, extracting value is like finding a needle in a stack of needles. Waterline Data Inventory lets business users find the best needles in the stack of needles, without having to write code, and without having to wrangle the entire stack. That's our secret sauce, and key to deliver faster time to value and broad Hadoop adoption."
About Waterline Data Science
Waterline Data Science is an early-stage Big Data software company, founded in December 2013, backed by Menlo Ventures and Sigma West. The inspiration for the name "Waterline" came from the metaphor of the Big Data Lake. Waterline solves the challenges of data self-service for the Hadoop data lake. It's easy to get data into Hadoop, but it's not easy to get it out in a self-service manner and derive business value from it. The idea behind Waterline is that data self-service for Hadoop should be like finding the data you need easily, without having to dive for it -- you should be able to Hadoop "above the waterline."