Greenplum Brings MapReduce to the Enterprise

Share Article

First product to support internet-scale analytics technology popularized by Google

We are finding this to be incredibly efficient because complex SQL queries can be expressed in a few lines of Perl or Python code.

Greenplum, a leading provider of database software for the next generation of data warehousing and analytics, today announced support for MapReduce within its massively parallel database engine. MapReduce is the parallel computing technique pioneered by Google for analyzing the web, and Greenplum now makes MapReduce available to enterprises to allow them to derive deeper insights from their own data. Early adopters of the technology include LinkedIn and O'Reilly Media.

MapReduce has been proven as a technique for high-scale data analysis by Internet leaders such as Google and Yahoo. Greenplum gives enterprises the best of both worlds - MapReduce for programmers and SQL for DBAs - and will execute both MapReduce and SQL directly within Greenplum's parallel dataflow engine, which is at the heart of the Greenplum Database.

"On its own, MapReduce is a powerful tool for data manipulation and analysis. Companies that are integrating MapReduce and SQL are increasing its applicability and giving developers and DBAs the ability to work together on a common parallel data processing infrastructure," said Curt Monash PhD, President of Monash Research and editor of the influential blog DBMS2.

Greenplum customers have been involved in an early-access program utilizing Greenplum MapReduce for advanced analytics. For example, LinkedIn is using Greenplum Database for new, innovative social networking features such as "People You May Know" and sees Greenplum MapReduce as a way to develop compelling analytics products faster. A primary benefit of the new capability is that customers can combine SQL queries and MapReduce programs into unified tasks that are executed in parallel across hundreds or thousands of cores.

"Greenplum has seamlessly integrated MapReduce into its database, making it possible for us to access our massive dataset with standard SQL queries in combination with MapReduce programs," said Roger Magoulas, Research Director, O'Reilly Media. "We are finding this to be incredibly efficient because complex SQL queries can be expressed in a few lines of Perl or Python code."

"Greenplum has assembled some of the best and brightest database and distributed systems experts to build the parallel data processing technology that is at the heart of Greenplum Database. The introduction of MapReduce into our product means that customers will immediately have a wide range of new capabilities for their massive-scale data analytics, something we are uniquely qualified to bring to market," said Scott Yara, co-founder and President of Greenplum.

For more information please visit to download an informative whitepaper on Greenplum MapReduce. MapReduce will be available as a part of Greenplum Database in September.

About Greenplum
Greenplum is a data infrastructure company that is reinventing how companies gain insight and competitive advantage from their data. The company's flagship product, Greenplum Database, is built to support the next generation of data warehousing and large-scale analytics processing. Supporting SQL and MapReduce parallel processing, Greenplum Database offers industry-leading performance at a low cost for companies managing terabytes to petabytes of data. Greenplum Database is used by major global organizations including Nasdaq, NYSE Euronext, Reliance Communications, Skype and LinkedIn. Greenplum partners with Sun Microsystems to power the Sun Data Warehouse Appliance. For more information visit .


Share article on social media or email:

View article via:

Pdf Print

Contact Author

Leyl Black

Paul Salazar
Visit website