For most data-driven companies, Big Data is just called data. Instead, we're enabling smart and fast data whatever the size.
Santa Clara, CA (PRWEB) March 20, 2013
Today, Berkeley-based startup wise.io launched its first product designed to transform the speed and efficiency of machine learning on Big Data. Despite the pervasiveness of machine learning in industry, the growing volume of data has forced data scientists and companies to revert to approximate approaches that produce imprecise results. With WiseRF™ Oak, wise.io has found a way to make one of the foremost machine-learning algorithms fast, scalable, and extremely memory efficient.
Henrik Brink, CTO, said in his presentation at the PyData conference that with this product his company empowers a new generation of “high-frequency data scientists” armed with powerful toolkits. He showed that WiseRF™ Oak models a 12 GB image dataset in 2 minutes whereas other popular algorithms take several days on the same 8-core machine. With the speed and scalability of WiseRF™ Oak, wise.io hopes to alleviate some of the daily pain points of data scientists, who are otherwise forced into slow development cycles due to the existing set of slow machine-learning algorithms that simply do not work on big data sets.
Another pain point in enterprise is the difficulty of transporting the sophisticated machine-learning models developed in-house by R and Python data scientists for use in production. Brink said that “The inability to implement the winning algorithm of the Netflix prize is an important lesson for all of us. What we do in the sandbox must also work in real life. The models generated by Oak are extremely easy to use in production-scale Java environments.”
Joseph Richards, Chief Scientist, said “Despite some of the great theoretical statistical properties of ensembles of decision trees, this algorithm, beloved by Kaggle contestants and academics, has traditionally been stigmatized because of the memory requirements and the perceived slowness relative to other inferior algorithms that scale better. Today we hope to change that.”
“The intellectual property lies in the approaches and methods we invented, rather than in the algorithm itself,” said Damian Eads, Director of Engineering at wise.io. The algorithm, which allows researchers and data scientists to make classification and regression predictions on very high-dimensional datasets is called Random Forests®, a particular version of ensembles of decision trees that was invented at UC Berkeley in 2001 and is a registered trademark of Salford Systems.
Other than the exploitation of multi-core parallelism, a big engineering focus in WiseRF™ Oak was in speeding the prediction rate using the learned models. Joshua Bloom, CEO, said “Fast and accurate learning on large-scale data is only one side of the coin from a data-insight perspective: you must be able to actually use, in production, the models you build. For many companies, this means that the high-velocity of the data requires rapid prediction.” Indeed, some of the first customers of wise.io are less reliant upon fast learning than they are upon fast prediction on new incoming data.
The team demonstrated today that they were getting 2 million predictions per second on 1000-dimensional datasets at 99% accuracy.
To coincide with the launch and the company’s presentation at PyData, WiseRF™ Oak went on sale today, with a free evaluation license for single-seat developers and custom pricing for multiple seat and site licenses for enterprises. Academics and social-good researchers will be able to apply for a free version at the end of the evaluation term.
The aim of the semi-annual PyData conference is to “change the way scientists, engineers, and analysts perceive Big Data.” In his presentation showing the capabilities of WiseRF™ Oak and related benchmarks, Brink said that “We’re actually not big fans of the term Big Data since for most data-driven companies, Big Data is just called data. Instead, we're enabling smart and fast data whatever the size."
The WiseRF™ Oak release is the mid-scale version of wise.io’s ensemble of decision forest software portfolio. It currently includes a command-line interface and Python-language bindings. WiseRF™ Pine is a size-restricted and feature-limited version that is available through wise.io’s partner, Continuum Analytics. WiseRF™ Sequoia is a multi-computer version that is under development.
About the Company: wise.io, Inc. is a privately held pre-funding machine-learning company based in Berkeley, CA. It was founded in 2012 by computer science, statistics, astrophysics, and physics academics from UC Berkeley, UC Santa Cruz, Copenhagen University, and Carnegie Mellon University. For more information, visit the wise.io website http://wise.io