Blazegraph 2.1.0 Graph Database Now Enables Geospatial Searching and PubChem Data Processing

Share Article

Version 2.1.0 Speeds Analysis of Billion-Edge Data Sets with Improved Semantic Search Support and Query Performance

“We chose Blazegraph to manage the metadata on the Cancer Genomics Cloud because it helps researchers to easily build complex queries based on how they think, not on how the data is stored,” said Igor Bogicevic, CTO at Seven Bridges.

Blazegraph, creator of the industry’s first GPU-accelerated high-performance database for large graphs, today announced version 2.1.0, with significant updates that give users faster, easier access to key data sets, such as new support for processing geospatial coordinates and optimizing queries against the National Center for Biotechnology Information’s (NCBI) PubChem database. In addition, Blazegraph 2.1.0 delivers new tools that enable semantic search on even the largest data published in the Linked Open Data structure, which is heavily used in global publishing, cultural and open government projects. To deliver the speed and performance needed to work with these massive data sets, version 2.1.0 includes significant improvements to its bulk load and query performance capabilities.

Blazegraph 2.1.0 users are already powering complex SPARQL queries to quickly uncover new insights. For example, Wikidata, the free knowledge base community, has deployed version 2.1.0 to power its query service. With this, data experts are using the geospatial capabilities to, for example, create graphs such as shared state borders in the United States, this map of all earthquakes, and this map of chemical elements and their discovery locations.

Another Blazegraph user, Seven Bridges, is a biomedical data analysis company selected by the National Cancer Institute to develop the Cancer Genomic Cloud program. This first complete ecosystem gives cancer researchers immediate access to one of the world’s largest genomic data sets — The Cancer Genome Atlas (TCGA) — and the computational resources to analyze it.

“We chose Blazegraph to manage the metadata on the Cancer Genomics Cloud because it helps researchers to easily build complex queries based on how they think, not on how the data is stored,” said Igor Bogicevic, CTO at Seven Bridges. “In addition to helping scientists find the data they need, Blazegraph and its new 2.1.0 version is just plain fast. It helps us deliver the scale and performance needed to meet some of the biggest cancer genomics data analysis challenges."

Pre-configured for Geospatial, PubChem and Linked Data

Businesses and governments are making their data accessible for complex analysis and deep learning. To utilize this data, researchers in a wide range of fields -- including materials science, precision medicine, genomics and cyber security -- need new tools to achieve insights and innovative results. Blazegraph 2.1.0 makes it even easier for businesses and researchers to leverage graph databases in these complex, data-intensive use cases with its exceptional combination of standards support and features for building graph applications at very large scale, up to 50 billion nodes on a single server.

Geospatial Searching
Blazegraph 2.1.0 provides a new API that enables users to store latitude and longitude coordinates directly within the database, enabling users to integrate even the largest geospatial searches into their query. This feature supports a wide range of convenient and powerful capabilities, ranging from proximity searches to more complex routing and topology analysis. It has been shown to deliver sub-second graph queries to geolocate mobile devices running over billions of edges on the Amazon EC2 platform.

Geospatial search capabilities resulted from Blazegraph’s partnership with metaphacts GmbH, a leader in delivering knowledge graph applications using semantic web technologies. Peter Haase, metaphacts, GmbH CEO, said, "The 2.1.0 release of Blazegraph Database provides new features that are critical for today's knowledge-driven applications. We have already deployed it in production using our leading metaphactory platform for enabling Enterprise Knowledge Graphs."

PubChem is a public repository for information on millions of chemical substances and their biological activities. Consisting of three interlinked databases (substance, compound and bioassay), PubChem is a critical resource for life science and materials science applications.

Blazegraph 2.1.0 includes a pre-configured integration with the PubChem vocabulary, enabling researchers to download the PubChem core data set into an Amazon EC2 instance with minimal set-up and configuration. Then, they can search billions of chemical structures and combine that data with other information to research interactions, develop new compounds, and use in new, innovative applications.

Linked Data
More governments and corporations are using the W3C standards, known as the Linking Data (LD) project, to overcome the challenges of sharing data and making it transparent and readily searchable. Widely used in applications, such as open government, publishing and global heritage projects, this data is messy, disconnected and subject to unexpected structural updates. Data scientists need tools to connect and query data from many different and shifting sources. However, today’s tools are not capable of scaling to handle the vast quantities of information available in LD projects. Blazegraph 2.1.0, with integrated support for emerging data indexing and interchange standards such as JSON-LD and Linked Data Fragments (LDF), will become the platform of choice for exploring and analyzing this data.

Also included in this release is the Blazegraph-based TPF Server, a LDF server that provides a Triple Pattern Fragment (TPF) interface using Blazegraph Database as the backend. Developed in partnership with Olaf Dartig, Ph.D., a researcher at the Hasso Plattner Institute, the TPF server fills the gap in delivering open data at true web scale, and will be critical as LD becomes more widely used.

Version 2.1.0 Delivers Faster Performance on Bigger Data Sets
Since its inception, Blazegraph has been committed to delivering solutions for data scientists and researchers who need to work with billion-edge data sets but have hit a scalability wall with other graph database solutions. Blazegraph 2.1.0 enables them to load and process data sets at least 20 percent faster than previously possible. Other performance enhancements include “out of the box” compatibility with popular frameworks, such as text indexing library Apache Lucene 5.5.0. As part of this release, Blazegraph announced that it will be migrating towards Github for the open source releases in the future. Releases will still be available on Sourceforge.

About Blazegraph
Blazegraph is a provider of highly scalable software for solving complex graph and machine learning algorithms. Founded in 2006, the company is the creator of Blazegraph DB, an ultra-high performance graph database supporting up to 50 billion edges on a single machine. Blazegraph GPU and Blazegraph DASL are its disruptive new technologies using GPUs to enable extreme scaling that is thousands of times faster and 40 times more affordable than CPU-based solutions. Fortune 500 companies and government agencies – including DARPA, EMC, Wikimedia Foundation and Yahoo7 – rely on Blazegraph for graphs at scale because, in graphs, size matters. For more information, visit

Share article on social media or email:

View article via:

Pdf Print

Contact Author

Laurie Gibson
Kickstart Consulting for Blazegraph
+1 818-704-8481
Email >
Visit website