can we scale to NGS?
Westborough, MA (PRWEB) July 29, 2009
GenomeQuest, Inc. today announced its Sequence Data Management (SDM) product. GenomeQuest 6.0Beta introduces a new category of functionality, accessibility, and methods to genomics researchers and organizations seeking to fundamentally improve the performance of their discovery process and broadly prepare for next generation sequencing (NGS).
Specifically, with GenomeQuest 6.0Beta:
- Researchers, from a web-browser and personalized dashboard, can perform discoveries, manage and share sequence data and results, and access the world's largest collection of sequence reference databases.
- Bioinformatics managers, using the open platform with RDBMS interoperability, can customize discovery workflows and unify their sequence data environment.
- IT and business managers, with the GenomeQuest Engine, can efficiently scale to broad utilization of NGS across their discovery operations.
GenomeQuest CEO, Ron Ranauro, comments, "These are exciting and defining times for genomic research. Beside us are powerful enabling forces, including vast and expanding reference databases, ever-increasing compute power, and next-generation sequencing. In front of us are major contributions for the world, including in health care, personalized medicine, agriculture, the environment, and energy. Immense research and business opportunities abound for research organizations that create and implement a genomic sequence vision."
Ranauro continues, "One core question for every successful research vision is: 'how will we manage our sequence data?' GenomeQuest sees three major requirements: first, researchers need fast and easy access to their sequence information and discovery tools; second, bioinformatics managers want to upgrade their data management from a point-to-point to an open platform approach; and third, managers are demanding NGS scalability. In each case, GenomeQuest 6.0Beta uniquely meets or exceeds these requirements -- in fact, we feel it creates a new category of product that we call SDM."
GenomeQuest full functionality for researchers includes:
- A browser-based, data management dashboard
- Sequence assembly and annotation
- Data mining on sequence alignments and text annotations
- Data set joining (mining across multiple experiments)
- Result saving/re-mining (interactive discovery)
- Data/result sharing with colleagues through dashboards
- Data export to other discovery tools, including desktop visualization
- Single-point-of-access to reference databases
- Support for all popular sequencing machines, including 454, Illumina, and SOLiD
- Easy configuration, and
- Built-in workflows for common discovery challenges.
GenomeQuest workflows package all bioinformatics details -- including the reference data, the queries, the algorithms, the interfacing to third party tools, the compute environment, and the results -- into an easy-to-use upload/compute/discover loop for the researcher. Workflows are available for RNA-Seq, antibody optimization, variant detection, rapid annotation for metagenomics, BLAST search, and patent search. Aggregated reference databases include Transcriptomes, Genomes Transcriptomes, Genomes, Reference Genomes, Genbank, and GenomeQuest Pat which span genes, genomes, proteins, drugs, and patents.
Examples of workflow uses cases include:
- Starting with billions of Illumina reads of a single human, a researcher can map them to the reference human genome, immediately mine for the entire set of genes that have high-quality novel variations, and visualize a select few to determine how the variation changes the protein product of the gene.
- From millions of Illumina reads from a drug resistant strain of a bacteria, a researcher can map them to the reference bacterial genome and catalog the differences.
- With 80 million SOLiD reads from an RNA-seq time series from Arabidopsis thaliana, a researcher can view the digital gene expression profiles over the time series.
Ranauro explains, "Perhaps the biggest question we get from researchers is: 'what is the best discovery strategy for using these tools?' We believe that the most effective strategy is a 'top-down' methodology, where researchers use GenomeQuest for global mining to identify local areas of interest for further investigation by desktop analysis and visualization. This is proving far more effective than 'piecemeal' methods where local areas are arbitrarily chosen for investigation, where the effort too often comes up dry. The idea is to make sure that you're investigating only those haystacks that have needles."
More for Bioinformatics Managers
Bioinformatics managers are often tasked with sharing sequence data between an increasing number of discovery tools in their environment, including mining, visualization, statistics, LIMS, assemblers, and aligners. Increasingly, they are finding it intractable to develop and maintain point-to-point interfaces between these tools and, therefore, seek a platform to broadly share their sequence data.
GenomeQuest provides an open platform for bioinformatics managers to:
- Share and manage sequence data throughout their environment
- Configure and extend GenomeQuest
- Share compute resources across all discoveries
- Establish common management of data and workflows across all tools, and
- Generally unify their sequence data environment.
Mark Boguski, MD, PhD, associate professor at Harvard Medical School, Department of Pathology and the Center for Biomedical Informatics, comments, "I agree with Ron that the life sciences are going through a generational transformation, powered by genomics. And I have no doubt that the researchers and organizations who master the management and mining of sequence data will shape this future and stand as the leaders of tomorrow. I've had the privilege of being at or near the center of genomics research for almost 20 years, and can say with full confidence that those aiming for leadership positions will certainly require the type of SDM capabilities that GenomeQuest 6.0 uniquely provides."
The open platform is based on a comprehensive application programmer interface (API) which provides access to all GenomeQuest data, commands, and compute resources. The data types include administrative, sequence databases, and sequence compares. The commands include administration, database upload/download, sequence search/compare, database joins, database read/writes, and workflows. The APIs are available via URL APIs, web services, and command line APIs.
The open platform requires and interoperates with popular RDBMS's, including Oracle and MySQL. Using a federated, best-of-breed approach, GenomeQuest 6.0 utilizes the RDBMS to manage all relational data -- including admin, metadata, and user annotations -- while itself efficiently and comprehensively manages all sequence data and reference annotations.
More for IT Managers
NGS represents an increase in sequence data volume by 10,000X -- giving rise to the concern of IT/business managers of "can we scale to NGS?". Furthermore, the challenge is more than just volume - it's complicated by the nature of the data. Sequence data is largely unstructured - efficiently mining it does not lend itself well to traditional database management solutions and scaling is not a simple matter of buying more machines.
The GQ-Engine of GenomeQuest is purpose-built for large-scale sequence data management. As organizations grow into NGS, its distributed architecture maintains performance and accuracy across all critical dimensions: users, data, and tools. Also, because the compute environment, the data storage, and the management of data and workflows are all central, shared resources, the entire discovery environment of the research organization can also scale comfortably to NGS.
Genomics is surely not the first industry to face a challenge of massive scalability of unstructured data. The most notable example is the Google File System which is purpose-built for mining the entire Internet.
GenomeQuest 6.0Beta is commercially available for use by researchers and organization online - for fast startup and low cost of ownership, or fully deployed inside the customer firewall - for maximum control and integration.
Researchers can immediately experience GenomeQuest 6.0Beta online at http://www.genomequest.com. There is no obligation, no credit card required, and no software to install. The site includes a self-guided tour, online help/chat, and sample sequence databases. With a few clicks of their browser, researchers can be experiencing SDM, comparing its discovery performance versus using traditional tools and methods, and even uploading and mining their own projects.
Learn more about GenomeQuest, GenomeQuest 6.0Beta, and SDM at http://www.genomequest.com.
GenomeQuest, the leader in sequence data management (SDM) helps genomic researchers and their organizations make great discoveries far faster. Over 160 leading life science companies use GenomeQuest for mission-critical work, including 18 of the top 20 pharmaceuticals.
Using GenomeQuest, organizations improve the performance of their discovery process and broadly prepare for next generation sequencing (NGS). Researchers perform discoveries, manage and share sequence data, and access the world's largest collection of reference databases from a web-browser and personalized dashboard. Bioinformatics managers customize discovery workflows and unify their sequence data environment using the open platform. IT and business managers efficiently scale to broad utilization of next generation sequencing in their discovery operations using the GQ-Engine.