The human paired-end data being released is of such depth that discovering smaller structural events at higher resolution becomes possible. The availability of this dataset in the public domain will accelerate our understanding of structural variation in normal and disease states, and open the door to a faster exploration of this type of genetic diversity across human populations.
FOSTER CITY, Calif. (PRWEB) March 12, 2008
Applied Biosystems was able to analyze the human genome sequence for a cost of less than $60,000, which is the commercial price for all required reagents needed to complete the project. This is a fraction of the cost of any previously released human genome data, including the approximately $300 million(1) spent on the Human Genome Project. The cost of the Applied Biosystems sequencing project is less than the $100,000 milestone set forth by the industry for the new generation of DNA sequencing technologies, which are beginning to gain wider adoption by the scientific community.
The availability of this sequence data in the public domain is expected to help scientists gain a greater understanding of human genetic variation and potentially help them to explain differences in individual susceptibility and response to treatment for disease, which is the goal of personalized medicine. Although most human genetic information is the same in all people, researchers are generally more interested in studying the small percentage of genetic material that varies among individuals. They seek to characterize that variation as either single-base changes, or as a series of larger stretches of sequence variation known as structural variants. Structural variants comprise fragments of DNA - which include insertions, deletions, inversions, and translocations of DNA sequences ranging from a few to millions of base pairs that have a higher potential of impacting genes and thus contributing to human disease.
Under the direction of Kevin McKernan, Applied Biosystems' senior director of scientific operations, the scientists resequenced a human DNA sample that was included in the International HapMap Project. The team used the company's SOLiD(TM) System to generate 36 gigabases of sequence data in 7 runs of the system, achieving throughput up to 9 gigabases per run, which is the highest throughput reported by any of the providers of DNA sequencing technology.
The 36 gigabases includes DNA sequence data generated from covering the contents of the human genome more than 12 times, which helped the scientists to determine the precise order of DNA bases and to confidently identify the millions of single-base variations (SNPs) present in a human genome. The team also analyzed the areas of the human genome that contain the structural variation between individuals. These regions of structural variation were revealed by greater than 100-fold physical coverage, which shows positions of larger segments of the genome that may vary relative to the human reference genome.
"We believe this project validates the promise of next-generation sequencing technologies, which is to lower the cost and increase the speed and accuracy of analyzing human genomic information," said McKernan. "With each technological milestone, we are moving closer to realizing the promise of personalized medicine."
McKernan's team used the SOLiD System's ultra-high-throughput capabilities to obtain deep sequence coverage of the genome of an anonymous African male of the Yoruba people of Ibadan, Nigeria, who participated in the International HapMap Project. The scientists were able to perform an in-depth analysis of structural variants by creating multiple paired-end libraries of genomic sequence that included a wide range of insert sizes. Most inserts exceeded 1,000 bases. The SOLiD System has the ability to analyze paired-end libraries with large insert sizes. For the millions of SNPs identified in the project, the SOLiD System's 2-base encoding chemistry discriminated random or systematic errors from true SNPs to reveal these SNPs with greater than 99.94 percent sequencing accuracy.
Another important attribute of the SOLiD System is that, unlike other available DNA sequencing platforms, the system is inherently scalable to support higher levels of throughput without requiring changes to the system's hardware. The high-throughput, accuracy and paired-end analysis capability of the SOLiD System are expected to continue to reduce the cost of conducting studies of complex genomes and how variation in these genomes contributes to conditions such as cancer, diabetes and heart disease, among others.
Associating Genetic Variation with Cancer and Other Diseases
As in-depth resequencing efforts continue to reveal previously uncharacterized genetic variation in human genomes, researchers such as John McPherson, Ph.D., at the Ontario Institute for Cancer Research expect to be able to associate these genetic variants with diseases such as cancer. McPherson is cataloging genetic alterations that occur in different types of cancers to better classify tumors and identify the important early events driving the disease. These provide critical targets for refining and developing new targeted treatments and diagnostic tools.
"Paired-end sequencing is an essential component of whole genome analysis," said Dr. McPherson. "The tight fragment size range provided by the SOLiD protocols allows the identification of a wide range of insertion and deletion sizes. Structural rearrangements are readily identified and deep genome coverage easily attained due to the high throughput of this platform."
Evan Eichler, Ph.D., an associate professor of genome sciences at the University of Washington's School of Medicine and a Howard Hughes Medical Institute Investigator, focuses his research on the role of duplicate regions and structural variation in the human genome. Using computational and experimental approaches, he investigates the architecture of these regions and their role in evolution and disease.
"To understand the extent and prevalence of structural variation in the human genome, which is still largely unknown, my lab has been applying traditional sequencing methods with good results, but much more needs to be discovered at a faster pace," said Dr. Eichler. "The human paired-end data being released is of such depth that discovering smaller structural events at higher resolution becomes possible. The availability of this dataset in the public domain will accelerate our understanding of structural variation in normal and disease states, and open the door to a faster exploration of this type of genetic diversity across human populations."
Developing Software Analysis Tools for Next-Generation Sequencing
Next-generation sequencing platforms have enabled researchers to generate more genetic data than ever before. Applied Biosystems' human resequencing effort represents one of the most comprehensive datasets of genomic data, which is expected to provide researchers with libraries of sequence data that will serve as a model for how to prepare and analyze samples of other complex genomes for future genome analysis projects.
Applied Biosystems expects that the public availability of the human sequence data will help drive innovation and speed the development of new bioinformatics tools. These new tools are expected to enable researchers to interpret the meaning of the data that provide clues to better understand various aspects of health and disease. In addition to the full human dataset, subsets of sequence data are available at NCBI. These datasets can be accessed by independent academic and commercial software developers to further enable the development of analytical tools. Applied Biosystems is making an analysis tool available through the SOLiD System Software Development Community, which is expected to help independent software providers to interpret the subsets of data.
Through its Software Development Community, Applied Biosystems has established relationships with scientists and bioinformatics companies to help scientists address next-generation sequencing bioinformatics challenges and develop tools that are expected to advance data analysis and management. To access the human sequence data released by Applied Biosystems, please visit the SOLiD Software Development Community at: http://info.appliedbiosystems.com/solidsoftwarecommunity. The data have also been deposited at the National Center for Biotechnology Information (NCBI, http://www.ncbi.nlm.nih.gov), which is part of the National Library of Medicine, National Institutes of Health (Bethesda MD USA). At NCBI, the human sequence data can be located at ftp://ftp.ncbi.nih.gov/pub/TraceDB/ShortRead/SRA000272 or by the project name, SOLiD Human HapMap Sample NA18507 Whole Genome Sequence under accession number SRA000272.
Applied Biosystems is a global leader in the development and commercialization of instrument-based systems, consumables, software, and services for the life-science market and is the recognized market leader in the commercialization of DNA sequencing platforms. Perhaps best known for its role in developing the technology that enabled the historic sequencing of the human genome, Applied Biosystems continues its leadership in DNA sequencing by commercializing technology that helps scientists to better understand and treat disease based on genomic information. The company's latest platform for genetic analysis, the SOLiD System, is the life-science industry's highest throughput system for DNA sequencing.
About the SOLiD System
The SOLiD System is an end-to-end next-generation genetic analysis solution comprised of the sequencing unit, chemistry, a computing cluster and data storage. The platform is based on sequencing by oligonucleotide ligation and detection. Unlike polymerase sequencing approaches, the SOLiD System utilizes a proprietary technology called stepwise ligation, which generates high-quality data for applications including: whole genome sequencing, chromatin immunoprecipitation (ChIP), microbial sequencing, digital karyotyping, medical sequencing, genotyping, gene expression, and small RNA discovery, among others.
Unparalleled throughput and scalability distinguish the SOLiD System from other next-generation sequencing platforms. The system can be scaled to support a higher density of sequence per slide through bead enrichment. Beads are an integral part of the SOLiD System's open-slide format architecture, enabling the system to generate up to 9 gigabases of sequence data per run. The combination of the open-slide format, bead enrichment, and software algorithms provide the infrastructure for allowing it to scale to even higher throughput, without significant changes to the platform's current hardware or software.
About the Applied Biosystems Human Genome Dataset
These facts were developed based on 1 gigabase (GB) of data equaling 1 billion (1,000,000,000) bases of DNA sequence.
-- If all 36 billion bases were spread out at 1 millimeter apart, they would extend 36,000 kilometers, or more than 4,000 times the height of Mt. Everest, which at 8,848 meters above sea level, is the highest mountain on Earth.
-- If all 36 billion bases were spread along the Great Wall of China at 1 millimeter apart, this would equate to spanning the 5,000 kilometer wall more than 7 times.
-- If a person were to proofread the 36 billion bases in this dataset at one letter per second for 24 hours-per-day, it would take 1,200 years to read the entire data set.
-- If each base represented one individual in the world population, the dataset would account for more than 5 times the entire world population of 6.8 billion people.
-- This dataset, at 36 billion bases of DNA sequence, is equivalent to 360 times all of the 100 million visible stars in the Earth's galaxy.
About Applera Corporation and Applied Biosystems
Applera Corporation consists of two operating groups. Applied Biosystems serves the life science industry and research community by developing and marketing instrument-based systems, consumables, software, and services. Customers use these tools to analyze nucleic acids (DNA and RNA), small molecules, and proteins to make scientific discoveries and develop new pharmaceuticals. Applied Biosystems' products also serve the needs of some markets outside of life science research, which we refer to as "applied markets," such as the fields of: human identity testing (forensic and paternity testing); biosecurity, which refers to products needed in response to the threat of biological terrorism and other malicious, accidental, and natural biological dangers; and quality and safety testing, such as testing required for food and pharmaceutical manufacturing. Applied Biosystems is headquartered in Foster City, CA, and reported sales of approximately $2.1 billion during fiscal 2007. The Celera Group is a diagnostics business delivering personalized disease management through a combination of products and services incorporating proprietary discoveries. Berkeley HeartLab, a subsidiary of Celera, offers services to predict cardiovascular disease risk and optimize patient management. Celera also commercializes a wide range of molecular diagnostic products through its strategic alliance with Abbott and has licensed other relevant diagnostic technologies developed to provide personalized disease management in cancer and liver diseases. Information about Applera Corporation, including reports and other information filed by the company with the Securities and Exchange Commission, is available at http://www.applera.com, or by telephoning 800.762.6923. Information about Applied Biosystems is available at http://www.appliedbiosystems.com. All information in this press release is as of the date of the release, and Applera does not undertake any duty to update this information unless required by law.
As a national resource for molecular biology information, NCBI's mission is to develop new information technologies to aid in the understanding of fundamental molecular and genetic processes that control health and disease. More specifically, the NCBI has been charged with creating automated systems for storing and analyzing knowledge about molecular biology, biochemistry, and genetics; facilitating the use of such databases and software by the research and medical community; coordinating efforts to gather biotechnology information both nationally and internationally; and performing research into advanced methods of computer-based information processing for analyzing the structure and function of biologically important molecules.
Applied Biosystems Forward Looking Statements
Certain statements in this press release are forward-looking. These may be identified by the use of forward-looking words or phrases such as "should," "planned," and "expect," among others. These forward-looking statements are based on Applera Corporation's current expectations. The Private Securities Litigation Reform Act of 1995 provides a "safe harbor" for such forward-looking statements. In order to comply with the terms of the safe harbor, Applera Corporation notes that a variety of factors could cause actual results and experience to differ materially from the anticipated results or other expectations expressed in such forward-looking statements. These factors include but are not limited to: (1) rapidly changing technology and dependence on customer acceptance of the SOLiD System; (2) the risk of unanticipated difficulties associated with the further development of the SOLiD(TM) System; and (3) other factors that might be described from time to time in Applera Corporation's filings with the Securities and Exchange Commission. All information in this press release is as of the date of the release, and Applera does not undertake any duty to update this information, including any forward-looking statements, unless required by law.
For Research Use Only. Not for use in diagnostic procedures.
(C)Copyright 2008. Applied Biosystems. All rights reserved. Applera, Applied Biosystems, and AB (Design) are registered trademarks and SOLiD is a trademark of Applera Corporation or its subsidiaries in the U.S. and/or certain other countries.
(1) Source: National Institute of Health press release from June 26, 2000.