Bioproximity's Proteome Cluster Selected by Amazon Web Services for High Performance Computing Case Study

Share Article

Bioproximity’s Proteome Cluster platform utilizes a cloud computing architecture to enable rapid and cost-effective analysis of large-scale scientific data sets.

Bioproximity is pleased to announce their Proteome Cluster platform has been selected by Amazon Web Services (AWS) for a case study in the use of high performance computing. AWS permits Bioproximity to deploy dedicated Proteome Cluster servers for each client. Unlike shared computing grids, each Proteome Cluster server is capable of simultaneously launching multiple, dedicated, high-performance Message Passing Interface (MPI) compute clusters on-demand. Each cluster is dedicated to the analysis of a single search, after which it shuts down. This provides clients with the resources needed to rapidly and efficiently analyze large-scale proteomic data sets. These data sets typically comprise millions of sequencing events, each of which must be queried against all possible peptide forms with similar mass. The ever-increasing sequencing rate and sensitivity of mass spectrometers indicates that the requisite computational needs will grow hand-in-hand.

Each MPI cluster consists of one master node and a user-specified number of slave nodes using “Cluster Compute” instance types. Each node possesses two quad-core Intel Xeon CPUs. Independent clusters are launched by the user after specifying search parameters and the tandem mass spectrometry data files to be searched. Protein sequence libraries stored on Amazon S3 and user-specific input files stored on Amazon EBS are sent to the cluster. Following search completion the results are sent to the user’s dedicated Proteome Cluster server and the cluster is shut-down. Efficient, on-demand usage of computing resources minimizes costs and allows even very large data sets to be analyzed cost efficiently. Because multiple clusters can be launched simultaneously it is possible to search multiple data sets using multiple search algorithms in under an hour in many cases. Proteome Cluster currently supports four tandem mass spectrometry search algorithms, with support for additional algorithms planned in the near future.

Proteome Cluster also helps to enforce good scientific practices. Custom search parameters are saved by the user and may be downloaded and shared with other members of the scientific community. As search parameters are modified they are versioned so that a search is always linked to the parameter file used. Tandem mass spectrometry (peak list) files are also maintained and associated with all of the searches for which they were used as inputs. These may also be downloaded and shared. The protein sequence libraries are available to the public on Amazon S3. This permits others to faithfully reproduce results obtained using Proteome Cluster.

Proteome Cluster eliminates the need for mass spectrometrists to buy or maintain computing hardware dedicated to tandem mass spectrometry searches or to install, configure or upgrade software. Because Proteome Cluster is a web-based application, improvements and upgrades are deployed continuously, behind the scenes, without any need for user involvement.

Simple, annual pricing plans are available. Learn more about Proteome Cluster here. Or visit us at Experimental Biology 2011 in Washington, DC for a demonstration.

Read the case study here.

About proteomics

Proteomics is the study of proteins. Shotgun proteomics is a method for identifying and quantifying proteins and has been optimized by the scientific research community to permit near-global profiling of almost any sample. Using modern methods and instrumentation, many thousands of proteins may be routinely identified at levels approaching picogram quantities. Shotgun proteomics is currently used in a wide variety of applications, from discovery of biomarkers for disease diagnosis and prognosis to quality control of bioprocessing pipelines in the biotechnology, energy, food and pharmaceutical industries. At its essence, shotgun proteomics consists of using an enzyme to digest proteins into peptides which are then sequenced by fragmentation in a tandem mass spectrometer followed by matching of the fragmentation pattern against protein sequence libraries or libraries of experimental spectra. Shotgun proteomics applications typically require a sequenced genome for the organism of interest. As the genomes of more and more organisms are sequenced the applications for shotgun proteomics are similarly increasing.

About Bioproximity

Bioproximity is a contract research organization founded in 2008 specializing in protein analysis and informatics. Bioproximity offers its clients a broad range of shotgun proteomics services customizable to suit an ever-expanding array of applications. Bioproximity’s clients include academic researchers and companies in the biotechnology, energy, healthcare and pharmaceutical industries.

# # #

Share article on social media or email:

View article via:

Pdf Print

Contact Author

Brian Balgley
Visit website