Illinois Bio-Grid Grid Computing The Illinois Bio-Grid Alexander B. Schilling, Ph.D. University of Chicago Proteomics Core Lab

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

Protein Quantitation II: Multiple Reaction Monitoring
Improvements in Mass Spectrometry for Life Science Research – Does Agilent Have the Answer? Ashley Sage PhD.
LESSON 1: What is Genetic Research? PowerPoint slides to accompany Using Bioinformatics : Genetic Research.
MitoInteractome : Mitochondrial Protein Interactome Database Rohit Reja Korean Bioinformation Center, Daejeon, Korea.
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
N-Glycopeptide Identification from CID Tandem Mass Spectra using Glycan Databases and False Discovery Rate Estimation Kevin B. Chandler, Petr Pompach,
Bioinformatics “Other techniques raise more questions than they answer. Bioinformatics is what answers the questions those techniques generate.” SheAvery
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
Systems Biology Existing and future genome sequencing projects and the follow-on structural and functional analysis of complete genomes will produce an.
Sequence Similarity Searching Class 4 March 2010.
Protein Sequencing and Identification by Mass Spectrometry.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Computational Molecular Biology (Spring’03) Chitta Baral Professor of Computer Science & Engg.
PEAKS: De Novo Sequencing using MS/MS spectra Bin Ma, U. Western Ontario, Canada Kaizhong Zhang,U. Western Ontario, Canada Chengzhi Liang, Bioinformatics.
Bioinformatics: a Multidisciplinary Challenge Ron Y. Pinter Dept. of Computer Science Technion March 12, 2003.
Introduction to Genomics, Bioinformatics & Proteomics Brian Rybarczyk, PhD PMABS Department of Biology University of North Carolina Chapel Hill.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
The restriction mapping problem revisited Gopal Pandurangan and H. Ramesh Journal of Computer and System Sciences 526~544(2002)
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Mass Spectrometry. What are mass spectrometers? They are analytical tools used to measure the molecular weight of a sample. Accuracy – 0.01 % of the total.
FIGURE 5. Plot of peptide charge state ratios. Quality Control Concept Figure 6 shows a concept for the implementation of quality control as system suitability.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Proteomics Josh Leung Biology 1220 April 13 th, 2010.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Knowledgebase Creation & Systems Biology: A new prospect in discovery informatics S.Shriram, Siri Technologies (Cytogenomics), Bangalore S.Shriram, Siri.
Bioinformatics Timothy Ketcham Union College Gradutate Seminar 2003 Bioinformatics.
Bioinformatics and it’s methods Prepared by: Petro Rogutskyi
2007 GeneSpring MS GeneSpring for Metabolite BioMarker Analysis using Mass Spectrometry data Agilent Q-TOF VIP Visit Jan 16-17, 2007 Santa Clara, CA Thon.
The dynamic nature of the proteome
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Center for Human Health and the Environment
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
A Comprehensive Comparison of the de novo Sequencing Accuracies of PEAKS, BioAnalyst and PLGS Bin Ma 1 ; Amanda Doherty-Kirby 1 ; Aaron Booy 2 ; Bob Olafson.
Laxman Yetukuri T : Modeling of Proteomics Data
INF380 - Proteomics-101 INF380 – Proteomics Chapter 10 – Spectral Comparison Spectral comparison means that an experimental spectrum is compared to theoretical.
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Temple University MASS SPECTROMETRY FURTHER INVESTIGATIONS Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Temple University MASS SPECTROMETRY INTRODUCTION Ilyana Mushaeva and Amber Moscato Department of Electrical and Computer Engineering Temple University.
Biological Signal Detection for Protein Function Prediction Investigators: Yang Dai Prime Grant Support: NSF Problem Statement and Motivation Technical.
Software Project MassAnalyst Roeland Luitwieler Marnix Kammer April 24, 2006.
PEAKS: De Novo Sequencing using Tandem Mass Spectrometry Bin Ma Dept. of Computer Science University of Western Ontario.
Computing the Smith-Waterman Algorithm on the Illinois Bio-Grid Dave S. Angulo 1, Nigel M. Parsad 2, Tom Goodale 3, Gabrielle Allen 3, Ed Seidel 3 1 The.
Central dogma: the story of life RNA DNA Protein.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Overview of Mass Spectrometry
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
BMC Bioinformatics 2005, 6(Suppl 4):S3 Protein Structure Prediction not a trivial matter Strict relation between protein function and structure Gap between.
Bioinformatics Dipl. Ing. (FH) Patrick Grossmann
Biotechnology and Bioinformatics: Bioinformatics Essential Idea: Bioinformatics is the use of computers to analyze sequence data in biological research.
High throughput biology data management and data intensive computing drivers George Michaels.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
VIEWS b.ppt-1 Managing Intelligent Decision Support Networks in Biosurveillance PHIN 2008, Session G1, August 27, 2008 Mohammad Hashemian, MS, Zaruhi.
10/30/2013BCHB Edwards Project/Review BCHB Lecture 17.
Bioinformatics Overview
Introduction to Bioinformatics Resources for DNA Barcoding
A Database of Peak Annotations of Empirically Derived Mass Spectra
Data-intensive Computing: Case Study Area 1: Bioinformatics
Bioinformatics Solutions Inc.
Proteomics Informatics David Fenyő
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Proteomics Informatics David Fenyő
Presentation transcript:

Illinois Bio-Grid Grid Computing The Illinois Bio-Grid Alexander B. Schilling, Ph.D. University of Chicago Proteomics Core Lab

Illinois Bio-Grid Outline Bio-Medical Informatics –Show how computability is growing exponentially Illinois Bio-Grid –Describe this Grid founded at DePaul IBG Workbench –Describe these grid enabled BioInformatics tools Mass Spec Toolkit in Cactus –Describe plans to implement tools for spectral interpretation in Cactus

Illinois Bio-Grid BioInformatics and Computability Growth of data in GenBank is exponential and doesn't show signs of slowing down yet. –Source GenBank/NCBI Compute time to process data growing equivalently –Twice Moore's law Biologists don't have access to supercomputers for everyday work Grid computing gives Biologists more computing power affordably

Illinois Bio-Grid A consortium of –Educational Institutions –National Labs –Private Industry –City & State entities –Museums

Illinois Bio-Grid Goals 1.Provide an infrastructure of computational (and other) resources to Biological and Medical researchers 2.Provide an infrastructure of computational (and other) resources to Computer Scientists working on BioMedical problems 3.Provide a tool suite of BioMedical software for BioMedical researchers to use on the IBG computational resources –Also for open source distribution worldwide 4.Provide an environment for CS researchers to work with BioMedical researchers 5.Try to solve some computationally intense BioMedical Informatics problems 6.Create a workbench of BioMedical software modules in open source distribution to facilitate more rapid BioMedical Informatics research by researchers worldwide

Illinois Bio-Grid Illinois Bio-Grid Infrastructure

Illinois Bio-Grid Bio-Grid Workbench Consists of many applications important to Biological and Medical Researchers All Grid enabled to provide enhanced computational power Genomics Proteomics Phylogenetics Computational Fluid Dynamics / Medical Imaging Cell membrane modeling Data Modeling LSG-RG in GGF Reference Implementation

Illinois Bio-Grid Genomics and Proteomics 1 Homology Searching –Searching for proteins with the same evolutionary "ancestor" –Smith-Waterman / Blast / FastA –Database against database searches (instead of single sequence against database searches) –Allow groups of input sequences to search for homologous sequences to all in the set Mass Spec Data Interpretation –Ionize peptides and fragment them inside mass spectrometer –Measure charge/mass ratio of peptide ions and fragments –Interpret resulting spectra

Illinois Bio-Grid Mass Spec Based Protein Identification –Conduct “In Silico” Digestion of protein database –Predict charge/mass ratio of all possible peptide ions resulting from database –Search actual ions in spectra against predicted ions –Return identifications of proteins based on scoring match Genomics and Proteomics 2

Illinois Bio-Grid Genomics and Proteomics 3 Predict 3D Protein folding given sequence of amino acids Solution to Schrödinger equation is intractable Search space of possible folds is immense Current methods of searching –ab-initio –AI –Lego –Monte Carlo –Lattice On Grids can run multiple searches –In parallel –In series On Grids can run at higher resolutions

Illinois Bio-Grid Phylogenetics Sequence various taxa (individuals or species) –Frequently sequence mitochondrial DNA –Mitochondrial DNA much like prokaryote DNA Compare sequences –Form hypothetical evolutionary tree –Each branch is a mutation –Shows mutations from hypothetical ancestor Search space is immense –Runs for 6 months on a single processor –Then crashes!

Illinois Bio-Grid Computational Fluid Dynamics / Medical Imaging Monitor and collect real time CAT scan data –Arterial blood flow Use Grid to interpret data –Use Computational Fluid Dynamics to model blood flow –Produce real time imaging –Locate aneurisms and other anomalies –Aid in diagnosis and decision making for surgical procedures –Non-invasive

Illinois Bio-Grid Cell membrane modeling Run simulations using both –Configurational Bias Monte Carlo Method (CBMC) –Molecular Dynamics (MD) Current simulations being done involve the properties of cholesterol in lipid membranes –Cholesterol is known to be an essential component of mammalian cell membranes –Its exact role is not well understood Previous simulations have been run –Up to 1600 lipid or cholesterol molecules –And 52,000 water molecules We're increasing these simulations by –An order of magnitude in the physical dimensions –And 2 to 3 orders of magnitude in time

Illinois Bio-Grid Data Modeling Data Modeling LSG-RG in GGF Reference Implementation –Automatic Data Synchronization –Flagging "dirty" data –Flagging data sources (including versioning)

Illinois Bio-Grid IBG Workbench Grid Fabric (Resources) Grid Services (Middleware) DB Access Homology Searching Phylogenetic Trees Mass Spec CFDProteomicsMembrane Modeling

Illinois Bio-Grid The Purpose of Mass Spectrometry in Proteomics Identify and sequence all proteins involved in an organism’s biology. Use this knowledge to identify proteins (or peptides) that can be used to study and understand different biological states. Correlate protein expression levels to biological function. Use protein or peptide biomarkers to identify disease states in patients. Use the structure of the relevant proteins as targets for developing new therapeutic techniques (drugs etc..).

Illinois Bio-Grid Mass Spectrometers in Proteomics Mass spectrometers measure the masses of proteins and peptides by moving their ions through the instrument in a controlled way. Proteins can be degraded using enzymes and the peptides produced can be analyzed by the mass spectrometer. A MS/MS instrument can cause the peptide ions to fragment into smaller pieces which can be used to deduce the peptide’s sequence. Once the sequence of the peptides has been determined, the protein’s complete sequence can be reassembled from the peptide sequences. The intensity of peaks can be used to determine the expression level of a protein in a sample. Samples from healthy and diseased tissue can be compared to locate biomarkers for disease.

Illinois Bio-Grid The MS/MS Experiment Produces Multidimensional Data Chromatograms (Time vs Intensity) Precursor Ion Spectra of Peptides (Mass vs Intensity) Product Ion Spectra of Peptides(*(Precursor Mass), Mass vs Intensity) MS TIC

Illinois Bio-Grid What the tandem mass spectrum of a peptide looks like. NH C H R 1 C O N H C H C H R 2 C O N H R 3 C O N H C H R 4 C O OH Y ion B ion 3 1 Y ion 2 2 B ion Y ion Y-ions from C to N terminus B-ions from N to C terminus

Illinois Bio-Grid Important Issues In Computation for Proteomics DeNovo Sequencing –Many computationally efficient algorithms exist –Many times algorithms produce incorrect results very quickly! –Issue of posttranslational modifications introduces complexity into interpretations –Much data must be discarded to accommodate workstation based computational capacity –A strong desire exists to use intensity data as well as mass data in interpretations Database Search (Protein ID) –Most packages are commercial, few open source (BLAST based only) –The more posttranslational modifications you allow for, the longer the searches take. Area is ripe for parallelism. –Serious problems with false positive identifications Many active in research to address this problem Could be reduced by more front end interpretation before search Could combine spectra from multiple MS types before search instead of correlating ID results after searches Datamining –What do you do with all the identifications? Systems Biology! Create models for signal pathways using protein id and expression data

Illinois Bio-Grid GridProt: A Cactus Based Proteomics Tool Kit Thorns: GridMass – handles basic data extraction, chromatographic peak integration, mass detection GridTAG - partial sequence mass tag extraction GridID - grid based database search using mass spec data GridDeNS - grid based denovo sequencing Visualization – OpenDX Data Storage – mzXML and HDF-5

Illinois Bio-Grid Conclusions Illinois Bio-Grid –Excellent resource for Biological and Medical researchers IBG Workbench –Excellent software architecture for compute intensive applications –Will be source of BioMedical Informatics software sharing for a plethora of different research areas –Will be source of workbench tools for researchers in other related Informatics software creation Cactus is an ideal platform for HPC of Mass Spec data –Modular thorns allow generalization for MS, specialization for Proteomics –Ideal base for open source, extendable software ready for HPC as Proteomics data sets grow.

Illinois Bio-Grid Acknowledgements University of Chicago Howard Hughes Medical Institute Ben May Cancer Center Pfizer Inc. Illinois Biogrid: Dave Angulo, DePaul University Gregor von Laszewski, ANL Kevin Drew, Tim Freeman