Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.

Slides:



Advertisements
Similar presentations
Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Advertisements

The Proteomics Core at Wayne State University
MN-B-C 2 Analysis of High Dimensional (-omics) Data Kay Hofmann – Protein Evolution Group Week 5: Proteomics.
How to identify peptides October 2013 Gustavo de Souza IMM, OUS.
Knowledge Enabled Information and Services Science What can SW do for HCLS today? Panel at HCSL Workshop, WWW2007 Amit Sheth Kno.e.sis Center Wright State.
Protein Sequencing and Identification by Mass Spectrometry.
Computational Methods and Bioinformatics in Proteomic Studies Bioinformatics: Building Bridges April 14, 2005 Tim Griffin Dept. Biochemistry, Molecular.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Introduction to BioInformatics GCB/CIS535
PROTEOMICS LECTURE. Genomics DNA (Gene) Functional Genomics TranscriptomicsRNA Proteomics PROTEIN Metabolomics METABOLITE Transcription Translation Enzymatic.
Modeling Functional Genomics Datasets CVM Lesson 1 13 June 2007Bindu Nanduri.
ProReP - Protein Results Parser v3.0©
Proteomics Informatics – Protein identification II: search engines and protein sequence databases (Week 5)
Announcements: Proposal resubmissions are due 4/23. It is recommended that students set up a meeting to discuss modifications for the final step of the.
Previous Lecture: Regression and Correlation
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
My contact details and information about submitting samples for MS
Proteomics Josh Leung Biology 1220 April 13 th, 2010.
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Introduction to high-throughput analysis of proteins and metabolites by Mass Spectrometry The basic principle Brief introduction of techniques Computational.
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Evaluated Reference MS/MS Spectra Libraries Current and Future NIST Programs.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Mueller LN, Brusniak MY, Mani DR, Aebersold R
A highly abbreviated introduction to proteomics
DOE Genomics: GTL Program IT Infrastructure Needs for Systems Biology David G. Thomassen Office of Biological and Environmental Research DOE Office of.
The dynamic nature of the proteome
GTL Facilities Computing Infrastructure for 21 st Century Systems Biology Ed Uberbacher ORNL & Mike Colvin LLNL.
Center for Human Health and the Environment
GSAT501 - proteomics Name, home-town Students – previous lab experience –Lab you hope to end up in? Teachers – what is your current project.
ISMB 2005 Detroit, June 27 th 2005 Proteome 1 Michal Linial Institute of Life Sciences The Hebrew University Jerusalem, Israel Computer Science and Engineering.
Finish up array applications Move on to proteomics Protein microarrays.
Common parameters At the beginning one need to set up the parameters.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Laxman Yetukuri T : Modeling of Proteomics Data
TECHNIQUES INVOVED IN PROTEOMICS,GENOMICS,TRANSCRIPTOMICS…….
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Knowledge Enabled Information and Services Science Glycomics project overview.
High throughput Protein Measurement Techniques Harin Kanani.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Genome of the week - Enterococcus faecalis E. faecalis - urinary tract infections, bacteremia, endocarditis. Organism sequenced is vancomycin resistant.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
FuGE: A framework for developing standards for functional genomics Andrew Jones School of Computer Science, University of Manchester Metabomeeting 2.0.
Mass Analyst Analysis of MS(MS) data. Short function overview: Load mzXML data (ms-ms data) Load pepXML and/or mascot data (found proteins/peptides after.
Central dogma: the story of life RNA DNA Protein.
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Bioinformatics Research Overview Outline Biomedical Ontologies oGlycO oEnzyO oProPreO Scientific Workflow for analysis of Proteomics Data Framework for.
Overview of Mass Spectrometry
A New Strategy of Protein Identification in Proteomics Xinmin Yin CS Dept. Ball State Univ.
EBI is an Outstation of the European Molecular Biology Laboratory. In silico analysis of accurate proteomics, complemented by selective isolation of peptides.
Data Management Support for Life Sciences or What can we do for the Life Sciences? Mourad Ouzzani
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Novel Peptide Identification using ESTs and Genomic Sequence Nathan Edwards Center for Bioinformatics and Computational Biology University of Maryland,
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Introduction to high-throughput analysis of proteins and metabolites by Mass Spectrometry The basic principle Brief introduction of techniques Computational.
High throughput biology data management and data intensive computing drivers George Michaels.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
Constructing high resolution consensus spectra for a peptide library
Proteomics: Technology and Cell Signaling Presenter: Ido Tal Advisor: Prof. Michal Linial י " ג סיון תשע " ה.
Novel Proteomics Techniques
Bottom-Up Proteomics Data collection
Thomas BOTZANOWSKI & Blandine CHAZARIN
Proteomics Informatics David Fenyő
A perspective on proteomics in cell biology
Bioinformatics for Proteomics
Shotgun Proteomics in Neuroscience
Protein identification using MS/MS.
Proteomics Informatics David Fenyő
Presentation transcript:

Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics

What is proteomics? “Proteomics includes not only the identification and quantification of proteins, but also the determination of their localization, modifications, interactions, activities, and, ultimately, their function.” -Stan Fields in Science, 2001.

Genomics vs. Proteomics Similarities:  Large datasets, tools needed for annotation and interpretation of results Differences:  Genomics – generally mature technologies, data processing methods, questions asked usually involve quantitative changes in RNA transcripts (microarrays)  Proteomics – still evolving, complexity of protein biochemical properties: expression changes, modifications, interactions, activities – many questions to ask and data to interpret, methods changing, different approaches (mass spec, arrays etc.),

Genomics, Proteomics, and Systems Biology mature prototype emerging genomic DNA mRNA sequencing arrays genomics protein cataloguing protein products functional protein quantitative profiling protein phosphorylation Protein dynamics Protein Modifications sub cellular location catalytic activity descriptive protein interaction maps 3D structure proteomics measure and define properties system identify system components interactions between components computational biology

µLC separation ( um) Tandem mass spectrum (thousands in a matter of hours) “Shotgun” identification of proteins in mixtures by LC-MS/MS Liquid chromatography coupled to tandem mass spectrometry (MS/MS) Ionization: MALDI or Electrospray IsolationFragmentation Mass Analysis peptide fragments peptides m/z

Peptide sequence determination from MS/MS spectra H 2 N -N--S--G--D--I--V--N--L--G--S--I--A--G--R- COOH b2b2 b3b3 b4b4 b5b5 b6b6 b7b7 b8b8 b9b9 b 10 b 11 b 12 b 13 b 14 b1b1 y 13 y 12 y 11 y 10 y9y9 y8y8 y7y7 y6y6 y5y5 y4y4 y3y3 y2y2 y1y1 y 14 Collision-induced dissociation (CID) creates two prominent ion series: y-series: b-series:

H 2 N -NSGDIVNLGSIAGR- COOH m/z Relative Abundance LGSIAGR GSIAGR SIAGR IAGR AGR GR R NLGSIAGR VNLGSIAGR IVNLGSIAGR DIVNLGSIAGR GDIVNLGSIAGR Peptide sequence identifies the protein YMR134W, yeast protein involved in iron metabolism

High-throughput protein identification by LC-MS/MS and automated sequence database searching Protein sequence and/or DNA sequence database search Raw MS/MS spectrum Peptide sequence match Direct identification of proteins from complex mixtures Protein identification

Dealing with the data 1. Data acquisition 2. Peak analysis 3. Knowledge annotation and interpretation Experimental information, metadata capture Sequence database searching Quantitative analysis Database mining Assignment of function, pathway, localization etc. Output for database archiving, publication Integrated workflow?

1. Data acquisition: capturing experimental information Proteomics Experimental Data Repository (PEDRo) Proposed schema Similar to genomic needs, but experimental info a bit different

2. Peak Analysis  ProFound  Mascot  PepSea  MS-Fit  MOWSE  Peptident  Multident  Sequest  PepFrag  MS-Tag Protein identification Computational algorithms for searching MS/MS spectra against protein sequence databases, mRNA sequences, DNA sequences need cpu horsepower (parallel computing)

2. Peak Analysis: data formats Format 1Format 3Format 2 Output 1 Output 2 Output 3 Lack of flexibility Slow to evolve Lack of incorporation of competing products, methods ??

2. Peak Analysis: need general, flexible, in-house solutions Format 1Format 3Format 2 General tools for analysis of multiple data formats reverse engineering of data formats

2. Peak Analysis; reverse engineering data formats

2. Peak analysis: quality control of protein matches Unfiltered – matches (lots of noise and junk) Filtered – thousands of “true” matches filtering Statistical analysis of database results (tools are available)

2. Peak Analysis: Quantitative analysis Flexibility is key – need tools to handle different quantitative methods External chemical labeling Metabolic labeling (SILAC) Enzymatic incorporation (O 16 /O 18 )

2. Peak Analysis: Quantitative analysis Sample 1 Sample 2 Relative intensity = relative protein abundance

Evolving methodologies: iTRAQ iTRAQ label: Multidimensional separation m/z Intensity Digest to peptides Diagnostic ions used for quantitative analysis Peptide fragments used for sequence identification MS/MS spectrum Sample: way multiplexing: simultaneous comparison of multiple states, replicates

Need for “changeable” tools Intensity “old” “new” Automated analysis tools?

3. Knowledge annotation: making sense of lists of data

3. Knowledge annotation: mining proteomic/genomic databases

3. Knowledge annotation: needs Annotation: accession numbers and protein names Functional assignments (functional degeneracy?) Pathway assignments Subcellular localization Disease implications Comparison of different proteomic datasets (i.e. expression profiles compared to modification state profiles, other protein properties) Automated and streamlined?? Publication and deposit in databases Visualization of complex phenomena, interpretation of biological relevance Modeling, integration with genomics data – computational and systems biology