SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON Given a heterogeneous mixture of proteins, how can we determine.

Slides:



Advertisements
Similar presentations
Unravelling the biochemical reaction kinetics from time-series data Santiago Schnell Indiana University School of Informatics and Biocomplexity Institute.
Advertisements

Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
Peptide Identification by Tandem Mass Spectrometry Behshad Behzadi April 2005.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
Previous Lecture: Regression and Correlation
My contact details and information about submitting samples for MS
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Modelling, comparison, and analysis of proteomes Ram Samudrala University of Washington.
PROTEIN STRUCTURE NAME: ANUSHA. INTRODUCTION Frederick Sanger was awarded his first Nobel Prize for determining the amino acid sequence of insulin, the.
Consensus RAPDF rTAD Refinement Successes & Failures Jeremy Horst Ram Samudrala’s CompBio Group University of Washington.
Modelling proteomes An integrated computational framework for systems biology research Ram Samudrala University of Washington How does the genome of an.
es/by-sa/2.0/. Large Scale Approaches to the Study of Protein Levels and Activity Prof:Rui Alves
COMPUTATIONAL VACCINE DESIGN RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How can we design vaccines based on conformational epitopes and.
INF380 - Proteomics-91 INF380 – Proteomics Chapter 9 – Identification and characterization by MS/MS The MS/MS identification problem can be formulated.
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study George Chikenji*, Yoshimi Fujitsuka, and Shoji Takada*
Laxman Yetukuri T : Modeling of Proteomics Data
Automating Steps in Protein Structure Determination by NMR CS April 13, 2009.
Novel Algorithms for the Quantification Confidence in Quantitative Proteomics with Stable Isotope Labeling* Novel Algorithms for the Quantification Confidence.
PerkinElmer Life Sciences Production Company Meeting - 1st February 2002 Progenesis John Hoyland Product Manager - Bioinformatics.
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Samudrala group - overall research areas CASP6 prediction for T Å C α RMSD for all 70 residues CASP6 prediction for T Å C α RMSD for all.
An Integrated Computational Framework for Systems Biology Ram Samudrala University of Washington How does the genome of an organism specify its behaviour.
SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON Given a heterogeneous mixture of proteins, how can we determine.
INTERACTOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON NIH DIRECTOR’S PIONEER AWARD 2010 How does the genome of an organism specify its.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
NOVEL PARADIGMS FOR DRUG DISCOVERY SHOTGUN COMPUTATIONAL MULTITARGET SCREENING RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON NIH DIRECTOR’S.
Computational engineering of bionanostructures Ram Samudrala University of Washington How can we analyse, design, & engineer peptides capable of specific.
THERAPUETIC DISCOVERY BY MODELLING INTERACTOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its.
Structural proteomics Handouts. Proteomics section from book already assigned.
COMPUTATIONAL ENGINEERING OF BIONANOSTRUCTURES
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
MODELLING INTERACTOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Modelling proteomes Ram Samudrala Department of Microbiology How does the genome of an organism specify its behaviour and characteristics?
COMPUTATIONAL BIOLOGY IN DRUG DISCOVERY RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How can we computationally screen compounds against.
NOVEL PARADIGMS FOR DRUG DISCOVERY
MODELLING PROTEOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
MODELLING INTERACTOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Discovery of Therapeutics to Improve Quality of Life Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
Modelling proteomes: Application to understanding HIV disease progression Ram Samudrala Department of Microbiology University of Washington How does the.
COMPUTATIONAL ENGINEERING OF BIONANOSTRUCTURES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How can we design peptides and proteins capable.
Modelling genome structure and function Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
MODELLING PROTEOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
How does the genome of an organism
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
University of Washington
MODELLING INTERACTOMES
Modelling the rice proteome
MODELLING INTERACTOMES
University of Washington
Proteomics Informatics David Fenyő
How does the genome of an organism
A perspective on proteomics in cell biology
Proteomics Informatics –
University of Washington
NOVEL PARADIGMS FOR DRUG DISCOVERY
Proteomics Informatics David Fenyő
Presentation transcript:

SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON Given a heterogeneous mixture of proteins, how can we determine all their structures in a high-throughput and high-resolution manner?

METHODS FOR OBTAINING STRUCTURE 0246 ACCURACY Experiment (X-ray, NMR) Computation (de novo) Computation (template-based) Hybrid (Iterative Bayesian interpretation of noisy NMR data with structure simulations) One distance constraint for every six residues One distance constraint for every ten residues C α RMSD

WHY ARE CURRENT METHODS NOT ADEQUATE? The major bottlenecks for both X-ray diffraction and NMR studies is producing sufficient quantities of the protein in a pure form to perform the experiments. Deviations from ideal behaviour in a protein sample result in slow and labour-intensive structure determination, if at all possible. These major structure determination techniques were developed at a time when our worldview of proteins was simple and did not account for environment-dependent structure formation, protein dynamics and conformational changes, and post-translational modifications. The vast majority of proteins will therefore be inaccessible to X-ray diffraction and NMR studies. Computational approaches do not have the resolution of experimental approaches and lack consistency. Develop new methods based on crosslinking, mass spectroscopy, and isotope labelling for high throughput structure determination.

DISTANCE INFORMATION USING KNOWN STRUCTURES Residue specific all-atom probability discriminatory function (RAPDF) s(d ab ) for contacts AO AN AC... YOH AO AN AC … YOH 167 X167 contacts distance bins Known structures atom-atom contacts AO AN AC... YOH AO AN AC … YOH Candidate structure atom-atom contacts AO AN AC... YOH AO AN AC … YOH NxN contacts

DISTANCE INFORMATION USING NMR H HN N Peak coordinates: Protons with consistent chemical shifts: 43 VAL HG ILE HN LEU HB Nucleii of proteins emit RF radiation measured in the form of chemical shifts. Primary source of distance information between protons is due to NOE. Steps: experiment (labourious), chemical shift assignment (automated), peak assignment (nontrivial), and structure determination (partially automated). Bayesian estimation of contact probabilities: Prior Post. Dist. 43 VAL HG1 - 8 ILE HN Å 59 LEU HB3 - 8 ILE HN Å

STRUCTURES USING COMPUTATION AND EXPERIMENT Bayesian approach calculates the probability distribution of each NOE peak contributing to proton-proton distances in a protein. Approach is assignment free, fast, fully automated, tolerant of noise, incompleteness and ambiguity, and enables iterative reinterpretation of source experimental data based on simulated structures (90% complete). PROTINFO NMR structure for 1aye 1.8 Å C α RMSD for 70 residues PROTINFO NMR structure for mjnop 3.5 Å C α RMSD for 50 residues (required manual interpretation for several months)

DISTANCE INFORMATION USING MASS SPECTROSCOPY Add crosslinkers Repeat using different crosslinkers and isotope labelling Identify crosslinked fragments MS MKRS LVKQ VSKNT KEVN Confirm sequence MS Identify proteins with single crosslinks and fragment MS

WHAT HAS BEEN DONE Proof of concept studies done by several people. A very good example: Young MM, Tang N, Hempel JC, Oshiro CM, Taylor EW, Kuntz ID, Gibson BW, and Dollinger G. High throughput protein fold identification by using experimental constraints derived from intramolecular cross-links and mass spectrometry. PNAS 97: , Eighteen intramolecular lysine-lysine crosslinks were identified for FGF-2 using crosslinking, MS, and proteolytic digestion, and fold identification. Authors claim method can be automated to produce structures in two days.

WHAT HAS BEEN DONE Young MM, Tang N, Hempel JC, Oshiro CM, Taylor EW, Kuntz ID, Gibson BW, and Dollinger G. PNAS 97: , 2000.

CROSSLINKING POSSIBILITIES Seven chemical groups that can be crosslinked from the following residues: cysteine, lysine, arginine, aspartate, glutamate, and the two terminii. Numerous distances for the 49 (7x7) possible pairs of groups. For every 100 residues, there may be up to ten members of each group, but typically only one crosslink is possible at a particular distance out of the ~100 possible pairs. A database of nonredundant protein structures reveals an average of 265 nonlocal crosslinks per protein and 1.5 per residue (estimate assuming a line of sight up to 20 Å between groups to be crosslinked).

HOW AND WHY WILL THIS WORK? Perform experiments to obtain a number of distance constraints for several proteins simultaneously. Perform simulations based on high confidence constraints and use distance distributions from resulting structures to iteratively reinterpret the spectra (without repeating experiment) until we obtain a high-resolution structure. Computational aspects largely complete. Components of approach have been implemented by others in a limited way but are assembled here in a robust and unique manner. Method can handle: Impure protein purification (ex: structural genomics failures). Environment-dependent structures (ex: chaperones + effectors). Partially disordered proteins. Several proteins simultaneously (large scale). No need for proteolytic digestion (complicates things). Focus on structures from noisy data, unlike X-ray diffraction and NMR.

OUR PROOF OF CONCEPT (IN PROGRESS) We have identified a novel herpesivirus protease inhibitor using docking with dynamics. We have experimentally verified this inhibitor works comparable to or better than existing antiherpes drugs against all representative members in cell culture. We have not verified whether inhibitor binds to active site of protease as predicted. We are synthesising, cloning, expressing, and purifying the protein (for Ki measurements). We will confirm presence or absence of bound inhibitor by crosslinking:

WHAT NEEDS TO BE DONE Crosslinkers need be constructed for several distances for all possible crosslinkable groups to get maximum number of constraints possible. Computational studies using simulated data (with noise) and develop software to prioritise experiments (ex: crosslinker choices). Initial studies starting with fairly pure mixtures >> not-so-pure mixtures >> 2-3 proteins >> handful of proteins >> Difficult proteins >> heterogenous mixtures >> whole proteomes. Bayesian framework utilised to estimate accuracy/error: Avoid repeating past oversight with NMR. Obtain an R-factor like estimate as in X-ray diffraction. Comparison of generated spectra from models to actual spectra. Iterative reinterpretation of experimental data.

OUTCOME AND EXPECTATIONS Structural genomics projects aim to obtain a representative structure of every protein family using X-ray diffraction and NMR methods and employ computational methods to fill in the gaps (enable coverage of the entire proteome). However, several families of proteins will not be accessible by these structure determination methodologies due to the need for large amounts of pure protein. Computational methods alone are far from capable of consistently producing high resolution structures. Even in successful cases, the dynamic effect of environmental effects on protein structure is not accounted for by current experimental and computational approaches. Our hybrid approach, which complements existing structural genomics efforts, will be used to rapidly obtain structures for entire proteomes in biologically relevant environments.

ACKNOWLEDGEMENTS Baishali Chanda Brady Bernard Chuck Mader David Nickle Ersin Emre Oren Ekachai Jenwitheesuk Gong Cheng Imran Rashid Jeremy Horst Ling-Hong Hung Michal Guerquin Rob Brasier Rosalia Tungaraza Shing-Chung Ngan Siriphan Manocheewa Somsak Phattarasukol Stewart Moughon Tianyun Liu Vania Wang Weerayuth Kittichotirat Zach Frazier Kristina Montgomery, Program Manager Current group members: Aaron Chang Duncan Milburn Jason McDermott Kai Wang Marissa LaMadrid Past group members: Funding agencies: National Institutes of Health National Science Foundation Searle Scholars Program Puget Sound Partners in Global Health UW Advanced Technology Initiative Washington Research Foundation UW TGIF James Staley Mehmet Sarikaya/Candan Tamerler Michael Lagunoff Roger Bumgarner Wesley Van Voorhis Collaborators:

DISTANCE INFORMATION USING MASS SPECTROSCOPY Add labelled and unlabelled crosslinkers to a heterogeneous mixture of proteins Repeat with different fragmentation resolution, crosslinker types, isotope labelling Relative abundance mass/charge For each peak representing a protein with a single crosslinker: MS Enrich (LC, biotin) fragment MS Identify peaks consistent with crosslinked fragments and obtain distance constraints Relative abundance mass/charge

INTERPRETING MASS SPECTRA …AKRS…LKYVT…SKL…ARKT… (4 x 3 = 12 possibilities, one true contact) Ambiguous peaks in spectra are Relative abundance mass/charge AKR-LK ARK-KL disambiguated (either eliminated or prioritised) using different fragmentation resolution, database preferences, and iterative reinterpretation after structure simulations Relative abundance mass/charge AKRS-LKY Spurious peaks in spectra are mass/charge eliminated using isotope labelling (look for precise shifts) AKR-LK ARK-KL Relative abundance AKR-SK? Relative abundance mass/charge AKR-LK ARK-KL