SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON Given a heterogeneous mixture of proteins, how can we determine.

Slides:



Advertisements
Similar presentations
Unravelling the biochemical reaction kinetics from time-series data Santiago Schnell Indiana University School of Informatics and Biocomplexity Institute.
Advertisements

Genomes and Proteomes genome: complete set of genetic information in organism gene sequence contains recipe for making proteins (genotype) proteome: complete.
Protein Quantitation II: Multiple Reaction Monitoring
UC Mass Spectrometry Facility & Protein Characterization for Proteomics Core Proteomics Capabilities: Examples of Protein ID and Analysis of Modified Proteins.
05/27/2006 Modeling and Determining the Structures of Proteins and Macromolecular Assemblies Depts. of Biopharmaceutical Sciences and Pharmaceutical Chemistry.
Bioinformatics Needs for the post-genomic era Dr. Erik Bongcam-Rudloff The Linnaeus Centre for Bioinformatics.
A comprehensive analysis of protein-protein interactions in Saccharomyces cerevisiae Article by Peter Uetz, et.al. Presented by Kerstin Obando.
Cluster Analysis.  What is Cluster Analysis?  Types of Data in Cluster Analysis  A Categorization of Major Clustering Methods  Partitioning Methods.
Proteomics: A Challenge for Technology and Information Science CBCB Seminar, November 21, 2005 Tim Griffin Dept. Biochemistry, Molecular Biology and Biophysics.
In double vision when drunk By Thomas Huber 23 November 2001 Alexandra Headland.
Fa 05CSE182 CSE182-L8 Mass Spectrometry. Fa 05CSE182 Bio. quiz What is a gene? What is a transcript? What is translation? What are microarrays? What is.
1 Seventh Lecture Error Analysis Instrumentation and Product Testing.
Proteomics Informatics Workshop Part I: Protein Identification
Previous Lecture: Regression and Correlation
Proteomics Informatics (BMSC-GA 4437) Course Director David Fenyö Contact information
Fa 05CSE182 CSE182-L9 Mass Spectrometry Quantitation and other applications.
Proteomics Informatics Workshop Part II: Protein Characterization David Fenyö February 18, 2011 Top-down/bottom-up proteomics Post-translational modifications.
Protein sequencing and Mass Spectrometry. Sample Preparation Enzymatic Digestion (Trypsin) + Fractionation.
Tryptic digestion Proteomics Workflow for Gel-based and LC-coupled Mass Spectrometry Protein or peptide pre-fractionation is a prerequisite for the reduction.
Fully automated, high throughput H/D Exchange technology provides a high resolution fingerprint of the structure and dynamics of proteins under many conditions.
ClusPro: an automated docking and discrimination method for the prediction of protein complexes Stephen R. Comeau, David W.Gatchell, Sandor Vajda, and.
E.Chiaveri on behalf of the n_TOF Collaboration n_TOF Collaboration/Collaboration Board Lisbon, 13/15 December 2011 Proposal for Experimental Area 2(EAR-2)
Center for Human Health and the Environment
Modelling proteomes An integrated computational framework for systems biology research Ram Samudrala University of Washington How does the genome of an.
es/by-sa/2.0/. Large Scale Approaches to the Study of Protein Levels and Activity Prof:Rui Alves
-A cell is an organization of millions of molecules -Proper communication between these molecules is essential to the normal functioning of the cell -To.
Acknowledgements This work is supported by NSF award DBI , and National Center for Glycomics and Glycoproteomics, funded by NIH/NCRR grant 5P41RR
Biomolecular Nuclear Magnetic Resonance Spectroscopy BASIC CONCEPTS OF NMR How does NMR work? Resonance assignment Structure determination 01/24/05 NMR.
Analysis of Complex Proteomic Datasets Using Scaffold Free Scaffold Viewer can be downloaded at:
Function first: a powerful approach to post-genomic drug discovery Stephen F. Betz, Susan M. Baxter and Jacquelyn S. Fetrow GeneFormatics Presented by.
Shaping up the protein folding funnel by local interaction: Lesson from a structure prediction study George Chikenji*, Yoshimi Fujitsuka, and Shoji Takada*
Laxman Yetukuri T : Modeling of Proteomics Data
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Automating Steps in Protein Structure Determination by NMR CS April 13, 2009.
Novel Algorithms for the Quantification Confidence in Quantitative Proteomics with Stable Isotope Labeling* Novel Algorithms for the Quantification Confidence.
Biomolecular Nuclear Magnetic Resonance Spectroscopy FROM ASSIGNMENT TO STRUCTURE Sequential resonance assignment strategies NMR data for structure determination.
MS Calibration for Protein Profiles We need calibration for –Accurate mass value Mass error: (Measured Mass – Theoretical Mass) X 10 6 ppm Theoretical.
CS 461b/661b: Bioinformatics Tools and Applications Software Algorithm Mathematical Models Biology Experiments and Data.
Samudrala group - overall research areas CASP6 prediction for T Å C α RMSD for all 70 residues CASP6 prediction for T Å C α RMSD for all.
The Number of Absorptions Protons have different chemical shifts when they are in different chemical environments Types of protons: – Homotopic Protons.
Lecture 9. Functional Genomics at the Protein Level: Proteomics.
Genomics II: The Proteome Using high-throughput methods to identify proteins and to understand their function.
Proteomics What is it? How is it done? Are there different kinds? Why would you want to do it (what can it tell you)?
Protein Modeling Protein Structure Prediction. 3D Protein Structure ALA CαCα LEU CαCαCαCαCαCαCαCα PRO VALVAL ARG …… ??? backbone sidechain.
Computational engineering of bionanostructures Ram Samudrala University of Washington How can we analyse, design, & engineer peptides capable of specific.
Modelling protein tertiary structure Ram Samudrala University of Washington.
COMPUTATIONAL ENGINEERING OF BIONANOSTRUCTURES
CSE182 CSE182-L11 Protein sequencing and Mass Spectrometry.
Modelling proteomes Ram Samudrala Department of Microbiology How does the genome of an organism specify its behaviour and characteristics?
MODELLING PROTEOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Proteomics Informatics (BMSC-GA 4437) Instructor David Fenyö Contact information
SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON Given a heterogeneous mixture of proteins, how can we determine.
Salamanca, March 16th 2010 Participants: Laboratori de Proteomica-HUVH Servicio de Proteómica-CNB-CSIC Participants: Laboratori de Proteomica-HUVH Servicio.
Molecular dynamics simulations of toxin binding to ion channels Quantitative description protein –ligand interactions is a fundamental problem in molecular.
Discovery of Therapeutics to Improve Quality of Life Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington.
CS-ROSETTA Yang Shen et al. Presented by Jonathan Jou.
COMPUTATIONAL ENGINEERING OF BIONANOSTRUCTURES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How can we design peptides and proteins capable.
How NMR is Used for the Study of Biomacromolecules Analytical biochemistry Comparative analysis Interactions between biomolecules Structure determination.
Modelling genome structure and function Ram Samudrala University of Washington.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Proteomics Informatics (BMSC-GA 4437) Course Directors David Fenyö Kelly Ruggles Beatrix Ueberheide Contact information
The Syllabus. The Syllabus Safety First !!! Students will not be allowed into the lab without proper attire. Proper attire is designed for your protection.
University of Washington
Modelling the rice proteome
University of Washington
Molecular Docking Profacgen. The interactions between proteins and other molecules play important roles in various biological processes, including gene.
Proteomics Informatics David Fenyő
Proteomics Informatics –
Proteomics Informatics David Fenyő
Presentation transcript:

SHOTGUN STRUCTURAL PROTEOMICS RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON Given a heterogeneous mixture of proteins, how can we determine all their structures in a high-throughput and high-resolution manner?

MOTIVATION FOR DETERMINING PROTEIN STRUCTURE The functions necessary for life are undertaken by proteins. Protein function is mediated by protein three-dimensional structure. Knowing protein structure at high resolution will enable us to: Determine and understand molecular function. Understand substrate and ligand binding. Devise intelligent mutagenesis and biochemical experiments to understand biological function. Design therapeutics rationally. Design novel proteins. Knowing the structures of all proteins encoded by an organism’s genome will enable us to understand complex pathways and systems, and ultimately organismal behaviour and evolution. Applications in the area of medicine, nanotechnology, and biological computing.

HOW CAN WE DETERMINE STRUCTURE? 0246 Accuracy Experiment (X-ray, NMR) Computation (de novo) Computation (template-based) Hybrid (Iterative Bayesian interpretation of noisy NMR data with structure simulations) One distance constraint for every six residues One distance constraint for every ten residues C α RMSD

DISTANCE INFORMATION USING MASS SPECTROSCOPY Add crosslinkers Repeat using different crosslinkers and isotope labelling Identify crosslinked fragments MS MKRS LVKQ VSKNT KEVN Confirm sequence MS Identify proteins with single crosslinks and fragment MS

HOW AND WHY WILL THIS WORK? Perform experiments to obtain a number of distance constraints (one for every six residues for medium to high-resolution structures). Perform simulations based on high confidence constraints and use distance distributions from resulting structures to iteratively reinterpret the spectra (without repeating experiment) until we obtain a high-resolution structure. Computational aspects largely complete. Components of approach have been implemented by others in a limited way but are assembled here in a robust and unique manner. Method can handle: Impure protein purification (ex: structural genomics failures). Environment-dependent structures (ex: chaperones + effectors). Partially disordered proteins. Several proteins simultaneously (large scale). No need for proteolytic digestion (complicates things). Focus on structures from noisy data, unlike X-ray diffraction and NMR.

PLAN OF ACTION Begin computational studies using simulated data (with noise) and develop software to prioritise experiments (ex: crosslinker choices). Initial studies using UW Mass Spectrometry Center: Start with fairly pure mixtures >> not-so-pure mixtures >> 2-3 proteins >> handful of proteins >> Difficult proteins >> heterogenous mixtures >> whole proteomes. Advice from Aebersold, Kelleher. Team of personnel working on crosslinking technology, protein enrichment, mass spectroscopy, structure calculation, parameterisation. Dedicated instrumentation through Pioneer Award, startup, MRI. Bayesian framework will be utilised to estimate accuracy/error: Avoid repeating past oversight with NMR. Obtain an R-factor like estimate as in X-ray diffraction. Comparison of generated spectra from models to actual spectra. Iterative reinterpretation of experimental data.

RECENT SUCCESSES AND SUITABILITY PROTINFO structure for 1aye 1.8 Å C α RMSD for 70 residues PROTEIN STRUCTURE DETERMINATION PROTEIN DESIGN/NANOTECHNOLOGY PROTEIN INHIBITOR DISCOVERY Track record of notable successes (5 years). Excellent environment at UW/Seattle. Ability to unify components cohesively. Young and highly energetic. Right combination of computational skills and experimental design strategy to carry out the work.

OUTCOME AND EXPECTATIONS Structural genomics projects aim to obtain a representative structure of every protein family using X-ray diffraction and NMR methods and employ computational methods to fill in the gaps. However, several families of proteins will not be accessible by these structure determination methodologies, and computational methods alone are far from capable of consistently producing high resolution structures. Even in successful cases, the effect of the biological environment on protein structure is not accounted for. Our hybrid approach, which complements existing structural genomics efforts, will be used to rapidly obtain structures for entire proteomes in biologically relevant environments.

WHY ARE CURRENT METHODS NOT ADEQUATE? The major bottlenecks for both X-ray diffraction and NMR studies is producing sufficient quantities of the protein in a pure form to perform the experiments. Deviations from ideal behaviour in a protein sample result in slow and labour-intensive structure determination, if at all possible. These major structure determination techniques were developed at a time when our worldview of proteins was simple and did not account for environment-dependent structure formation, protein dynamics and conformational changes, and post-translational modifications. The vast majority of proteins will therefore be inaccessible to X-ray diffraction and NMR studies. Computational approaches do not have the resolution of experimental approaches and lack consistency.

CROSSLINKING POSSIBILITIES Seven chemical groups that can be crosslinked: amines (2), carboxyls (3), and thiols (2). Numerous distances for the ~42 (7 x 6) possible pairs of groups. For every 100 residues, there may be up to ten members of each group, but typically only one crosslink is possible at a particular distance out of the ~100 possible pairs. For every 100 residues, the total number of groups is ~20-40, resulting in a potential yield of distance constraints if all crosslink possibilities can occur.

DISTANCE INFORMATION USING KNOWN STRUCTURES Residue specific all-atom probability discriminatory function (RAPDF) s(d ab ) for contacts AO AN AC... YOH AO AN AC … YOH 167 X167 contacts distance bins Known structures atom-atom contacts AO AN AC... YOH AO AN AC … YOH Candidate structure atom-atom contacts AO AN AC... YOH AO AN AC … YOH NxN contacts

STRUCTURES FROM SIMULATIONS USING RAPDF Good correlation between RAPDF score and accuracy of structure. RAPDF is one of the first all-atom knowledge-based functions and is a standard by which other scoring functions are compared. RAPDF has contributed to our success at CASP when combined with our simulation protocols to sample protein conformational space efficiently. PROTINFO AB CASP6 prediction for T Å C α RMSD for all 70 residues (continuous RAPDF produces 2.1 Å RMSD structure) PROTINFO CM CASP6 prediction for T Å C α RMSD for all 142 residues (46% ID)

DISTANCE INFORMATION USING NMR H HN N Peak coordinates: Protons with consistent chemical shifts: 43 VAL HG ILE HN LEU HB Nucleii of proteins emit RF radiation measured in the form of chemical shifts. Primary source of distance information between protons is due to NOE. Steps: experiment (labourious), chemical shift assignment (automated), peak assignment (nontrivial), and structure determination (partially automated). Bayesian estimation of contact probabilities: Prior Post. Dist. 43 VAL HG1 - 8 ILE HN Å 59 LEU HB3 - 8 ILE HN Å

STRUCTURES USING COMPUTATION AND EXPERIMENT Bayesian approach calculates the probability distribution of each NOE peak contributing to proton-proton distances in a protein. Approach is assignment free, fast, fully automated, tolerant of noise, incompleteness and ambiguity, and enables iterative reinterpretation of source experimental data based on simulated structures (90% complete). PROTINFO NMR structure for 1aye 1.8 Å C α RMSD for 70 residues PROTINFO NMR structure for mjnop 3.5 Å C α RMSD for 50 residues (required manual interpretation for several months)

DISTANCE INFORMATION USING MASS SPECTROSCOPY Add labelled and unlabelled crosslinkers to a heterogeneous mixture of proteins Repeat with different fragmentation resolution, crosslinker types, isotope labelling Relative abundance mass/charge For each peak representing a protein with a single crosslinker: MS Enrich (LC, biotin) fragment MS Identify peaks consistent with crosslinked fragments and obtain distance constraints Relative abundance mass/charge

INTERPRETING MASS SPECTRA …AKRS…LKYVT…SKL…ARKT… (4 x 3 = 12 possibilities, one true contact) Ambiguous peaks in spectra are Relative abundance mass/charge AKR-LK ARK-KL disambiguated (either eliminated or prioritised) using different fragmentation resolution, database preferences, and iterative reinterpretation after structure simulations Relative abundance mass/charge AKRS-LKY Spurious peaks in spectra are mass/charge eliminated using isotope labelling (look for precise shifts) AKR-LK ARK-KL Relative abundance AKR-SK? Relative abundance mass/charge AKR-LK ARK-KL

DISTANCE INFORMATION USING FRET Analogous to MS approach, but instead of peaks representing mass/charge ratios that identify two crosslinked residues (indirect distance information), we can obtain direct distance information. Express protein in an in vitro system to ensure single flurophore donor/acceptor pair for two residues in a protein. Use confocal microscopy setup to measure energy transfer for many donor/acceptor pairs. Distance is based on donor/acceptor type can be obtained for any pair of residues that do not cause loss of structure (determined by consistency across many pairs); tangential benefit of identifying structurally important residues. Ideal for measurement of long range distances and for large proteins.