Modelling Genome Structure and Function Ram Samudrala University of Washington.

Slides:



Advertisements
Similar presentations
Functional Site Prediction Selects Correct Protein Models Vijayalakshmi Chelliah Division of Mathematical Biology National Institute.
Advertisements

PROTEOMICS 3D Structure Prediction. Contents Protein 3D structure. –Basics –PDB –Prediction approaches Protein classification.
Protein Structure Prediction using ROSETTA
Structural bioinformatics
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
Thomas Blicher Center for Biological Sequence Analysis
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
Protein Structure and Prediction
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Molecular modelling / structure prediction (A computational approach to protein structure) Today: Why bother about proteins/prediction Concepts of molecular.
Protein structure determination & prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray.
1 Protein Structure Prediction Charles Yan. 2 Different Levels of Protein Structures The primary structure is the sequence of residues in the polypeptide.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structure Prediction Xiaole Shirley Liu And Jun Liu STAT115.
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
Protein Tertiary Structure Prediction
Construyendo modelos 3D de proteinas ‘fold recognition / threading’
Modelling, comparison, and analysis of proteomes Ram Samudrala University of Washington.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Modelling proteomes An integrated computational framework for systems biology research Ram Samudrala University of Washington How does the genome of an.
Representations of Molecular Structure: Bonds Only.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Protein Structure Prediction Ram Samudrala University of Washington.
Ab Initio Methods for Protein Structure Prediction CS882 Presentation, by Shuai C., Li.
Modelling Genome Structure and Function Ram Samudrala University of Washington.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Protein secondary structure Prediction Why 2 nd Structure prediction? The problem Seq: RPLQGLVLDTQLYGFPGAFDDWERFMRE Pred:CCCCCHHHHHCCCCEEEECCHHHHHHCC.
DNA encodes messenger RNA
Applied Bioinformatics Week 12. Bioinformatics & Functional Proteomics How to classify proteins into functional classes? How to compare one proteome with.
Protein Folding and Modeling Carol K. Hall Chemical and Biomolecular Engineering North Carolina State University.
An Integrated Computational Framework for Systems Biology Ram Samudrala University of Washington How does the genome of an organism specify its behaviour.
Structure prediction: Homology modeling
Computational engineering of bionanostructures Ram Samudrala University of Washington How can we analyse, design, & engineer peptides capable of specific.
Modelling protein tertiary structure Ram Samudrala University of Washington.
Introduction to Protein Structure Prediction BMI/CS 576 Colin Dewey Fall 2008.
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
Protein Design with Backbone Optimization Brian Kuhlman University of North Carolina at Chapel Hill.
Structure prediction: Ab-initio Lecture 9 Structural Bioinformatics Dr. Avraham Samson Let’s think!
Bioinformatics and Computational Biology
Modelling proteomes Ram Samudrala Department of Microbiology How does the genome of an organism specify its behaviour and characteristics?
MODELLING PROTEOMES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How does the genome of an organism specify its behaviour and characteristics?
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
Query sequence MTYKLILNGKTKGETTTEAVDAATAEKVFQYANDN GVDGEWTYTE Structure-Sequence alignment “Structure is better preserved than sequence” Me! Non-redundant.
Modelling proteins and proteomes using Linux clusters Ram Samudrala University of Washington.
Modelling proteomes Ram Samudrala University of Washington.
Modelling proteomes: Application to understanding HIV disease progression Ram Samudrala Department of Microbiology University of Washington How does the.
COMPUTATIONAL ENGINEERING OF BIONANOSTRUCTURES RAM SAMUDRALA ASSOCIATE PROFESSOR UNIVERSITY OF WASHINGTON How can we design peptides and proteins capable.
Structure/function studies of HIV proteins HIV gp120 V3 loop modelling using de novo approaches HIV protease-inhibitor binding energy prediction.
Modelling genome structure and function Ram Samudrala University of Washington.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Modelling proteomes Ram Samudrala University of Washington How does the genome of an organism specify its behaviour and characteristics?
Modelling genome structure and function - a practical approach Ram Samudrala University of Washington.
Protein Structure Visualisation
How does the genome of an organism
University of Washington
Modelling the rice proteome
University of Washington
Prediction of Protein Structure and Function on a Proteomic Scale
Protein Structure Prediction
How does the genome of an organism
Molecular Modeling By Rashmi Shrivastava Lecturer
The future of protein secondary structure prediction accuracy
Rosetta: De Novo determination of protein structure
University of Washington
Homology Modeling.
Protein structure prediction.
Presentation transcript:

Modelling Genome Structure and Function Ram Samudrala University of Washington

Rationale for understanding protein structure and function Protein sequence -large numbers of sequences, including whole genomes Protein function - rational drug design and treatment of disease - protein and genetic engineering - build networks to model cellular pathways - study organismal function and evolution ? structure determination structure prediction homology rational mutagenesis biochemical analysis model studies Protein structure - three dimensional - complicated - mediates function

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) not unique mobile inactive expanded irregular

Protein folding …-L-K-E-G-V-S-K-D-… …-CUA-AAA-GAA-GGU-GUU-AGC-AAG-GUU-… one amino acid DNA protein sequence unfolded protein native state spontaneous self-organisation (~1 second) unique shape precisely ordered stable/functional globular/compact helices and sheets not unique mobile inactive expanded irregular

Ab initio prediction of protein structure sample conformational space such that native-like conformations are found astronomically large number of conformations 5 states/100 residues = = select hard to design functions that are not fooled by non-native conformations (“decoys”)

Semi-exhaustive segment-based folding EFDVILKAAGANKVAVIKAVRGATGLGLKEAKDLVESAPAALKEGVSKDDAEALKKALEEAGAEVEVK generate fragments from database 14-state ,  model …… minimise monte carlo with simulated annealing conformational space annealing, GA …… filter all-atom pairwise interactions, bad contacts compactness, secondary structure

Ab initio prediction at CASP CASP1: worse than random CASP2: worse than random with one exception CASP4: consistently predicted correct topology - ~4-6.0 A for residues CASP3: consistently predicted correct topology - ~ 6.0 Å for 60+ residues **T110/rbfa – 4.0 Å (80 residues; 1-80)*T114/afp1 – 6.5 Å (45 residues; 36-80) **T97/er29 – 6.0 Å (80 residues; 18-97) **T106/sfrp3 – 6.2 Å (70 residues; 6-75) *T98/sp0a – 6.0 Å (60 residues; )**T102/as48 – 5.3 Å (70 residues; 1-70) Before CASP (BC): “solved” (biased results)

Prediction for CASP4 target T110/rbfa C  RMSD of 4.0 Å for 80 residues (1-80)

Prediction for CASP4 target T97/er29 C  RMSD of 6.2 Å for 80 residues (18-97)

Prediction for CASP4 target T106/sfrp3 C  RMSD of 6.2 Å for 70 residues (6-75)

Prediction for CASP4 target T98/sp0a C  RMSD of 6.0 Å for 60 residues (37-105)

Prediction for CASP4 target T126/omp C  RMSD of 6.5 Å for 60 residues (87-146)

Prediction for CASP4 target T114/afp1 C  RMSD of 6.5 Å for 45 residues (36-80)

Postdiction for CASP4 target T102/as48 C  RMSD of 5.3 Å for 70 residues (1-70)

Comparative modelling of protein structure KDHPFGFAVPTKNPDGTMNLMNWECAIP KDPPAGIGAPQDN----QNIMLWNAVIP ** * * * * * * * ** …… scan align refine physical functions build initial model minimum perturbation construct non-conserved side chains and main chains graph theory, semfold de novo simulation

A graph theoretic representation of protein structure -0.6 (V 1 ) -1.0 (F) -0.7 (K) -0.5 (I) -0.9 (V 2 ) weigh nodes -0.5 (I)-0.9 (V 2 ) -1.0 (F) -0.7 (K) find cliques W = -4.5 represent residues as nodes -0.5 (I) -0.6 (V 1 ) -0.9 (V 2 ) -1.0 (F) -0.7 (K) construct graph -0.1

CASP4: overall model accuracy ranging from 1 Å to 6 Å for 50-10% sequence identity **T112/dhso – 4.9 Å (348 residues; 24%)**T92/yeco – 5.6 Å (104 residues; 12%) **T128/sodm – 1.0 Å (198 residues; 50%) **T125/sp18 – 4.4 Å (137 residues; 24%) **T111/eno – 1.7 Å (430 residues; 51%)**T122/trpa – 2.9 Å (241 residues; 33%) Comparative modelling at CASP CASP2 fair ~ 75% ~ 1.0 Å ~ 3.0 Å CASP3 fair ~75% ~ 1.0 Å ~ 2.5 Å CASP4 fair ~75% ~ 1.0 Å ~ 2.0 Å CASP1 poor ~ 50% ~ 3.0 Å > 5.0 Å BC excellent ~ 80% 1.0 Å 2.0 Å alignment side chain short loops longer loops

Prediction for CASP4 target T128/sodm C  RMSD of 1.0 Å for 198 residues (PID 50%)

Prediction for CASP4 target T111/eno C  RMSD of 1.7 Å for 430 residues (PID 51%)

Prediction for CASP4 target T122/trpa C  RMSD of 2.9 Å for 241 residues (PID 33%)

Prediction for CASP4 target T125/sp18 C  RMSD of 4.4 Å for 137 residues (PID 24%)

Prediction for CASP4 target T112/dhso C  RMSD of 4.9 Å for 348 residues (PID 24%)

Prediction for CASP4 target T92/yeco C  RMSD of 5.6 Å for 104 residues (PID 12%)

Prediction for Invb

Computational aspects of structural genomics D. ab initio prediction C. fold recognition * * * * * * * * * * B. comparative modelling A. sequence space * * * * * * * * * * * * E. target selection targets F. analysis * * (Figure idea by Steve Brenner.)

Computational aspects of functional genomics structure based methods microenvironment analysis zinc binding site? structure comparison homology function? sequence based methods sequence comparison motif searches phylogenetic profiles domain fusion analyses + experimental data + * * * * G. assign function * * assign function to entire protein space

Modelling structure and function of entire genomes Homo sapiens (human) ~ 52,413 sequences Oryza sativa (rice) ~ 55,000 sequences Pseudomonas aeruginosa ~ 6000 sequences Salmonella typhimurium ~ 2000 sequences HIV 9 genes + ~ 52,000 variants

Modelling structure and function of the Oryza sativa (rice) genome Most common functions (from PROSITE) ATP/GTP-binding site motif A (P loop) Serine/Threonine protein kinase active site EF-hand (Calcium binding) Cytochrome C Heme binding site Most common functions (from annotations) Reverse transcriptase Nucleotide Binding Site (NBS) Serine/Threonine protein kinase Chitinase ~30 % with known homologs in PDB 6813 coding sequences 3149 without a product annotation 816 classified as hypothetical protein 1187 with a hypothetical function

Bioverse database and webserver

Bioverse webserver snapshot

Bioverse dataflow

Take home message Prediction of protein structure and function can be used to model whole genomes to understand organismal function and evolution Jason McDermott Group members Levitt and Moult groups Acknowledgements