HOMOLOGY MODELLING Chris Wilton. Homology Modelling   What is it and why do we need it? principles of modelling, applications available   Using Swiss-Model.

Slides:



Advertisements
Similar presentations
1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Advertisements

Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
1 Protein Structure, Structure Classification and Prediction Bioinformatics X3 January 2005 P. Johansson, D. Madsen Dept.of Cell & Molecular Biology, Uppsala.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modeling Anne Mølgaard, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 1, 2009 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
Tertiary protein structure viewing and prediction July 5, 2006 Learning objectives- Learn how to manipulate protein structures with Deep View software.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
Thomas Blicher Center for Biological Sequence Analysis
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Summary Protein design seeks to find amino acid sequences which stably fold into specific 3-D structures. Modeling the inherent flexibility of the protein.
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Tertiary protein structure modelling May 31, 2005 Graded papers will handed back Thursday Quiz#4 today Learning objectives- Continue to learn how to manipulate.
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Protein structure prediction May 30, 2002 Quiz#4 on June 4 Learning objectives-Understand difference between primary secondary and tertiary structure.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Can protein model accuracy be identified? Morten Nielsen, CBS, BioCentrum, DTU.
CENTER FOR BIOLOGICAL SEQUENCE ANALYSISTECHNICAL UNIVERSITY OF DENMARK DTU Homology Modelling Thomas Blicher Center for Biological Sequence Analysis.
PDB-Protein Data Bank SCOP –Protein structure classification CATH –Protein structure classification genTHREADER–3D structure prediction Swiss-Model–3D.
Protein Structure Prediction II
Introduction to Bioinformatics - Tutorial no. 8 Protein Prediction: - PROSITE - Pfam - SCOP - TOPITS - genThreader.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Bioinformatics Ayesha M. Khan Spring 2013.
Protein Structure Prediction and Analysis
Homology Modeling David Shiuan Department of Life Science and Institute of Biotechnology National Dong Hwa University.
MODELLER hands-on Ben Webb, Sali Lab, UC San Francisco Maya Topf, Birkbeck College, London.
Tertiary Structure Prediction Methods Any given protein sequence Structure selection Compare sequence with proteins have solved structure Homology Modeling.
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
Practical session 2b Introduction to 3D Modelling and threading 9:30am-10:00am 3D modeling and threading 10:00am-10:30am Analysis of mutations in MYH6.
COMPARATIVE or HOMOLOGY MODELING
Lecture 10 – protein structure prediction. A protein sequence.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
© Wiley Publishing All Rights Reserved. Protein 3D Structures.
3D structure -Swiss Pdb Viewer
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Protein Classification II CISC889: Bioinformatics Gang Situ 04/11/2002 Parts of this lecture borrowed from lecture given by Dr. Altman.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Module 3 Protein Structure Database/Structure Analysis Learning objectives Understand how information is stored in PDB Learn how to read a PDB flat file.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Bioinformatics how to … use publicly available free tools to predict protein structure by comparative modeling.
Manually Adjusting Multiple Alignments Chris Wilton.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Homology modeling with SWISS-MODEL
EBI is an Outstation of the European Molecular Biology Laboratory. Sanchayita Sen, Ph.D. PDB Depositions Validation & Structure Quality.
Protein Folding & Biospectroscopy Lecture 6 F14PFB David Robinson.
Protein Structure Prediction Graham Wood Charlotte Deane.
SAGExplore web server tutorial. The SAGExplore server has three different modules …
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Structural classification of Proteins SCOP Classification: consists of a database Family Evolutionarily related with a significant sequence identity Superfamily.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Lab Lab 10.2: Homology Modeling Lab Boris Steipe Departments of Biochemistry and.
Sequence: PFAM Used example: Database of protein domain families. It is based on manually curated alignments.
Protein Structure Visualisation
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Take a REST from manual searching: PDBe, programmatically
Protein Structure Prediction and Protein Homology modeling
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Homology Modeling.
Protein domains Miguel Andrade Mainz, Germany Faculty of Biology,
Presentation transcript:

HOMOLOGY MODELLING Chris Wilton

Homology Modelling   What is it and why do we need it? principles of modelling, applications available   Using Swiss-Model preparation, workspace tools, automated-mode refining with manually-adjusted alignments   Model Assessment using SwissModel tools to check model quality what to look for, what we expect…

What is Homology Modelling?   Given an unknown protein, make an informed guess on its 3D structure based on its sequence: Search structure databases for homologous sequences Transfer coordinates of known protein onto unknown MQEQLTDFSKVETNLISW-QGSLETVEQMEPWAGSDANSQTEAY MHQQVSDYAKVEHQWLYRVAGTIETLDNMSPANHSDAQTQAA | |..|. |||... |..||.|.| | |||..| | = Identity. | = Homology

Why Modelling is Necessary   Current structure determination methods: NMR – conc. protein solution; limited size + resolution XRay Crystallography – protein crystals, high resolution Expensive, slow, difficult (especially membrane proteins!)   Can’t keep up with growth rate of sequence databases: PDB: structures ( 14/11/2006 ) pairsdb: sequences ( 11/2006 )

 SWISSMODEL (www)   3D-Jigsaw (www)   ESyPred3D (www)   WHATIF (www)   CPH Models 2.0 (www)   MODELLER 8v2 (standalone for windows, mac, linux; also web submission) Current (Free) Servers & Software

  Increase in sequence identity correlates with increase in structural similarity   RMSD of core  -carbon coordinates for two homologous proteins sharing 50% identity expected at ~1Å (GLY 3.5Å, helix 5Å diameter)   Theoretical models are low resolution, and depend on quality of input alignment! Assumptions & Principles

The SwissModel Workspace

Search structure databases for potential homologues, and obtain family information where possible. Template Search

 Automated mode: Select a template structure with high % identity and average resolution, or let SwissModel choose best one.  Alignment mode: Obtain multiple alignment between possible templates and your query sequence. Check alignment – are key motifs correctly aligned? Manually adjust as necessary, using Jalview. Upload alignment, select template, and submit. Preparation

Automated Mode Submit a Model Request

Alignment Mode Submit a Model Request

Hidden Modelling Steps  Automated mode, multiple templates: Superimposition and optimisation of corresponding α -carbon pairs for all template structures Alignment of all residues of template structures: acceptable alpha-carbon atoms located within 3Å radius of mean  All modes: Framework construction of query protein structure, using coordinates of template structure, or mean coordinate positions if multiple templates. Building unconserved loop regions / insertions, closing gaps in backbone - uses penta-peptide fragment library derived from all PDB entries of resolution 2Å or better (note: this can fail badly!)

Output  Within a few minutes of submission, your results are returned to the Workspace.  Output Page: 1.Your model in pdb format (plus simple viewer) 2.Query to template alignment 3.Simple assessment graphs 4.Logging data of the modelling process  Save the model in DeepView Project format, then open in DeepView, and colour by B-factor.

Multiple Sequence Alignment  More information than pairwise alignment  Shows conserved regions in protein family Critical secondary structure elements  Shows strongly conserved residues and motifs Structural importance Functional importance

Alignment Adjustment  Look at alignment used to select template Check alignment against 3D structure (Jalview) Secondary structure elements preserved? Motifs and key residues aligned? Are gaps in acceptable locations (ie: loops)?  Try to improve alignment manually...  Or use different alignment application

Contains theoretical models ( on 05/09/2006 ) Searchable by UniProt / Trembl accession number Fully automated procedure - models may be bad! It can be found here: ModBase also holds threshold-validated theoretical models [ structures currently ]: The SwissModel Repository

Model Assessment Use SwissModel Workspace tools:

Model Evaluation  Look at model in DeepView: is it globular and compact, are there odd unstructured loops?  Colour by B-factor and note bad areas – in loops, or in important secondary elements?  WhatCheck - always lots of errors! Look for packing, inside-outside, torsion angle errors. Solvation Profile  Check Solvation Profile – how well packed is the model? Are there regions which are too loose, and therefore solvent-exposed?

Torsion Angles

Ramachandran Plot Experimentally Resolved Model (NMR, XRay): Rare for residues other than Glycine & Alanine to be in disallowed regions Homology Model: Bad residues common May have to remodel small sections and loops Large residues in bad regions worth further investigation …

Solvation Profile Long stretches above zero probably loops Most-negative regions well buried in core Functional residues above zero! Majority of residues must be below zero Compare your model to pdb template!

Further details   DeepView tutorial – –   Advanced tutorial on SwissModel & DeepView – –

Suggested Reading   Swiss-Model’s advice: – –   A good interactive tutorial on Protein Modelling: – –

References Schwede T, Kopp J, Guex N, and Peitsch MC (2003) SWISS- MODEL: an automated protein homology-modeling server. Nucleic Acids Research 31: Guex, N. and Peitsch, M. C. (1997) SWISS-MODEL and the Swiss- PdbViewer: An environment for comparative protein modelling. Electrophoresis 18: Peitsch, M. C. (1995) Protein modeling by . Bio/Technology 13: R.W.W.Hooft, G.Vriend, C.Sander and E.E.Abola, Errors in protein structures. Nature 381, 272 (1996). Ramachandran GN, Ramakrishnan C, Sasisekharan V (1963). J Mol Biol 7:95-99.

Demonstrations   Uniprot entry Q11CR8_MESSB is described as a "Fructose- bisphosphate aldolase class 1, from Mezorhizobium sp. (strain BNC1)"   It belongs to Pfam PF00274, the Fructose-1,6-Bisphosphate Aldolases (Class 1); this family contains 24 PDB structures.PF00274   When we run BlastP on Q11CR8_MESSB against the PDB, we get hits to all these structures; %ids range from 48 to 54%, which is reassuringly high.   Let’s try SwissModel in Automated mode:  

Demonstrations   Uniprot entry Q07159 is another Class I FBP Aldolase from Staphylococcus carnosus. It is unusual, however – most bacterial FBP Aldolases belong to Class II (PF00596)PF00596   When we run a Blast against the PDB, again we get hits to all twenty- four PDB structures, but identities range from 25 to 32% – not good.   Try SwissModel in Automated mode...   Oh dear, only part of our sequence has been modelled – automated mode has failed this time. Pfam has a family alignment, so let’s use it as input to Alignment mode.family alignment

Demonstrations Now let’s assess our two complete models:   Go to [Tools] >> Structure Assessment, and upload each model in turn: – –Add Whatcheck and Procheck to the selection, and add the resolution (check result file for automated mode, or check PDB website for alignment mode)   Also, go to the SolvX Server and input our models. It should give us (at least) two graphs per input – the first will be our model, the other(s) our template(s).SolvX Server

Demonstrations   What validation checks are the important ones? Whatcheck: Accessibility-related (2.6), 3D-Database (2.8), Geometric ( ) Checks. Make a note of bad residues and regions. Procheck: Residues Report (G-factors) and Ramachandran Plot. Again, note bad residues and regions. SolvX: Compare your model with the template(s). Where are the worst (most solvent accessible) regions? Note them down.   How does your alignment model for ALF1_STACA compare with your neighbour’s model?