Bioinformatics how to … use publicly available free tools to predict protein structure by comparative modeling.

Slides:



Advertisements
Similar presentations
Predicting the function of a protein form either a sequence or a structure (is not trivial) Adam Godzik The Sanford-Burnham Medical Research Institute.
Advertisements

1 Genome information GenBank (Entrez nucleotide) Species-specific databases Protein sequence GenBank (Entrez protein) UniProtKB (SwissProt) Protein structure.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Tutorial Homology Modelling. A Brief Introduction to Homology Modeling.
Protein Structure Database Introduction Database of Comparative Protein Structure Models ModBase 生資所 g 詹濠先.
Bioinformatics and Statistics: A Real World Example Joseph D. Szustakowski.
Protein Tertiary Structure Prediction
Structural bioinformatics
Searching Sequence Databases
Predicting the Function of Single Nucleotide Polymorphisms Corey Harada Advisor: Eleazar Eskin.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Protein Fold recognition Morten Nielsen, Thomas Nordahl CBS, BioCentrum, DTU.
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Protein Fold recognition
The Protein Data Bank (PDB)
. Protein Structure Prediction [Based on Structural Bioinformatics, section VII]
Protein Tertiary Structure. Primary: amino acid linear sequence. Secondary:  -helices, β-sheets and loops. Tertiary: the 3D shape of the fully folded.
Similar Sequence Similar Function Charles Yan Spring 2006.
Protein Structure and Function Prediction. Predicting 3D Structure –Comparative modeling (homology) –Fold recognition (threading) Outstanding difficult.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Tertiary Structure Prediction Structural Bioinformatics.
Protein Structures.
Protein Structure Prediction and Analysis
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
© Wiley Publishing All Rights Reserved. Searching Sequence Databases.
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
Protein Tertiary Structure Prediction
Part II : Introduction To Protein Structure Kong Lesheng Victor Tong Joo Chuan National University of Singapore.
Discussion on Metagenomic Data for ANGUS Course Adina Howe.
Protein domains. Protein domains are structural units (average 160 aa) that share: Function Folding Evolution Proteins normally are multidomain (average.
COMPARATIVE or HOMOLOGY MODELING
Fast Search Protein Structure Prediction Algorithm for Almost Perfect Matches1 By Jayakumar Rudhrasenan S Primary Supervisor: Prof. Heiko Schroder.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
1 P9 Extra Discussion Slides. Sequence-Structure-Function Relationships Proteins of similar sequences fold into similar structures and perform similar.
Protein Folding Programs By Asım OKUR CSE 549 November 14, 2002.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
MolIDE2: Homology Modeling Of Protein Oligomers And Complexes Qiang Wang, Qifang Xu, Guoli Wang, and Roland L. Dunbrack, Jr. Fox Chase Cancer Center Philadelphia,
Condor: BLAST Monday, July 19 th, 3:15pm Alain Roy OSG Software Coordinator University of Wisconsin-Madison.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
A Tutorial of Sequence Matching in Oracle Haifeng Ji* and Gang Qian** * Oklahoma City Community College ** University of Central Oklahoma.
Protein Tertiary Structure. Protein Data Bank (PDB) Contains all known 3D structural data of large biological molecules, mostly proteins and nucleic acids:
Structure prediction: Homology modeling
©CMBI 2005 Transfer of information The main topic of this course is transfer of information. A month in the lab can easily save you an hour in front of.
Condor: BLAST Rob Quick Open Science Grid Indiana University.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Predicting Protein Structure: Comparative Modeling (homology modeling)
Protein Structure Prediction: Homology Modeling & Threading/Fold Recognition D. Mohanty NII, New Delhi.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Homology Modeling 原理、流程,還有如何用該工具去預測三級結構 Lu Chih-Hao 1 1.
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Guidelines for sequence reports. Outline Summary Results & Discussion –Sequence identification –Function assignment –Fold assignment –Identification of.
Step 3: Tools Database Searching
©CMBI 2005 Database Searching BLAST Database Searching Sequence Alignment Scoring Matrices Significance of an alignment BLAST, algorithm BLAST, parameters.
Copyright OpenHelix. No use or reproduction without express written consent1.
Protein Tertiary Structure Prediction Structural Bioinformatics.
©CMBI 2009 Transfer of information The main topic of this course is transfer of information. In the protein world that leads to the questions: 1)From which.
DNA / protein sequence analysis 第九組成員: 吳宇軒 侯卜夫 朱子豪 王俊偉
3.3b1 Protein Structure Threading (Fold recognition) Boris Steipe University of Toronto (Slides evolved from original material.
Discussion on Genomic/Metagenomic Data for ANGUS Course Adina Howe.
Bioinformatics Computing 1 CMP 807 – Day 4 Kevin Galens.
Protein Structure Prediction and Protein Homology modeling
Bioinformatics how to …
Protein Structures.
Molecular Modeling By Rashmi Shrivastava Lecturer
Basic Local Alignment Search Tool
Homology Modeling.
Protein structure prediction.
Basic Local Alignment Search Tool (BLAST)
Basic Local Alignment Search Tool
Presentation transcript:

Bioinformatics how to … use publicly available free tools to predict protein structure by comparative modeling

Proteins are 3D objects with complex shapes Over 60,000 protein structures have been determined, mostly by X-ray crystallography (PDB) 3D structure of ~70% of bacterial and 50% of human proteins can be predicted (comparative modeling)

A predicted model simply illustrates our assumptions No assumptions, this is nature telling us how it is GNAAAAKKGSEQESVKEFLAKAKEDFLKKWENPA QNTAHLDQFERIKTLGTGSFGRVMLVKHKETGNH FAMKILDKQKVVKLKQIEHTLNEKRILQAVNFPF LVKLEYSFKDNSNLYMVMEYVPGGEMFSHLRRIG RFSEPHARFYAAQIVLTFEYLHSLDLIYRDLKPE NLLIDQQGYIQVTDFGFAKRVKGRTWTLCGTPEY LAPEIILSKGYNKAVDWWALGVLIYEMAAGYPPF FADQPIQIYEKIVSGKVRFPSHFSSDLKDLLRNL LQVDLTKRFGNLKDGVNDIKNHKWFATTDWIAIY QRKVEAPFIPKFKGPGDTSNFDDYEEEEIRVSIN EKCGKEFSEF Sequence Assumption (protein A is Similar to protein B) Result (protein A is Similar to protein B)

Unknown protein GLLTTKFVSLLQEAKDGVLDLKL AADTLAVRQKRRIYDITNVLEGIG LIEKKSKNSIQW Well studied protein SRRSASHPTYSEMIAAAIRAEKS RGGSSRQSIQKYIKSHYKVGHN ADLQIKLSIRRLLAA similarity prediction How do we know that these proteins are similar?

How can we make such assumptions? Statistical reliability of the prediction E-value - the number of hits one can "expect" to see just by chance when searching a database of a particular size (closer to zero the better) Z-score – score expressed as a distance from the mean calculated in standard deviations (the bigger the better)

Similar, but not homologous phosphoribosyltransferase and viral coat protein, identity: 42%, different folds, different functions IRLKSYCNDQSTGDIKVIGGDDLSTLTGKNVLIVEDIIDTGKTMQTLLSLVRQY.NPKMVKVASLLVKRTPRSVGY 173 : ||. ||| || |. || | : | | | | || | || |:| | ||.| | 214 VPLKTDANDQ.IGDSLY....SAMTVDDFGVLAVRVVNDHNPTKVT..SKVRIYMKPKHVRV...WCPRPPRAVPY 279

Different, but homologous Histone H5 and transcription factor E2F4, identity 7%, similar fold, similar function (DNA binding) PTYSEMIAAAIRAEKSRGGSSRQSIQKYIKSHYKVGHNADLQIKLSIRRLLAAGVLKQTKGVGASGSFRL | | | | | GLLTTKFVSLLQEAKD-GVLDLKLAADTLA------VRQKRRIYDITNVLEGIGLIEKKS----KNSIQW

Steps in comparative modeling Recognition Model analysis Are there any well characterized proteins similar to my protein? What is the detailed 3D structure of my proteins Is my model any good? Modeling Alignment What is the position-by-position target/template equivalence

Recognition BLAST, PSI-BLAST or PFAM, FFAS, metaserver (bioinfo) Name (PDB code) of the template Statistical significance of the match (Z- score, e.value, p.value, points)

Alignment The same tools as in recognition (perhaps with different parameters), editing by hand Position by position equivalence table

Modeling Commercial programs Accelrys (Insight) Tripos (Sybyl) … Freeware/shareware /servers Modeller (Andrej Sali) Jackal (Barry Honig) SCRWL (Roland Dunbrack) SwissModel

Model quality Empirical energy based tools PSQS ( SwissPDB viewer Geometric quality Procheck, SFCHECK, etc. ( n/sv3.cgi) n/sv3.cgi

Easy – % sequence id - strong sequence similarity, strong structure similarity, obvious function analogy Difficult – 40%-25% - twilight zone sequence similarity, increasing structure divergence, function diversification Fold prediction – below 25% seq id. no apparent sequence similarity extreme function divergence Expectations of comparative modeling

Challenges of comparative modeling Recognition Alignment Modeling Challenges Trivial SimpleLoop modeling TrivialEasySimpleLoop modeling SimpleChallenging Alignment, backbone shifts DifficultVery difficult Significant errors Alignment, backbone shifts Often impossible Significant errors Often impossible Recognition

Hands-on Activity Click below for a hands-on, “bioinformatics how to” activity Go to Click Structure Biology Course - “ Protein Modeling Tutorial ” Link in the homepage. Protein Modeling Tutorial OR Go to….