Protein motif /domain Structural unit Functional unit Signature of protein family How are they defined?

Slides:



Advertisements
Similar presentations
Secondary structure prediction from amino acid sequence.
Advertisements

Blast outputoutput. How to measure the similarity between two sequences Q: which one is a better match to the query ? Query: M A T W L Seq_A: M A T P.
Statistics in Bioinformatics May 2, 2002 Quiz-15 min Learning objectives-Understand equally likely outcomes, Counting techniques (Example, genetic code,
Ulf Schmitz, Statistical methods for aiding alignment1 Bioinformatics Statistical methods for pattern searching Ulf Schmitz
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Profile Hidden Markov Models Bioinformatics Fall-2004 Dr Webb Miller and Dr Claude Depamphilis Dhiraj Joshi Department of Computer Science and Engineering.
1 Profile Hidden Markov Models For Protein Structure Prediction Colin Cherry
Patterns, Profiles, and Multiple Alignment.
Profiles for Sequences
درس بیوانفورماتیک December 2013 مدل ‌ مخفی مارکوف و تعمیم ‌ های آن به نام خدا.
Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Bioinformatics Finding signals and motifs in DNA and proteins Expectation Maximization Algorithm MEME The Gibbs sampler Lecture 10.
Secondary structure prediction. Amino acid sequence -> Secondary structure Alpha helix Beta strand Disordered/coil 70% accuracy 1991, 81% accuracy in.
Chapter 9 Structure Prediction. Motivation Given a protein, can you predict molecular structure Want to avoid repeated x-ray crystallography, but want.
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT. 2 HMM Architecture Markov Chains What is a Hidden Markov Model(HMM)? Components of HMM Problems of HMMs.
Profile-profile alignment using hidden Markov models Wing Wong.
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Structure Prediction. Tertiary protein structure: protein folding Three main approaches: [1] experimental determination (X-ray crystallography, NMR) [2]
HIDDEN MARKOV MODELS IN MULTIPLE ALIGNMENT
Computational Biology, Part 10 Protein Structure Prediction and Display Robert F. Murphy Copyright  1996, 1999, All rights reserved.
Protein Modules An Introduction to Bioinformatics.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Making Sense of DNA and protein sequence analysis tools (course #2) Dave Baumler Genome Center of Wisconsin,
Situations where generic scoring matrix is not suitable Short exact match Specific patterns.
Effect of gap penalty on Local Alignment Score:Score: 161 at (seq1)[2..36] : (seq2)[53..90] 2 ASTV----TSCLEPTEVFMDLWPEDHSNWQELSPLEPSD || | | |||||||||||||||||||||||||||
Lecture 11, CS5671 Secondary Structure Prediction Progressive improvement –Chou-Fasman rules –Qian-Sejnowski –Burkhard-Rost PHD –Riis-Krogh Chou-Fasman.
Protein Sequence Alignment and Database Searching.
Prediction to Protein Structure Fall 2005 CSC 487/687 Computing for Bioinformatics.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
BINF6201/8201 Hidden Markov Models for Sequence Analysis
Scoring Matrices Scoring matrices, PSSMs, and HMMs BIO520 BioinformaticsJim Lund Reading: Ch 6.1.
Generic substitution matrix -based sequence similarity evaluation Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W.
Sequence analysis: Macromolecular motif recognition Sylvia Nagl.
Study of Loop Length & Residue Composition of β-Hairpin Motif
Protein Secondary Structure Prediction Based on Position-specific Scoring Matrices Yan Liu Sep 29, 2003.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Neural Networks for Protein Structure Prediction Brown, JMB 1999 CS 466 Saurabh Sinha.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Protein Secondary Structure Prediction
1 LSM2241 AY0910 Semester 2 MiniProject Briefing Round 5.
Secondary structure prediction
Construction of Substitution Matrices
Protein Secondary Structure, Bioinformatics Tools, and Multiple Sequence Alignments Finding Similar Sequences Predicting Secondary Structures Predicting.
Protein structure prediction May 26, 2011 HW #8 due today Quiz #3 on Tuesday, May 31 Learning objectives-Understand the biochemical basis of secondary.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Hidden Markov Models A first-order Hidden Markov Model is completely defined by: A set of states. An alphabet of symbols. A transition probability matrix.
Manually Adjusting Multiple Alignments Chris Wilton.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
Generic substitution matrix based sequence comparison Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W T V A. Total:
Protein Structure Prediction ● Why ? ● Type of protein structure predictions – Sec Str. Pred – Homology Modelling – Fold Recognition – Ab Initio ● Secondary.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Generic substitution matrix based sequence comparison Q: M A T W L I. A: M A - W T V. Scr: 45 -?11 3 Scr: Q: M A T W L I. A: M A W T V A. Total:
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Prediction of Protein Binding Sites in Protein Structures Using Hidden Markov Support Vector Machine.
Learning Sequence Motifs Using Expectation Maximization (EM) and Gibbs Sampling BMI/CS 776 Mark Craven
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Construction of Substitution matrices
(H)MMs in gene prediction and similarity searches.
Copyright OpenHelix. No use or reproduction without express written consent1.
Sparse nonnegative matrix factorization for protein sequence motifs information discovery Presented by Wooyoung Kim Computer Science, Georgia State University.
Secondary structure prediction
Learning Sequence Motif Models Using Expectation Maximization (EM)
Sequence Based Analysis Tutorial
Sequence Based Analysis Tutorial
Protein Structures.
Applying principles of computer science in a biological context
Protein structure prediction
Presentation transcript:

Protein motif /domain Structural unit Functional unit Signature of protein family How are they defined?

How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G D D I BAXB_HUMAN L S E C L K R I G D E L BimS I A Q E L R R I G D E F HRK_HUMAN T A A R L K A L G D E L Egl-1 I G S K L A A M C D D F Binary pattern: L [GSC] [HEQCRK] X [^ILMFV] Basic concept of motif identification 2.

How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G D D I BAXB_HUMAN L S E C L K R I G D E L BimS I A Q E L R R I G D E F HRK_HUMAN T A A R L K A L G D E L Egl-1 I G S K L A A M C D D F Statistical representation G: 5 -> 71% S: 1 -> 14 % C: 1 -> 14 % Basic concept of motif identification 2.

Scoring sequence based on Model Seq: A S L D E L G D E A C D ….... position_1 = s(A/1) + s(S/2) + s(L/3) + s(D / 4) An example of position specific matrixexample

Representation of positional information in specific motif M-C-N-S-S-C-[MV]-G-G-M-N-R-R. Binary patterns: Positional matrix:

Hidden Markov Model A specification on “how events will happen” so that statistical assessment can be readily made Used for Speech recognition and characterizing sequence patterns

Hidden Markov Model Position 1Position 2 Deletion Insertion Position 3 1.) possible states of events or “routes” --Transition Probability

Hidden Markov Model Position 1Position 2 2.) possible AA at a given position -- emission probability ACDEFGHIACDEFGHI

Hidden Markov Model 3.) That’s all. Let’s give it a try and you will know what it is about

Hidden Markov Model How to make a HMM for my petty motif Collect related sequences MEME : bin/meme.cgi *Selection of sequences determines the model*

Identifying motifs using MEME -Multiple EM for Motif Elicitation EM: Expectation maximization (P ). Identifies statistically significant motif(s) in a set of sequences. Outline the occurrence of the motifs at the end of the report

Practice: Identify conserved motifs using MEME 1.) Input your own address. 2.) Load the file of multiple Fasta format sequences. 3.) You can change other options based on your needs.

Two search examples  The outcome of the search is dependent on the inputting set of sequences.  Compose the inputting set based on your research needs. Set1: Mammalian P53 plus mosquito hits Set2: Diverse set of P53 plus mosquito hits

Two search examples Set1: Mammalian P53 plus mosquito hits Set2: Diverse set of P53 plus mosquito hits

Secondary structure prediction Predict the likelihood of amino acid x to be in each of the three (four) types of secondary structure configuration Helix Sheet Turn Coil Coiled-coil is two helices tangled together

Secondary structure prediction - different strategies and algorithms Chou-Fasman / Garnier Method -- based on AA composition Nearest Neighbor / Levin Method -- based on sequence similarity Neural Network / PHD SOPM, DPM, DSC, etc.

Results are given at single amino acid level

Secondary structure prediction --Interpretation of result Seq: - D G S L A D E R K Pre: - B B B H H B B T T What is the likelihood of helix formation here ?

Secondary structure prediction -Accuracy At the amino acid level -- ~ 75% based on testing set IF: Seq: - D G I L A V A S M I V Pre: - B B H H H H H H H H H Length > 9 > 90% chance a helix formation around this region

Secondary structure prediction -Programs There are over a dozen web sites provide 2 nd structure predication service – (tools) AntheWin has a good sample of different approaches and has other associated tools

Practice: using AnathePro to analyze protein secondary structure Open the sequence file. Chose and run a secondary structure predication method from the “Methods” menu LEFT click the left boundary of an alpha helix and then RIGHT the right boundary, perform “helical Wheel”

Helix wheel to discern helix subtype HydrophilicHydrophobicOthers AmphipathicHydrophobicHydrophilic