1 Multiple sequence alignment Lesson 4. 2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG.

Slides:



Advertisements
Similar presentations
Blast to Psi-Blast Blast makes use of Scoring Matrix derived from large number of proteins. What if you want to find homologs based upon a specific gene.
Advertisements

Psi-BLAST, Prosite, UCSC Genome Browser Lecture 3.
Molecular Evolution Revised 29/12/06
Structural bioinformatics
Intro to Bioinformatics Summary. What did we learn Pairwise alignment – Local and Global Alignments When? How ? Tools : for local blast2seq, for global.
From Pairwise to Multiple Alignment. WHATS TODAY? Multiple Sequence Alignment- CLUSTAL MOTIF search.
Sequence Similarity Searching Class 4 March 2010.
. Class 5: Multiple Sequence Alignment. Multiple sequence alignment VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG.
From Pairwise to Multiple Alignment. WHATS TODAY? Multiple Sequence Alignment- CLUSTAL MOTIF search.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
Readings for this week Gogarten et al Horizontal gene transfer….. Francke et al. Reconstructing metabolic networks….. Sign up for meeting next week for.
Progressive MSA Do pair-wise alignment Develop an evolutionary tree Most closely related sequences are then aligned, then more distant are added. Genetic.
Identifying functional residues of proteins from sequence info Using MSA (multiple sequence alignment) - search for remote homologs using HMMs or profiles.
1 Multiple sequence alignment Lesson What is a multiple sequence alignment?
Multiple sequence alignments and motif discovery Tutorial 5.
Sequence homology and alignment
Similar Sequence Similar Function Charles Yan Spring 2006.
Multiple sequence alignment Conserved blocks are recognized Different degrees of similarity are marked.
School of Medical Education Liverpool, L69 3GE, UK Phylogeny of the Human Protein Tyrosine Kinases Dr John Smith Abstract The.
Exploring Protein Sequences Tutorial 5. Exploring Protein Sequences Multiple alignment –ClustalW Motif discovery –MEME –Jaspar.
Multiple Sequence Alignments
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Multiple Sequence Alignment. Terminology n Motif: the biological object one attempts to model - a functional or structural domain, active site, phosphorylation.
Comparative Genomics of the Eukaryotes
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
Multiple Alignment Modified from Tolga Can’s lecture notes (METU)
Biology 4900 Biocomputing.
Multiple Sequence Alignment
Multiple sequence alignment (MSA) Usean sekvenssin rinnastus Petri Törönen Help contributed by: Liisa Holm & Ari Löytynoja.
CRB Journal Club February 13, 2006 Jenny Gu. Selected for a Reason Residues selected by evolution for a reason, but conservation is not distinguished.
Good solutions are advantageous Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Multiple sequence alignments Introduction to Bioinformatics Jacques van Helden Aix-Marseille Université (AMU), France Lab.
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Tutorial 4 Substitution matrices and PSI-BLAST 1.
Study of Protein Prediction Related Problems Ph.D. candidate Le-Yi WEI 1.
Protein Sequence Analysis - Overview - NIH Proteomics Workshop 2007 Raja Mazumder Scientific Coordinator, PIR Research Assistant Professor, Department.
COT 6930 HPC and Bioinformatics Multiple Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
Techniques for Protein Sequence Alignment and Database Searching (part2) G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Sequence Based Analysis Tutorial March 26, 2004 NIH Proteomics Workshop Lai-Su L. Yeh, Ph.D. Protein Science Team Lead Protein Information Resource at.
Exercises Pairwise alignment Homology search (BLAST) Multiple alignment (CLUSTAL W) Iterative Profile Search: Profile Search –Pfam –Prosite –PSI-BLAST.
Doug Raiford Lesson 5.  Dynamic programming methods  Needleman-Wunsch (global alignment)  Smith-Waterman (local alignment)  BLAST Fixed: best Linear:
Construction of Substitution matrices
Blast 2.0 Details The Filter Option: –process of hiding regions of (nucleic acid or amino acid) sequence having characteristics.
David Wishart February 18th, 2004 Lecture 3 BLAST (c) 2004 CGDN.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Introduction to Bioinformatics - Tutorial no. 5 MEME – Discovering motifs in sequences MAST – Searching for motifs in databanks TRANSFAC – the Transcription.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
V diagonal lines give equivalent residues ILS TRIVHVNSILPSTN V I L S T R I V I L P E F S T Sequence A Sequence B Dot Plots, Path Matrices, Score Matrices.
1 Multiple Sequence Alignment and Molecular Evolution.
Biology 224 Instructor: Tom Peavy October 18 & 20, Multiple Sequence.
1 Homology and sequence alignment.. Homology Homology = Similarity between objects due to a common ancestry Hund = Dog, Schwein = Pig.
Tutorial 4 Substitution matrices and PSI-BLAST
Uncovering the Protein Tyrosine Phosphatome in Cattle
Multiple sequence alignment (msa)
The ideal approach is simultaneous alignment and tree estimation.
Overview of Multiple Sequence Alignment Algorithms
Dot Plots, Path Matrices, Score Matrices
Sequence Based Analysis Tutorial
Protein structure prediction.
MULTIPLE SEQUENCE ALIGNMENT
Presentation transcript:

1 Multiple sequence alignment Lesson 4

2 VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGTSSNIGS--ITVNWYQQLPG LRLSCSSSGFIFSS--YAMYWVRQAPG LSLTCTVSGTSFDD--YYSTWVRQPPG PEVTCVVVDVSHEDPQVKFNWYVDG-- ATLVCLISDFYPGA--VTVAWKADS-- AALGCLVKDYFPEP--VTVSWNSG--- VSLTCLVKGFYPSD--IAVEWWSNG-- Like pairwise alignment BUT compare n sequences instead of 2 Each row represents an individual sequence Each column represents the ‘same’ position May be gaps in some sequences

3 MSA & Evolution MSA can give you a picture of the forces that shape evolution!  Important amino acids or nucleotides are not “ allowed ” to mutate  Less important positions change more easily

4 Conserved positions  Columns where all the sequences contain the same amino acids or nucleotides  Important for the function or structure VTISCTGSSSNIGAG-NHVKWYQQLPG VTISCTGSSSNIGS--ITVNWYQQLPG LRLSCTGSGFIFSS--YAMYWYQQAPG LSLTCTGSGTSFDD-QYYSTWYQQPPG

5 Consensus Sequence  A consensus sequence holds the most frequent character of the alignment at each column TGTTCTA TGTTCAA TCTTCAA TGTTCAA

6 Profile TGTTCTA TGTTCAA TCTTCAA A T..0000C..0000G Profile = PSSM – Position Specific Score (probability) Matrix

7 Alignment methods There is no available optimal solution for MSA – all methods are heuristics:  Progressive/hierarchical alignment (Clustal)  Iterative alignment (mafft, muscle)

8 ABCDEABCDE EDCBA A 11B 13C 1022D 1111E Compute the pairwise alignments for all against all (6 pairwise alignments) the similarities are stored in a table First step: Progressive alignment

9 A D C B E Cluster the sequences to create a tree (guide tree): represents the order in which pairs of sequences are to be alignedrepresents the order in which pairs of sequences are to be aligned similar sequences are neighbors in the treesimilar sequences are neighbors in the tree distant sequences are distant from each other in the treedistant sequences are distant from each other in the tree Second step: EDCBA A 11B 13C 1022D 1111E The guide tree is imprecise and is NOT the tree which truly describes the relationship between the sequences!

10 Third step: A D C B E 1. Align the most similar (neighboring) pairs sequence

11 Third step: A D C B E 2. Align pairs of pairs sequence profile

12 Third step: A D C B E 3. Align out group sequence profile Main disadvantages: 1.sub-optimal tree topology 2.Misalignments resulting from globally aligning a pair of sequences will only cause further deterioration

13 ABCDEABCDE Iterative alignment Guide tree MSA Pairwise distance table A D C B EDCBA A 11B 13C 1022D 1111E Iterate until the MSA doesn ’ t change (convergence) E

14 Searching for remote homologs  Sometimes BLAST isn ’ t enough.  Large protein family, and BLAST only gives close members. We want more distant members  PSI-BLAST  Profile HMMs

15 Profile HMM  Similar to PSI-BLAST: also uses a profile  Takes into account:  Dependence among sites (if site n is conserved, it is likely that site n+1 is conserved  part of a domain  The probability of a certain column in an alignment

16 PSI BLAST Vs. profile HMM Profile HMM PSI BLAST More exact Slower Less exact Faster

17 Case study: Using homology searching  The human kinome

18 Kinases and phosphatases

19 Multi-tasking enzymes  Signal transduction  Metabolism  Transcription  Cell-cycle  Differentiation   Function of nervous and immune system  …  And more

20 How many kinases in the human genome?  1950 ’ s, discovery of that reversible phosphorylation regulates the activity of glycogen phosphorylase  1970 ’ s, advent of cloning and sequencing produced a speculation that the vertebrate genome encodes as many as 1001 kinases

21  2001 – human genome sequence …  As well – databases of Genbank, Swissprot, and dbEST  How can we find out how many kinases are out there? How many kinases in the human genome?

22 The human kinome  In 2002, Manning, Whyte, Martinez, Hunter and Sudarsanam set out to: 1. Search and cross-reference all these databases for all kinases 2. Characterize all found kinases

23 ePKs and aPKs Eukaryotic protein kinase (majority) catalytic domain Atypical protein kinases Sequence homology of the catalytic domain; additional regulatory domains are non-homologous No sequence homology to ePKs; some aPK subfamilies have structural similarity to ePKs

24 The search  Several profiles were built: based on the catalytic domain of: (a) 70 known ePKs from yeast, worm, fly, and human with >50% identity in the ePK domain (a) 70 known ePKs from yeast, worm, fly, and human with >50% identity in the ePK domain (b) each subfamily of known aPKs  HMM-profile searches and PSI-BLAST searches were performed

25 The results…  478 apKs  40 ePKs  Total of 518 kinases in the human genome (half of the prediction in the 1970 ’ s)

26 Classifying the kinases 1. Classification based on the catalytic domain 2. Classification based on the regulatory domains 189 sub-families of kinases

27 Comparison to other species  209 subfamilies of ePKs in human, worm, yeast and fly

28  The human genome has x2 kinases (in number) as fly or worm. Many are aPKs.  Most of them are receptor tyrosine kinases (RTKs) The human-expanded kinase families function predominantly in processes of the:  Nervous system  Immune system  Angiogenesis  Hemopoiesis

29 The discovery of new kinases: a new front for battling human diseases

30 Correlating with human diseases  160 kinases mapped to amplicons seen in tumors  80 kinases mapped to amplicons in other major illnesses  Usually kinases are over-expressed in cancer and other diseases

31 Correlating with human diseases  6 kinase inhibitors have been approved till today for the use against cancer  >70 other inhibitors are in clinical trials