Pairwise Sequence Alignment. Three modifications for local alignment The scoring system uses negative scores for mismatches The minimum score for.

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

1 Introduction to Sequence Analysis Utah State University – Spring 2012 STAT 5570: Statistical Bioinformatics Notes 6.1.
Alignment methods Introduction to global and local sequence alignment methods Global : Needleman-Wunch Local : Smith-Waterman Database Search BLAST FASTA.
Sources Page & Holmes Vladimir Likic presentation: 20show.pdf
Measuring the degree of similarity: PAM and blosum Matrix
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Introduction to Bioinformatics
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Sequence Similarity Searching Class 4 March 2010.
Heuristic alignment algorithms and cost matrices
Introduction to Bioinformatics Algorithms Sequence Alignment.
Reminder -Structure of a genome Human 3x10 9 bp Genome: ~30,000 genes ~200,000 exons ~23 Mb coding ~15 Mb noncoding pre-mRNA transcription splicing translation.
Sequence Analysis Tools
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Performance Optimization of Clustal W: Parallel Clustal W, HT Clustal and MULTICLUSTAL Arunesh Mishra CMSC 838 Presentation Authors : Dmitri Mikhailov,
Sequence similarity.
Alignment methods June 26, 2007 Learning objectives- Understand how Global alignment program works. Understand how Local alignment program works.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Multiple Sequence Alignments
Protein Sequence Comparison Patrice Koehl
CECS Introduction to Bioinformatics University of Louisville Spring 2003 Dr. Eric Rouchka Lecture 3: Multiple Sequence Alignment Eric C. Rouchka,
Sequence alignment, E-value & Extreme value distribution
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Sequence comparison: Local alignment
TM Biological Sequence Comparison / Database Homology Searching Aoife McLysaght Summer Intern, Compaq Computer Corporation Ballybrit Business Park, Galway,
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
Developing Pairwise Sequence Alignment Algorithms
Traceback and local alignment Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and Engineering University of Washington.
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
Pairwise alignments Introduction Introduction Why do alignments? Why do alignments? Definitions Definitions Scoring alignments Scoring alignments Alignment.
Multiple Sequence Alignment May 12, 2009 Announcements Quiz #2 return (average 30) Hand in homework #7 Learning objectives-Understand ClustalW Homework#8-Due.
Basic Introduction of BLAST Jundi Wang School of Computing CSC691 09/08/2013.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Amino Acid Scoring Matrices Jason Davis. Overview Protein synthesis/evolution Protein synthesis/evolution Computational sequence alignment Computational.
Alignment methods April 26, 2011 Return Quiz 1 today Return homework #4 today. Next homework due Tues, May 3 Learning objectives- Understand the Smith-Waterman.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Construction of Substitution Matrices
Bioinformatics Multiple Alignment. Overview Introduction Multiple Alignments Global multiple alignment –Introduction –Scoring –Algorithms.
Multiple alignment: Feng- Doolittle algorithm. Why multiple alignments? Alignment of more than two sequences Usually gives better information about conserved.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
Multiple Sequence Alignment. How to score a MSA? Very commonly: Sum of Pairs = SP Compute the pairwise score of all pairs of sequences and sum them. Gap.
Chapter 3 Computational Molecular Biology Michael Smith
HMMs for alignments & Sequence pattern discovery I519 Introduction to Bioinformatics.
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Using BLAST for Genomic Sequence Annotation Jeremy Buhler For HHMI / BIO4342 Tutorial Workshop.
Finding Patterns Gopalan Vivek Lee Teck Kwong Bernett.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
Sequence comparison: Local alignment
Basic Local Alignment Search Tool (BLAST)
Presentation transcript:

Pairwise Sequence Alignment

Three modifications for local alignment The scoring system uses negative scores for mismatches The minimum score for [i,j] is zero. The best score is sought anywhere in matrix, not just last column or row. These changes cause the method to seek high scoring subsequences, which are not penalized for their global effects, which don’t include poor match and which can occur anywhere.

Other methods for Alignment O(N 2 ) is too slow for large databases Heuristic methods based on frequency of shared subsequences Usually look for ungapped small sequences (See, for example, FASTA, BLAST, BLAZE)

Multiple Sequence Alignment

What is multiple sequence alignment? Simple extension of pairwise sequence alignments: Given: o Set of sequences o Scoring match table o Gap penalties Find: o Alignment of sequences such that optimal score is achieved.

Two major applications of multiple sequence alignment Aligning protein families – Takes advantage of richer alphabet – Establishes evolutionary relationships among proteins, starting point for trees – Can identify important functional regions – Can yield structural clues – Gold standard is clear Aligning non-coding DNA sequences – Conserved signals in DNA for control of expression – Can infer evolutionary relationships – Can identify important functional regions – Gold standard difficult to identify…

Why do we care about protein MSA? Useful way to summarize the sequences of related proteins. What do globin sequences look like? Useful way to find important functional amino acids by assessing conservation over many sequences. What is conserved?

Globin sequences 4mbn VLSEGEWQLVLHVWAKVE--ADVAGH 1myt ADFDAVLKCWGPVE--ADYTTM 2hhb A VLSPADKTNVKAAWGKVG--AHAGEY 2mhb A VLSAADKTNVKAAWSKVG--GHAGEY 1pbx A SLSDKDKAAVRALWSKIG--KSADAI 2hhb B VHLTPEEKSAVTALWGKV----NVDEV 2mhb B VQLSGEEKAAVLALWDKV----NEEEV 2lhb. -PIVDTGSVAPLSAAEKTKIRSAWAPVY--STYETS 1mba SLSAAEADLAGKSWAPVFA--NKNAN 1sdh A --PSVYDAAAQLTADVKKDLRDSWKVIGS--DKKGN 1lh GALTESQAALVKSSWEEFN--ANIPKH 1hlb. GGTLAIQAQGDLTLAQKKIVRKTWHQLMRN--KTSF 1ith A GLTAAQIKAIQDHWFLNI-KGCLQAA 1ecd LSADQISTVQASFDKVK------GD 2hbg GLSAAQRQVIAATWKDIAGADNGAGV

Conserved subsequences DRFKHLKTEAEMKASEDLKKHGVTVLTALGAILKKKG PKFAGI-AQADIAGNAAISAHGATVLKKLGELLKAKG PHF-DLSH-----GSAQVKGHGKKVADALTNAVAHVD PHF-DLSH-----GSAQVKAHGKKVGDALTLAVGHLD SHWPDVTP-----GSPHIKAHGKKVMGGIALAVSKID ESFGDLSTPDAVMGNPKVKAHGKKVLGAFSDGLAHLD DSFGDLSNPGAVMGNPKVKAHGKKVLHSFGEGVHHLD PKFKGLTTADELKKSADVRWHAERIINAVDDAVASMD ADFKGKSVAD-IKASPKLRDVSSRIFTRLNEFVNNAA KRLGNVS---QGMANDKLRGHSITLMYALQNFIDQLD SFLKGT--SEVPQNNPELQAHAGKVFKLVYEAAIQLE PQMAGM-SASQLRSSRQMQAHAIRVSSIMSEYVEELD HKFS-SVPLYGLRSNPAYKAQTLTVINYLDKVVDALG TQFAG-KDLESIKGTAPFETHANRIVGFFSKIIGELP GFSGA SDPGVAALGAKVLAQIGVAVSHLG

Why do we care about protein MSA? Establish evolutionary relationships between sequences. What was sequence of events leading to current species? More precisely understand how to model 3D structures. What other amino acids are acceptable in this structure?

The close relationship between msa and evolutionary tree

What is the protein MSA gold standard? Structural alignment! If sequences can be aligned, the alignment should reflect structural similarities. Thus, the alignment should lead to low RMS (in general) and certainly to “match” of common structural and functional elements. But remember: optimal computation is not same as optimal biology...

What about DNA MSA? May be conserved within species (to control expression in concerted fashion) May be conserved across species (using similar control mechanisms) May diverge within and across species for special purpose or evolutionary drift

What about DNA MSA? Much harder problem (4 letters only). What is being tested with multiple alignment of non­coding sequences? Common evolutionary descent Common mode of binding proteins Common overall function No structure to use as gold standard – would need to assess ability of aligned sequences to bind proteins, affect function. In general, statistical/probabilistic methods (EM, Gibbs Sampling) are more effective.

What else you need to know about all MSA methods? Almost all programs will align whatever sequences the user gives as input. They will always return an alignment, even if the sequences are completely unrelated. The biology thinking should be done by you. Most programs will insert gaps. However, if inserted, they are there to stay. You need to check how the program treats end gaps.