Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment.

Slides:



Advertisements
Similar presentations
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Advertisements

Fa07CSE 182 CSE182-L4: Database filtering. Fa07CSE 182 Summary (through lecture 3) A2 is online We considered the basics of sequence alignment –Opt score.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
BLAST Sequence alignment, E-value & Extreme value distribution.
Gapped Blast and PSI BLAST Basic Local Alignment Search Tool ~Sean Boyle Basic Local Alignment Search Tool ~Sean Boyle.
OUTLINE Scoring Matrices Probability of matching runs Quality of a database match.
Searching Sequence Databases
Lecture outline Database searches
Heuristic alignment algorithms and cost matrices
Expect value Expect value (E-value) Expected number of hits, of equivalent or better score, found by random chance in a database of the size.
Biological Sequence Analysis Chapter 3. Protein Families Organism 1 Organism 2 Enzym e 1 Enzym e 2 Closely relatedSame Function MSEKKQPVDLGLLEEDDEFEEFPAEDWAGLDEDEDAHVWEDNWDDDNVEDDFSNQLRAELEKHGYKMETS.
Biological Sequence Analysis Chapter 3 Claus Lundegaard.
Database searching. Purposes of similarity search Function prediction by homology (in silico annotation) Function prediction by homology (in silico annotation)
1 1. BLAST (Basic Local Alignment Search Tool) Heuristic Only parts of protein are frequently subject to mutations. For example, active sites (that one.
Midterm Review. Review of previous weeks Pairwise sequence alignment Scoring matrices PAM, BLOSUM, Dynamic programming Needleman-Wunsch (Global) Semi-global.
Heuristic alignment algorithms; Cost matrices 2.5 – 2.9 Thomas van Dijk.
Pairwise Sequence Alignment Part 2. Outline Global alignments-continuation Local versus Global BLAST algorithms Evaluating significance of alignments.
Introduction to bioinformatics
Similar Sequence Similar Function Charles Yan Spring 2006.
Sequence Alignment III CIS 667 February 10, 2004.
Sequences are related Darwin: all organisms are related through descent with modification Related molecules have similar functions in different organisms.
Fa05CSE 182 CSE182-L5: Scoring matrices Dictionary Matching.
Supplementary material Figure S1. Cumulative histogram of the fitness of the pairwise alignments of random generated ESSs. In order to assess the statistical.
Blast heuristics Morten Nielsen Department of Systems Biology, DTU.
Sequence alignment, E-value & Extreme value distribution
Introduction to Bioinformatics From Pairwise to Multiple Alignment.
Test 1 & 2 information Econ 130 Spring Information about Test 1 Number of students taking test: 41 Points possible: 50 Highest score: 47 (2 students)
Sequence comparison: Significance of similarity scores Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
BLAST: Basic Local Alignment Search Tool Urmila Kulkarni-Kale Bioinformatics Centre University of Pune.
Information theoretic interpretation of PAM matrices Sorin Istrail and Derek Aguiar.
Alignment Statistics and Substitution Matrices BMI/CS 576 Colin Dewey Fall 2010.
BLAST What it does and what it means Steven Slater Adapted from pt.
Protein Sequence Alignment and Database Searching.
Pairwise Alignment and Database Searching Henrik Nielsen Protein Post-Translational Modification & Molecular Evolution Groups Center for Biological Sequence.
Pairwise Alignment and Database Searching Slides retirados de... Henrik Nielsen Protein Post-Translational Modification & Molecular Evolution Groups Center.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
Eric C. Rouchka, University of Louisville Sequence Database Searching Eric Rouchka, D.Sc. Bioinformatics Journal Club October.
1 Lecture outline Database searches –BLAST –FASTA Statistical Significance of Sequence Comparison Results –Probability of matching runs –Karin-Altschul.
Database Searches BLAST. Basic Local Alignment Search Tool –Altschul, Gish, Miller, Myers, Lipman, J. Mol. Biol. 215 (1990) –Altschul, Madden, Schaffer,
BLAST Anders Gorm Pedersen & Rasmus Wernersson. Database searching Using pairwise alignments to search databases for similar sequences Database Query.
Comp. Genomics Recitation 3 The statistics of database searching.
Construction of Substitution Matrices
Bioinformatics Ayesha M. Khan 9 th April, What’s in a secondary database?  It should be noted that within multiple alignments can be found conserved.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Significance in protein analysis
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Pairwise Local Alignment and Database Search Csc 487/687 Computing for Bioinformatics.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Database Similarity Search. 2 Sequences that are similar probably have the same function Why do we care to align sequences?
Construction of Substitution matrices
Doug Raiford Phage class: introduction to sequence databases.
Step 3: Tools Database Searching
The statistics of pairwise alignment BMI/CS 576 Colin Dewey Fall 2015.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
Your friend has a hobby of generating random bit strings, and finding patterns in them. One day she come to you, excited and says: I found the strangest.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
9/6/07BCB 444/544 F07 ISU Dobbs - Lab 3 - BLAST1 BCB 444/544 Lab 3 BLAST Scoring Matrices & Alignment Statistics Sept6.
Blast Basic Local Alignment Search Tool
BLAST Anders Gorm Pedersen & Rasmus Wernersson.
Sequence comparison: Significance of similarity scores
Sequence alignment, Part 2
Pairwise Sequence Alignment (cont.)
Sequence comparison: Significance of similarity scores
Basic Local Alignment Search Tool
Projects….
Sequence alignment, E-value & Extreme value distribution
Searching Sequence Databases
Presentation transcript:

Spring 2007 Bioinformatiatics Ch. 2 - Sequence Alignment

PAM 250 Matrix

BLOSUM Matrix 62

SSU Secondary Structure

66

Cytochrome C

~82,000,000 DNA sequences as of April 2008

When is a database hit significant? Problem: –Even unrelated sequences can be aligned (yielding a low score) –How do we know if a database hit is meaningful? –When is an alignment score sufficiently high? Solution: –Determine the range of alignment scores you would expect to get for random reasons (i.e., when aligning unrelated sequences). –Compare actual scores to the distribution of random scores. –Is the real score much higher than you’d expect by chance?

Random alignment scores follow extreme value distributions The exact shape and location of the distribution depends on the exact nature of the database and the query sequence Searching a database of unrelated sequences result in scores following an extreme value distribution No. of Sequences Alignment Score

Significance of a hit: one possible solution (1)Align query sequence to all sequences in database, note scores (2)Fit actual scores to a mixture of two sub-distributions: (a) an extreme value distribution and (b) a normal distribution (3)Use fitted extreme-value distribution to predict how many random hits to expect for any given score (the “E-value”) No. of Sequences Alignment Score

Significance of a hit: example Search against a database of 10,000 sequences. An extreme-value distribution (blue) is fitted to the distribution of all scores. It is found that 99.9% of the blue distribution has a score below 112. This means that when searching a database of 10,000 sequences you’d expect to get 0.1% * 10,000 = 10 hits with a score of 112 or better for random reasons 10 is the E-value of a hit with score 112. You want E-values well below 1! No. of Sequences Alignment Score

Example of Blast E-values agtgaagtacgtgcgttaatgcgatgagtacggtaaaaagaccggcgtgctttatg ggtgagcgggagtttgtgccagcgaagcgtccttggacttagagagtgtcgggttcg ggacgtccggctacagaatagtaaa Semi-random sequence Blast these sequences agcggaccggtacttaagcgcggaccggcgtgtccttggacttagagagtggggac gtccggcttcggagcgggagtgttcgttgtgccagcgactaaaaagagaattaaatat gggtga Non-random sequence