CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment.

Slides:



Advertisements
Similar presentations
Computational Molecular Biology Biochem 218 – BioMedical Informatics Doug Brutlag Professor.
Advertisements

Large scale genomes comparisons Bioinformatics aspects (Introduction) Fredj Tekaia Institut Pasteur EMBO Bioinformatic and Comparative.
1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Algorithms for Alignment of Genomic Sequences Michael Brudno Department of Computer Science Stanford University PGA Workshop 07/16/2004.
GENE TREES Abhita Chugh. Phylogenetic tree Evolutionary tree showing the relationship among various entities that are believed to have a common ancestor.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Types of homology BLAST
Comparative genomics Joachim Bargsten February 2012.
Heuristic Local Alignerers 1.The basic indexing & extension technique 2.Indexing: techniques to improve sensitivity Pairs of Words, Patterns 3.Systems.
CS262 Lecture 9, Win07, Batzoglou History of WGA 1982: -virus, 48,502 bp 1995: h-influenzae, 1 Mbp 2000: fly, 100 Mbp 2001 – present  human (3Gbp), mouse.
Sequence Similarity. The Viterbi algorithm for alignment Compute the following matrices (DP)  M(i, j):most likely alignment of x 1 …x i with y 1 …y j.
Genomic Sequence Alignment. Overview Dynamic programming & the Needleman-Wunsch algorithm Local alignment—BLAST Fast global alignment Multiple sequence.
CS273a Lecture 14, Fall 08, Batzoglou CS273a Lecture 14, Fall 2008 Finding Conserved Elements (1) Binomial method  25-bp window in the human genome 
CS273a Lecture 10, Aut 08, Batzoglou CS273a Lecture 10, Fall 2008 Neutral Substitution Rates.
Comparison of Networks Across Species CS374 Presentation October 26, 2006 Chuan Sheng Foo.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
CS262 Lecture 9, Win07, Batzoglou Fragment Assembly Given N reads… Where N ~ 30 million… We need to use a linear-time algorithm.
Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication.
CS273a Lecture 8, Win07, Batzoglou Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion.
CS273a Lecture 11, Aut 08, Batzoglou Multiple Sequence Alignment.
Some new sequencing technologies. Molecular Inversion Probes.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Bioinformatics and Phylogenetic Analysis
Linear-Space Alignment. Linear-space alignment Using 2 columns of space, we can compute for k = 1…M, F(M/2, k), F r (M/2, N – k) PLUS the backpointers.
CS262 Lecture 9, Win07, Batzoglou Multiple Sequence Alignments.
CS262 Lecture 9, Win07, Batzoglou Phylogeny Tree Reconstruction
CS273a Lecture 9, Aut08, Batzoglou CS273a Lecture 9, Fall 2008 Quality of assemblies—mouse N50 contig length Terminology: N50 contig length If we sort.
CS273a Lecture 10, Aut 08, Batzoglou Multiple Sequence Alignment.
CS273a Lecture 10, Aut 08, Batzoglou CS273a Lecture 10, Fall 2008 Local Alignments.
Alignments and Comparative Genomics. Welcome to CS374! Today: Serafim: Alignments and Comparative Genomics Omkar: Administrivia.
Multiple Sequence Alignments. Lecture 12, Tuesday May 13, 2003 Reading Durbin’s book: Chapter Gusfield’s book: Chapter 14.1, 14.2, 14.5,
[Bejerano Fall10/11] 1 HW1 Due This Fri 10/15 at noon. TA Q&A: What to ask, How to ask.
Finding Orthologous Groups René van der Heijden. What is this lecture about? What is ‘orthology’? Why do we study gene-ancestry/gene-trees (phylogenies)?
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
$399 Personal Genome Service $2,500 Health Compass service $985 deCODEme (November 2007) (April 2008) $350,000 Whole-genome sequencing (November 2007)
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Short Primer on Comparative Genomics Today: Special guest lecture 12pm, Alway M108 Comparative genomics of animals and plants Adam Siepel Assistant Professor.
“Multiple indexes and multiple alignments” Presenting:Siddharth Jonathan Scribing:Susan Tang DFLW:Neda Nategh Upcoming: 10/24:“Evolution of Multidomain.
CIS786, Lecture 8 Usman Roshan Some of the slides are based upon material by Dennis Livesay and David.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Multiple Sequence Alignment CSC391/691 Bioinformatics Spring 2004 Fetrow/Burg/Miller (Slides by J. Burg)
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Phylogeny Estimation: Traditional and Bayesian Approaches Molecular Evolution, 2003
Multiple Sequence Alignment. Definition Given N sequences x 1, x 2,…, x N :  Insert gaps (-) in each sequence x i, such that All sequences have the.
Eric C. Rouchka, University of Louisville SATCHMO: sequence alignment and tree construction using hidden Markov models Edgar, R.C. and Sjolander, K. Bioinformatics.
Last lecture summary. Window size? Stringency? Color mapping? Frame shifts?
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Inferring phylogenetic trees: Maximum likelihood methods Prof. William Stafford Noble Department of Genome Sciences Department of Computer Science and.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Protein and RNA Families
Genome Analysis II Comparative Genomics Jiangbo Miao Apr. 25, 2002 CISC889-02S: Bioinformatics.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. [many slides borrowed from various sources]
Using BLAST for Genomic Sequence Annotation Jeremy Buhler For HHMI / BIO4342 Tutorial Workshop.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Using blast to study gene evolution – an example.
Orthology & Paralogy Alignment & Assembly Alastair Kerr Ph.D. WTCCB Bioinformatics Core [many slides borrowed from various sources]
Multiple Sequence Alignment
Construction of Substitution matrices
Ayesha M.Khan Spring Phylogenetic Basics 2 One central field in biology is to infer the relation between species. Do they possess a common ancestor?
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
Building Phylogenies Maximum Likelihood. Methods Distance-based Parsimony Maximum likelihood.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Phylogeny and the Tree of Life
Sequence similarity, BLAST alignments & multiple sequence alignments
CSCI2950-C Lecture 12 Networks
Evolutionary genomics can now be applied beyond ‘model’ organisms
LSM3241: Bioinformatics and Biocomputing Lecture 4: Sequence analysis methods revisited Prof. Chen Yu Zong Tel:
Inferring phylogenetic trees: Distance and maximum likelihood methods
Evolutionary genetics
Presentation transcript:

CS273a Lecture 9/10, Aut 10, Batzoglou Multiple Sequence Alignment

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Evolution at the DNA level …ACGGTGCAGTTACCA… …AC----CAGTCCACCA… Mutation SEQUENCE EDITS REARRANGEMENTS Deletion Inversion Translocation Duplication

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Orthology and Paralogy HB Human WB Worm HA1 Human HA2 Human Yeast WA Worm Orthologs: Derived by speciation Paralogs: Everything else Orthologs: Derived by speciation Paralogs: Everything else

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Orthology, Paralogy, Inparalogs, Outparalogs

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Genome Evolution – Macro Events Inversions Deletions Duplications

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Synteny maps Comparison of human and mouse

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Synteny maps

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Synteny maps

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Synteny maps

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Building synteny maps Recommended local aligners BLASTZ  Most accurate, especially for genes  Chains local alignments WU-BLAST  Good tradeoff of efficiency/sensitivity  Best command-line options BLAT  Fast, less sensitive  Good for comparing very similar sequences finding rough homology map

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Index-based local alignment Dictionary: All words of length k (~10) Alignment initiated between words of alignment score  T (typically T = k) Alignment: Ungapped extensions until score below statistical threshold Output: All local alignments with score > statistical threshold …… query DB query scan Question: Using an idea from overlap detection, better way to find all local alignments between two genomes?

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Local Alignments

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 After chaining

CS273a Lecture 9/10, Aut 10, Batzoglou Chaining local alignments 1.Find local alignments 2.Chain -O(NlogN) L.I.S. 3.Restricted DP

CS273a Lecture 9/10, Aut 10, Batzoglou Progressive Alignment When evolutionary tree is known:  Align closest first, in the order of the tree  In each step, align two sequences x, y, or profiles p x, p y, to generate a new alignment with associated profile p result Weighted version:  Tree edges have weights, proportional to the divergence in that edge  New profile is a weighted average of two old profiles x w y z Example Profile: (A, C, G, T, -) p x = (0.8, 0.2, 0, 0, 0) p y = (0.6, 0, 0, 0, 0.4) s(p x, p y ) = 0.8*0.6*s(A, A) + 0.2*0.6*s(C, A) + 0.8*0.4*s(A, -) + 0.2*0.4*s(C, -) Result: p xy = (0.7, 0.1, 0, 0, 0.2) s(p x, -) = 0.8*1.0*s(A, -) + 0.2*1.0*s(C, -) Result: p x- = (0.4, 0.1, 0, 0, 0.5)

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Threaded Blockset Aligner Human–Cow HMR – CD Restricted Area Profile Alignment

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Reconstructing the Ancestral Mammalian Genome Human: C Baboon: C Cat: C Dog: G C C or G G

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Neutral Substitution Rates

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Finding Conserved Elements (1) Binomial method  25-bp window in the human genome  Binomial distribution of k matches in N bases given the neutral probability of substitution

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Finding Conserved Elements (2) Parsimony Method  Count minimum # of mutations explaining each column  Assign a probability to this parsimony score given neutral model  Multiply probabilities across 25-bp window of human genome A C A A G

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Finding Conserved Elements

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Finding Conserved Elements (3) GERP

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Phylo HMMs HMM Phylogenetic Tree Model Phylo HMM

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Finding Conserved Elements (3)

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 How do the methods agree/disagree?

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Statistical Power to Detect Constraint L N C: cutoff # mutations D: neutral mutation rate  : constraint mutation rate relative to neutral

CS273a Lecture 9/10, Aut 10, Batzoglou CS273a Lecture 10, Fall 2010 Statistical Power to Detect Constraint L N C: cutoff # mutations D: neutral mutation rate  : constraint mutation rate relative to neutral