Annotation and Alignment of the Drosophila Genomes.

Slides:



Advertisements
Similar presentations
1 Aligning Multiple Genome Sequences With the Threaded Blockset Aligner Blanchette, W., Kent, W.J., Riemer, C., Elnitski, L., Smit, A.F.A., Roskin, K.M.,
Advertisements

Discrete models of biological networks Segunda Escuela Argentina de Matematica y Biologia Cordoba, Argentina June 29, 2007 Reinhard Laubenbacher Virginia.
Finding regulatory modules from local alignment - Department of Computer Science & Helsinki Institute of Information Technology HIIT University of Helsinki.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Sabegh Singh Virdi ASC Processor Group Computer Science Department
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
Sequence Alignment Tutorial #2
Parametric Inference and Drosophila Alignments Female Male Karyotype A project to compare and contrast Drosophila.
HMM Sampling and Applications to Gene Finding and Alignment European Conference on Computational Biology 2003 Simon Cawley * and Lior Pachter + and thanks.
Comparative Genomics of Drosophila Lior Pachter Department of Mathematics & Computer Science UC Berkeley (on sabbatical at Oxford ) Joint work.
Algebraic Statistics for Computational Biology Lior Pachter and Bernd Sturmfels Ch.5: Parametric Inference R. Mihaescu Παρουσίαση: Aγγελίνα Βιδάλη Αλγεβρικοί.
CPM '05 Sensitivity Analysis for Ungapped Markov Models of Evolution David Fernández-Baca Department of Computer Science Iowa State University (Joint work.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Regular Expression Constrained Sequence Alignment Abdullah N. Arslan Assistant Professor Computer Science Department.
Finding genes in human using the mouse Finding genes in mouse using the human Lior Pachter Department of Mathematics U.C. Berkeley.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Tutorial 2: Some problems in bioinformatics 1. Alignment pairs of sequences Database searching for sequences Multiple sequence alignment Protein classification.
Pairwise Alignment Global & local alignment Anders Gorm Pedersen Molecular Evolution Group Center for Biological Sequence Analysis.
Similar Sequence Similar Function Charles Yan Spring 2006.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Robust Alignment of Drosophila Genomes Lior Pachter EECS Joint Colloquium, October 5th 2005.
Computational Genomics Lecture 1, Tuesday April 1, 2003.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Annotation and Alignment of the Drosophila Genomes.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Parametric Inference for Biological Sequence Analysis Lior Pachter and Bernd Sturmfels Mathematics Dept., U.C. Berkeley.
Alignment of Genomic Sequences Wen-Hsiung Li Ecology & Evolution Univ. of Chicago.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
1 Introduction to Bioinformatics 2 Introduction to Bioinformatics. LECTURE 3: SEQUENCE ALIGNMENT * Chapter 3: All in the family.
Whole genome alignments Genome 559: Introduction to Statistical and Computational Genomics Prof. James H. Thomas
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Probabilistic Sequence Alignment BMI 877 Colin Dewey February 25, 2014.
Developing Pairwise Sequence Alignment Algorithms
Annotation and Alignment of the Drosophila Genomes Centro de Ciencas Genomicas, May 29, 2006.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
BIOMETRICS Module Code: CA641 Week 11- Pairwise Sequence Alignment.
Pairwise & Multiple sequence alignments
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Pairwise Sequence Alignment (I) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 22, 2005 ChengXiang Zhai Department of Computer Science University.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
* only 17% of SNPs implicated in freshwater adaptation map to coding sequences Many, many mapping studies find prevalent noncoding QTLs.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Motif finding with Gibbs sampling CS 466 Saurabh Sinha.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
Using BLAST for Genomic Sequence Annotation Jeremy Buhler For HHMI / BIO4342 Tutorial Workshop.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
A Table-Driven, Full-Sensitivity Similarity Search Algorithm Gene Myers and Richard Durbin Presented by Wang, Jia-Nan and Huang, Yu- Feng.
Introduction to Phylogenetic trees Colin Dewey BMI/CS 576 Fall 2015.
Pairwise Sequence Alignment Part 2. Outline Summary Local and Global alignments FASTA and BLAST algorithms Evaluating significance of alignments Alignment.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Doug Raiford Phage class: introduction to sequence databases.
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
What is genomics? Genes, promoters, regulatory elements, alignments, trees, …
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
Transcription factor binding motifs (part II) 10/22/07.
Finding Motifs Vasileios Hatzivassiloglou University of Texas at Dallas.
BLAST: Database Search Heuristic Algorithm Some slides courtesy of Dr. Pevsner and Dr. Dirk Husmeier.
COURSE OF BIOINFORMATICS Exam_30/01/2014 A.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Last lecture summary. Sequence alignment What is sequence alignment Three flavors of sequence alignment Point mutations, indels.
Nora Pierstorff Dept. of Genetics University of Cologne
Presentation transcript:

Annotation and Alignment of the Drosophila Genomes

Genes or Regulation ? “10,516 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura / melanogaster divergence” “ Cis -regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis -regulatory elements is flexible” Richards et al., Comparative genome sequencing of Drosophila pseudoobscura : Chromosomal, gene, and cis -element evolution, Genome Res., Jan 2005.

Genes or Regulatory Elements ? “10,516 10,867 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura / melanogaster divergence” “ Cis -regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis -regulatory elements is flexible” Richards et al., Comparative genome sequencing of Drosophila pseudoobscura : Chromosomal, gene, and cis -element evolution, Genome Res., Jan 2005.

BP England, U Heberlein, R Tjian. Purified Drosophila transcription factor, Adh distal factor-1 (Adf-1), binds to sites in several Drosophila promoters and activates transcription, J Biol Chem 1990.

S. Chatterji and L. Pachter, GeneMapper: Reference based annotation with GeneMapper,2005.

Genes or Regulatory Elements ? “10,516 10,867 putative orthologs have been identified as a core gene set conserved over 25–55 million years (Myr) since the pseudoobscura / melanogaster divergence” “ Cis -regulatory sequences are more conserved than random and nearby sequences between the species—but the difference is slight, suggesting that the evolution of cis -regulatory elements is flexible” Richards et al., Comparative genome sequencing of Drosophila pseudoobscura : Chromosomal, gene, and cis -element evolution, Genome Res., Jan 2005.

DroAna_ _ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_ _ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_ _ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_ _ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of coding sequence DroAna_ _ CTGAAGGAAT TCTATATT AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroMoj_ _ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_ _ CTGCGGGATTAGGAGTCATTAGAGT GCGGAAAAGCGG GTT- DroVir_ _ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT GAAGAATAGATC CTTT *** * * * DroAna_ _ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_ _ ----TATTTACTCAC DroPse_1_ TGTACTTAC DroSim_ _ ATTCTATGGACTCAC DroVir_ _ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** ** Alignment of non-coding sequence

DroAna_ _ GTCGCTCAACCAGCATTTGCAAAAGTCGCAGAACTTGCGCTCATTGGATTTCCAGTACTC DroMel_4_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroMoj_ _ GTCGCTTAACCAGCATTTACAGAAATCGCAATACTTGCGTTCATTGGATTTCCAGTACTC DroPse_1_ GTCGCTCAGCCAGCACTTGCAGAAGTCGCAGTACTTGCGCTCGTTTGATTTCCAGAATTC DroSim_ _ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTGCGCTCGTTTGATTTCCAGTACTC DroVir_ _ GTCGCTCAACCAGCATTTGCAGAAGTCGCAATACTTGCGTTCATTCGACTTCCAGTACTC DroYak_1_ GTCGCTCAGCCAGCATTTGCAGAAGTCGCAGAACTTCCGCTCGTTTGACTTCCAGTACTC ****** * ****** ** ** ** ***** **** ** ** ** ** ****** * ** Alignment of coding sequence Alignment of non-coding sequence droAna CTGAAGGAATTCTA--TATTAAAG dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA CACATAAA--CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG TGCGGAAAAGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA TAAACAA----TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG TGAAGAATAGATCCT-TTATTT *** * * * * droAna AAGATTTCTCATCATTGGTTGAATC ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_ AAATATTT TATTGACTCAC dp3.chr4_group TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_ AAATATTTGGTCCACTCAC droYak1.chr2L CATAAACTCAC *** **

Per site analysisGroup 1 mean per site % identity51.3% 47.8% Group 2 mean per site % identity47.8%42.9% Difference of means (group 1 – group 2)3.6%8.4%4.9% Difference of means resampling p-value E-5 Distribution comparison KS p-value E-6 Per base analysisGroup 1 mean per base % identity47.8% 46.3% Group 2 mean per base % identity46.3%42.4% Difference of means (group 1 – group 2)1.5%5.4%3.9% Richards et al., Comparative genome sequencing of Drosophila pseudoobscura : Chromosomal, gene, and cis -element evolution, Genome Res., Jan 2005.

dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC How is an alignment made from two sequences? >dm2.chr2L CTGCGGGATTAGGGGTCATTAGAGTGCCGAAAAGCGAGTTTATTCTATGGACTCAC >dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCGTGTACTTAC ? Given two sequences of lengths n,m : n=56 m=64

dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCAC DroPse_1_ TGTACTTAC Each alignment can be summarized by counting the number of matches ( #M ), mismatches ( #X ), gaps ( #G ), and spaces ( #S ).

dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGT-TTATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG dm2.chr2L TATGGACTCAC dp3.chr4_group3 TGT--ACTTAC DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroMel_4_ ATTCTATGGACTCAC DroPse_1_ TGTACTTAC Each alignment can be summarized by counting the number of matches ( #M ), mismatches ( #X ), gaps ( #G ), and spaces ( #S ). #M=31, #X=22, #G=3, #S=12 #M=27, #X=18, #G=3, #S=28 2(#M+#X)+#S=112 so #X,#G and #S suffice to specify a summary.

The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points: (22,3,12)(18,3,28)

The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points: (22,3,12)(18,3,28) In the example of our two sequences there are different alignments.

The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points: (22,3,12)(18,3,28) In the example of our two sequences there are different alignments, but only different summaries. So we don’t need to plot that many points.

The summary of an alignment is a point in 3 dimensional space. For example, the two alignments just shown correspond to the points: (22,3,12)(18,3,28) In the example of our two sequences there are different alignments, but only different summaries. So we don’t need to plot that many points. But is still quite a large number. Fortunately, there are only 69 vertices on the convex hull of the points. These are the interesting ones, and we can even draw them…

>mel CTGCGGGATTAGGGGTCATTAGAGTGCCGA AAAGCGAGTTTATTCTATGGAC >pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGA GGAGAGGCCATCATCGTGTAC For the sequences: 49 #x=24, #S=10, #G=2 There are eight alignments that have this summary. the alignment polytope is:

mel CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAGA GTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAGA GTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC mel CTGCGGGATTAGGGGTCATTAG AGTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATC-GTGTAC mel CTGCGGGATTAGGGGTCATTAG AGTGCCGAAAAGCGAGTTTATTCTATGGAC pse CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG-TGTAC

mel CTGCGGGATTAGGGGTCATTAGAGT===------===GCCGAAAAGCGAGTTTATTCTA=TGGAC pse CTGGAAGAGTTTTGATTAGTAG===GGGATCCATGGGGGCGAGGAGAGGCCATCATC==GTGTAC Consensus at a vertex

The vertices of the polytope have special significance. Given parameters for a model, e.g. the default parameters for MULTIZ: M = 100, X = -100, S = -30, G = -400 the summary is the result of maximizing the linear form -200*(#X)-400*(#G)-80*(#S) over the polytope. Thus, the vertices of the polytope correspond to optimal alignments. 49 #x=24, #S=10, #G=2

What is usually done, is that a single set of parameters is specified ( M = 100, X = -100, S = -30, G = -400 is a standard default) and then the optimal vertex is identified using dynamic programming. An alignment optimal for the vertex is then selected. The running time of the algorithm is O(nm) [Needleman-Wunsch, 1970, Smith-Waterman, 1981] and it requires O(n+m) space [Hirschberg 1975]. Standard scoring schemes are: Parameters Model M,X,S Jukes-Cantor with linear gap penalty M,X,S,G Jukes-Cantor with affine gap penalty M,X TS,X TV,S,G Kimura-2 parameter with affine gap penalty Needleman-Wunsch Alignment

W i,j = S*W i-1,j +S*W i,j-1 +(X or M)*W i-1,j-1 A A C A T T A G A AGATTACCACA Score of best alignment of positions [1,i] and [1,j] in each sequence Needleman-Wunsch algorithm max plus

Building Drosophila whole genome multiple alignments MAVID MULTIZ (currently no D. erecta )

DroAna_ _ CTGAAGGAAT TCTATATT AAAGAAGATTTCTCATCATTGGTTG DroMel_4_ CTGCGGGATTAGGGGTCATTAGAGT GCCGAAAAGCGA GTTT DroMoj_ _ CTGGAATAGTTAATTTCATTGTAACACATAAACGTTTTAAATTCTATTGAAA DroPse_1_ CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG---- DroSim_ _ CTGCGGGATTAGGAGTCATTAGAGT GCGGAAAAGCGG GTT- DroVir_ _ CTGCAGCAGTTAAATA-ATTGTAATAAACAATTCTCT--AATTTGGTCCAAA DroYak_1_ CTGCGGGATTAGCGGTCATTGGTGT GAAGAATAGATC CTTT *** * * * DroAna_ _ AATC-----ACTTAC DroMel_4_ ATTCTATGGACTCAC DroMoj_ _ ----TATTTACTCAC DroPse_1_ TGTACTTAC DroSim_ _ ATTCTATGGACTCAC DroVir_ _ ----TATTTACTCAC DroYak_1_ ATTTCATAAACTCAC *** ** N. Bray and L. Pachter, MAVID: Constrained ancestral alignment of multiple sequences, Genome Research 14 (2004) p MAVID

Needleman-Wunsch

droAna CTGAAGGAATTCTA--TATTAAAG dm2.chr2L CTGCGGGATTAGGGGTCATTAGAG TGCCGAAAAGCGAGT-TTATTC droMoj1.contig_2959 CTGGAATAGTTAATTTCATTGTAA CACATAAA--CGTTTTAAATTC dp3.chr4_group3 CTGGAAGAGTTTTGATTAGTAGGGGATCCATGGGGGCGAGGAGAGGCCATCATCG droSim1.chr2L CTGCGGGATTAGGAGTCATTAGAG TGCGGAAAAGCGGG--TTATTC droVir1.scaffold_6 CTGCAGCAGTTAA-ATAATTGTAA TAAACAA----TTCTCTAATTT droYak1.chr2L CTGCGGGATTAGCGGTCATTGGTG TGAAGAATAGATCCT-TTATTT *** * * * * droAna AAGATTTCTCATCATTGGTTGAATC ACTTAC dm2.chr2L TATGGACTCAC droMoj1.contig_ AAATATTT TATTGACTCAC dp3.chr4_group TGT--ACTTAC droSim1.chr2L TATGGACTCAC droVir1.scaffold_ AAATATTTGGTCCACTCAC droYak1.chr2L CATAAACTCAC *** ** Blanchette et al., Aligning multiple sequences with the threaded blockset aligner, Genome Research 14 (2004) p MULTIZ

Needleman-Wunsch

One (possibly wrong) alignment is not enough: the history of parametric inference 1992: Waterman, M., Eggert, M. & Lander, E. Parametric sequence comparisons, Proc. Natl. Acad. Sci. USA 89, : Gusfield, D., Balasubramanian, K. & Naor, D. Parametric optimization of sequence alignment, Algorithmica 12, : Wang, L., Zhao, J. Parametric alignment of ordered trees, Bioinformatics, : Fernández-Baca, D., Seppäläinen, T. & Slutzki, G. Parametric Multiple Sequence Alignment and Phylogeny Construction, Journal of Discrete Algorithms, XPARAL by Kristian Stevens and Dan Gusfield

Whole Genome Parametric Alignment Colin Dewey, Peter Huggins, Lior Pachter, Bernd Sturmfels and Kevin Woods Mathematics and Computer Science Parametric alignment in higher dimensions. Faster new algorithms. Deeper understanding of alignment polytopes. Biology Whole genome parametric alignment. Biological implications of alignment parameters. Alignment with biology rather than for biology.