1 Genome sizes (sample). 2 Some genomics history 1995: first bacterial genome, Haemophilus influenza, 1.8 Mbp, sequenced at TIGR first use of whole-genome.

Slides:



Advertisements
Similar presentations
Global Sequence Alignment by Dynamic Programming.
Advertisements

Sequence allignement 1 Chitta Baral. Sequences and Sequence allignment Two main kind of sequences –Sequence of base pairs in DNA molecules (A+T+C+G)*
1 ALIGNMENT OF NUCLEOTIDE & AMINO-ACID SEQUENCES.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Definitions Optimal alignment - one that exhibits the most correspondences. It is the alignment with the highest score. May or may not be biologically.
C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E Alignments 1 Sequence Analysis.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez.
Sequence Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2005.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics and Phylogenetic Analysis
Alignment methods and database searching April 14, 2005 Quiz#1 today Learning objectives- Finish Dotter Program analysis. Understand how to use the program.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez June 23, 2004.
Sequence similarity.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 20, 2003.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Dynamic Programming. Pairwise Alignment Needleman - Wunsch Global Alignment Smith - Waterman Local Alignment.
Developing Pairwise Sequence Alignment Algorithms Dr. Nancy Warter-Perez May 10, 2005.
Pairwise alignment Computational Genomics and Proteomics.
Alignment methods II April 24, 2007 Learning objectives- 1) Understand how Global alignment program works using the longest common subsequence method.
Biological entities (sequences, taxa) share common ancestry
Sequence comparison: Local alignment
1 Introduction to Bioinformatics 2 Introduction to Bioinformatics. LECTURE 3: SEQUENCE ALIGNMENT * Chapter 3: All in the family.
Sequencing a genome and Basic Sequence Alignment
Comparative Genomics of the Eukaryotes
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
Developing Pairwise Sequence Alignment Algorithms
Sequence Alignment.
Pairwise Sequence Alignment
Pairwise & Multiple sequence alignments
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Tentative definition of bioinformatics Bioinformatics, often also called genomics, computational genomics, or computational biology, is a new interdisciplinary.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
Genome Alignment. Alignment Methods Needleman-Wunsch (global) and Smith- Waterman (local) use dynamic programming Guaranteed to find an optimal alignment.
Sequence Alignment Goal: line up two or more sequences An alignment of two amino acid sequences: …. Seq1: HKIYHLQSKVPTFVRMLAPEGALNIHEKAWNAYPYCRTVITN-EYMKEDFLIKIETWHKP.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment (II) (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 27, 2005 ChengXiang Zhai Department of Computer Science University.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
JM - 1 Introduction to Bioinformatics: Lecture III Genome Assembly and String Matching Jarek Meller Jarek Meller Division of Biomedical.
Sequencing a genome and Basic Sequence Alignment
Construction of Substitution Matrices
Function preserves sequences Christophe Roos - MediCel ltd Similarity is a tool in understanding the information in a sequence.
1 Genome sizes (sample). 2 Some genomics history 1995: first bacterial genome, Haemophilus influenza, 1.8 Mbp, sequenced at TIGR first use of whole-genome.
Bioinformatics Chapter 4: Sequence comparison ____________________________________________________________________________________________________________________.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Using BLAST for Genomic Sequence Annotation Jeremy Buhler For HHMI / BIO4342 Tutorial Workshop.
Basic Local Alignment Search Tool BLAST Why Use BLAST?
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke.
Alignment methods April 21, 2009 Quiz 1-April 23 (JAM lectures through today) Writing assignment topic due Tues, April 23 Hand in homework #3 Why has HbS.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Burkhard Morgenstern Institut für Mikrobiologie und Genetik Molekulare Evolution und Rekonstruktion von phylogenetischen Bäumen WS 2006/2007.
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
MICROBIOLOGIA GENERALE Prokaryotic genomes. The prokaryotic genome.
More on HMMs and Multiple Sequence Alignment BMI/CS 776 Mark Craven March 2002.
Substitution Matrices and Alignment Statistics BMI/CS 776 Mark Craven February 2002.
MICROBIOLOGIA GENERALE Prokaryotic genomes. The Escherichia coli nucleoid.
Sequence comparison: Local alignment
Pairwise Sequence Alignment
BCB 444/544 Lecture 7 #7_Sept5 Global vs Local Alignment
Presentation transcript:

1 Genome sizes (sample)

2 Some genomics history 1995: first bacterial genome, Haemophilus influenza, 1.8 Mbp, sequenced at TIGR first use of whole-genome shotgun for a bacterium Fleischmann et al became most-cited paper of the year (>3000 citations) : 2nd and 3rd bacteria published by TIGR: Mycoplasma genitalium, Methanococcus jannaschii 1996: first eukaryote, S. cerevisiae (yeast), 13 Mbp, sequenced by a consortium of (mostly European) labs 1997: E. coli finished (7th bacterial genome) : T. pallidum (syphilis), B. burgdorferi (Lyme disease), M. tuberculosis, Vibrio cholerae, Neisseria meningitidis, Streptococcus pneumoniae, Chlamydia pneumoniae [all at TIGR] 2000: fruit fly, Drosophila melanogaster 2000: first plant genome, Arabidopsis thaliana 2001: human genome, first draft 2002: malaria genome, Plasmodium falciparum 2002: anthrax genome, Bacillus anthracis TODAY (Sept 1, 2010): 1214 complete microbial genomes! (two years ago: 700) 3424 microbial genomes in progress! (two years ago: 1199) 838 eukaryotic genomes complete or in progress! (two years ago: 476)

3

New directions: sequencing ancient DNA New directions: sequencing ancient DNA (some assembly required)

5 J. P. Noonan et al., Science 309, (2005)

6 Published by AAAS J. P. Noonan et al., Science 309, (2005) Fig. 1. Schematic illustration of the ancient DNA extraction and library construction process

7 Published by AAAS J. P. Noonan et al., Science 309, (2005) Fig. 2. Characterization of two independent cave bear genomic libraries Fig. 2. Predicted origin of 9035 clones from library CB1 (A) and 4992 clones from library CB2 (B) are shown, as determined by BLAST comparison to GenBank and environmental sequence databases. Other refers to viral or plasmid-derived DNAs. Distribution of sequence annotation features in 6,775 nucleotides of carnivore sequence from library CB1 (C) and 20,086 nucleotides of carnivore sequence from library CB2 (D) are shown as determined by alignment to the July 2004 dog genome assembly.

8

9

10 Published by AAAS H. N. Poinar et al., Science 311, (2006) Fig. 1. Characterization of the mammoth metagenomic library, including percentage of read distributions to various taxa

11

Published by AAAS R. E. Green et al., Science 328, (2010) Fig. 1 Samples and sites from which DNA was retrieved A Draft Sequence of the Neandertal Genome Richard E. Green et al., Science 7 May 2010

Published by AAAS R. E. Green et al., Science 328, (2010) Fig. 2 Nucleotide substitutions inferred to have occurred on the evolutionary lineages leading to the Neandertals, the human, and the chimpanzee genomes

14 Journals The very best: Science Nature PLoS Biology

15 Bioinformatics Journals Bioinformatics bioinformatics.oxfordjournals.org BMC Bioinformatics PLoS Computational Biology compbiol.plosjournals.org Journal of Computational Biology

16 Radically new journals PLoS ONE Biology Direct Reviewers’ comments are public Both journals can be annotated by readers Papers can be negative results, confirmations of other results, or brand new

17 Genomics Journals (which publish computational biology papers) Genome Biology genomebiology.com Genome Research Nucleic Acids Research nar.oxfordjournals.org BMC Genomics

Before assembly… … we need to cover a basic sequence alignment algorithm 18

19 PAIRWISE ALIGNMENT (ALIGNMENT OF TWO NUCLEOTIDE OR TWO AMINO-ACID SEQUENCES) This and the following slides are borrowed from Prof. Dan Graur, Univ. of Houston

20 Any two organisms or two sequences share a common ancestor in their past ancestor descendant 1descendant 2

21 ancestor (5 MYA)

22 ancestor (120 MYA)

23 ancestor (1,500 MYA)

24 By comparing homologous characters, we can reconstruct the evolutionary events that have led to the formation of the extant sequences from the common ancestor. Homology

25 Sequence alignment involves the identification of the correct location of deletions and insertions that have occurred in either of the two lineages since their divergence from a common ancestor.

26 ACTGGGCCCAAATC 1 deletion 1 substitution 1 insertion 1 substitution AACAGGGCCCAAATCCTGGGCCCAGATC -CTGGGCCCAGATC ACTGGGCCCAAATC *********.*** Correct alignment

There are two modes of alignment. Local alignment determines if sub-segments of one sequence (A) are present in another (B). Local alignment methods have their greatest utility in database searching and retrieval (e.g., BLAST). In global alignment, each element of sequence A is compared with each element in sequence B. Global alignment algorithms are used in comparative and evolutionary studies.

28 A pairwise alignment consists of a series of paired bases, one base from each sequence. There are three types of pairs: (1) matches = the same nucleotide appears in both sequences. (2) mismatches = different nucleotides are found in the two sequences. (3) gaps = a base in one sequence and a null base in the other. GCGGCCCATCAGGTAGTTGGTG-G GCGTTCCATC--CTGGTTGGTGTG

29 Motivation for sequence alignment Study function –Sequences that are similar probably have similar functions. Study evolution –Similarity is mostly indicative of common ancestry.

30 An example of pairwise alignment of an unknown protein with a known one (A) Glutaredoxin, Bacteriophage T4 from E. coli, 87 aa (B) Unknown protein - 93 aa Glutar KVYGYDSNIHKCVYCDNAKRLLTVKKQPFEFINIMPEKGV---FDD—EKIAELLTKLGR..::.. :: :.: :: :.:.:.... :: ::. :... Unknow EIYGIPEDVAKCSGCISAIRLCFEKGYDYEIIPVLKKANNQLGFDYILEKFDECKARANM Glutar DTQIGLTMPQVFAPDGSHIGGFDQLREYF.:...:..:. ::..::.. :.... Unknow QTR-PTSFPRIFV-DGQYIGSLKQFKDLY Is the unknown protein a glutaredoxin? Unknown protein, Bacteriophage 65 from Aeromonas sp. 93 aa

31 Alignment algorithms

32 Aim: Given certain criteria, find the alignment associated with the best score from among all possible alignments. The OPTIMAL ALIGNMENT

33 The number of possible alignments may be astronomical. where n and m are the lengths of the two sequences to be aligned.

34 The number of possible alignments may be astronomical. For example, when two sequences 300 residues long each are compared, there are possible alignments. In comparison, the number of elementary particles in the universe is only ~10 80.

35 Needleman-Wunsch (1970) algorithm Dynamic Programming The Needleman-Wunsch (1970) algorithm uses Dynamic Programming

36 Dynamic programming can be applied to problems of alignment because ALIGNMENT SCORES obey the following rules:

Wunsch Algorithm

48 traceback The alignment is produced by starting at the minimum score in either the rightmost column or the bottom row, and following the back pointers. This stage is called traceback.

49 A Multiple Alignment

50 Local vs. Global Alignment A Global Alignment algorithm will find the optimal path between vertices (0,0) and (n,m) in the dynamic programming matrix. A Local Alignment algorithm will find the optimal- scoring alignment between arbitrary vertices (i,j) and (k,l) in the dynamic programming matrix.

51 Local vs. Global Alignment Global Alignment Local Alignment—better alignment to find conserved segment --T—-CC-C-AGT—-TATGT-CAGGGGACACG—A-GCATGCAGA-GAC | || | || | | | ||| || | | | | |||| | AATTGCCGCC-GTCGT-T-TTCAG----CA-GTTATG—T-CAGAT--C tccCAGTTATGTCAGgggacacgagcatgcagagac |||||||||||| aattgccgccgtcgttttcagCAGTTATGTCAGatc