Sequencing a genome and Basic Sequence Alignment

Slides:



Advertisements
Similar presentations
Gene Prediction: Similarity-Based Approaches
Advertisements

Chapter 10 How proteins are made.
Prokaryotic Gene Regulation:
Prokaryotic Gene Regulation: Lecture 5. Introduction The two types of transcription regulation control in prokaryotic cells The lac operon an inducible.
Regulation of eukaryotic gene sequence expression Lecture 6.
 Aim in building a phylogenetic tree is to use a knowledge of the characters of organisms to build a tree that reflects the relationships between them.
Measuring the degree of similarity: PAM and blosum Matrix
Phylogenetic Trees Understand the history and diversity of life. Systematics. –Study of biological diversity in evolutionary context. –Phylogeny is evolutionary.
DNA sequences alignment measurement
Lecture 8 Alignment of pairs of sequence Local and global alignment
Molecular Evolution Revised 29/12/06
Structural bioinformatics
Sequencing a genome and Basic Sequence Alignment Lecture 10 1Global Sequence.
Finding Eukaryotic Open reading frames.
CSE182-L12 Gene Finding.
Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center
Sequence Comparison Intragenic - self to self. -find internal repeating units. Intergenic -compare two different sequences. Dotplot - visual alignment.
Sequence similarity.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Computational Biology, Part 2 Sequence Comparison with Dot Matrices Robert F. Murphy Copyright  1996, All rights reserved.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Sequence comparison: Local alignment
Sequencing a genome and Basic Sequence Alignment Lecture 8 1Global Sequence.
Finding prokaryotic genes and non intronic eukaryotic genes
Sequencing a genome and Basic Sequence Alignment
Regulation of eukaryotic gene sequence expression
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
On line (DNA and amino acid) Sequence Information
Sequence Alignment.
Assessment of sequence alignment Lecture Introduction The Dot plot Matrix visualisation matching tool: – Basics of Dot plot – Examples of Dot plot.
An Introduction to Bioinformatics
Mouse Genome Sequencing
Pairwise Alignment, Part I Constructing the Values and Directions Tables from 2 related DNA (or Protein) Sequences.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BIOINFORMATICS IN BIOCHEMISTRY Bioinformatics– a field at the interface of molecular biology, computer science, and mathematics Bioinformatics focuses.
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Bioinformatics 2011 Molecular Evolution Revised 29/12/06.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequence Analysis CSC 487/687 Introduction to computing for Bioinformatics.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
CSCI 6900/4900 Special Topics in Computer Science Automata and Formal Grammars for Bioinformatics Bioinformatics problems sequence comparison pattern/structure.
Chapter 21 Eukaryotic Genome Sequences
Comp. Genomics Recitation 3 The statistics of database searching.
Recombinant DNA Technology and Genomics A.Overview: B.Creating a DNA Library C.Recover the clone of interest D.Analyzing/characterizing the DNA - create.
Figure 2: over-representation of neighbors in the fushi-tarazu region of Drosophila melanogaster. Annotated enhancers are marked grey. The CDS is marked.
Sequence Alignment Csc 487/687 Computing for bioinformatics.
Chapter 3 Computational Molecular Biology Michael Smith
Basic terms:  Similarity - measurable quantity. Similarity- applied to proteins using concept of conservative substitutions Similarity- applied to proteins.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Chapter 17 From Gene to Protein. Gene Expression DNA leads to specific traits by synthesizing proteins Gene expression – the process by which DNA directs.
Multiple Alignment and Phylogenetic Trees Csc 487/687 Computing for Bioinformatics.
BLAST: Basic Local Alignment Search Tool Altschul et al. J. Mol Bio CS 466 Saurabh Sinha.
Overview of Bioinformatics 1 Module Denis Manley..
Gene Prediction: Similarity-Based Methods (Lecture for CS498-CXZ Algorithms in Bioinformatics) Sept. 15, 2005 ChengXiang Zhai Department of Computer Science.
Human Genomics. Writing in RED indicates the SQA outcomes. Writing in BLACK explains these outcomes in depth.
Sequence Alignments with Indels Evolution produces insertions and deletions (indels) – In addition to substitutions Good example: MHHNALQRRTVWVNAY MHHALQRRTVWVNAY-
Sequence Alignment.
Construction of Substitution matrices
Finding genes in the genome
DNA sequences alignment measurement Lecture 13. Introduction Measurement of “strength” alignment Nucleic acid and amino acid substitutions Measurement.
Introduction to Bioinformatics Resources for DNA Barcoding
Pipelines for Computational Analysis (Bioinformatics)
Sequence comparison: Local alignment
Protein Synthesis Genetics.
How are proteins made from the DNA sequence?
DNA and the Genome Key Area 8a Genomic Sequencing.
Unit Genomic sequencing
Presentation transcript:

Sequencing a genome and Basic Sequence Alignment Lecture 10 Global Sequence

Introduction Annotation of DNA sequences Discovering genomes the shot-gun approach Sequence alignment and sequence matching

Annotation of sequences As discussed before when the gene sequence’s (DNA and/or mRNA) have been determined (obtained) then the data must be annotated: (Klug 2010) what sequences correspond UTR, exons/ introns, coding sequences (cds), polyA signal Other sequences of interest include: promoters sites and other regulatory regions (enhancers…) Annotation also contains important supplementary material; other organisms that have the same gene; the corresponding protein sequence and journal articles related to the sequences…. Global Sequence

Sequence similarity In many cases of the annotation of gene sequence; a sequence homology “test”, to existing sequences whose function is known, is performed. the assumption is that the both sequences were homologous [ have a common ancestor; were the same sequence] but are now different because of a series Mutations: substitution, deletions, insertions The basic concepts behind this process is sequence alignment and determining the strength of the match for the aligned sequence.

Sequence Alignment ( Pair-wise) : A simple global match The assignment of residues-residue corresponds: A Global match: align all of one sequence with another . The figure shows to sequences of nucleic acids. Some have the same base (nucleic acid ) and so there is a match at this position between the strands. This is represented by a vertical line and a blue highlight. Others do not match and have no vertical line and no blue highligh This figure adapted from Klug is a comparison of a “leptin gene” from a dog (top) and a homo sapiens (bottom) Global Sequence

A simple global Match The non matches are presumed to correspond to mutations; in this case a substitution mutation. In DNA (nucleic acids) mutations A transition A <-> G is more probable than a transversions T <-> C The substitution mutation is more probable than insertion/deletion. The relative probability of such mutations has to be taken into account when determining the strength of the match. (we will discuss this in greater detail later)

Global sequence alignment: different size sequences Example 1 I am from Cork I am not from Cork **** (4 matches out of 18; based on length of bottom string) A Global alignment between sequence of difference sizes requires the inclusions of gaps [dash] in order to optimise the matching process. In Example 1 (only considers substitution mutations) produces a much lower number of matches than Example 2 which considers all types of 3 types of point mutations. This examples calculates a simple matching score; in DNA you would need to factor in the relative probability of substitutions. In amino acids the calculation is more complicated. Example 2 I am ---- from Cork I am not from Cork **** ********** (14 matches out of 18; based on length of bottom string) Global Sequence

Example of DNA sequence alignment Adapted from Klug p. 384 Global Sequence

Sequence alignment: Amino Acids “*” match; “-” gap; “:” conserved substitution “.”semi-conserved substitution. In DNA the sequence “itself” is most important; All nucleic acids have the “same” basics properties. However amino acid sequences produce a 3-D structure, which relates to the property of amino acids in the sequence. Amino Acids with similar, side chain, properties will have overlapping “effects” on 3-D structure of the protein. The above figure takes this into account by referring to two types of substitutions: conserved and semi-conserved substitutions Global Sequence

Sequence Alignment: a local Match Example Find a region in one sequence that matches a region in the other. A local match is generally used if there is a larger difference in size between the sequences The overhangs at the beginning and end of the query string are not treated as gaps. In the example A global (alignment) gives a score of 9 out of 13; A Local (alignment) gives a score is 8 out of 10 ( do not count overhangs…) In general the Alignment with the highest score is the one that is taken. Global Sequence

Sequence Alignment: pairwise : a motif match A motif match can find: a “perfect match between a small sequence and one or more regions in a larger sequence. This plays an important part in looking for repeating sequences [tandem repeats] , and important other “small” sequences; The motif match like the others of course does not have to be “contigiuos ; it can also include conserved distributed pattern You are not from Cork You are not normal They are not happy about… *** *** Global Sequence

Multiple sequence alignment Similar to the previous except you look for areas conserved between all the sequences in the alignment: My name is denis and I am from cork My name is kieran and I am not from cork We name the dog “canis familiaris” name used to align multiple sequences which can be used to check for conserved motifs/sequences in many species: used to determine protein functionality, promoter signals, enhancer and silencer regions…. From this determine phylogenetic relationships. ( evolution: refer to understanding bioinformatics chapter 7) Global Sequence

GENOMES: Sequencing and assembling The supplementary lecture covers how to produce and determine the sequence of DNA strands. However, the size of the Strands are limited to a few 1000 base pairs. To sequence an organism’ s entire genome : Must use the “shot gun” approach Cut the genome into small fragments whose sequence can be determined. use computational techniques (sequence alignment) to join them back together in the correct order Global Sequence

Shot-gun Shot gun approach requires two genetic technologies (refer to supplementary material for more detail) and one computational technique (overlapping contigs) : Restriction enzymes: cut up denatured (ss)DNA Fast DNA sequencing of fragments (sequences) Combining overlapping contiguous DNA sequences

Overlapping Contiguous Fragments Adapted from [1] p. 377 Global Sequence

Overlapping Fragments: example Original sentence: This is the school of computing bioinformatics course. Cut 2 copies of the sentence into fragments This is The school of Computing bioinformatics course This is the School of computing Bioinformatics course Global Sequence

Overlapping Fragments: example Check for overlaps (prefix and suffix) This is This is the The school of School of computing computing bioinformatics course Bioinformatics course Result of alignment of fragments is: This is the school of computing bioinformatics course Global Sequence

Example of Contigs alignment: The above diagram shows an DNA example of how overlapping contiguous sequences are aligned. However it is an oversimplification as actual segments are many times larger than shown and overlapping does not always happen at then end of ends of segments. Adapted from: Klug 7th p 378 Global Sequence

Example 2: Reconstruct the following fragments the men and women merely players;\none man in his time All the world's their entrances,\nand one man a stage,\nAnd all the men and women They have their exits and their entrances,\n world's a stage,\nAnd all in his time plays many parts. merely players;\nThey have

Example 2 Solution all the world’s a stage, And all the men and women merely players; They have their exits and their entrances And one man plays many parts Order of statements joining together are: 3,7,5,1,10, 6,4,8,2,9

Example 2 Solution in detail. the men and women merely players;(\n) one man in his time All the world's their entrances,(\n) and one man stage, (\n) And all the men and women They have their exits and their entrances,(\n) world's a stage, (\n) And all their entrances, (\n) and one man in his time plays many parts. merely players; (\n) They have Order of the statements 3: all the world’s, 7: all the world’s a stage, And all 5: all the world’s a stage, And all the men and women 1: all the world’s a stage, And all the men and women merely players; 10: all the world’s a stage, They have 6: all the world’s a stage, They have their exits and their entrances 4: all the world’s a stage, And one man 8: all the world’s a stage, 2: all the world’s a stage, And one man in his time 9: all the world’s a stage, And one man plays many parts

Algorithm to join contigs we need two relationships between fragments:# (1) which fragment shares no prefix with suffix of another fragment# (This tells us which fragment comes first) (2) which fragment shares longest suffix with a prefix of another# (This tells us which fragment follows any fragment)

Potential Exam question Briefly describe the three main types of sequence alignment (6 marks) Explain how would determine the DNA sequence of a genome given that technology can only determine the DNA sequences of relatively small DNA strands (14 marks). Explain, two important elements, of an algorithm that can solve the problem. (10 marks)