Pairwise Alignments Part 1 Biology 224 Instructor: Tom Peavy Sept 8 <PowerPoint slides based on Bioinformatics and Functional Genomics by Jonathan Pevsner>

Slides:



Advertisements
Similar presentations
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Advertisements

1 Orthologs: Two genes, each from a different species, that descended from a single common ancestral gene Paralogs: Two or more genes, often thought of.
Pairwise sequence alignment.
Bioinformatics Tutorial I BLAST and Sequence Alignment.
Basics of Comparative Genomics Dr G. P. S. Raghava.
Last lecture summary.
Sequence Alignment.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 2: “Homology” Searches and Sequence Alignments.
Lecture 8 Alignment of pairs of sequence Local and global alignment
Structural bioinformatics
Sequence Similarity Searching Class 4 March 2010.
Sequence Alignment.
Introduction to Bioinformatics Algorithms Sequence Alignment.
Bioinformatics and Phylogenetic Analysis
Sequence similarity.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Pairwise Sequence Alignment (PSA)
Basics of Sequence Alignment and Weight Matrices and DOT Plot
Pairwise Alignment How do we tell whether two sequences are similar? BIO520 BioinformaticsJim Lund Assigned reading: Ch , Ch 5.1, get what you can.
8/27/20151 Pairwise sequence Alignment. 8/27/20152 Many of the images in this power point presentation are from Bioinformatics and Functional Genomics.
Sequence Analysis Alignments dot-plots scoring scheme Substitution matrices Search algorithms (BLAST)
Database searching with BLAST
Pairwise & Multiple sequence alignments
An Introduction to Bioinformatics
Protein Evolution and Sequence Analysis Protein Evolution and Sequence Analysis.
Bioinformatics Jack Min Office 3012 Office hours: TR 12:15 – 4.
CISC667, S07, Lec5, Liao CISC 667 Intro to Bioinformatics (Spring 2007) Pairwise sequence alignment Needleman-Wunsch (global alignment)
Computational Biology, Part 3 Sequence Alignment Robert F. Murphy Copyright  1996, All rights reserved.
Chapter 11 Assessing Pairwise Sequence Similarity: BLAST and FASTA (Lecture follows chapter pretty closely) This lecture is designed to introduce you to.
Introduction to Bioinformatics CPSC 265. Interface of biology and computer science Analysis of proteins, genes and genomes using computer algorithms and.
BCB 444/544 F07 ISU Dobbs #4 - Sequence Alignment
Sequence Alignment Techniques. In this presentation…… Part 1 – Searching for Sequence Similarity Part 2 – Multiple Sequence Alignment.
Content of the previous class Introduction The evolutionary basis of sequence alignment The Modular Nature of proteins.
Biology 224 Tom Peavy Sept 20 & 22, 2010
Pairwise Sequence Alignment. The most important class of bioinformatics tools – pairwise alignment of DNA and protein seqs. alignment 1alignment 2 Seq.
Pairwise Sequence Alignment BMI/CS 776 Mark Craven January 2002.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
DNA alphabet DNA is the principal constituent of the genome. It may be regarded as a complex set of instructions for creating an organism. Four different.
Biology 4900 Biocomputing.
Construction of Substitution Matrices
Accessing information on molecular sequences Bio 224 Dr. Tom Peavy February 5, 2008.
COT 6930 HPC and Bioinformatics Sequence Alignment Xingquan Zhu Dept. of Computer Science and Engineering.
In-Class Assignment #1: Research CD2
Pairwise sequence Alignment. Learning objectives Define homologs, paralogs, orthologs Perform pairwise alignments (NCBI BLAST) Understand how scores are.
341- INTRODUCTION TO BIOINFORMATICS Overview of the Course Material 1.
Biology 4900 Biocomputing.
Point Specific Alignment Methods PSI – BLAST & PHI – BLAST.
Pairwise sequence alignment Lecture 02. Overview  Sequence comparison lies at the heart of bioinformatics analysis.  It is the first step towards structural.
Sequence Alignment.
Construction of Substitution matrices
Sequence Alignment Abhishek Niroula Department of Experimental Medical Science Lund University
Step 3: Tools Database Searching
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
©CMBI 2009 Transfer of information The main topic of this course is transfer of information. In the protein world that leads to the questions: 1)From which.
Last lecture summary. Sequence alignment What is sequence alignment Three flavors of sequence alignment Point mutations, indels.
Introduction to Bioinformatics
Sequence similarity, BLAST alignments & multiple sequence alignments
Courtesy of Jonathan Pevsner
Introduction to sequence alignment Mike Hallett (David Walsh)
Bioinformatics for Research
Basics of Comparative Genomics
Protein Sequence Alignments
Identifying templates for protein modeling:
Pairwise Sequence Alignment
What do you with a whole genome sequence?
Pairwise Sequence Alignment
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
Basics of Comparative Genomics
Basic Local Alignment Search Tool
Introduction to bioinformatics Lecture 5 Pair-wise sequence alignment
Presentation transcript:

Pairwise Alignments Part 1 Biology 224 Instructor: Tom Peavy Sept 8 <PowerPoint slides based on Bioinformatics and Functional Genomics by Jonathan Pevsner>

Pairwise alignments in the 1950s  -corticotropin (sheep) Corticotropin A (pig) ala gly glu asp asp glu asp gly ala glu asp glu Oxytocin Vasopressin CYIQNCPLG CYFQNCPRG Early alignments revealed --differences in amino acid sequences between species --differences in amino acids responsible for distinct functions

It is used to decide if two proteins (or genes) are related structurally or functionally It is used to identify domains or motifs that are shared between proteins It is the basis of BLAST searching (next week) It is used in the analysis of genomes Pairwise sequence alignment is the most fundamental operation of bioinformatics

Pairwise alignment: protein sequences can be more informative than DNA protein is more informative (20 vs 4 characters); many amino acids share related biophysical properties codons are degenerate: changes in the third position often do not alter the amino acid that is specified protein sequences offer a longer “look-back” time (relatedness over millions or billions of years) (note: issue of convergent evolution) DNA sequences can be translated into protein, and then used in pairwise alignments

DNA can be translated into six potential proteins 5’ CAT CAA 5’ ATC AAC 5’ TCA ACT 5’ GTG GGT 5’ TGG GTA 5’ GGG TAG Pairwise alignment: protein sequences can be more informative than DNA 5’ CATCAACTACAACTCCAAAGACACCCTTACACATCAACAAACCTACCCAC 3’ 3’ GTAGTTGATGTTGAGGTTTCTGTGGGAATGTGTAGTTGTTTGGATGGGTG 5’

Query: 181 catcaactacaactccaaagacacccttacacccactaggatatcaacaaacctacccac 240 |||||||| |||| |||||| ||||| | ||||||||||||||||||||||||||||||| Sbjct: 189 catcaactgcaaccccaaagccacccct-cacccactaggatatcaacaaacctacccac 247 Pairwise alignment: protein sequences can be more informative than DNA Many times, DNA alignments are appropriate --to confirm the identity of a cDNA --to study noncoding regions of DNA --to study DNA polymorphisms --to study molecular evolution (syn. vs nonsyn) --example: Neanderthal vs modern human DNA

Pairwise alignment The process of lining up two or more sequences to achieve maximal levels of identity (and conservation, in the case of amino acid sequences) for the purpose of assessing the degree of similarity and the possibility of homology. Definitions

Homology Similarity attributed to descent from a common ancestor. Definitions Identity The extent to which two (nucleotide or amino acid) sequences are invariant. RBP 26 RVKENFDKARFSGTWYAMAKKDPEGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWD- 84 +K GTW++MA + L + A V T + +L+ W+ glycodelin 23 QTKQDLELPKLAGTWHSMAMA-TNNISLMATLKAPLRVHITSLLPTPEDNLEIVLHRWEN 81

Conservation Changes at a specific position of an amino acid or (less commonly, DNA) sequence that preserve the physico- chemical properties of the original residue. Similarity The extent to which nucleotide or protein sequences are related. It is based upon identity plus conservation. Definitions

Orthologs Homologous sequences in different species that arose from a common ancestral gene during speciation; may or may not be responsible for a similar function. Paralogs Homologous sequences within a single species that arose by gene duplication. Definitions: two types of homology

1.MKWVWALLLLA.AWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDP 48 :: || || ||.||.||..| :|||:.|:.| |||.||||| 1 MLRICVALCALATCWA...QDCQVSNIQVMQNFDRSRYTGRWYAVAKKDP EGLFLQDNIVAEFSVDETGQMSATAKGRVRLLNNWDVCADMVGTFTDTED 98 |||| ||:||:|||||.|.|.||| ||| :||||:.||.| ||| || | 48 VGLFLLDNVVAQFSVDESGKMTATAHGRVIILNNWEMCANMFGTFEDTPD PAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAVQYSCRLLNLDGTCADS 148 ||||||:||| ||:|| ||||||::||||| ||: ||||..||||| | 98 PAKFKMRYWGAASYLQTGNDDHWVIDTDYDNYAIHYSCREVDLDGTCLDG YSFVFSRDPNGLPPEAQKIVRQRQEELCLARQYRLIVHNGYCDGRSERNLL 199 |||:||| | || || |||| :..|:|.|| : | |:|: 148 YSFIFSRHPTGLRPEDQKIVTDKKKEICFLGKYRRVGHTGFCESS Pairwise GLOBAL alignment of retinol-binding protein from human (top) and rainbow trout (O. mykiss)

1 MKWVWALLLLAAWAAAERDCRVSSFRVKENFDKARFSGTWYAMAKKDPEG 50 RBP. ||| |. |... | :.||||.:| : 1...MKCLLLALALTCGAQALIVT..QTMKGLDIQKVAGTWYSLAMAASD. 44 lactoglobulin 51 LFLQDNIVAEFSVDETGQMSATAKGRVR.LLNNWD..VCADMVGTFTDTE 97 RBP : | | | | :: |.|. || |: || |. 45 ISLLDAQSAPLRV.YVEELKPTPEGDLEILLQKWENGECAQKKIIAEKTK 93 lactoglobulin 98 DPAKFKMKYWGVASFLQKGNDDHWIVDTDYDTYAV QYSC 136 RBP || ||. | :.|||| |..| 94 IPAVFKIDALNENKVL VLDTDYKKYLLFCMENSAEPEQSLAC 135 lactoglobulin 137 RLLNLDGTCADSYSFVFSRDPNGLPPEAQKIVRQRQ.EELCLARQYRLIV 185 RBP. | | | : ||. | || | 136 QCLVRTPEVDDEALEKFDKALKALPMHIRLSFNPTQLEEQCHI lactoglobulin Pairwise GLOBAL alignment of retinol-binding protein and  -lactoglobulin 25% identity; 32% similarity

retinol-binding protein (NP_006735)  -lactoglobulin (P02754) RBP and  -lactoglobulin are homologous proteins that share related three-dimensional structures

Positions at which a letter is paired with a null are called gaps. Gap scores are typically negative. Since a single mutational event may cause the insertion or deletion of more than one residue, the presence of a gap is ascribed more significance than the length of the gap. In BLAST, it is rarely necessary to change gap values from the default. Gaps

Should distantly related species have more gaps than closely related species (or genes)? What about their relationship in regards to sequence identity?

There are 3 Principal Methods of Pair-wise Sequence Alignment 1)Dot Matrix Analysis (e.g. Dotlet, Dotter, Dottup) 2)Dynamic Programming (DP) algorithm 3)Word or k-tuple methods (e.g. FASTA & BLAST)

Exon and Introns