Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Slides:



Advertisements
Similar presentations
Sequence Alignments.
Advertisements

Mutations.
Pairwise Sequence Alignment Sushmita Roy BMI/CS 576 Sushmita Roy Sep 10 th, 2013 BMI/CS 576.
Sequence Alignments and Database Searches Introduction to Bioinformatics.
Introduction to Bioinformatics Burkhard Morgenstern Institute of Microbiology and Genetics Department of Bioinformatics Goldschmidtstr. 1 Göttingen, March.
Sequence Similarity Searching Class 4 March 2010.
Sequencing and Sequence Alignment
Introduction to Bioinformatics Algorithms Sequence Alignment.
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University.
Introduction to Bioinformatics
Introduction to Bioinformatics Algorithms Sequence Alignment.
Sequence Alignments Introduction to Bioinformatics.
Bioinformatics Unit 1: Data Bases and Alignments Lecture 3: “Homology” Searches and Sequence Alignments (cont.) The Mechanics of Alignments.
Incorporating Bioinformatics in an Algorithms Course Lawrence D’Antonio Ramapo College of New Jersey.
Roadmap The topics:  basic concepts of molecular biology  more on Perl  overview of the field  biological databases and database searching  sequence.
Sequencing a genome and Basic Sequence Alignment
Sequence Alignments and Dynamic Programming BIO/CS 471 – Algorithms for Bioinformatics.
Sequence Alignment.
Introduction to Bioinformatics Dot Plots. One of the simplest and oldest methods for sequence alignment Visualization of regions of similarity –Assign.
Pairwise alignment of DNA/protein sequences I519 Introduction to Bioinformatics, Fall 2012.
Sequencing a genome and Basic Sequence Alignment
Construction of Substitution Matrices
Arun Goja MITCON BIOPHARMA
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Mutations Genetic Changes.
LO: SWBAT explain how protein shape is determined and differentiate between the different types of mutations. DN: h/0 protein synthesis HW: Read pp #
Applied Bioinformatics Week 3. Theory I Similarity Dot plot.
DNA Mutations What is a mutation? 1) Change in the DNA of a gene. 2) When a cell puts its genetic code into action it is making precisely the proteins.
Mutations.
Introduction A mutation is a change in the normal DNA sequence. They are usually neutral, having no effect on the fitness of the organism. Sometimes,
Chapter 2 Genetic Variations. Introduction The human genome contains variations in base sequence from one individual to another. Some sequence variants.
 During replication (in DNA), an error may be made that causes changes in the mRNA and proteins made from that part of the DNA  These errors or changes.
Mutation: The Source of Genetic Variation Chapter 11.
Rate of mutations in the Human Genome A study published in Current Biology in 2009, shows that in total, we all carry new mutations in our DNA.
Slide 1 of 24 VIII MUTATIONS Mutations Types of Mutations:
12.4 Mutations Copyright Pearson Prentice Hall.. What Are Mutations? Changes in the nucleotide sequence of DNA (genetic material) May occur in somatic.
Techniques for Protein Sequence Alignment and Database Searching G P S Raghava Scientist & Head Bioinformatics Centre, Institute of Microbial Technology,
Fantasy Mutations Reality. Mutations: a permanent and heritable change in the nucleotide sequence of a gene. Are caused by mutagens (x-rays and UV light)
DNA Mutations. Remember that during DNA replication, the DNA makes an exact copy of itself before it divides. DNA replication is not always accurate.
Ch. 9.7 Mutations Every once in a while, cells make mistakes in copying their own DNA An incorrect base can be inserted or sometimes a base is skipped.
Wednesday, January 16 th What is a mutation? Reminders: DNA Test Friday.
4.12 DNA and Mutations. Quick DNA Review Base pairing Base pairing.
A change in the nucleotide sequence of DNA Ultimate source of genetic diversity Gene vs. Chromosome.
The Cell Cycle.
LO: SWBAT explain how protein shape is determined and differentiate between the different types of mutations. DN: h/0 protein synthesis HW: Read pp.
Variation among organisms
Mutations Mutations: changes in the genetic code that can lead to changes in the amino acid sequence and ultimately to the overall shape of the protein.
Mutations 6/26/2018 SB2d.
Lecture 55 Mutations Ozgur Unal
Mutations.
Mutations.
Mutations Chapter 12-4.
Aim: How is protein shape determined?
Copyright Pearson Prentice Hall
Mutations Add to Table of Contents – p. 14
MUTATIONS And their effect.
MUTATIONS.
MUTATIONS 12-4.
KEY CONCEPT Entire genomes are sequenced, studied, and compared.
Ch 12-4 Genetic Mutations.
12.4 Mutations Kinds of Mutations Significance of Mutations.
MUTATIONS.
Mutations changes in genetic material (_____).
Bioinformatics Lecture 2 By: Dr. Mehdi Mansouri
MUTATIONS.
Section 20.4 Mutations and Genetic Variation
Academic Biology Notes
Sequence Analysis Alan Christoffels
Mutations: Changes in Genes
Presentation transcript:

Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu

Sequence Alignments Cornerstone of bioinformatics What is a sequence? Nucleotide sequence Amino acid sequence Pairwise and multiple sequence alignments What alignments can help Determine function of a newly discovered gene sequence Determine evolutionary relationships among genes, proteins, and species Predicting structure and function of protein Intro to Bioinformatics – Sequence Alignment Acknowledgement: This notes is adapted from lecture notes of both Wright State University’s Bioinformatics Program.

DNA Replication Prior to cell division, all the genetic instructions must be “copied” so that each new cell will have a complete set Intro to Bioinformatics – Sequence Alignment

Over time, genes accumulate mutations Environmental factors Radiation Oxidation Mistakes in replication or repair Deletions, Duplications Insertions, Inversions Translocations Point mutations Intro to Bioinformatics – Sequence Alignment

Deletions Codon deletion: ACG ATA GCG TAT GTA TAG CCG… Effect depends on the protein, position, etc. Almost always deleterious Sometimes lethal Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?… Almost always lethal Intro to Bioinformatics – Sequence Alignment

Indels Comparing two genes it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known: ACGTCTGATACGCCGTATCGTCTATCT ACGTCTGAT---CCGTATCGTCTATCT Intro to Bioinformatics – Sequence Alignment

The Genetic Code Substitutions are mutations accepted by natural selection. Synonymous: CGC  CGA Non-synonymous: GAU  GAA Intro to Bioinformatics – Sequence Alignment

Point Mutation Example: Sickle-cell Disease Wild-type hemoglobin DNA 3’----CTT----5’ mRNA 5’----GAA----3’ Normal hemoglobin ------[Glu]------ Mutant hemoglobin DNA 3’----CAT----5’ mRNA 5’----GUA----3’ ------[Val]------ Intro to Bioinformatics – Sequence Alignment

Intro to Bioinformatics – Sequence Alignment image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.

Comparing Two Sequences Point mutations, easy: ACGTCTGATACGCCGTATAGTCTATCT ACGTCTGATTCGCCCTATCGTCTATCT Indels are difficult, must align sequences: ACGTCTGATACGCCGTATAGTCTATCT CTGATTCGCATCGTCTATCT ACGTCTGATACGCCGTATAGTCTATCT ----CTGATTCGC---ATCGTCTATCT Intro to Bioinformatics – Sequence Alignment

Why Align Sequences? The draft human genome is available Automated gene finding is possible Gene: AGTACGTATCGTATAGCGTAA What does it do? One approach: Is there a similar gene in another species? Align sequences with known genes Find the gene with the “best” match Intro to Bioinformatics – Sequence Alignment

Scoring a Sequence Alignment Match score: +1 Mismatch score: +0 Gap penalty: –1 ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || |||||||| ----CTGATTCGC---ATCGTCTATCT Matches: 18 × (+1) Mismatches: 2 × 0 Gaps: 7 × (– 1) Score = +11 Intro to Bioinformatics – Sequence Alignment

How can we find an optimal alignment? Finding the alignment is computationally hard: ACGTCTGATACGCCGTATAGTCTATCT CTGAT---TCG-CATCGTC--T-ATCT There are ~888,000 possibilities to align the two sequences given above. Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop. Intro to Bioinformatics – Sequence Alignment

Global and Local alignments Global alignments – score the entire alignment Local alignment – find the best matching subsequence Why local sequence alignment? Subsequence comparison between a DNA sequence and a genome Protein function domains Exons matching Intro to Bioinformatics – Sequence Alignment

Example Compare the two sequences: TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG Global alignment (does it look good?) TTGACACCCTCC-CAATT || || || Local alignment (does it look good?) ---------TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG-------- Intro to Bioinformatics – Sequence Alignment

Dot Plots One of the simplest and oldest methods for sequence alignment Visualization of regions of similarity Assign one sequence on the horizontal axis Assign the other on the vertical axis Place dots on the space of matches Diagonal lines means adjacent regions of identity Intro to Bioinformatics – Sequence Alignment

A Simple Example Construct a simple dot plot for TAGTCGATG TGGTCATC The alignment is TAGTCGATG TGGTC-ATC T A G C * Intro to Bioinformatics – Sequence Alignment

What else can it do (and how)? Gaps Inverse substring Repeat Palindrome Gene conservation and order study Intro to Bioinformatics – Sequence Alignment