Presentation is loading. Please wait.

Presentation is loading. Please wait.

Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu.

Similar presentations


Presentation on theme: "Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu."— Presentation transcript:

1 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center

2 Sequence Alignments Cornerstone of bioinformatics What is a sequence?
Nucleotide sequence Amino acid sequence Pairwise and multiple sequence alignments What alignments can help Determine function of a newly discovered gene sequence Determine evolutionary relationships among genes, proteins, and species Predicting structure and function of protein Intro to Bioinformatics – Sequence Alignment Acknowledgement: This notes is adapted from lecture notes of both Wright State University’s Bioinformatics Program.

3 DNA Replication Prior to cell division, all the genetic instructions must be “copied” so that each new cell will have a complete set Intro to Bioinformatics – Sequence Alignment

4 Over time, genes accumulate mutations
Environmental factors Radiation Oxidation Mistakes in replication or repair Deletions, Duplications Insertions, Inversions Translocations Point mutations Intro to Bioinformatics – Sequence Alignment

5 Deletions Codon deletion: ACG ATA GCG TAT GTA TAG CCG…
Effect depends on the protein, position, etc. Almost always deleterious Sometimes lethal Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?… Almost always lethal Intro to Bioinformatics – Sequence Alignment

6 Indels Comparing two genes it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known: ACGTCTGATACGCCGTATCGTCTATCT ACGTCTGAT---CCGTATCGTCTATCT Intro to Bioinformatics – Sequence Alignment

7 The Genetic Code Substitutions are mutations accepted by natural selection. Synonymous: CGC  CGA Non-synonymous: GAU  GAA Intro to Bioinformatics – Sequence Alignment

8 Point Mutation Example: Sickle-cell Disease
Wild-type hemoglobin DNA 3’----CTT----5’ mRNA 5’----GAA----3’ Normal hemoglobin ------[Glu]------ Mutant hemoglobin DNA 3’----CAT----5’ mRNA 5’----GUA----3’ ------[Val]------ Intro to Bioinformatics – Sequence Alignment

9 Intro to Bioinformatics – Sequence Alignment
image credit: U.S. Department of Energy Human Genome Program,

10 Comparing Two Sequences
Point mutations, easy: ACGTCTGATACGCCGTATAGTCTATCT ACGTCTGATTCGCCCTATCGTCTATCT Indels are difficult, must align sequences: ACGTCTGATACGCCGTATAGTCTATCT CTGATTCGCATCGTCTATCT ACGTCTGATACGCCGTATAGTCTATCT ----CTGATTCGC---ATCGTCTATCT Intro to Bioinformatics – Sequence Alignment

11 Why Align Sequences? The draft human genome is available
Automated gene finding is possible Gene: AGTACGTATCGTATAGCGTAA What does it do? One approach: Is there a similar gene in another species? Align sequences with known genes Find the gene with the “best” match Intro to Bioinformatics – Sequence Alignment

12 Scoring a Sequence Alignment
Match score: +1 Mismatch score: +0 Gap penalty: –1 ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || |||||||| ----CTGATTCGC---ATCGTCTATCT Matches: 18 × (+1) Mismatches: 2 × 0 Gaps: 7 × (– 1) Score = +11 Intro to Bioinformatics – Sequence Alignment

13 How can we find an optimal alignment?
Finding the alignment is computationally hard: ACGTCTGATACGCCGTATAGTCTATCT CTGAT---TCG-CATCGTC--T-ATCT There are ~888,000 possibilities to align the two sequences given above. Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop. Intro to Bioinformatics – Sequence Alignment

14 Global and Local alignments
Global alignments – score the entire alignment Local alignment – find the best matching subsequence Why local sequence alignment? Subsequence comparison between a DNA sequence and a genome Protein function domains Exons matching Intro to Bioinformatics – Sequence Alignment

15 Example Compare the two sequences: TTGACACCCTCCCAATT
ACCCCAGGCTTTACACAG Global alignment (does it look good?) TTGACACCCTCC-CAATT || || || Local alignment (does it look good?) TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG Intro to Bioinformatics – Sequence Alignment

16 Dot Plots One of the simplest and oldest methods for sequence alignment Visualization of regions of similarity Assign one sequence on the horizontal axis Assign the other on the vertical axis Place dots on the space of matches Diagonal lines means adjacent regions of identity Intro to Bioinformatics – Sequence Alignment

17 A Simple Example Construct a simple dot plot for TAGTCGATG TGGTCATC
The alignment is TAGTCGATG TGGTC-ATC T A G C * Intro to Bioinformatics – Sequence Alignment

18 What else can it do (and how)?
Gaps Inverse substring Repeat Palindrome Gene conservation and order study Intro to Bioinformatics – Sequence Alignment


Download ppt "Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu."

Similar presentations


Ads by Google