Download presentation
Presentation is loading. Please wait.
1
Summer Bioinformatics Workshop 2008 Sequence Alignments Chi-Cheng Lin, Ph.D. Associate Professor Department of Computer Science Winona State University – Rochester Center clin@winona.edu
2
Summer Bioinformatics Workshop 2008 2 Sequence Alignments Cornerstone of bioinformatics What is a sequence? Nucleotide sequence Amino acid sequence Pairwise and multiple sequence alignments What alignments can help Determine function of a newly discovered gene sequence Determine evolutionary relationships among genes, proteins, and species Predict structure and function of protein
3
Summer Bioinformatics Workshop 2008 3 Why Align Sequences? The draft human genome is available Automated gene finding is possible Gene: AGTACGTATCGTATAGCGTAA What does it do? One approach: Is there a similar gene in another species? Align sequences with known genes Find the gene with the “best” match
4
Summer Bioinformatics Workshop 2008 4 Visualization of Sequence Alignment Dot Plot One of the simplest and oldest methods for sequence alignment Visualization of regions of similarity Assign one sequence on the horizontal axis Assign the other on the vertical axis Place dots on the space of matches Diagonal lines means adjacent regions of identity
5
Summer Bioinformatics Workshop 2008 5 A Simple Example Construct a simple dot plot for TAGTCGATG TGGTCATC The alignment is TAGTCGATG TGGTC-ATC TAGTCGATG T*** G*** G*** T*** C* A** T*** C*
6
Summer Bioinformatics Workshop 2008 6 Genes Accumulate Mutations over Time Mistakes in gene replication or repair Deletions, duplications Insertions, inversions Translocations Point mutations Environmental factors Radiation Oxidation
7
Summer Bioinformatics Workshop 2008 7 Codon deletion: ACG ATA GCG TAT GTA TAG CCG… Effect depends on the protein, position, etc. Almost always deleterious Sometimes lethal Frame shift mutation: ACG ATA GCG TAT GTA TAG CCG… ACG ATA GCG ATG TAT AGC CG?… Almost always lethal Deletions
8
Summer Bioinformatics Workshop 2008 8 Indels Comparing two genes it is generally impossible to tell if an indel is an insertion in one gene, or a deletion in another, unless ancestry is known: ACGTCTGATACGCCGTATCGTCTATCT ACGTCTGAT---CCGTATCGTCTATCT
9
Summer Bioinformatics Workshop 2008 9 The Genetic Code Substitutions Substitutions are mutations accepted by natural selection. Synonymous: CGC CGA Non-synonymous: GAU GAA
10
Summer Bioinformatics Workshop 2008 10 Point Mutation Example: Sickle-cell Disease Wild-type hemoglobin DNA 3’----CTT----5’ mRNA 5’----GAA----3’ Normal hemoglobin ------[Glu]------ Mutant hemoglobin DNA 3’----CAT----5’ mRNA 5’----GUA----3’ Mutant hemoglobin ------[Val]------
11
Summer Bioinformatics Workshop 2008 11 image credit: U.S. Department of Energy Human Genome Program, http://www.ornl.gov/hgmis.http://www.ornl.gov/hgmis
12
Summer Bioinformatics Workshop 2008 12 Comparing Two Sequences Point mutations, easy: ACGTCTGATACGCCGTATAGTCTATCT ACGTCTGATTCGCCCTATCGTCTATCT Indels are difficult, must align sequences: ACGTCTGATACGCCGTATAGTCTATCT CTGATTCGCATCGTCTATCT ACGTCTGATACGCCGTATAGTCTATCT ----CTGATTCGC---ATCGTCTATCT
13
Summer Bioinformatics Workshop 2008 13 Scoring a Sequence Alignment Example Match score:+1 Mismatch score:+0 Gap penalty:–1 ACGTCTGATACGCCGTATAGTCTATCT ||||| ||| || |||||||| ----CTGATTCGC---ATCGTCTATCT Matches: 18 × (+1) Mismatches: 2 × 0 Gaps: 7 × (– 1) Various scoring scheme exist. Score = 18 + 0 + (-7) = +11
14
Summer Bioinformatics Workshop 2008 14 How can we find an optimal alignment? Finding the alignment is computationally hard: ACGTCTGATACGCCGTATAGTCTATCT CTGAT---TCG-CATCGTC--T-ATCT There are ~888,000 possibilities to align the two sequences given above. Algorithms using a technique called “dynamic programming” are used – out of the scope of this workshop.
15
Summer Bioinformatics Workshop 2008 15 Global and Local Alignments Global alignments – score the entire alignment Local alignment – find the best matching subsequence Why local sequence alignment? Global alignment is useful only if the sequences to be aligned are very similar Subsequence comparison between a DNA sequence and a genome Identify Conserved regions Protein function domains
16
Summer Bioinformatics Workshop 2008 16 Example Compare the two sequences: TTGACACCCTCCCAATT ACCCCAGGCTTTACACAG Global alignment (does it look good?) TTGACACCCTCC-CAATT || || || ACCCCAGGCTTTACACAG Local alignment (does it look good?) ---------TTGACACCCTCCCAATT || |||| ACCCCAGGCTTTACACAG--------
17
Summer Bioinformatics Workshop 2008 17 Where do we get sequences to work with? Biological databases NCBI Entrez (http://www.ncbi.nlm.nih.gov/gquery/gquery.fcg i?term=)http://www.ncbi.nlm.nih.gov/gquery/gquery.fcg i?term Wet labs Simulations Other people’s results On-line education resources BEDROCK (http://www.bioquest.org/bedrock/)http://www.bioquest.org/bedrock/ BLAST results
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.