Biocomputation: Comparative Genomics Tanya Talkar Lolly Kruse Colleen O’Rourke
DNA
What is Biocomputation? Statistics Computer Science Molecular Biology
Four Main Parts Biomolecular computation Biological Computation Computational Biology Bioinformatics
Bioinformatics: Biology Computer Science Information Technology
Sequence Analysis Very Functional! Compare DNA between Species Small Fragments Return full sequence
Computational Genomics Needleman – Wunsch Not used much More Mapped Genomes = Computational Genomics!
Alignment
Global Alignment: Needleman - Wunsch O(N 3 ) Fewest edit operations Similar strings
Local Alignment Smith - Waterman O(N 2 ) Dissimilar strings Find high similarity regions
Comparison
S1PQRAXABCSTVQ S2XYAXBACSLT
S1AXAB_CS S2AX_BACS Score
Advantages: Global Alignment
Advantages: Local Alignment
BLAST Basic Local Alignment Search Tool FASTA
Improvements Increased Speed Locate initial alignment hot spots Statistical significance
Terminology Segment Pairs Locally maximal segment pairs Maximal segment pairs
How it works Query sentence, P Database Must have score over C! Multiple segment pairs combined ABCDEFG AGCBFDE BEDGAFB GFBEDCA
How it works Extends each hit Done efficiently Truncates Doesn’t find all pairs
Proteins Fixed length, W Words above threshold Each hit extended
DNA Word List Exact matches NOT dynamic programming
Scoring Blosum62 Matrix Match (+2), Mismatch (-3), Gaps penalized
Substitution Matrix Represents Scoring Functions
Multiple Sequence Alignment
Methods of MSA Progressive Alignment Construction Iterative Methods Hidden Markov Models Genetic Algorithms and Simulated Annealing
Comparative Genomics Compare Species Find Evolutionary Significances! Low Level High Level Importance of Non Coding DNA