Download presentation
Presentation is loading. Please wait.
1
BNFO 235 Lecture 5 Usman Roshan
2
What we have done to date Basic Perl –Data types: numbers, strings, arrays, and hashes –Control structures: If-else, for loop, while loop –Input/Output: reading from a file –Subroutines –Regular expressions: =~, !~, =~ s/ / /g –Perl functions: index, substr, scalar, length Bioinformatics –Number of matches and mismatches in two DNA sequences of equal length –Read DNA scoring matrix –Translate DNA string into protein –IUPAC code
3
Notice Make sure you know how to do ALL of the homeworks handed out to date. If you can do that then this improves your chances of an A.
4
DNA Sequence Evolution AAGACTT -3 mil yrs -2 mil yrs -1 mil yrs today AAGACTT T_GACTTAAGGCTT _GGGCTTTAGACCTTA_CACTT ACCTT (Cat) ACACTTC (Lion) TAGCCCTTA (Monkey) TAGGCCTT (Human) GGCTT (Mouse) T_GACTTAAGGCTT AAGACTT _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT TAGGCCTT (Human) TAGCCCTTA (Monkey) A_C_CTT (Cat) A_CACTTC (Lion) _G_GCTT (Mouse) _GGGCTTTAGACCTTA_CACTT AAGGCTTT_GACTT AAGACTT
5
Pairwise sequence alignment Similarity in DNA sequence implies evolutionary relationship from which many many things can be inferred. Similar genes have similar function: classic example is connection between cancer and uncontrolled cell growth. This was established by observing a high similarity between cancer and cell growth genes Detecting similarity is important also for management and analysis purposes: storing and retrieving genes of tens of thousands of species effectively.
6
Pairwise sequence alignment Example: –Lion: ACACTTCCat: ACCTT –Both have undergone insertion/deletions and are unequal in length (recall previous figure of sequences evolving on tree) To compare we have to align first, i.e. pair up similar nucleotides and identify insertion deletions. Alignment: –ACACTTCACACTTC –ACCTTACCTT –1 1 0 1110 = 5 1 1 0 0 1 = 3 Similarity score
7
Pairwise sequence alignment This one is better because it maximizes similarity The pairwise alignment problem can be solved efficiently automatically with a computer program
8
How to compute optimal alignment? For two sequences of length m and n this problem can be solved in polynomial time and space, i.e. efficiently (at least for short sequences) We use a standard computer science algorithmic technique called dynamic programming
9
Retrieving similar sequences from a database Scenario: you isloate a piece of DNA from a cell and are interested in its functional role. One way to determine this is to compare (align) against sequences of known functionality in a large database. Algorithm: we align the query against each sequence in the database and output the top k sequences with the highest sequence similarity. One such program for doing this in BLAST (Online example with tp53 and beta-globin)
10
Problems Convert multiple sequence alignment from ClustalW format into FASTA format Compute sum-of-pairs score of a multiple alignment Extract high scoring pairs (HSPs) from BLAST output Determine the conserved columns in a multiple alignment Compute a distance matrix from a multiple alignment
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.