Download presentation
Presentation is loading. Please wait.
1
Local alignments Seq X: Seq Y:
2
Local alignment What’s local? –Allow only parts of the sequence to match –Results in High Scoring Segments –Locally maximal: can not make it better by trimming/extending the alignment Seq X: Seq Y:
3
Local alignment Seq X: Seq Y: Why local? –Parts of sequence diverge faster evolutionary pressure does not put constraints on the whole sequence –Proteins have modular construction sharing domains between sequences
4
Global local alignment Take the old, good equation Look at the result of the global alignment (on the right) -seque - s e q CAGCACTTGGATTCTCG- CA-C-----GATTCGT-G a) global align b) retrieve the result c) score along the result align. pos.
5
Local alignment – breaking the alignment A recipe –Just don’t let the score go below 0 –Start the new alignment when it happens –Where is the result in the matrix? Before:After: align. pos. q e s 0- euqes- q e 0s 0- euqes-
6
Local alignment – the equation Init the matrix with 0’s Read the maximal value from anywhere in the matrix Find the result with backtracking M[i, j] = M[i, j- 1 ] – g M[i- 1, j] – g M[i- 1, j- 1 ] + score(X[i],Y[j]) max 0 Great contribution to science! q e s - euqes-
7
Amino acid substitution matrices
8
Point Accepted Mutations distance and matrices Accepted by natural selection –not lethal –not silent Def.: S 1 and S 2 are PAM 1 distant if on avg. there was one mutation per 100 aa Q.: If the seqs are PAM 8 distant, how many residues may be diffent?
9
PAM matrix Created from “easy”alignments –pairwise –gapless –85% id
10
PAM matrix How to calculate M for PAM 2 distance? –Take more distant sequences –or extrapolate...
11
Alignments BLAST - Basic Local Alignment Search Tool
12
Genome vs. gene
13
The amount of genetic information in organisms
14
http://www.cbs.dtu.dk/staff/dave/roanoke/genetics980313.html Largest genome: amoeba Chaos chaos (200x human genome) http://www.lawrence.edu/dept/biology/animal/
15
Sequence searching - challenges Exponential growth of databases
16
Sequence searching – definition Task: –Query: short, new sequence (~1000 letters) –Database (searching space): very many sequences –Goal: find seqs homologous to the query
17
Sequence searching – definition We want: –fast tool –primarily a filter: most sequences will be unrelated to the query –fine-tune the alignment later
18
Database Search Algorithms: Sensitivity, Selectivity True Positive (TP) – a homology detected (positive) correctly (true)
19
Database Search Algorithms: Sensitivity, Selectivity Sensitivity =TP/(TP+FN) Selectivity =TN/(TN+FP) Sensitivity Selectivity Courtesy of Gary Benson (ISSCB 2003)
20
What is BLAST Basic Local Alignment Search Tool Bad news: it is only a heuristics –Heuristics: A rule of thumb that often helps in solving a certain class of problems, but makes no guarantees. Perkins, DN (1981) The Mind's Best Work Basic idea: –High scoring segments have well conserved (almost identical) part –As well conserved part are identified, extend it to the real alignment q e s - euqes-
21
What means well conserved for BLAST? BLAST works with k-words (words of length k) –k is a parameter –different for DNA (>10) and proteins (2..4) word w 1 is T-similar to w 2 if the sum of pair scores is at least T (e.g. T=12) Similar 3-words W 1 :R K P W 2 :R R P Score:9 –1 7 sum = 15
22
BLAST algorithm 3 basic steps Preprocess the query: extract all the k-words Scan for T-similar matches in database Extend them to alignments 1) Preprocess 2) Scan 3) Extend
23
BLAST, Step 1: Preprocess the query Take the query (e.g. LVNRKPVVP ) Chop it into overlapping k-words (k=3 in this case) Query:LVNRKPVVP Word1:LVN Word2: VNR Word3: NRK … For each word find all similar words (scoring at least T) E.g. for RKP the following 3-words are similar: QKP KKP RQP REP RRP RKP 1) Preprocess 2) Scan 3) Extend
24
Finite state machine AC*T|GGC abstract machine constant amount of memory (states) used in computation and languages recognizes regular expressions –cp dmt*.pdf /home/john 1) Preprocess 2) Scan 3) Extend
25
BLAST, Step 2: Find ¨exact¨ matches with scanning Use all the T-similar k-words to build the Finite State Machine Scan for exact matches...VLQKPLKKPPLVKRQPCCEVVRKPLVKVIRCLA... QKP KKP RQP REP RRP RKP... movement 1) Preprocess 2) Scan 3) Extend
26
BLAST, Step 3: Extending ¨exact¨ matches Having the list of exact matches we extend alignment in both directions Query: L V N R K P V V P T-similar: R R P Subject: G V C R R P L K C Score:-3 4 -3 5 2 7 1 -2 -3 …till the sum of scores drops below some level X (e.g. X=-100) from the best known - what with gaps? 1) Preprocess 2) Scan 3) Extend
27
Gapped BLAST (now standard) gapped local alignments are computed: much, much, much slower therefore: modified “Hit criteria” 1) Preprocess 2) Scan 3) Extend
28
Hit criteria Extends the alignment only if there are close two hits on the same diagonal –sensitivity would drop without lowering T –reduces extensions (90% time is spend on extensions) Gapped local alignments are computed –increased sensitivity allows us raise T –raising T speeds up the search 1) Preprocess 2) Scan 3) Extend dbpos query pos close hit, same diag
29
Gapped BLAST v BLAST We end up with –same speed (even a bit faster overall) –gapped alignments! –much higher sensitivity
30
BLAST flavours blastp : protein query, protein db blastn : DNA query, DNA db blastx : DNA query, protein db –in all reading frames. Used to find potential translation products of an unknown nucleotide sequence. tblastn : protein query, DNA db –database dynamically translated in all reading frames. tblastx : DNA query, DNA db –all translations of query against all translations of db
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.