Download presentation
Presentation is loading. Please wait.
Published byNickolas Gordon Modified over 6 years ago
1
paper study for class presentation on Nov16th, 2005 slider by 陳奕先
tPatternHunter: gapped, fast and sensitive translated homology search Derek Kisman, Ming Li, Bin Ma, Li Wang Bioinformatics, 21(4): February 2005 paper study for class presentation on Nov16th, 2005 slider by 陳奕先
2
tPatternHunter "t" for translated search
what issue we'll meet when trying to apply PatternHunter technique on translated search? Protein has 20 different letters, much more than DNA's 4 letters 3 DNA letters makes a codon. at the hit extension stage, a DNA gap may cause a frameshift,
3
Protein has 20 different letters, much more than DNA's 4 letters
the space complexity of the hash table will be significantly larger than for DNA sequence PatternHunter used weight-11 seeds for DNA sequence. How big the seeds we should use for protein? 11 * log 4 = * log 20 = 6.51 tPH uses weight-5 spaced seeds (the default seed is )
4
only the five letters at the "1" position are checked for hits.
using BLOSUM 62 scores to evaluate. a "Hit": all five position has value at least 0, and the total score above a threshold T
5
Blosum62 Scoring Matrix
6
And the issue about frameshift ?
when performing DNA-protein or DNA-DNA search...... tPH regards the DNA sequences as a sequence of overlapped codons. T T T G C A F L C A
8
To improve the sensitivity, we can use not only one seed.
The default of tPH uses four weight-5 seeds (length 6 or 7), and threshold T=20 for BLOSUM62 how fast and how sensitive tPH is ???
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.