Download presentation
Presentation is loading. Please wait.
Published byMelinda Richard Modified over 9 years ago
1
2016/1/27Summer Course1 Pattern Search Problems Part I: Fundament Concept
2
2016/1/27Summer Course2
3
2016/1/27Summer Course3 FASTA: http://www.ebi.ac.uk/Tools/fasta33/index.html
4
2016/1/27Summer Course4 FastP and FastA FastA is an algorithm that attempts to speed up string matching over the standard optimal alignment. The FastA algorithm is implemented in the following 6 stages: –Locate hot spots –Find the 10 best regions in the matrix –Score using a substitution matrix –Combine initial regions from different diagonals –Optimal alignment –Presentation
5
2016/1/27Summer Course5
6
2016/1/27Summer Course6 BLAST: http://blast.ncbi.nlm.nih.gov/Blast.cgi
7
2016/1/27Summer Course7 BLAST The BLAST database consists of three files for every FastA file input. –The first contains all of the sequence headers, textual information about the amino acid or nucleotide sequence. –The second contains the compressed sequences (2 bits for each nucleotide, 5 bits for each amino acid). –The third file contains an index of the compressed sequences so that they can be matched with the corresponding headers. The program runs in 3 rounds. –Database Scanning (table search or Finite state machine) –Seed Growing –Combining Alignments
8
2016/1/27Summer Course8
9
2016/1/27Summer Course9 Pattern matching
10
2016/1/27Summer Course10 (Character to Character Comparison)
11
2016/1/27Summer Course11
12
2016/1/27Summer Course12
13
2016/1/27Summer Course13
14
2016/1/27Summer Course14
15
2016/1/27Summer Course15
16
2016/1/27Summer Course16
17
2016/1/27Summer Course17
18
2016/1/27Summer Course18
19
2016/1/27Summer Course19
20
2016/1/27Summer Course20
21
2016/1/27Summer Course21
22
2016/1/27Summer Course22
23
2016/1/27Summer Course23
24
2016/1/27Summer Course24
25
2016/1/27Summer Course25 (Under a preprocessing, path)
26
2016/1/27Summer Course26
27
2016/1/27Summer Course27
28
2016/1/27Summer Course28
29
2016/1/27Summer Course29
30
2016/1/27Summer Course30
31
2016/1/27Summer Course31
32
2016/1/27Summer Course32
33
2016/1/27Summer Course33
34
2016/1/27Summer Course34 Sliding Window Comparison
35
2016/1/27Summer Course35 Sliding Windows Coding the sequence –DNA/RNA: A: 00, T: 01, G: 10, C: 11 –Protein: 20 amino acid K-tuple overlapping sliding windows Sorting –Bucket Sort
36
2016/1/27Summer Course36 Table Search
37
2016/1/27Summer Course37 Table Search Indexing table –overlapping or non-overlapping –Indexing for the text or patterns How to reduce the table size? How to do the search? How to do the filtration?
38
2016/1/27Summer Course38 Approximation string matching? (It still is very hard to do…)
39
2016/1/27Summer Course39 Bio-Problems SNP finding? ESTs align to whole genome? Genome assembly? Consensus and signature pattern finding? Motif finding?
40
2016/1/27Summer Course40 Part II: Advance Concept Indexing Methods for Pattern Search and Motif Finding problems
41
2016/1/27Summer Course41
42
2016/1/27Summer Course42 BLAT: http://genome.ucsc.edu/cgi-bin/hgBlat?command=start
43
2016/1/27Summer Course43 BLAT Non-overlapping indexing Table Exact and approximation match (by statistical method) Order concept
44
2016/1/27Summer Course44
45
2016/1/27Summer Course45
46
2016/1/27Summer Course46
47
2016/1/27Summer Course47 Using Single UMs for indexing table
48
2016/1/27Summer Course48
49
2016/1/27Summer Course49 Multiple-Unique Marker
50
2016/1/27Summer Course50
51
2016/1/27Summer Course51 Sandwich DP
52
2016/1/27Summer Course52
53
2016/1/27Summer Course53
54
2016/1/27Summer Course54 MEME: http://meme.nbcr.net/meme/intro.html
55
2016/1/27Summer Course55 (not the traditional motif definition)
56
2016/1/27Summer Course56 Degenerate motif discovery problem Given a set of sequences S = {S 1, S 2, …, S m | S i belongs to {A, G, C, T}* for all i} and three nonnegative integers k, l and d, find all degenerate (l, d)-motifs, each of which has occurrences in at least k sequences in S. A degenerate (l, d)-motif is defined as a pattern of length l over the IUPAC code with no more than d degenerate positions. (A degenerate position is a position occupied by a character other than A, G, C or T) e.g. ARATTYT degenerate (7,2)-motif ( 參考補充資料 )
57
2016/1/27Summer Course57 New Challenge Solexa and 454 short reads New hardware support
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.