Download presentation
Presentation is loading. Please wait.
Published byBrendan Harmon Modified over 9 years ago
2
Position Weight Matrices for Representing Signals in Sequences Triinu Tasa, Koke 04.02.05
3
Definitions Sequence, string – ordered arrangement of letters {'A', 'C', 'G', 'T'} Pattern – simplified regular expression, alphabet {'A', 'C', 'G', 'T', '.'}, where '.' - wild-card of length 1 ('A', 'C', 'G' or 'T') Triinu Tasa, Koke 04.02.05
4
What is a weight matrix? GATGAG GATGAT TGATAT GATGAT or [GT][AG][TA][GT]A[GT] What is a weight matrix? Triinu Tasa, Koke 04.02.05
5
Alignment matrix C: A 021030 C000000 G210201 T102102 Frequency matrix F: A00.70.3010 C000000 G0.70.300.700.3 T0.300.70.300.7 Better: GATGAG GATGAT TGATAT Triinu Tasa, Koke 04.02.05 What is a weight matrix?
6
Or weight matrix W: where N – number of sequences used - a priori probability of letter i What is a weight matrix? Triinu Tasa, Koke 04.02.05
7
Importance matrix I: I(i, j) = * A01.40.3030 C000000 G1.40.301.400.3 T0.301.40.301.4 What is a weight matrix? Triinu Tasa, Koke 04.02.05
8
Applications Pattern clustering 1. G.GATGAG.T 62/75 1:39/49 2:23/26 R:17.3026 BP:1.12008e-37 2. G.GATGAG 89/110 1:45/60 2:44/50 R:10.436 BP:1.61764e-34 3. GATGAG.T 124/148 1:52/70 2:72/78 R:7.36961 BP:2.79148e-33 4. TG.AAA.TTT 132/145 1:53/61 2:79/84 R:6.84578 BP:1.83509e-32 5. AAAATTTT 200/231 1:63/77 2:137/154 R:4.69239 BP:1.19109e-30 6. TGAAAA.TTT 104/114 1:45/53 2:59/61 R:7.78277 BP:3.86086e-29 7. AAA.TTTT 343/537 1:79/145 2:264/392 R:3.05349 BP:5.66833e-29 8. G.AAA.TTTT 135/156 1:51/62 2:84/94 R:6.19534 BP:5.69933e-29 9. TG.GATGAG 49/57 1:30/35 2:19/22 R:16.1117 BP:9.35765e-28 10. TG.AAA.TTTT 86/91 1:40/43 2:46/48 R:8.87311 BP:1.1124e-27... Triinu Tasa, Koke 04.02.05 Applications - Clustering
9
G.GATGAG.T: GAGATGAGAT GTGATGAGAT GAGATGAGGT... A-6.90.98-6.91.38-6.9-6.91.38-6.90.98-6.9 C-6.9-6.9-6.9-6.9-6.9-6.9-6.9-6.9-6.9-6.9 G1.38-6.91.38-6.9-6.91.38-6.91.380.29-6.9 T-6.90.29-6.9-6.91.38-6.9-6.9-6.9-6.91.38 Triinu Tasa, Koke 04.02.05 Applications - Clustering
10
Compare matrices with each other using the dynamic programming approach : where A, B – matrices i, j - columns If D(m,n) > threshold => matrices are different Triinu Tasa, Koke 04.02.05 Applications - Clustering
11
G.GATGAG.TTG.AAA.TTTAAAATTTT G.GATGAGTGAAAA.TTTAAA.TTTT GATGAG.TTG.AAA.TTTT We want to represent the clusters by logos: We need to align the patterns first – position the similar parts of the patterns above each other: G.GATGAG.T G.GATGAG-- --GATGAG.T or the logo will look like this: Triinu Tasa, Koke 04.02.05 Applications - Clustering
12
Multiple Alignment Importance matrix I – represents the aligned patterns. Example: G.GATGAG.T GATGAG.T G.GATGAG 1. Insert the first pattern into I: ('.' gives 0.25 to each) A00.250100100.250 C00.250000000.250 G10.251001010.250 T00.250010000.251 2. Align the second pattern with I using a dynamic programming approach: Triinu Tasa, Koke 04.02.05 Applications – Multiple alignment
13
Dynamic programming matrix: G. G A T G A G. T G0.00 0.10 0.01 0.10 0.00 0.00 0.10 0.00 0.10 0.01 0.00 A0.00 0.00 0.11 0.00 0.20 0.00 0.00 0.20 0.00 0.11 0.00 T0.00 0.00 0.01 0.00 0.00 0.30 0.00 0.00 0.00 0.01 0.21 G0.00 0.10 0.01 0.11 0.00 0.00 0.40 0.00 0.10 0.01 0.00 A0.00 0.00 0.11 0.00 0.21 0.00 0.00 0.50 0.00 0.11 0.00 G0.00 0.10 0.01 0.21 0.00 0.00 0.10 0.00 0.60 0.01 0.00.0.00 0.00 0.10 0.01 0.21 0.00 0.00 0.10 0.00 0.60 0.01 T0.00 0.00 0.01 0.00 0.00 0.31 0.00 0.00 0.00 0.01 0.70 G.GATGAG.T --GATGAG.T Triinu Tasa, Koke 04.02.05 Applications – Multiple alignment
14
3. Add the pattern '--GATGAG.T' to I, if necessary add columns to the matrix. 4. Repeat the procedure for every pattern. Output: G.GATGAG.T G.GATGAG-- --GATGAG.T Why importance matrix? Triinu Tasa, Koke 04.02.05 Applications – Multiple alignment
15
Example: Pattern: GATG So far aligned: GATGATGTA- - - - GATGTGG We want: w(G, 4) > w(G, 1) > w(G, 9) Solution – importance matrix Triinu Tasa, Koke 04.02.05 Applications – Multiple alignment
16
● Weight Matrix Matching Purpose: find the sequences that the weight matrix describes best in a given text file...CATAGGAAATTCCACCTCTTTGGCTTTGCCCAGTCTTCCCTTGAGGATGCCTACGTTC... 1. Calculate the score for each position 2. if score > threshold => signal Problem: finding a good threshold ● Threshold – 99.5% quantile Triinu Tasa, Koke 04.02.05 Applications – Weight matrix matching
17
Questions? Triinu Tasa, Koke 04.02.05
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.