Download presentation
Presentation is loading. Please wait.
Published byMadeline Lawrence Modified over 9 years ago
1
Special Topics in Genomics Motif Analysis
2
Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA TTAGAGGCACAATTGCTTGGGTGGTGCACAAAAAAACAAG AACAGCCTTGGATTAGCTGCTGGGGGGGTGAGTGGTCCAC ATCAGAATGGGTGGTCCATATATCCCAAAGAAGAGGGTAG TF 123456789 TGGGTGGTC TGGGTGGTA TGGGAGGTC TGGGTGGTG TGAGTGGTC TGGGTGGTC Transcription Factor Binding Sites (TFBS) DNA motif: Protein motif:
3
Motif representation
4
Consensus sequence Example: CACSTG
5
Sequence Logo Schneider & Stephens, Nucleic Acids Res. 18:6097-6100 (1990) Entropy (Shannon) – a measurement of uncertainty The amount of uncertainty reduced by observing sequences is the amount of information (or information content) we obtained: This is the height of each position in the logo plot. Height of each nucleotide is proportional to its frequency
6
Two questions in motif analysis Known motif mapping Finding occurrences of a motif in nucleotide or amino acid sequences De novo motif discovery Finding motifs that are previously unknown
7
Known motif mapping Consensus mapping STEP 1: provide a motif (e.g. CACSTG = CAC[C,G]TG) STEP 2: specify number of mismatches allowed (e.g. <=1) STEP 3: scan the sequence CGCCGGGACCAGATCAACGCCGAGATCCGGCACATGAAGGAGCT m=3, no m=1, yes A useful tool: CisGenome (http://www.biostat.jhsph.edu/~hji/cisgenome)http://www.biostat.jhsph.edu/~hji/cisgenome
8
Known motif mapping Motif matrix mapping (CisGenome) STEP 1: provide a motif and background model STEP 2: specify a likelihood ratio cutoff (e.g. LR>=500) STEP 3: scan the sequence 00 GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA LR>500, yes LR<500, no Motif: Background: A C G T A.3.2.2.3 C.2.3.3.2 G.2.3.3.2 T.3.2.2.3 1 2 3 4 5 6 7 8 9 A 0.00 0.00 0.17 0.00 0.17 0.00 0.00 0.00 0.17 C 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66 G 0.00 1.00 0.83 1.00 0.00 1.00 1.00 0.00 0.17 T 1.00 0.00 0.00 0.00 0.83 0.00 0.00 1.00 0.00 Another tool for matrix mapping MAST (http://meme.sdsc.edu/meme/mast-intro.html)http://meme.sdsc.edu/meme/mast-intro.html
9
De novo motif discovery Two major class of methods: 1. Word enumeration 2. Matrix updating
10
Word enumeration Example: Sinha & Tompa, Nucleic Acids Res. 30: 5549-5560 (2002) STEP 1: enumerate possible words; STEP 2: count word occurrences; STEP 3: compare observed word count with random expectation.
11
Matrix updating CONSENSUS (Stormo & Hartzell, PNAS, 86: 1183-1187, 1990) STEP 1: use all k-mers in the first sequence as seeds; STEP 2: find matches (often use best matches) of each seed in the second sequence; STEP 3: update seed matrices, exclude matrices with low information content; STEP 4: repeat step 2 and 3 for all sequences.
12
Matrix updating Mixture model 00 , W EM: Lawrence and Reilly (1990) Bailey and Elkan (1994), etc. Gibbs Sampler: Lawrence et al. (1993) Liu (1994), Liu et al. (1995), etc. S: GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA A: 000000000000001000000000000000000000000001000000000000000000000000000000 Motif:Background: q = [q 0,q 1 ]q0q0 q1q1 A C G T A.3.2.2.3 C.2.3.3.2 G.2.3.3.2 T.3.2.2.3 1 2 3 4 5 6 7 8 9 A 0.00 0.00 0.17 0.00 0.17 0.00 0.00 0.00 0.17 C 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66 G 0.00 1.00 0.83 1.00 0.00 1.00 1.00 0.00 0.17 T 1.00 0.00 0.00 0.00 0.83 0.00 0.00 1.00 0.00 ,W,q A Inference by iterative estimation/sampling
13
Other issues Dependencies within motif Functions of novel motifs
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.