Presentation is loading. Please wait.

Presentation is loading. Please wait.

Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA.

Similar presentations


Presentation on theme: "Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA."— Presentation transcript:

1 Special Topics in Genomics Motif Analysis

2 Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA CTGGGAGGTCCTCGGTTCAGAGTCACAGAGCAGATAATCA TTAGAGGCACAATTGCTTGGGTGGTGCACAAAAAAACAAG AACAGCCTTGGATTAGCTGCTGGGGGGGTGAGTGGTCCAC ATCAGAATGGGTGGTCCATATATCCCAAAGAAGAGGGTAG TF 123456789 TGGGTGGTC TGGGTGGTA TGGGAGGTC TGGGTGGTG TGAGTGGTC TGGGTGGTC Transcription Factor Binding Sites (TFBS) DNA motif: Protein motif:

3 Motif representation

4 Consensus sequence Example: CACSTG

5 Sequence Logo Schneider & Stephens, Nucleic Acids Res. 18:6097-6100 (1990) Entropy (Shannon) – a measurement of uncertainty The amount of uncertainty reduced by observing sequences is the amount of information (or information content) we obtained: This is the height of each position in the logo plot. Height of each nucleotide is proportional to its frequency

6 Two questions in motif analysis Known motif mapping Finding occurrences of a motif in nucleotide or amino acid sequences De novo motif discovery Finding motifs that are previously unknown

7 Known motif mapping Consensus mapping STEP 1: provide a motif (e.g. CACSTG = CAC[C,G]TG) STEP 2: specify number of mismatches allowed (e.g. <=1) STEP 3: scan the sequence CGCCGGGACCAGATCAACGCCGAGATCCGGCACATGAAGGAGCT m=3, no m=1, yes A useful tool: CisGenome (http://www.biostat.jhsph.edu/~hji/cisgenome)http://www.biostat.jhsph.edu/~hji/cisgenome

8 Known motif mapping Motif matrix mapping (CisGenome) STEP 1: provide a motif and background model STEP 2: specify a likelihood ratio cutoff (e.g. LR>=500) STEP 3: scan the sequence 00  GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA LR>500, yes LR<500, no Motif: Background: A C G T A.3.2.2.3 C.2.3.3.2 G.2.3.3.2 T.3.2.2.3 1 2 3 4 5 6 7 8 9 A 0.00 0.00 0.17 0.00 0.17 0.00 0.00 0.00 0.17 C 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66 G 0.00 1.00 0.83 1.00 0.00 1.00 1.00 0.00 0.17 T 1.00 0.00 0.00 0.00 0.83 0.00 0.00 1.00 0.00 Another tool for matrix mapping MAST (http://meme.sdsc.edu/meme/mast-intro.html)http://meme.sdsc.edu/meme/mast-intro.html

9 De novo motif discovery Two major class of methods: 1. Word enumeration 2. Matrix updating

10 Word enumeration Example: Sinha & Tompa, Nucleic Acids Res. 30: 5549-5560 (2002) STEP 1: enumerate possible words; STEP 2: count word occurrences; STEP 3: compare observed word count with random expectation.

11 Matrix updating CONSENSUS (Stormo & Hartzell, PNAS, 86: 1183-1187, 1990) STEP 1: use all k-mers in the first sequence as seeds; STEP 2: find matches (often use best matches) of each seed in the second sequence; STEP 3: update seed matrices, exclude matrices with low information content; STEP 4: repeat step 2 and 3 for all sequences.

12 Matrix updating Mixture model 00 , W EM: Lawrence and Reilly (1990) Bailey and Elkan (1994), etc. Gibbs Sampler: Lawrence et al. (1993) Liu (1994), Liu et al. (1995), etc. S: GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGACTGGGAGGTCCTCGGTTCAGAGTCACAGAGCA A: 000000000000001000000000000000000000000001000000000000000000000000000000 Motif:Background: q = [q 0,q 1 ]q0q0 q1q1 A C G T A.3.2.2.3 C.2.3.3.2 G.2.3.3.2 T.3.2.2.3 1 2 3 4 5 6 7 8 9 A 0.00 0.00 0.17 0.00 0.17 0.00 0.00 0.00 0.17 C 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.66 G 0.00 1.00 0.83 1.00 0.00 1.00 1.00 0.00 0.17 T 1.00 0.00 0.00 0.00 0.83 0.00 0.00 1.00 0.00 ,W,q A Inference by iterative estimation/sampling

13 Other issues Dependencies within motif Functions of novel motifs


Download ppt "Special Topics in Genomics Motif Analysis. Sequence motif – a pattern of nucleotide or amino acid sequences GTATGTACTTACTATGGGTGGTCAACAAATCTATGTATGA TAACATGTGACTCCTATAACCTCTTTGGGTGGTACATGAA."

Similar presentations


Ads by Google