Presentation is loading. Please wait.

Presentation is loading. Please wait.

MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department.

Similar presentations


Presentation on theme: "MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department."— Presentation transcript:

1 MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department of Computer Science 2 Department of Biology Texas A&M University

2 Analyzing Gene Expression Patterns DNA microarrays ~4000 genes E. coli, ~6000 genes for yeast Compare expression levels between conditions Example: starvation response in E. coli –starve cells for nutrient sources –reintroduce => recovery => exponential growth –which genes show changes in response?

3 types of response: –up-regulation –down-regulation –transient response (spike) –(arbitrary temporal patterns) Problem: can cluster genes based on response pattern, but then what? –not all genes in cluster are regulated the same way

4 Couple with genomic analysis –search for common motifs in up-stream regions –subsets of co-regulated genes within clusters Assumptions: 1. regulation occurs by interaction of transcription factors with small motifs (~10-20bp) within several hundred bp of transcription start site 2. among many motifs, the ones of interest will be common to some genes in a cluster, but not found in any genes outside (with different responses) 3. the motif does not have to be shared by all genes in the cluster, only a subset

5 Related Work Many algorithms exist for motif finding –assume cluster (gene set) is already defined –word/string analysis models –probabilistic models Gibbs sampling (AlignACE, MotifSampler) Expectation Maximization (MEME) HMMs –graph algorithms (e.g. clique) Pevzner and Sze –what if motif only appears in a subset of genes? count as parameter in MotifSampler, MEME

6 Overview Our Approach 1. Definition of regulation patterns 2. Extraction of upstream sequences (for up-reg) 3. Define control set (genes with no change) 4. Make a list of all 12-mers in upstream regions 5. Find motifs that occur (more than once) in up- regulated set, but not at all in control set 6. Group the motifs using clustering, form consensus of patterns

7 Define Regulation Patterns measured at 0, 5, and 15min after recovery discrete representation of changes in expression levels relative to exp. growth phase conditions +1: >2-fold increase -1: >2-fold decrease 0: otherwise (no significant change) up-regulation patterns: (0,1,1) (0,1,0) (0,0,1) (-1,1,1) (-1,1,0) (-1,0,1) define control set: (0,0,0) (1,1,1) (-1,-1,-1)

8 Extraction of Upstream Sequences nominally, 600bp upstream of translation start site (i.e. ORF; not transcription start) If gene is a member of an operon: –take 300bp upstream of gene –plus 300bp upstream of translation start of first gene in operon databases: K12 sequence: GOLD –operon relationships: E. coli Linkage Map (Berlyn et al.) use reverse complement if transcribed in rev.

9 Pre-processing extract all 12-mers (overlapping) from upstream regions of up-regulated genes note: better than DFS remove those that appear in the control set remove those that are dissimilar to everything else (“de-noising”) –score=mean distance to all motifs not in same upstream region or operon –remove if score>~9/12 mis-matches

10 Clustering compute similarity matrix among motifs repeatedly merge closest neighbors –minimum spanning tree –single-linkage clustering Stop merging when dist>3/12 mismatches Form consensus: relax constraints on nucleotides at position by disjunction –ACCATGGTATC –ACGATGGTATT –ACTATAGTATC –AC(CTG)AT(AG)GTAT(TC)

11 Experiments Starvation of E. coli for glucose in medium 3 time-points: starved (0min), 5min, 15min Data collected in Siegele lab up-regulated: 22 genes control set: 1361 genes

12 Motifs Found

13 Sequence Logos

14 Distance to Transcription Start

15 Other Forms of Validation Palindromicity: 11/13 motifs have index>0.5 TRANSFAC database: –e.g. motif 2 matches pattern for MetJ-MetF site –a number of other hits for known transcription factors biological verification awaits... –role in regulation pathway for starvation response?

16 Conclusions Augment cluster-analysis of expression patterns with motif analysis Efficient method for generating candidates –from 12-mers in upstream regions Efficient method for screening them –empirically, against a control set, rather than probabilistic background model Advantage: Pattern does not have to be in all the genes in a set Challenges: defining appropriate upstream regions and the right control set (as filter)


Download ppt "MOPAC: Motif-finding by Preprocessing and Agglomerative Clustering from Microarrays Thomas R. Ioerger 1 Ganesh Rajagopalan 1 Debby Siegele 2 1 Department."

Similar presentations


Ads by Google