Presentation is loading. Please wait.

Presentation is loading. Please wait.

Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)

Similar presentations


Presentation on theme: "Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)"— Presentation transcript:

1 Cis-regultory module 10/24/07

2 TFs often work synergistically (Harbison 2004)

3 Combinatorial control

4 lysogenicgrowth lyticgrowth (source: Gary Kaiser) -phase E coli

5 OROR cI cro -operon

6 OROR cI cro -operon onoff lysogenic growth

7 OROR cI cro -operon offon lytic growth O R1 O R2 O R3

8 cro -operon cI Pol II lysogenic cro cI Pol II lytic

9 Cis-regulatory module (CRM) “A CRM is a DNA segment, typically a few hundred base pairs in length containing multiple binding sites, that recruits several cooperating factors to a particular genomic location” –Ji and Wong (2006)

10 Statistical Methods Predict modules when the motifs are known. (simpler) –LRA, by Wasserman and Fickett (1998) Predict modules when the motifs also need to be discovered. (more difficult) –CisModule, by Zhou and Wong (2004) –EMCModule, by Gupta and Liu (2005)

11 LRA

12 Cooperative motifs: Basic idea: True regulatory regions are likely to have multiple motif sites. Probability for being regulatory

13 LRA Training data contain a subset of known regulatory and control regions. highest motif matching score within a given sequence regression coefficient Probability for being a regulatory region

14 Application: skeletal-muscle gene regulation 5 muscle-specific TFs are known: –Mef-2, Myf, SRF, Tef, Sp-1 29 regulatory regions are known. Can we predict the regulatory regions just from sequence motif information?

15 Computational Procedure Motif matrices are identified by Gibbs sampling using sequence information from the 29 regulatory regions. For some TF, motifs cannot be found by the de novo approach. Use literature motifs instead. Top two matching scores for each TF are included as covariates. Apply LRA model. Use leave-one-out cross- validation to evaluate model performance.

16 Results Single motifs are highly non-specific. Simple multi-sites analysis improves specificity at the cost of reducing sensitivity.

17 Results Single motifs are highly non-specific. Simple multi-sites analysis improves specificity at the cost of reducing sensitivity.

18 Results Single motifs are highly non-specific. Simple multi-sites analysis improves specificity at the cost of reducing sensitivity. Logistic regression further improves specificity at reduced cost for sensitivity.

19 Motifs must be known in advance. When known regulatory sequences are few, it is difficult to identify motifs by using traditional methods. Objective: Integrating motif discovery and module finding in a single statistical model. Limitations of LRA

20 De novo module identification Two tasks Identify TF motifs Identify CRMs.

21 Why module approach can help motif discovery Due to poor specificity, a short sequence can be enriched simply by chance. The probability for random matches is much smaller for motif co-occurrence.

22 cisModule Basic idea: A two-level hierarchical mixture model (HMx). –Level 1: modules  sequences (Zhou and Wong 2004)

23 cisModule Basic idea: A two-level hierarchical mixture model (HMx). –Level 1: modules  sequences –Level 2: motifs  modules (Zhou and Wong 2004)

24 Treat HMx model as a stochastic machinery to generate sequences. –From the first sequence position, make a series of random decisions of whether to initiate a module of length l or generate a letter from the background model. –Inside a module, If a site for the kth motif was initiated at position n, then generate w k letters from its PWM and place them at [n, n+w k -1], otherwise generate a letter from the background. –After reaching the end of the current module, decide whether sampling from the background or initiating a new module. HMx Model as a Stochastic Process (Zhou and Wong 2004)

25 given alignment, update model parameters given model parameters, update module/motif locations Model inference: Gibbs sampling

26 An numerical experiment Merge the 29 regulatory regions with a set of sequences randomly selected from ENSEMBL promoters. Test the ability of cisModule to identify motifs under “noisy” environment.

27 Results

28 Limitations of CisModule The length of module, and number of motifs are externally provided. Convergence time could be slow. Multiple cycles are needed each starting from a new seed. Assuming that combinations of different motifs are independent.

29 EMCModule Gupta and Liu (2005) developed a similar approach called EMCModule. Main difference: –They use the collection of literature motifs as initial “seeds” for motif discovery. –Their method improves the convergence speed. –Their definition of CRMs are a little different: the number of motifs are fixed within one module, but the order of and distance between different motifs can be varied.

30 Further issues Comparative genomic approach can also be incorporated into module discovery. (Zhou and Wong 2007). The modules identified by these methods can be viewed as belonging to one “type”. New methods need to developed to discover multiple module types. While module-based approach is helpful for finding cooperative motifs, it may hurt discovery of single motifs.

31 (Yuh et al. 1998)

32

33

34

35 Reading List Wasserman and Fickett (1988) –LRA. One of the first work on cis-regulatory modules. Zhou and Wong (2004) –cisModule. A statistical method to identify cis- regulatory modules without knowledge of motif information. Yuh et al. (1998) –An influential biological paper on how information can be integrated from different modules to regulate gene expression.


Download ppt "Cis-regultory module 10/24/07. TFs often work synergistically (Harbison 2004)"

Similar presentations


Ads by Google