Download presentation
Presentation is loading. Please wait.
1
www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3
2
Welcome If you encounter any technical difficulties during the webinar –Type a report using the chat option Slide presentation ~20 min Compile Questions as they are submitted and answer them during the final Q&A/discussion period During the discussion session, we’ll allow audience speaking 2
3
Webinar Format Introduction Walk-Through Summary Q&A 3
4
INTRODUCTION 4
5
Overview Given co-expressed gene sets, what are the key mediators of co-expression? –Focus on TFs Web-based software system for motif enrichment analysis –Co-expressed genes or sequences –Multiple sets of analysis methods –Available for human, mouse, fly, worm, yeast 5
6
Motif Enrichment Analysis 6 BackgroundTarget p=0.04 p=0.55 p=0.66 Finds over-represented TFBS in co-expressed gene sets
7
What do we need? Region selection –Where to look for enriched binding sites –Use conservation filter to restrict search space TFBS profiles to search for –Need a pool of validated profiles Scoring metrics for enrichment –How to measure motif over-representation 7
8
Gene CR1CR2CR4CR3 Threshold Genomic Position phastCons Score Conserved Region Selection 8
9
TFBS Profiles JASPAR 2010: Portales-Casamar et al. Nucleic Acids Research 2009. Expanded collection of TFBS profiles –130 vertebrate profiles –105 insect profiles –5 nematode profiles –177 yeast profiles –PBM (104), PBM_HOMEO (176), PBM_BHLH (19) Standardized 2-level TF classification (class, family) 9
10
Scoring Metrics Z scores –Based on the number of occurrences of the TFBS relative to background –Normalized for sequence length –Simple binomial distribution model Fisher scores –Fisher exact probability test Fisher score = -log(Fisher p-value) –Based on the number of genes containing the TFBS relative to background 10
11
Additional Metric for Seq-Based KS scores –Kolmogorov-Smirnoff test –Compares the empirical distribution of the distances of the binding sites from the maximum point of confidence (MPC) to the background –Expect real binding sites to be centered around the MPC 11 MPC Foreground Background KS score = -log(KS test p-value)
12
Analysis Methods 12
13
WALK-THROUGH 13
14
14 http://opossum.cisreg.ca/oPOSSUM3
15
Human SSA - Input 15
16
16
17
17
18
Human SSA - Results 18
19
19 TFHNF1A JASPAR IDMA0046.1 ClassHelix-Turn-Helix FamilyHomeo Tax GroupVertebrates IC15.548 GC Content0.259
20
20 Target Gene Hits19 Target Gene Non-Hits36 Background Gene Hits1113 Background Gene Non-Hits3887 Target TFBS Hits41 Target TFBS Nucleotide Rate0.0269 Background TFBS Hits2127 Background TFBS Nucleotide Rate0.009
21
21 Z-score15.134 Fisher score3.646
22
22
23
oPOSSUM methods 23
24
24
25
Human aCSA - Input 25
26
Human aCSA - Input 26
27
Human aCSA - Input 27
28
Human aCSA - Results 28
29
29
30
30
31
TFBS Cluster Analysis 31 TFBS Profile Cluster
32
Gene CR1CR2CR4CR3 TFBSs TFBS Cluster Hits Merge Overrepresentation Analysis based on merged TFBS cluster hits TFBS Cluster Analysis (TCA) 32
33
Human TCA – TFBS cluster selection 33
34
Human TCA - Results 34
35
TFCluster Info Page 35
36
36
37
Seq SSA - Input 37
38
Seq SSA - Input 38
39
39
40
40
41
41
42
42
43
43
44
44
45
Seq SSA - Results 45
46
46 KS score
47
47
48
Seq TCA - Input 48
49
SUMMARY 49
50
oPOSSUM-3 Web-based system for motif enrichment analysis in co-expressed gene sets and sequences from high-throughput experiments Important functionalities –Gene-based vs. Sequence-based –Single site vs. Anchored combination site –Individual vs. clusters of TFBS profiles –Human, mouse, fly, worm and yeast 50
51
Development Team 51 Version 1CSAVersion 2Version 3 Ho Sui, SJ Mortimer, JR Arenillas, DJ Brumm, J Walsh, CJ Kennedy, BP Wasserman, WW Huang, S Fulton, DL Arenillas, DJ Perco, P Ho Sui, SJ Mortimer, JR Wasserman, WW Ho Sui, SJ Fulton, DL Arenillas, DJ Kwon, AT Wasserman, WW Kwon, AT Arenillas, DJ Worsely Hunt, R Wasserman, WW
52
QUESTIONS & ANSWERS Please take a moment to type questions/comments into the chat box. The questions will be answered shortly. 52
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.