Presentation is loading. Please wait.

Presentation is loading. Please wait.

Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Similar presentations


Presentation on theme: "Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3."— Presentation transcript:

1 www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3

2 Welcome If you encounter any technical difficulties during the webinar –Type a report using the chat option Slide presentation ~20 min Compile Questions as they are submitted and answer them during the final Q&A/discussion period During the discussion session, we’ll allow audience speaking 2

3 Webinar Format Introduction Walk-Through Summary Q&A 3

4 INTRODUCTION 4

5 Overview Given co-expressed gene sets, what are the key mediators of co-expression? –Focus on TFs Web-based software system for motif enrichment analysis –Co-expressed genes or sequences –Multiple sets of analysis methods –Available for human, mouse, fly, worm, yeast 5

6 Motif Enrichment Analysis 6 BackgroundTarget p=0.04 p=0.55 p=0.66 Finds over-represented TFBS in co-expressed gene sets

7 What do we need? Region selection –Where to look for enriched binding sites –Use conservation filter to restrict search space TFBS profiles to search for –Need a pool of validated profiles Scoring metrics for enrichment –How to measure motif over-representation 7

8 Gene CR1CR2CR4CR3 Threshold Genomic Position phastCons Score Conserved Region Selection 8

9 TFBS Profiles JASPAR 2010: Portales-Casamar et al. Nucleic Acids Research 2009. Expanded collection of TFBS profiles –130 vertebrate profiles –105 insect profiles –5 nematode profiles –177 yeast profiles –PBM (104), PBM_HOMEO (176), PBM_BHLH (19) Standardized 2-level TF classification (class, family) 9

10 Scoring Metrics Z scores –Based on the number of occurrences of the TFBS relative to background –Normalized for sequence length –Simple binomial distribution model Fisher scores –Fisher exact probability test Fisher score = -log(Fisher p-value) –Based on the number of genes containing the TFBS relative to background 10

11 Additional Metric for Seq-Based KS scores –Kolmogorov-Smirnoff test –Compares the empirical distribution of the distances of the binding sites from the maximum point of confidence (MPC) to the background –Expect real binding sites to be centered around the MPC 11 MPC Foreground Background KS score = -log(KS test p-value)

12 Analysis Methods 12

13 WALK-THROUGH 13

14 14 http://opossum.cisreg.ca/oPOSSUM3

15 Human SSA - Input 15

16 16

17 17

18 Human SSA - Results 18

19 19 TFHNF1A JASPAR IDMA0046.1 ClassHelix-Turn-Helix FamilyHomeo Tax GroupVertebrates IC15.548 GC Content0.259

20 20 Target Gene Hits19 Target Gene Non-Hits36 Background Gene Hits1113 Background Gene Non-Hits3887 Target TFBS Hits41 Target TFBS Nucleotide Rate0.0269 Background TFBS Hits2127 Background TFBS Nucleotide Rate0.009

21 21 Z-score15.134 Fisher score3.646

22 22

23 oPOSSUM methods 23

24 24

25 Human aCSA - Input 25

26 Human aCSA - Input 26

27 Human aCSA - Input 27

28 Human aCSA - Results 28

29 29

30 30

31 TFBS Cluster Analysis 31 TFBS Profile Cluster

32 Gene CR1CR2CR4CR3 TFBSs TFBS Cluster Hits Merge Overrepresentation Analysis based on merged TFBS cluster hits TFBS Cluster Analysis (TCA) 32

33 Human TCA – TFBS cluster selection 33

34 Human TCA - Results 34

35 TFCluster Info Page 35

36 36

37 Seq SSA - Input 37

38 Seq SSA - Input 38

39 39

40 40

41 41

42 42

43 43

44 44

45 Seq SSA - Results 45

46 46 KS score

47 47

48 Seq TCA - Input 48

49 SUMMARY 49

50 oPOSSUM-3 Web-based system for motif enrichment analysis in co-expressed gene sets and sequences from high-throughput experiments Important functionalities –Gene-based vs. Sequence-based –Single site vs. Anchored combination site –Individual vs. clusters of TFBS profiles –Human, mouse, fly, worm and yeast 50

51 Development Team 51 Version 1CSAVersion 2Version 3 Ho Sui, SJ Mortimer, JR Arenillas, DJ Brumm, J Walsh, CJ Kennedy, BP Wasserman, WW Huang, S Fulton, DL Arenillas, DJ Perco, P Ho Sui, SJ Mortimer, JR Wasserman, WW Ho Sui, SJ Fulton, DL Arenillas, DJ Kwon, AT Wasserman, WW Kwon, AT Arenillas, DJ Worsely Hunt, R Wasserman, WW

52 QUESTIONS & ANSWERS Please take a moment to type questions/comments into the chat box. The questions will be answered shortly. 52


Download ppt "Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3."

Similar presentations


Ads by Google