Www.cmmt.ubc.ca MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3.

Slides:



Advertisements
Similar presentations
ACCESSING AND EXTRACTING CHIP-SEQ AND TF-GENE INTERACTIONS FROM PAZAR
Advertisements

Exploiting transcription factor binding site clustering to identify cis-regulatory modules involved in pattern formation in the Drosophila genome ECS289A.
Methods to read out regulatory functions
Periodic clusters. Non periodic clusters That was only the beginning…
Transcriptional regulation and promoter analysis
Predicting Enhancers in Co-Expressed Genes Harshit Maheshwari Prabhat Pandey.
Computational detection of cis-regulatory modules Stein Aerts, Peter Van Loo, Ger Thijs, Yves Moreau and Bart De Moor Katholieke Universiteit Leuven, Belgium.
Gene Set Enrichment Analysis (GSEA)
Combined analysis of ChIP- chip data and sequence data Harbison et al. CS 466 Saurabh Sinha.
The AMADEUS Motif Discovery Platform C. Linhart, Y. Halperin, R. Shamir Tel-Aviv University ApoSys workshop May ‘ 08 Genome Research 2008.
Bioinformatics Motif Detection Revised 27/10/06. Overview Introduction Multiple Alignments Multiple alignment based on HMM Motif Finding –Motif representation.
Finding Transcription Factor Binding Sites BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG.
Regulatory Motifs. Contents Biology of regulatory motifs Experimental discovery Computational discovery PSSM MEME.
Genome-wide prediction and characterization of interactions between transcription factors in S. cerevisiae Speaker: Chunhui Cai.
Promoter Panel Review. Background related Promoter In genetics, a promoter is a DNA sequence that enables a gene to be transcribed. It may be very long.
Some Statistical Methods For Detecting Clustering In Biological Sequences Some Statistical Methods For Detecting Clustering In Biological Sequences
Protein Functional Site Prediction The identification of protein regions responsible for stability and function is an especially important post-genomic.
Cis/TF discovery for Arabidopsis Aristotelis Tsirigos NYU Computer Science.
PAZAR DATABASE CHIP-SEQ DEPOSIT Wyeth Wasserman.
A Basic Introduction to SFold Kevin MacDonald December 7, 2004 BI420 Final Presentation.
Microarrays and Cancer Segal et al. CS 466 Saurabh Sinha.
The Model To model the complex distribution of the data we used the Gaussian Mixture Model (GMM) with a countable infinite number of Gaussian components.
An analysis of “Alignments anchored on genomic landmarks can aid in the identification of regulatory elements” by Kannan Tharakaraman et al. Sarah Aerni.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
In silico cis-analysis promoter analysis - Promoters and cis-elements - Searching for patterns - Searching redundant patterns.
Biological Sequence Pattern Analysis Liangjiang (LJ) Wang March 8, 2005 PLPTH 890 Introduction to Genomic Bioinformatics Lecture 16.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
Bioinformatics Basics Cyrus Courtesy from LO Leung Yau’s original presentation.
Finding Regulatory Motifs in DNA Sequences
Introduction to computational genomics – hands on course Gene expression (Gasch et al) Unit 1: Mapper Unit 2: Aggregator and peak finder Solexa MNase Reads.
Identifying conserved promoter motifs and transcription factor binding sites in plant promoters Endre Sebestyén, ARI-HAS, Martonvásár, Hungary 26th, November,
Searching for TFBSs with TRANSFAC - Hot topics in Bioinformatics.
1Module 2: Analyzing Gene Lists Canadian Bioinformatics Workshops
MicroRNA Targets Prediction and Analysis. Small RNAs play important roles The Nobel Prize in Physiology or Medicine for 2006 Andrew Z. Fire and Craig.
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Basic features for portal users. Agenda - Basic features Overview –features and navigation Browsing data –Files and Samples Gene Summary pages Performing.
Copyright OpenHelix. No use or reproduction without express written consent1.
© 2005 by Genomatix Software GmbH Genomatix Microarray Evaluation for Gene Regulation Analysis Dr. Martin Seifert Genomatix Software GmbH Landsberger Strasse.
ChIP-on-Chip and Differential Location Analysis Junguk Hur School of Informatics October 4, 2005.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Motifs BCH364C/391L Systems Biology / Bioinformatics – Spring 2015 Edward Marcotte, Univ of Texas at Austin Edward Marcotte/Univ. of Texas/BCH364C-391L/Spring.
Copyright OpenHelix. No use or reproduction without express written consent1.
Computational Genomics and Proteomics Lecture 8 Motif Discovery C E N T R F O R I N T E G R A T I V E B I O I N F O R M A T I C S V U E.
Module networks Sushmita Roy BMI/CS 576 Nov 18 th & 20th, 2014.
Identification of Ortholog Groups by OrthoMCL Protein sequences from organisms of interest All-against-all BLASTP Between Species: Reciprocal best similarity.
Comparative Genomics Gene Regulatory Networks (GRNs) Anil Jegga Biomedical Informatics Contact Information: Anil Jegga Biomedical Informatics Room # 232,
Starting Monday M Oct 29 –Back to BLAST and Orthology (readings posted) will focus on the BLAST algorithm, different types and applications of BLAST; in.
Journal report: High Resolution Model of Transcription Factor- DNA Affinities Improve In Vitro and In Vivo Binding Predictions Paper by: Phadera Gius,
Data Mining the Yeast Genome Expression and Sequence Data Alvis Brazma European Bioinformatics Institute.
Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory.
Copyright OpenHelix. No use or reproduction without express written consent1.
Copyright OpenHelix. No use or reproduction without express written consent1.
How do we represent the position specific preference ? BID_MOUSE I A R H L A Q I G D E M BAD_MOUSE Y G R E L R R M S D E F BAK_MOUSE V G R Q L A L I G.
Module 5: Future 1 Canadian Bioinformatics Workshops
Case Study: Characterizing Diseased States from Expression/Regulation Data Tuck et al., BMC Bioinformatics, 2006.
Computational Biology, Part 3 Representing and Finding Sequence Features using Frequency Matrices Robert F. Murphy Copyright  All rights reserved.
Pattern Discovery and Recognition for Understanding Genetic Regulation Timothy L. Bailey Institute for Molecular Bioscience University of Queensland.
Transcription factor binding motifs (part II) 10/22/07.
Canadian Bioinformatics Workshops
A New Interface to GeneKeyDB Methods for analyzing relationships among proteins based on shared motifs Chris Symons & Xinxia Peng.
Module 2: Analyzing gene lists: over-representation analysis
Amos Tanay Nir Yosef 1st HCA Jamboree, 8/2017
Figure 1. Annotation and characterization of genomic target of p63 in mouse keratinocytes (MK) based on ChIP-Seq. (A) Scatterplot representing high degree.
Motifs BCH364C/394P - Systems Biology / Bioinformatics
De novo Motif Finding using ChIP-Seq
Evolutionary Rewiring of Human Regulatory Networks by Waves of Genome Expansion  Davide Marnetto, Federica Mantica, Ivan Molineris, Elena Grassi, Igor.
Nora Pierstorff Dept. of Genetics University of Cologne
BIOBASE Training TRANSFAC® ExPlain™
Motifs BCH339N Systems Biology / Bioinformatics – Spring 2016
Taichi Umeyama, Takashi Ito  Cell Reports 
Presentation transcript:

MOTIF ENRICHMENT ANALYSIS IN CO- EXPRESSED GENE SETS AND HIGH- THROUGHPUT SEQUENCE SETS Wyeth Wasserman Jan. 18, 2012 opossum.cisreg.ca/oPOSSUM3

Welcome If you encounter any technical difficulties during the webinar –Type a report using the chat option Slide presentation ~20 min Compile Questions as they are submitted and answer them during the final Q&A/discussion period During the discussion session, we’ll allow audience speaking 2

Webinar Format Introduction Walk-Through Summary Q&A 3

INTRODUCTION 4

Overview Given co-expressed gene sets, what are the key mediators of co-expression? –Focus on TFs Web-based software system for motif enrichment analysis –Co-expressed genes or sequences –Multiple sets of analysis methods –Available for human, mouse, fly, worm, yeast 5

Motif Enrichment Analysis 6 BackgroundTarget p=0.04 p=0.55 p=0.66 Finds over-represented TFBS in co-expressed gene sets

What do we need? Region selection –Where to look for enriched binding sites –Use conservation filter to restrict search space TFBS profiles to search for –Need a pool of validated profiles Scoring metrics for enrichment –How to measure motif over-representation 7

Gene CR1CR2CR4CR3 Threshold Genomic Position phastCons Score Conserved Region Selection 8

TFBS Profiles JASPAR 2010: Portales-Casamar et al. Nucleic Acids Research Expanded collection of TFBS profiles –130 vertebrate profiles –105 insect profiles –5 nematode profiles –177 yeast profiles –PBM (104), PBM_HOMEO (176), PBM_BHLH (19) Standardized 2-level TF classification (class, family) 9

Scoring Metrics Z scores –Based on the number of occurrences of the TFBS relative to background –Normalized for sequence length –Simple binomial distribution model Fisher scores –Fisher exact probability test Fisher score = -log(Fisher p-value) –Based on the number of genes containing the TFBS relative to background 10

Additional Metric for Seq-Based KS scores –Kolmogorov-Smirnoff test –Compares the empirical distribution of the distances of the binding sites from the maximum point of confidence (MPC) to the background –Expect real binding sites to be centered around the MPC 11 MPC Foreground Background KS score = -log(KS test p-value)

Analysis Methods 12

WALK-THROUGH 13

14

Human SSA - Input 15

16

17

Human SSA - Results 18

19 TFHNF1A JASPAR IDMA ClassHelix-Turn-Helix FamilyHomeo Tax GroupVertebrates IC GC Content0.259

20 Target Gene Hits19 Target Gene Non-Hits36 Background Gene Hits1113 Background Gene Non-Hits3887 Target TFBS Hits41 Target TFBS Nucleotide Rate Background TFBS Hits2127 Background TFBS Nucleotide Rate0.009

21 Z-score Fisher score3.646

22

oPOSSUM methods 23

24

Human aCSA - Input 25

Human aCSA - Input 26

Human aCSA - Input 27

Human aCSA - Results 28

29

30

TFBS Cluster Analysis 31 TFBS Profile Cluster

Gene CR1CR2CR4CR3 TFBSs TFBS Cluster Hits Merge Overrepresentation Analysis based on merged TFBS cluster hits TFBS Cluster Analysis (TCA) 32

Human TCA – TFBS cluster selection 33

Human TCA - Results 34

TFCluster Info Page 35

36

Seq SSA - Input 37

Seq SSA - Input 38

39

40

41

42

43

44

Seq SSA - Results 45

46 KS score

47

Seq TCA - Input 48

SUMMARY 49

oPOSSUM-3 Web-based system for motif enrichment analysis in co-expressed gene sets and sequences from high-throughput experiments Important functionalities –Gene-based vs. Sequence-based –Single site vs. Anchored combination site –Individual vs. clusters of TFBS profiles –Human, mouse, fly, worm and yeast 50

Development Team 51 Version 1CSAVersion 2Version 3 Ho Sui, SJ Mortimer, JR Arenillas, DJ Brumm, J Walsh, CJ Kennedy, BP Wasserman, WW Huang, S Fulton, DL Arenillas, DJ Perco, P Ho Sui, SJ Mortimer, JR Wasserman, WW Ho Sui, SJ Fulton, DL Arenillas, DJ Kwon, AT Wasserman, WW Kwon, AT Arenillas, DJ Worsely Hunt, R Wasserman, WW

QUESTIONS & ANSWERS Please take a moment to type questions/comments into the chat box. The questions will be answered shortly. 52