Presentation is loading. Please wait.

Presentation is loading. Please wait.

ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001.

Similar presentations

Presentation on theme: "ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001."— Presentation transcript:

1 ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001


3 Layout Genome-scale evidence from microarray measurements may be used to identify regulatory interactions between TFs and targets Hu et al used a genetic approach to identify targets of transcription factors in Yeast and reconstruct a functional regulatory network Reimand et al re-analyzed Hu’s data using improved statistical techniques

4 Hu et al’s work Grew each of 263 transcription factor knockout strains and compared mRNA expression of each of these strains with a wildtype strain using microarrays Defined unrefined transcription factor target network as the cumulative set of significantly differentially expressed genes in each deletion strain. There was overlap between transcription factor targets identified in the unrefined network and targets identified by ChIP-chip

5 2-level Refinement First level of network refinement – If TF A activated TF B and gene M, B activated gene M, and if the confidence of A regulating gene M was lower than for B regulating gene M, then the regulation of gene M by A was presumed to be indirect and was therefore erased Additional refinement step – Similar to previous step, except that the indirect edge that was removed bridged a three-step direct interaction series at the preceding level, resulting in a level 3 refined network Note that the logical consistency for regulatory edges was maintained at all times


7 Hu et al’s work When the transcription factor bound to a promoter was deleted, the expression of the downstream gene was much more likely to be affected than the background Expression from promoters that were detectably occupied by a single TF were even more likely to be affected by deletion of that potentially major or sole TF Thus, there was significant overlap between binding targets defined by ChIP-chip and functional targets defined by TF deletion


9 Hu et al’s work – problems However, Hu et al ‘s study used relatively dated and insensitive approaches for microarray data processing As a result the published P-values and target-gene ranking are likely to be unreliable – P-values were not corrected for multiple-testing – Lack of background and print-tip correction during normalization Reimand et al re-analyzed the same dataset with the state-of- art software and obtained a much larger network Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

10 False Discovery Rate False discovery rate is a statistical method used in multiple hypothesis testing to correct for multiple comparisons. q- value is defined to be the FDR analogue of p-value FDR is the expected proportion of false positives among all significant hypotheses For example, if 1000 observations were experimentally predicted to be different, and FDR for these observations was 0.1, then 100 observations would be expected to be false FDR is determined from the observed p-value distribution, and hence is adaptive to the amount of records

11 Redo the Preprocessing Microarrays were normalized using the VSN package, including print-tip and background correction Differential expression was calculated using a moderated eBayes t-test as implemented in the Limma Bioconductor package FDR cut-off of 0.05 was used to detect significant differential gene expression Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

12 Re-analyze TF binding data DNA–protein interactions derived from ChIP-chip experiments were obtained and with a P_value<0.001 were considered A set of ‘trusted’ position weight matrices (PWMs) for 72 regulatory factors were derived by running the PROCSE and PhyloGibbs algorithms on a set of experimentally derived TF binding sites from SCPD These PWMs were then used to scan multiple alignments of each intergenic region in Yeast with the orthologous regions of another four Yeast species Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

13 Re-analyze knockout expression and ChIP binding data Overlap between TF-binding and TF knockout data – Collect binding sites for 142 TFs, comprising 5,188 ChIP- chip interactions and 17,091 motif predictions – Calculate the intersection between the list of differentially expressed genes from the TF knockout and targets identified by ChIP-chip or binding-site predictions –  2,230 regulation relations 13 Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

14 Re-analyze knockout expression and ChIP binding data Checked the expression levels of the TFs – Intuitively one expects the TF under consideration to have lower expression in the mutant strain compared with the wild type strain – confirms this for 155 TFs – 78 TFs display a negative fold change at statistically non- significant levels – 36 TFs are lethal 14 Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

15 15 Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

16 Re-analyze knockout expression and ChIP binding data Examine functional annotations of differentially expressed genes – As most TFs are considered to regulate distinct cellular processes, their target genes should be associated with a coherent set of molecular and biological functions – Used g:Profiler to identify GO, KEGG and Reactome pathway annotations – Across all TF knockouts, this analysis has a higher score than the original analysis 16 Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

17 17 Reimand et al, Nucleic Acids Research, 2010, Vol. 38, No. 14 pp 4768–4777

18 SUMMARY - exploring biological networks

19 Topology Approaches What’s the next after constructing biological networks? First of all, simple approaches – Degree, betweenness, clustering coefficient, topological coefficient, shortest path – Shared neighbors, neighborhood connectivity, closeness centrality 19

20 Clustering Coefficient Clustering coefficient is a measure of degree to which nodes in a graph tend to cluster together Clustering coefficient (local version): does my neighbors connect with each other? Evidence suggests that in most real-world networks, nodes tend to create tightly knit groups characterized by a relatively high density of ties where k i is the number of neighbors of node i and e i is the number of connected pairs between all neighbors of node i Luciano da F. Costa, Francisco A. Rodrigues, Alexandre S. Cristino. Complex networks: the key to systems biology. Genet. Mol. Biol. vol.31 no.3. 2008; 20

21 Average Clustering Coefficient Distribution Nodes with only a few links have a high C(k) and belong to highly interconnected small modules By contrast, the highly connected hubs have a low C(k), with their role being to link different, and otherwise not communicating, modules 21 Define function C(k) as the average clustering coefficient of all nodes with k links For many real networks C(k) ~ k –1

22 closeness centrality Closeness centrality is a measure of how many steps is required to access every other node from a given node Closeness centrality: How long it will take information to spread from a given node to other reachable nodes in the network? 22 where d G (i, t) is the length of the shortest path from i to t, and V is the set of nodes in G Freeman, 1978; Opsahl et al., 2010; Wasserman and Faust, 1994

23 Distribution of closeness centrality 23 Closeness centrality are successful in distinguishing the important members of the community Its distribution resembles a normal curve, while the other centrality measures have a long tail distribution similar to a power law

24 Limitations of simple approaches Study each node/edge individually; cannot apply enrichment study Topology study only; difficult to integrate other knowledge Nodes with high scores <> key genes/proteins 24 Study a group of genes simultaneously

25 Advanced approaches Dense subgraph detection Network motif detection Graph clustering Graph classification etc. 25

26 Dense subgraph detection 26 Software available at

27 Dense subgraph detection A subgraph is considered coherent and dense if and only if every edge is well supported, and its corresponding second- order graph is dense 27 CODENSE

28 Network Motif Detection 28

29 Network Motif Detection

30 Perform graph join operation to find repeated size-k graphs Join each tree with it’s cousins to produce frequent motif candidates C k. t 4_1 t 4_2 & & & h1h1 h2h2 h3h3 h4h4 h5h5 30

31 Graph Clustering Graph clustering is an organization process with the goal to put similar nodes together; the result is a partition of the network into a set of communities MCL algorithm is a fast and scalable unsupervised cluster algorithm for graphs based on simulation of stochastic flow in graphs, available at 31 Van Dongen, S. (2000) Graph Clustering by Flow Simulation. PhD Thesis, University of Utrecht, The Netherlands

32 Graph Clustering 32 Graph Graph Clusters

Download ppt "ChIP-seq and its applications in GRN construction Jin Chen 2012 Fall CSE891-001."

Similar presentations

Ads by Google