Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen
2 Background : eQTL
Expression quantitative trait loci (eQTLs) 3 Nat. Rev. Cardiol. doi: /nrcardio Background: eQTL
Motivation previous studies: the relationship between pairs of correlated quantitative traits such as mRNA and clinical phenotypes (Mehrabian et al, 2005; Schadt et al, 2005; Yang et al, 2009). We applied a variation of a previously described statistical procedure (Schadt et al, 2005) to identify mRNAs that respond to changes in miRNA expression levels (miRNA targets), as well as mRNAs that perturb expression levels of miRNAs. 4
Summary Integrative genomics and genetics approaches have proven to be a useful tool in elucidating the complex relationships often found in gene regulatory networks. Our analysis reveals that the transcript abundances of miRNAs are subject to regulatory control by many more loci than previously observed for mRNA expression. our results : miRNAs exist as highly connected hub-nodes and function as key sensors within the transcriptional network. miRNAs can act cooperatively or redundantly to regulate a given pathway and miRNAs play a subtle role by dampening expression of their target gene through the use of feedback loops. 5
Idea and data This approach leverages DNA sequence variation as a causal anchor to identify the best fitting model that describes the relationship between pairs of traits (miRNA, mRNA) that are linked to the same genetic locus 6 Using an F2 mouse cross, we collected both mRNA expression and genotype information from liver. the mRNA and 183 miRNA transcripts. From the panel of 5000 SNP markers, 2804 markers informative for the BXD cross and evenly spaced across all chromosomes, excluding the Y chromosome, were selected for use in all analyses. MSB2011.SI\msb s5.xls (markers) MSB2011.SI\msb s5.xls
Methods ①Linkage analysis techniques were then applied to infer regulatory relationships between DNA loci and the two classes of expression traits, that is, mRNA and miRNAs. ②characterized the miRNA–mRNA relationships using a simple correlation analysis and ③applied a variation of a previously developed statistical inference technique to infer regulatory relationships between mRNA and miRNAs. 7
Figure 1 Top 15 microRNA expression quantitative trait loci (eQTL) plots. X- axis represents genomic coordinates in basepairs. 8 mRNA and miRNA eQTL mapping in the BXD mouse study Using standard parametric linkage analysis techniques, we treated the expression levels of both mRNAs and miRNAs as quantitative traits to identify regulatory loci generally referred to as expression quantitative trait loci (eQTLs). LOD score: LOD = Z = log 10 (probability of birth sequence with a given linkage value/ probability of birth sequence with no linkage)
In contrast, we identified 5293 eQTLs for 5107 of the mRNA transcripts (~13%) at a LOD score threshold of >4.9 (corresponding to an FDR <5%), Of these, 2712 (or 37%) were cis eQTLs. Thus by percentage, at the 10% FDR threshold, more than three times as many mRNA eQTL were detected when compared with the miRNA expression traits. 9 mRNA and miRNA eQTL mapping in the BXD mouse study
For each miRNA, we identified a set of mRNA expression traits that contained at least one hexamer region within the 3’ UTR. These gene sets were then filtered to contain only genes that were significantly negatively correlated with the corresponding miRNA. 10 Decrease the FDR of detecting miRNA eQTLs
Figure 2 Detection thresholds of miRNA eQTLs. (B) Illustration of procedure used to increase the statistical powerof detecting miRNA eQTLs. Messenger RNA expression traits that were negatively correlated with a given miRNA and contained at least one corresponding hexamer seed region in the 3’ UTR. (C) False-discovery rates (FDRs) as a function of LOD score threshold for miRNA eQTLs. 11 3’ UTR negatively correlated Decrease the FDR of detecting miRNA eQTLs
We next sought to determine if there were key loci involved in regulating many miRNAs 12
Supplementary figure 1. Distribution of eQTLs in 2 and 20 cM bins for mRNA and miRNA eQTLs, respectively, across the genome at a 10% FDR threshold. Top panel illustrates all mRNA eQTLs with LOD scores > 4.3 while the bottom panel illustrates all miRNA eQTLs with LOD scores > Distribution of eQTLs for mRNA and miRNA
we identified a strong eQTL hotspot on chr 13 and a weaker hotspot on chr 17. Of the 72 eQTLs identified, 42% mapped to chr 13, suggesting the presence of a key regulator influencing the expression levels of many miRNAs. 14 Key loci regulating many miRNAs and mRNAs Overall, we detected seven mRNA eQTL hotspots where each hotspot is defined to comprised 41% of the total number of eQTLs (computed using a Poisson distribution with mean 9.52). These hotspots localize to chr 2, 4, 7, 9, 12, 13, and 17.
In order to better compare the location of miRNA eQTL hotspots to mRNA eQTL hotspots, we recomputed the probabilities of an miRNA eQTL hotspot using 2 cM bins (1cM 约为 1000kb). eQTL hotspots for miRNAs and mRNAs on chromosome 13 are <4 cM apart. 15 Overlap eQTLs for miRNAs and mRNAs
mRNAs and 183 miRNAs we identified miRNA–mRNA trait pairs that were significantly correlated at an FDR 0.1%(P-value <3.98e-4) A number of miRNAs(hub-nodes) were very broadly connected to tens of thousands of mRNAs. Each miRNA, ~2545 mRNA transcripts. Each miRNAs,at least one mRNA transcript. 16 Correlation analysis between miRNA and mRNA expression levels in mice
miRNA signature set: compute the seed enrichment levels for each set 17
The distribution of seed enrichment 18 Distribution of seed enrichment using the full miRNA–mRNA correlation results. Distribution of seed enrichment using only positive correlations between miRNA– mRNAs. Distribution of seed enrichment using only negative correlations between miRNA–mRNAs. Distribution of seed enrichment using the full miRNA–mRNA correlation results. Distribution of seed enrichment using only positive correlations between miRNA– mRNAs. Distribution of seed enrichment using only negative correlations between miRNA–mRNAs. Feedback loops
Supplementary figure 3. Summary of miRNA‐mRNA correlation analysis. A. Illustration of the enrichment analysis. Fisher's exact test statistics for all pairwise comparisons between each set of miRNA signature sets and each category in GO Biological Process are computed. P‐values are corrected for multiple hypothesis testing using a Bonferroni's correction. The same analysis is repeated using sets in KEGG Pathways and Body Atlas Tissue Enrichment databases. Significant enrichment between the sets are defined as those with a corrected p‐value of less than B. Histogram showing the top 10 categories for GO Biological Process category in terms of number of enriched miRNAs signature sets. C. Histogram showing the number of enriched miRNA signature sets in each KEGG pathway category. 19 Enrichment analysis using (GO,KEGG)
20
21 We opted to annotate the sets of miRNA signature sets using only genes that contained at least one 6mer seed region in the 3’UTR region of the gene.
First, we identified all miRNA and mRNA trait pairs linked to a common genomic region at an LOD score threshold of 3.4 Next, we identified miRNA–mRNA trait pairs with closely linked eQTLs(<15 cM). Causal inference: (a) causal, where an eQTL for miRNA expression leads to changes in mRNA expression (miRNA targets); (b) reactive, where eQTL for mRNA levels leads to changes in miRNA expression (miRNA regulators); and (c) independent, eQTL independently drive miRNA and mRNA levels (independent). 22 Causal associations between miRNAs and mRNAs
23 Inference Method (Schadt et al, 2005;)
BXD mice: F2 offspring from C57BL/6J (B6) and DBA/2J (DBA). C57BL/6J: ob mutation in the C57BL/6J mouse background (B6-ob/ob) causes obesity, but only mild and transient diabetes (Coleman and Hummel, 1973). DBA/2J: mice show a low susceptibility to developing atherosclerotic aortic lesions Gene expression Liver extracted at 16 months of age 23,574 gene expression measured using Agilent arrays Genetic loci 139 autosomal genetic loci (microsatellite markers, 13 cM) Disease Omental fat pad mass (OFPM) traits (>4) Data 24
Model 25
–Causal Model (M1) –Reactive Model (M2) –Independent Model (M3) L mRNA Disease L mRNA Disease L mRNA Models for causality 26
Causal Model – Joint Probability – Likelihood L: Genotype R: mRNA level D: Disease L mRNA Disease M1 Likelihood 27
Reactive Model –Joint probability –Likelihood L mRNA Disease L: Genotype R: mRNA level D: Disease M2 Likelihood 28
Independent Model –Joint Probability –Likelihood L : Genotype R: mRNA level D: Disease L Disease mRNA M3 Likelihood 29
Likelihood-based Causality Model Selection (LCMS) –Calculating the Likelihood based on the data. –The model best supported by the data : smallest AIC (Akaike Information Criterion) Model Selection 30
Simulation: simple regression models 31
The model with an AIC significantly smaller than the AIC’s of the competing models was noted. L T1 Simulation study 32
Application on real data (A) predicted regulators; (B) predicted targets; (C) log ratio of the number of predicted regulators over the number ofpredicted targets. 33
34 The error in T2(mRNA) Is larger than in T1(miRNA). The error in T2(mRNA) Is larger than in T1(miRNA). microarray data,qPCR while the number of predicted causal regulators of miRNA is likely to be an under estimation of the actual number. microarray data,qPCR while the number of predicted causal regulators of miRNA is likely to be an under estimation of the actual number. Simulations
Conclusion eQTL(miRNAs,mRNAs) correlation analysis hub-nodes cooperatively feedback loops Positive correlations between miRNA–mRNAs Loci->mRNA->miRNA 35
36