Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen 2012.04.20 1.

Slides:



Advertisements
Similar presentations
Microarray statistical validation and functional annotation
Advertisements

Linkage and Genetic Mapping
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Genetic Analysis of Genome-wide Variation in Human Gene Expression Morley M. et al. Nature 2004,430: Yen-Yi Ho.
SHI Meng. Abstract The genetic basis of gene expression variation has long been studied with the aim to understand the landscape of regulatory variants,
1 Harvard Medical School Mapping Transcription Mechanisms from Multimodal Genomic Data Hsun-Hsien Chang, Michael McGeachie, and Marco F. Ramoni Children.
Gene Set Enrichment Analysis Genome 559: Introduction to Statistical and Computational Genomics Elhanan Borenstein.
Perspectives from Human Studies and Low Density Chip Jeffrey R. O’Connell University of Maryland School of Medicine October 28, 2008.
Basics of Linkage Analysis
An integrative genomics approach to infer causal associations between gene expression and disease Schadt, E. E., Lamb, J., Yang, X., Zhu, J., Edwards,
Gene Set Analysis 09/24/07. From individual gene to gene sets Finding a list of differentially expressed genes is only the starting point. Suppose we.
Using biological networks to search for interacting loci in genome-wide association studies Mathieu Emily et. al. European journal of human genetics, e-pub.
CS 374: Relating the Genetic Code to Gene Expression Sandeep Chinchali.
BIO341 Meiotic mapping of whole genomes (methods for simultaneously evaluating linkage relationships among large numbers of loci)
Office hours Wednesday 3-4pm 304A Stanley Hall Review session 5pm Thursday, Dec. 11 GPB100.
Identification of obesity-associated intergenic long noncoding RNAs
Proteomics Informatics – Data Analysis and Visualization (Week 13)
Linkage and LOD score Egmond, 2006 Manuel AR Ferreira Massachusetts General Hospital Harvard Medical School Boston.
Geuvadis RNAseq analysis at UNIGE Analysis plans
Epigenome 1. 2 Background: GWAS Genome-Wide Association Studies 3.
Radiogenomics in glioblastoma multiforme
Gene Set Enrichment Analysis (GSEA)
A systems biology approach to the identification and analysis of transcriptional regulatory networks in osteocytes Angela K. Dean, Stephen E. Harris, Jianhua.
Quantile-based Permutation Thresholds for QTL Hotspots Brian S Yandell and Elias Chaibub Neto 17 March © YandellMSRC5.
Genetic Mapping Oregon Wolfe Barley Map (Szucs et al., The Plant Genome 2, )
Regulation of gene expression in the mammalian eye and its relevance to eye disease Todd Scheetz et al. Presented by John MC Ma.
Supplemental Figure 1A. A small fraction of genes were mapped to >=20 SNPs. Supplemental Figure 1B. The density of distance from the position of an associated.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
Differential analysis of Eigengene Networks: Finding And Analyzing Shared Modules Across Multiple Microarray Datasets Peter Langfelder and Steve Horvath.
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Affymetrix 2.0 miRNA arrays on lung tissue RNA n=3-4 mice/strain 92 differentially expressed miRNAs 38 miRNAs both differentially and highly expressed.
Mapping and cloning Human Genes. Finding a gene based on phenotype ’s of DNA markers mapped onto each chromosome – high density linkage map. 2.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
1 Before considering selection, it’s important to characterize how gene expression varies within and between species. What evolutionary forces act on gene.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
Pedagogical Objectives Bioinformatics/Neuroinformatics Unit Review of genetics Review/introduction of statistical analyses and concepts Introduce QTL.
Nonlinear differential equation model for quantification of transcriptional regulation applied to microarray data of Saccharomyces cerevisiae Vu, T. T.,
Genetic correlations and associative networks for CNS transcript abundance and neurobehavioral phenotypes in a recombinant inbred mapping panel Elissa.
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Chapter 22 - Quantitative genetics: Traits with a continuous distribution of phenotypes are called continuous traits (e.g., height, weight, growth rate,
Statistical Analysis for Expression Experiments Heather Adams BeeSpace Doctoral Forum Thursday May 21, 2009.
13 October 2004Statistics: Yandell © Inferring Genetic Architecture of Complex Biological Processes Brian S. Yandell 12, Christina Kendziorski 13,
Genetics of Gene Expression BIOS Statistics for Systems Biology Spring 2008.
Advances and challenges in computational modeling and statistical learning of biological systems Qi Liu Department of Biomedical Informatics Vanderbilt.
A Fine Mapping Theorem to Refine Results from Association Genetics Studies S.J. Schrodi, V.E. Garcia, C.M. Rowland Celera, Alameda, CA ABSTRACT Justification.
Understanding GWAS SNPs Xiaole Shirley Liu Stat 115/215.
GENOME ORGANIZATION AS REVEALED BY GENOME MAPPING WHY MAP GENOMES? HOW TO MAP GENOMES?
EQTLs.
Quantile-based Permutation Thresholds for QTL Hotspots
University of Tennessee-Memphis
upstream vs. ORF binding and gene expression?
Figure 3. Active enhancers located in intergenic DMRs
Gene Hunting: Design and statistics
4. modern high throughput biology
Genetic-Variation-Driven Gene-Expression Changes Highlight Genes with Important Functions for Kidney Disease  Yi-An Ko, Huiguang Yi, Chengxiang Qiu, Shizheng.
Inferring Genetic Architecture of Complex Biological Processes Brian S
Conserved Seed Pairing, Often Flanked by Adenosines, Indicates that Thousands of Human Genes are MicroRNA Targets  Benjamin P. Lewis, Christopher B. Burge,
Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits  Nicholas Mancuso, Huwenbo Shi, Pagé.
In these studies, expression levels are viewed as quantitative traits, and gene expression phenotypes are mapped to particular genomic loci by combining.
Systematic Analysis of Tissue-Restricted miRISCs Reveals a Broad Role for MicroRNAs in Suppressing Basal Activity of the C. elegans Pathogen Response 
Integrative Multi-omic Analysis of Human Platelet eQTLs Reveals Alternative Start Site in Mitofusin 2  Lukas M. Simon, Edward S. Chen, Leonard C. Edelstein,
Expression Quantitative Trait Loci Analysis Identifies Associations Between Genotype and Gene Expression in Human Intestine  Boyko Kabakchiev, Mark S.
Sherlock: Detecting Gene-Disease Associations by Matching Patterns of Expression QTL and GWAS  Xin He, Chris K. Fuller, Yi Song, Qingying Meng, Bin Zhang,
Structural Architecture of SNP Effects on Complex Traits
GWAS-eQTL signal colocalisation methods
Genetic and Epigenetic Regulation of Human lincRNA Gene Expression
Presentation transcript:

Characterizing the role of miRNAs within gene regulatory networks using integrative genomics techniques Min Wenwen

2 Background : eQTL

Expression quantitative trait loci (eQTLs) 3 Nat. Rev. Cardiol. doi: /nrcardio Background: eQTL

Motivation  previous studies: the relationship between pairs of correlated quantitative traits such as mRNA and clinical phenotypes (Mehrabian et al, 2005; Schadt et al, 2005; Yang et al, 2009).  We applied a variation of a previously described statistical procedure (Schadt et al, 2005) to identify mRNAs that respond to changes in miRNA expression levels (miRNA targets), as well as mRNAs that perturb expression levels of miRNAs. 4

Summary  Integrative genomics and genetics approaches have proven to be a useful tool in elucidating the complex relationships often found in gene regulatory networks.  Our analysis reveals that the transcript abundances of miRNAs are subject to regulatory control by many more loci than previously observed for mRNA expression. our results :  miRNAs exist as highly connected hub-nodes and function as key sensors within the transcriptional network.  miRNAs can act cooperatively or redundantly to regulate a given pathway and  miRNAs play a subtle role by dampening expression of their target gene through the use of feedback loops. 5

Idea and data This approach leverages DNA sequence variation as a causal anchor to identify the best fitting model that describes the relationship between pairs of traits (miRNA, mRNA) that are linked to the same genetic locus 6  Using an F2 mouse cross, we collected both mRNA expression and genotype information from liver.  the mRNA and 183 miRNA transcripts.  From the panel of 5000 SNP markers, 2804 markers informative for the BXD cross and evenly spaced across all chromosomes, excluding the Y chromosome, were selected for use in all analyses.  MSB2011.SI\msb s5.xls (markers) MSB2011.SI\msb s5.xls

Methods ①Linkage analysis techniques were then applied to infer regulatory relationships between DNA loci and the two classes of expression traits, that is, mRNA and miRNAs. ②characterized the miRNA–mRNA relationships using a simple correlation analysis and ③applied a variation of a previously developed statistical inference technique to infer regulatory relationships between mRNA and miRNAs. 7

Figure 1 Top 15 microRNA expression quantitative trait loci (eQTL) plots. X- axis represents genomic coordinates in basepairs. 8 mRNA and miRNA eQTL mapping in the BXD mouse study Using standard parametric linkage analysis techniques, we treated the expression levels of both mRNAs and miRNAs as quantitative traits to identify regulatory loci generally referred to as expression quantitative trait loci (eQTLs). LOD score: LOD = Z = log 10 (probability of birth sequence with a given linkage value/ probability of birth sequence with no linkage)

 In contrast, we identified 5293 eQTLs for 5107 of the mRNA transcripts (~13%) at a LOD score threshold of >4.9 (corresponding to an FDR <5%),  Of these, 2712 (or 37%) were cis eQTLs.  Thus by percentage, at the 10% FDR threshold, more than three times as many mRNA eQTL were detected when compared with the miRNA expression traits. 9 mRNA and miRNA eQTL mapping in the BXD mouse study

 For each miRNA, we identified a set of mRNA expression traits that contained at least one hexamer region within the 3’ UTR.  These gene sets were then filtered to contain only genes that were significantly negatively correlated with the corresponding miRNA. 10 Decrease the FDR of detecting miRNA eQTLs

Figure 2 Detection thresholds of miRNA eQTLs. (B) Illustration of procedure used to increase the statistical powerof detecting miRNA eQTLs. Messenger RNA expression traits that were negatively correlated with a given miRNA and contained at least one corresponding hexamer seed region in the 3’ UTR. (C) False-discovery rates (FDRs) as a function of LOD score threshold for miRNA eQTLs. 11 3’ UTR negatively correlated Decrease the FDR of detecting miRNA eQTLs

We next sought to determine if there were key loci involved in regulating many miRNAs 12

Supplementary figure 1. Distribution of eQTLs in 2 and 20 cM bins for mRNA and miRNA eQTLs, respectively, across the genome at a 10% FDR threshold. Top panel illustrates all mRNA eQTLs with LOD scores > 4.3 while the bottom panel illustrates all miRNA eQTLs with LOD scores > Distribution of eQTLs for mRNA and miRNA

 we identified a strong eQTL hotspot on chr 13 and a weaker hotspot on chr 17.  Of the 72 eQTLs identified, 42% mapped to chr 13, suggesting the presence of a key regulator influencing the expression levels of many miRNAs. 14 Key loci regulating many miRNAs and mRNAs  Overall, we detected seven mRNA eQTL hotspots where each hotspot is defined to comprised 41% of the total number of eQTLs (computed using a Poisson distribution with mean 9.52).  These hotspots localize to chr 2, 4, 7, 9, 12, 13, and 17.

 In order to better compare the location of miRNA eQTL hotspots to mRNA eQTL hotspots, we recomputed the probabilities of an miRNA eQTL hotspot using 2 cM bins (1cM 约为 1000kb).  eQTL hotspots for miRNAs and mRNAs on chromosome 13 are <4 cM apart. 15 Overlap eQTLs for miRNAs and mRNAs

 mRNAs and 183 miRNAs  we identified miRNA–mRNA trait pairs that were significantly correlated at an FDR 0.1%(P-value <3.98e-4)  A number of miRNAs(hub-nodes) were very broadly connected to tens of thousands of mRNAs.  Each miRNA, ~2545 mRNA transcripts.  Each miRNAs,at least one mRNA transcript. 16 Correlation analysis between miRNA and mRNA expression levels in mice

miRNA signature set: compute the seed enrichment levels for each set 17

The distribution of seed enrichment 18  Distribution of seed enrichment using the full miRNA–mRNA correlation results.  Distribution of seed enrichment using only positive correlations between miRNA– mRNAs.  Distribution of seed enrichment using only negative correlations between miRNA–mRNAs.  Distribution of seed enrichment using the full miRNA–mRNA correlation results.  Distribution of seed enrichment using only positive correlations between miRNA– mRNAs.  Distribution of seed enrichment using only negative correlations between miRNA–mRNAs. Feedback loops

Supplementary figure 3. Summary of miRNA‐mRNA correlation analysis. A. Illustration of the enrichment analysis. Fisher's exact test statistics for all pairwise comparisons between each set of miRNA signature sets and each category in GO Biological Process are computed. P‐values are corrected for multiple hypothesis testing using a Bonferroni's correction. The same analysis is repeated using sets in KEGG Pathways and Body Atlas Tissue Enrichment databases. Significant enrichment between the sets are defined as those with a corrected p‐value of less than B. Histogram showing the top 10 categories for GO Biological Process category in terms of number of enriched miRNAs signature sets. C. Histogram showing the number of enriched miRNA signature sets in each KEGG pathway category. 19 Enrichment analysis using (GO,KEGG)

20

21 We opted to annotate the sets of miRNA signature sets using only genes that contained at least one 6mer seed region in the 3’UTR region of the gene.

 First, we identified all miRNA and mRNA trait pairs linked to a common genomic region at an LOD score threshold of 3.4  Next, we identified miRNA–mRNA trait pairs with closely linked eQTLs(<15 cM). Causal inference:  (a) causal, where an eQTL for miRNA expression leads to changes in mRNA expression (miRNA targets);  (b) reactive, where eQTL for mRNA levels leads to changes in miRNA expression (miRNA regulators); and  (c) independent, eQTL independently drive miRNA and mRNA levels (independent). 22 Causal associations between miRNAs and mRNAs

23 Inference Method (Schadt et al, 2005;)

BXD mice: F2 offspring from C57BL/6J (B6) and DBA/2J (DBA). C57BL/6J: ob mutation in the C57BL/6J mouse background (B6-ob/ob) causes obesity, but only mild and transient diabetes (Coleman and Hummel, 1973). DBA/2J: mice show a low susceptibility to developing atherosclerotic aortic lesions Gene expression Liver extracted at 16 months of age 23,574 gene expression measured using Agilent arrays Genetic loci 139 autosomal genetic loci (microsatellite markers, 13 cM) Disease  Omental fat pad mass (OFPM) traits (>4) Data 24

Model 25

–Causal Model (M1) –Reactive Model (M2) –Independent Model (M3) L mRNA Disease L mRNA Disease L mRNA Models for causality 26

Causal Model – Joint Probability – Likelihood L: Genotype R: mRNA level D: Disease L mRNA Disease M1 Likelihood 27

Reactive Model –Joint probability –Likelihood L mRNA Disease L: Genotype R: mRNA level D: Disease M2 Likelihood 28

Independent Model –Joint Probability –Likelihood L : Genotype R: mRNA level D: Disease L Disease mRNA M3 Likelihood 29

 Likelihood-based Causality Model Selection (LCMS) –Calculating the Likelihood based on the data. –The model best supported by the data : smallest AIC (Akaike Information Criterion) Model Selection 30

Simulation: simple regression models 31

The model with an AIC significantly smaller than the AIC’s of the competing models was noted. L T1 Simulation study 32

Application on real data (A) predicted regulators; (B) predicted targets; (C) log ratio of the number of predicted regulators over the number ofpredicted targets. 33

34 The error in T2(mRNA) Is larger than in T1(miRNA). The error in T2(mRNA) Is larger than in T1(miRNA). microarray data,qPCR while the number of predicted causal regulators of miRNA is likely to be an under estimation of the actual number. microarray data,qPCR while the number of predicted causal regulators of miRNA is likely to be an under estimation of the actual number. Simulations

Conclusion  eQTL(miRNAs,mRNAs)  correlation analysis  hub-nodes  cooperatively  feedback loops  Positive correlations between miRNA–mRNAs  Loci->mRNA->miRNA 35

36