Genetical Genomics in the Mouse

Slides:



Advertisements
Similar presentations
Richard M. Jacobs, OSA, Ph.D.
Advertisements

Shibing Deng Pfizer, Inc. Efficient Outlier Identification in Lung Cancer Study.
Original Figures for "Molecular Classification of Cancer: Class Discovery and Class Prediction by Gene Expression Monitoring"
1 QTL mapping in mice Lecture 10, Statistics 246 February 24, 2004.
Microarray technology and analysis of gene expression data Hillevi Lindroos.
Detecting Differentially Expressed Genes Pengyu Hong 09/13/2005.
Differentially expressed genes
Theoretical and experimental comparisons of gene expression indexes for oligonucleotide microarrays Division of Human Cancer Genetics Ohio State University.
Different Expression Multiple Hypothesis Testing STAT115 Spring 2012.
Multiple testing correction
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Probe-Level Data Normalisation: RMA and GC-RMA Sam Robson Images courtesy of Neil Ward, European Application Engineer, Agilent Technologies.
Proliferation cluster (G12) Figure S1 A The proliferation cluster is a stable one. A dendrogram depicting results of cluster analysis of all varying genes.
Regulation of gene expression in the mammalian eye and its relevance to eye disease Todd Scheetz et al. Presented by John MC Ma.
Scenario 6 Distinguishing different types of leukemia to target treatment.
Bioinformatics Expression profiling and functional genomics Part II: Differential expression Ad 27/11/2006.
CS5263 Bioinformatics Lecture 20 Practical issues in motif finding Final project.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Summarization of Oligonucleotide Expression Arrays BIOS Winter 2010.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Comp. Genomics Recitation 10 4/7/09 Differential expression detection.
Pedagogical Objectives Bioinformatics/Neuroinformatics Unit Review of genetics Review/introduction of statistical analyses and concepts Introduce QTL.
Empirical Bayes Analysis of Variance Component Models for Microarray Data S. Feng, 1 R.Wolfinger, 2 T.Chu, 2 G.Gibson, 3 L.McGraw 4 1. Department of Statistics,
Lecture 23: Quantitative Traits III Date: 11/12/02  Single locus backcross regression  Single locus backcross likelihood  F2 – regression, likelihood,
Genetic correlations and associative networks for CNS transcript abundance and neurobehavioral phenotypes in a recombinant inbred mapping panel Elissa.
1 Paper Outline Specific Aim Background & Significance Research Description Potential Pitfalls and Alternate Approaches Class Paper: 5-7 pages (with figures)
Marshall University School of Medicine Department of Biochemistry and Microbiology BMS 617 Lecture 6 –Multiple hypothesis testing Marshall University Genomics.
Distinguishing active from non active genes: Main principle: DNA hybridization -DNA hybridizes due to base pairing using H-bonds -A/T and C/G and A/U possible.
1 Genetic Mapping Establishing relative positions of genes along chromosomes using recombination frequencies Enables location of important disease genes.
Dirk-Jan de Koning*, Örjan Carlborg*, Robert Williams†, Lu Lu†,
University of Tennessee-Memphis
Invest. Ophthalmol. Vis. Sci ;52(6): doi: /iovs Figure Legend:
upstream vs. ORF binding and gene expression?
Xiaoshu Chen, Jianzhi Zhang  Cell Systems 
Differential Gene Expression
Genome Wide Association Studies using SNP
Describing, Exploring and Comparing Data
Figure 1. Effect of acute TNF treatment on transcription in human SGBS adipocytes as assessed by RNA-seq and RNAPII ChIP-seq. Following 10 days in vitro.
CHAPTER 3 Data Description 9/17/2018 Kasturiarachi.
Complex trait analysis beyond QTL mapping webqtl.org
Normalization Methods for Two-Color Microarray Data
Significance analysis of microarrays (SAM)
Significance Analysis of Microarrays (SAM)
Gene mapping in mice Karl W Broman Department of Biostatistics
Kendy K. Wong, Ronald J. deLeeuw, Nirpjit S. Dosanjh, Lindsey R
The Basics of Microarray Image Processing
Volume 86, Issue 3, Pages (May 2015)
Influence of RNA Labeling on Expression Profiling of MicroRNAs
Michael Cullen, Stephen P
Significance Analysis of Microarrays (SAM)
Eric Samorodnitsky, Jharna Datta, Benjamin M
Genome-wide Transcriptome Profiling Reveals the Functional Impact of Rare De Novo and Recurrent CNVs in Autism Spectrum Disorders  Rui Luo, Stephan J.
Rapid Next-Generation Sequencing Method for Prediction of Prostate Cancer Risks  Viacheslav Y. Fofanov, Kinnari Upadhyay, Alexander Pearlman, Johnny Loke,
Volume 3, Issue 1, Pages (July 2016)
Xin Li, Alexis Battle, Konrad J. Karczewski, Zach Zappala, David A
False discovery rate estimation
Integrating Autoimmune Risk Loci with Gene-Expression Data Identifies Specific Pathogenic Immune Cell Subsets  Xinli Hu, Hyun Kim, Eli Stahl, Robert Plenge,
Chapter 7 Beyond alleles: Quantitative Genetics
Molecular Convergence of Neurodevelopmental Disorders
Diagnostics and Remedial Measures
Volume 30, Issue 1, Pages (July 2014)
Volume 86, Issue 3, Pages (May 2015)
Varying Intolerance of Gene Pathways to Mutational Classes Explain Genetic Convergence across Neuropsychiatric Disorders  Shahar Shohat, Eyal Ben-David,
Volume 122, Issue 6, Pages (September 2005)
Volume 12, Issue 9, Pages (April 2002)
Volume 110, Issue 4, Pages (August 2002)
CaQTL analysis identifies genetic variants affecting human islet cis-RE use. caQTL analysis identifies genetic variants affecting human islet cis-RE use.
Volume 41, Issue 2, Pages (January 2011)
Xiaoshu Chen, Jianzhi Zhang  Cell Systems 
Genome-wide Functional Analysis Reveals Factors Needed at the Transition Steps of Induced Reprogramming  Chao-Shun Yang, Kung-Yen Chang, Tariq M. Rana 
Presentation transcript:

Genetical Genomics in the Mouse Finding Genes with Microarray Expression Data

Genetical Genomics Jansen, R.C. and J.P. Nap (2001). Genetical genomics: the added value from segregation. Trends Genet 17(7): 388-91.

Mouse Genetical Genomics BXD recombinant inbred lines 21 strains + parents and F1 genotypes 508 markers traits forebrain RNA assayed by Affymetrix U74Av2 PM probe sequences MM probe sequences 1 to 4 microarrays per RI line (average 2.5)

QTL mapping by regression Trait vs genotype association Genetically determined difference in expressed RNA level in hybridization of probe sequence in competing hybridization Measured by LRS (likelihood ratio statistic)

BXD Marker Distribution Distribution of 508 markers on the BXD genome. Marker location (recombination-based map distance) is plotted against marker number across the whole genome, and the location of the most proximal marker on each chromosome is given a location of 0. If markers were perfectly evenly distributed, they would form straight parallel lines with no gaps.

Trait Data Preparation 12,422 probesets (traits) 16 PM & 16 MM probes (oligonucleotides) average PM-MM difference log2-transform average difference normalize data of each microarray to common mean and standard deviation average replicate microarrays 400,000 PM & MM probes (cells) log2-transform cell intensity normalize and average replicate arrays Log-transformed data was normalized by subtracting the chip mean from each value, multiplying by (2/chip standard deviation), then adding 8. This gives values with a chip mean of 8.0 and a chip standard deviation of 2. Values from replicate chips were then averaged.

Multiple testing problem Two levels of multiple testing Each trait or probe vs 508 loci 12,422 traits or 400,000 probes Strategy Empirical p-value for multiple loci measures significance of single best association Benjamini-Hochberg procedure for multiple traits or probes may declare many significant associations assumes at least one significant association There are two levels of multiple testing in this analysis and we use different methods for dealing with each one. To handle the fact the we are testing multiple loci, we choose the single best association from each genome scan and establish its p-value by comparing it with a distribution of p-values generated by a permutation test. This converts the multiple test into a single test of the maximum p-value against the appropriate null distribution. To handle the multiple trait tests, we apply a Benjamini-Hochberg procedure to the p-values from each test. This test applies graduated significance threshold; the most significant cases are tested stringently, but as more cases are declared significant, the test becomes more lenient. The Benjamini-Hochberg method has one potential trap; it assumes that at least one case will be declared significant.

Empirical p-value Measures genome-wide significance converts multiple test into single test significance of best association among all loci Permutation test for distribution under null up to 106 scans with permuted trait values record largest LRS for each permutation Find p-value of original regression from its rank in the null distribution

Outliers Examine permutation test distribution for bimodality Compare 37th and 95th percentile values Find outlier and assign next most extreme value Redo permutation test and regression Among probeset data, about 5% of cases are corrected for an outlier. Among analyses with individual probes, about 12% of cases are corrected, but among individual probes declared significant, the rate is about 4%.

Benjamini-Hochberg test Test of 100 uniformly distributed p-values (p-values from non-significant results) P-values as blue dots Significance threshold for FDR = 0.2 as red line An idealized experiment in which 100 cases, none of which are significant, are tested with the Benjamini-Hochberg procedure, controlling the false discovery rate at 20%. The blue dots are the ranked p-values from the 100 cases, and the red line is the significance threshold established by the Benjamini-Hochberg procedure. None of the cases can be declared significant.

Benjamini-Hochberg test Test of 10 low p-values (significant results) mixed with 90 p-values from non-significant results P-values as blue dots Significance threshold for FDR = 0.2 as red line Eleven cases declared significant Declare significant An idealized experiment in which 10 cases with significantly low p-values are mixed with 90 cases that are not significant. All cases can be declared significant up to the highest-ranked case that falls below the significance threshold.

Empirical P-value Calculation 500x Permutation test Marker regression mapping ? p-value 5000x Perm Maximum genome-wide LRS ? p-value 50000x Perm ? p-value 1000000x Perm p-value

Trait-locus associations Ranked P-values as blue dots (90 smallest from 12,422) Significance threshold as red line Cases below red line are significant for FDR = 0.2 75 significant trait-locus associations Sorted p-values from about 12,000 QTL scans with microarray trait data. In the figure, blue dots show p-value plotted against rank. The red line shows the significance threshold established by the Benjamini-Hochberg procedure for a false discovery rate of 20%, declaring 75 trait-locus associations as significant.

Probe-locus associations Ranked P-values as blue dots (600 smallest from ~400,000) Significance threshold as red line Cases below red line are significant for FDR = 0.2 576 significant probe-locus associations Sorted p-values from about 400,000 QTL scans with data from individual microarray cells. In the figure, blue dots show p-value plotted against rank. The red line shows the significance threshold established by the Benjamini-Hochberg procedure for a false discovery rate of 20%, declaring 576 cell-locus associations as significant. P-values up to a rank of 419 are established by 106 permutations; beyond that, most p-values are established by 50,000 permutations.

QTLs from MM probes 576 QTLs defined by single microarray probes 454 (79%) by PM probes 122 (21%) by MM probes Proportion of PM probes QTLs declines as p-value increases A B C The 576 QTLs defined by single-cell QTL mapping include both QTLs defined by match cells and QTLs defined by mismatch cells. Overall, 79% of the QTLs are defined by match cells. The figure shows a moving-window average of the proportion of cell QTLs defined by match cells across cases ranked by increasing p-value. The proportion of QTLs defined by match cells for those with the smallest p-values (the most significant associations). The larger deviations from linearity may be artifacts of the fact that the data were sorted both by p-value and by LRS and the fact that p-values were defined by differnet numbers of permutations. Three lettered regions in the figure show how these conditions apply. Cases in region A had p-values that were less than 10-6 and not well defined; in this region cases were sorted by likelihood ratio statistic. Cases in regions A & B had p-values established with 1,000,000 permutations; those in region C were mostly established with 50,000 permutations.

QTLs from cell-level mapping 576 cell-marker associations (QTLs) 339 traits (probesets) represented most probesets represented by a single probe rarely, two or more significant probes from same probeset all probes from one probeset identify same locus 79% of probes are PM

QTLs from PM cells only 454 PM cells defining QTLs 288 traits (probesets) represented 184 controlled by location on the same chr 88 controlled by location on different chr 16 unknown location for probeset 147 locations (marker loci) with nearby QTLs, distributed on all chromosomes

Probe-locus associations among traits 339 traits (probesets) with probes identifying significant QTLs 186 traits represented by a single probes 2 traits represented by 10 probes

QTL distribution among marker loci 147 loci identified by at least one significant probes-locus association multiple associations to one locus multiple probes from one probeset multiple QTL near locus This chart shows the distribution of QTLs among marker loci, where each marker locus represents the QTLs near it. Of the 508 loci distinguishable in this data set, 147 have at least one QTL nearby and 67 have exactly one QTL nearby.

Profiles of probe sensitivity Li and Wong reported a year and a half ago that different probes within a probeset differed greatly in their ability to detect the target sequence for which they were designed. Li, C & Wong, WH (2001) Model-based analysis of oligonucleotide arrays: Expression index computation and outlier detection. PNAS 98: 31-36

Probe profiles (best) LRS vs probe number Probesets with highest significance in probeset-level mapping Comparison of probe-level mapping and probeset-level mapping. The four probesets were among the most significant of the 74 declared significant by probeset-level mapping. The charts show LRS for each probe as a function of probe number. Circled probes were significant when tested as single probes. Note that one MM probe has a higher LRS than the corresponding PM probe and that both achieved significance as single probes. For each probesets, significant probes (including MM probes) had regression coefficients of the same sign; coefficients were positive for 93269, negative for the others PM MM

Probe profiles (worst) LRS vs probe number Probesets with lowest significant association in probeset-level mapping Comparison of probe-level mapping and probeset-level mapping. The four probesets were among the least significant of the 74 declared significant by probeset-level mapping. The charts show LRS for each probe as a function of probe number. The LRS scores are generally lower than those on the previous slide and MM probe sometimes has stronger association than the PM probe. None of these probes achieved significant LRS scores when tested as single probes. PM MM

Distribution of controlled loci Distribution across chromosomes of 256 probes detecting a QTL (having a significant association with a marker locus). “Syn” and “Nonsyn” indicate those probes for which the detected QTL is syntenic and nonsyntenic, respectively. The number of probes on each chromosome is normalized to the approximate genetic length of the chromosome, with syntenic and nonsyntenic cases normalized separately. Lengths used, from L. Silver, “Mouse Genetics”, are: 107, 107, 85, 85, 107, 77, 77, 85, 77, 77, 77, 68, 77, 68, 68, 68, 55, 55, 55, 77 for chrs 1 through X, respectively.

Distribution of controlling loci Distribution across chromosomes of 272 QTLs. Sixteen QTLs for which the probe location is unknown are included in the nonsyntenic class. Frequencies are normalized to chromosome length as on the previous slide. Chr 9 appears to have a higher frequency of nonsyntenic QTLs than any other chromosome. Of the 21 sequences affected by chr 9 QTLs, 18 are distributed across 11 chromosomes and 3 have unknown locations.

Chr 9 QTLs Unusual number of chr 9 QTLs (22) controlling sequences on other chrs Normalized frequency 3-fold greater than average chr Many of these QTLs cluster near 2 loci on chr 9

Acknowledgments Jintao Wang Robert W Williams Ram Varma Jianxin Wang Lu Lu S Shou Yanhua Qu Elissa Chesler John D Mountz Hui Chen Hsu David Threadgill Gene Hwang Dan Nettleton Jintao Wang Ram Varma Jianxin Wang Mark Brady Gene Sobel U Tennessee, Memphis Gene Expression Core Bioinformatics U Alabama, Birmingham GOG U North Carolina Cornell U Iowa State U