A Fine Mapping Theorem to Refine Results from Association Genetics Studies S.J. Schrodi, V.E. Garcia, C.M. Rowland Celera, Alameda, CA ABSTRACT Justification of Fine Mapping Theorem Figure 3. Error AnalysesFigure 1. Performance Under Disease Models Figure 2. Simulation Results THEORETICAL RESULTS Use of the Fine Mapping Theorem for Association Studies Multipoint Determination of the Most Likely Causal Site in a Region Figure 5. Decay of Association for TRAF1 Region in RA Figure4. T1D Fine Mapping Following the Fundamental Theorem of the HapMap originally described in Lai et al. (1994), the derivation of the Fine Mapping Theorem directly follows: Much remains to be explained about human genetic architecture and specific variants underlying important traits such as disease phenotypes – both critical to successful fine mapping following GWAS. High density mapping and inference of susceptibility variants is highly reliant upon the positional pattern of disease association peaks. In this work we describe the nature of the decay curve of association patterns due to declining LD from a causative site. Under a variety of disease models, we show that the central tendency approximation 2 M ~ r 2 2 D holds, where 2 M and 2 D are the chi-sq association statistics at a marker and disease-causing site, respectively; and r 2 is the standard measure of LD between the two sites. We use the phrase “fine mapping theorem” for this approximation to underscore its potential utility in discovering specific variants underlying traits studied in very high density mapping studies. Monte Carlo simulations were used to characterize the amount of error in the approximation. These results showed that the maximum mean squared error is a concave function of r 2 peaking at intermediate levels of r 2 across all the disease models screened. Next, given a potential causative polymorphism and several closely-linked sites with disease association data, a method was developed to quantify the departure from the fine mapping theorem. Calculating this departure metric for all SNPs in an associated region will give a measure of correspondence with the fine mapping theorem for each polymorphism, and enable one to determine the most likely (i.e. those with the smallest departure metric) disease-causing variants under a theoretical model (i.e. a single disease-predisposing variant and numerous closely-linked markers associated with disease solely through LD). Lastly, we applied these approaches to previously- published fine mapping datasets for type 1 diabetes (IL2RA region consisting of 305 SNPs) and rheumatoid arthritis (TRAF1 region consisting of 138 SNPs). In both datasets, single SNPs with the highest correspondence to the theoretical association decay patterns were identified. Conversely, SNPs deviating from their chi-sq values expected under the theorem may constitute additional susceptibility polymorphisms in the region studied. Similar applications of this fine mapping theorem may prove to be a pragmatic approach to delineate genes, gene regions, or functional motifs responsible for disease etiology subsequent to initial genetic results from GWAS. The decay of disease association (as measured by a P-value) at a marker as a function of decreasing LD with the disease-causing site. Three different instances of each of the classic disease models were evaluated. Decay patterns approximately follow a linear relationship between log P and r 2. The figure shows the central tendency pattern of the Chi-sq statistic at the marker to be closely approximated by the product of the r 2 value and the Chi-sq association statistic at the disease site. Two-locus simulation performed under Haldane recombination and an additive disease model at one SNP. The result shows the correlation between the product of the standard measure of LD (r 2 ) and the Chi-sq disease association statistic at the disease site and the disease association Chi-Sq statistic at the marker. Two-locus disease models were analytically modeled and simulated under additive, multiplicative, recessive and dominant effects. The results demonstrate the wide-ranging applicability of the fine mapping theorem across disease models and aid in characterizing the error in the approximation. Combined analyses of P-values across rheumatoid arthritis sample sets (data from Chang et al, 2008) are plotted as a function of LD with rs If rs were the sole driving force of the association observed in this region, then these data should fall along the theoretical line. Departures from this theoretical result can indicate markers that independently contribute to disease risk. Patterns show that the most likely causative SNPs are found in the TRAF1 gene. There are four direct uses of the fine mapping theorem: 1) The fine mapping theorem gives insight into how to select SNPs for fine mapping studies given an initial association finding. Having good LD coverage (i.e. SNPs in varying ranges of LD from the initial hit) is a key feature of fine mapping coverage. 2) The fine mapping theorem graphically shows which markers are good candidates for association tests of conditional independence to identify markers in the region that are independently associated with disease status 3) The fine mapping theorem enables one to directly calculate the min LD with a causative site to detect association at a marker: and 4) The fine mapping theorem provides a framework to test for the best fit causative marker given all of the genotyping data in a region (see below). An analytic/computational method has been developed to test the fit of the decay of association with decreasing LD for every marker in a fine mapped region against the theoretical prediction from the fine mapping theorem. The marker with the highest score will have the highest likelihood of being the causative site under the model constructed. The initial measure used is the sum of the squared deviation from the decay pattern predicted from the model. A Bayesian method is currently under development. In this poster we have presented a simple theoretical result that could potentially aid fine mapping efforts to refine association signals in a region of linkage disequilibrium. While this theorem is an approximation, it nonetheless provides an expected elementary, multipoint association pattern expected under a basic disease model with one causal site and closely-linked set of associated markers. Use of the theorem enables higher powered analysis to detect causative sites as well as markers with independent effects. REFERENCES CONCLUSIONS 1. Schrodi SJ. A Fine Mapping Theorem. Manuscript in Preparation 2. Lai C, Lyman RF, Long AD, Langley CH, Mackay TF (1994) Science 266: Pritchard JK, Przeworski M (2001) AJHG 69: Schrodi SJ, Garcia VE, Rowland C, Jones HB (2007) EJHG 15: Lowe CE, Cooper JD, Brusko T, et al. (2007) Nat Genet 39: Chang M, Rowland CM, Garcia VE, et al (2008) PLoS Genet 4(6):e Analysis results from Lowe et al data at IL2RA-linked markers. The T1D P-values for 305 SNPs were plotted as a function of LD with the most significant SNP. The diagonal line is the prediction made by the fine mapping theorem approx. The SNP circled is rs which significantly deviates from the expected pattern. In addition, this analysis shows how information is borrowed from neighboring sites to further support association at the top left marker. The mean squared error is presented between the LHS and RHS of the approximation as a function of LD under simulated data. Two disease models of small effect and 2-5% disease prevalence at the disease site were employed. independent marker EMPIRICAL RESULTS