Data Mining Applied to Linkage Disequilibrium Mapping

Slides:



Advertisements
Similar presentations
Functional Analysis of the Neurofibromatosis Type 2 Protein by Means of Disease- Causing Point Mutations Renee P. Stokowski, David R. Cox The American.
Advertisements

Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
1 TreeDT:Gene Mapping by Tree Disequilibrium Test Author:Pettri Sevon Dept. of computer science & Finnish Genome center. Univ. of Helsinki Hannu T.T. Toivonen.
Sofia A. Oliveira, Yi-Ju Li, Maher A
Extensive Linkage Disequilibrium in Small Human Populations in Eurasia
Itsik Pe’er, Yves R. Chretien, Paul I. W. de Bakker, Jeffrey C
Daniel J. Schaid, Charles M. Rowland, David E. Tines, Robert M
Claudio Verzilli, Tina Shah, Juan P
Comparison of Microsatellites Versus Single-Nucleotide Polymorphisms in a Genome Linkage Screen for Prostate Cancer–Susceptibility Loci  Daniel J. Schaid,
D. E. Grice, K. A. Halmi, M. M. Fichter, M. Strober, D. B. Woodside, J
Soraya Beiraghi, Swapan K. Nath, Matthew Gaines, Desh D
Genomewide Linkage Scan Identifies a Novel Susceptibility Locus for Restless Legs Syndrome on Chromosome 9p  Shenghan Chen, William G. Ondo, Shaoqi Rao,
Genetic Linkage Analysis of a Dichotomous Trait Incorporating a Tightly Linked Quantitative Trait in Affected Sib Pairs  Jian Huang, Yanming Jiang  The.
Integrating Gene Expression with Summary Association Statistics to Identify Genes Associated with 30 Complex Traits  Nicholas Mancuso, Huwenbo Shi, Pagé.
Comparing Algorithms for Genotype Imputation
Meta-analysis of Genetic-Linkage Analysis of Quantitative-Trait Loci
Haplotype Estimation Using Sequencing Reads
A Combined Linkage-Physical Map of the Human Genome
Caroline Durrant, Krina T. Zondervan, Lon R
David H. Spencer, Kerry L. Bubb, Maynard V. Olson 
Accuracy of Haplotype Frequency Estimation for Biallelic Loci, via the Expectation- Maximization Algorithm for Unphased Diploid Genotype Data  Daniele.
QTL Fine Mapping by Measuring and Testing for Hardy-Weinberg and Linkage Disequilibrium at a Series of Linked Marker Loci in Extreme Samples of Populations 
Rounak Dey, Ellen M. Schmidt, Goncalo R. Abecasis, Seunggeun Lee 
Dissecting the Genetic Complexity of the Association between Human Leukocyte Antigens and Rheumatoid Arthritis  Damini Jawaheer, Wentian Li, Robert R.
Gene-Expression Variation Within and Among Human Populations
Arpita Ghosh, Fei Zou, Fred A. Wright 
Highly Significant Linkage to the SLI1 Locus in an Expanded Sample of Individuals Affected by Specific Language Impairment    The American Journal of.
Jingjing Li, Xiumei Hong, Sam Mesiano, Louis J
Reduction of Sample Heterogeneity through Use of Population Substructure: An Example from a Population of African American Families with Sarcoidosis 
Genomic Signatures of Selective Pressures and Introgression from Archaic Hominins at Human Innate Immunity Genes  Matthieu Deschamps, Guillaume Laval,
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Towfique Raj, Manik Kuchroo, Joseph M
A Selection Operator for Summary Association Statistics Reveals Allelic Heterogeneity of Complex Traits  Zheng Ning, Youngjo Lee, Peter K. Joshi, James.
Genetic Investigations of Kidney Disease: Core Curriculum 2013
Random-Effects Model Aimed at Discovering Associations in Meta-Analysis of Genome- wide Association Studies  Buhm Han, Eleazar Eskin  The American Journal.
Lue Ping Zhao, Ross Prentice, Fumin Shen, Li Hsu 
Are Rare Variants Responsible for Susceptibility to Complex Diseases?
Studying Gene and Gene-Environment Effects of Uncommon and Common Variants on Continuous Traits: A Marker-Set Approach Using Gene-Trait Similarity Regression 
Genes, Demography, and Life Span: The Contribution of Demographic Data in Genetic Studies on Aging and Longevity  A.I. Yashin, G. De Benedictis, J.W.
Christoph Lange, Nan M. Laird  The American Journal of Human Genetics 
Jon Wakefield  The American Journal of Human Genetics 
Multipoint Approximations of Identity-by-Descent Probabilities for Accurate Linkage Analysis of Distantly Related Individuals  Cornelis A. Albers, Jim.
Accurate Non-parametric Estimation of Recent Effective Population Size from Segments of Identity by Descent  Sharon R. Browning, Brian L. Browning  The.
Increased Level of Linkage Disequilibrium in Rural Compared with Urban Communities: A Factor to Consider in Association-Study Design  Veronique Vitart,
Molecular Analysis of the β-Globin Gene Cluster in the Niokholo Mandenka Population Reveals a Recent Origin of the βS Senegal Mutation  Mathias Currat,
Daniel J. Schaid, Shannon K. McDonnell, Scott J. Hebbring, Julie M
D. E. Grice, K. A. Halmi, M. M. Fichter, M. Strober, D. B. Woodside, J
A Unified Approach to Genotype Imputation and Haplotype-Phase Inference for Large Data Sets of Trios and Unrelated Individuals  Brian L. Browning, Sharon.
Gad Kimmel, Ron Shamir  The American Journal of Human Genetics 
Pei-Lung Chen, Dimitrios Avramopoulos, Virginia K. Lasseter, John A
Douglas F. Levinson, Matthew D. Levinson, Ricardo Segurado, Cathryn M
A Fast, Powerful Method for Detecting Identity by Descent
Nonpaternity in Linkage Studies of Extremely Discordant Sib Pairs
Population Structure in Admixed Populations: Effect of Admixture Dynamics on the Pattern of Linkage Disequilibrium  C.L. Pfaff, E.J. Parra, C. Bonilla,
Extensive Linkage Disequilibrium in Small Human Populations in Eurasia
Xiaoquan Wen, Yeji Lee, Francesca Luca, Roger Pique-Regi 
L-GATOR: Genetic Association Testing for a Longitudinally Measured Quantitative Trait in Samples with Related Individuals  Xiaowei Wu, Mary Sara McPeek 
Xiang Wan, Can Yang, Qiang Yang, Hong Xue, Xiaodan Fan, Nelson L. S
Yu Zhang, Tianhua Niu, Jun S. Liu 
Jung-Ying Tzeng, Chih-Hao Wang, Jau-Tsuen Kao, Chuhsing Kate Hsiao 
Tao Wang, Robert C. Elston  The American Journal of Human Genetics 
Whole-Genome Scan, in a Complex Disease, Using 11,245 Single-Nucleotide Polymorphisms: Comparison with Microsatellites  Sally John, Neil Shephard, Guoying.
Simone Sanna-Cherchi, Gianluca Caridi, Patricia L
Enhanced Localization of Genetic Samples through Linkage-Disequilibrium Correction  Yael Baran, Inés Quintela, Ángel Carracedo, Bogdan Pasaniuc, Eran Halperin 
Genotype-Imputation Accuracy across Worldwide Human Populations
Genetic Linkage Analysis of a Dichotomous Trait Incorporating a Tightly Linked Quantitative Trait in Affected Sib Pairs  Jian Huang, Yanming Jiang  The.
Jung-Ying Tzeng, Daowen Zhang  The American Journal of Human Genetics 
Gonçalo R. Abecasis, Janis E. Wigginton 
B. Yuan, R. Neuman, S. H. Duan, J. L. Weber, P. Y. Kwok, N. L
Bruce Rannala, Jeff P. Reeve  The American Journal of Human Genetics 
Presentation transcript:

Data Mining Applied to Linkage Disequilibrium Mapping Hannu T.T. Toivonen, Päivi Onkamo, Kari Vasko, Vesa Ollikainen, Petteri Sevon, Heikki Mannila, Mathias Herr, Juha Kere  The American Journal of Human Genetics  Volume 67, Issue 1, Pages 133-145 (July 2000) DOI: 10.1086/302954 Copyright © 2000 The American Society of Human Genetics Terms and Conditions

Figure 1 a, Examples of strongly disease-associated haplotype patterns discovered in a data set with A=10%; only the first 8 markers of 101 are shown. b, Corresponding marker frequencies (“Don’t care” symbols (*) in gaps are included in marker frequencies.) c, Marker frequencies in the same data set with all strongly disease-associated haplotype patterns. d, Plot of the true versus the predicted locations of the DS gene for 100 data sets (A=10%). The American Journal of Human Genetics 2000 67, 133-145DOI: (10.1086/302954) Copyright © 2000 The American Society of Human Genetics Terms and Conditions

Figure 2 The effect of various factors on prediction accuracy. The y axis shows which fraction of simulated data sets is within the error bound given on the x axis (i.e., y=P[error ≤ x]). The lowest, dotted curve is the prediction accuracy of random, uniform guesses. a, Effect of A, the proportion of DS mutation carrying chromosomes. b, Effect of doubled sample size (400 disease-associated and 400 control chromosomes). c, Effect of corrupted data. d, Effect of missing data. e, Comparison of prediction methods. The three topmost curves have been obtained with 0% corrupted and 0% missing data; the lower curves with 1% corrupted and 20% missing data. f, Effect of pattern search parameters. “HPM baseline”: haplotype pattern searching as before; “no gaps”: haplotype pattern searching without gaps; “single”: the middle point of single most strongly associated haplotype without gaps is used for predicting the localization; “long gaps”: gaps of up to three markers allowed, “long haplotypes”: no length limit on the pattern lengths. The American Journal of Human Genetics 2000 67, 133-145DOI: (10.1086/302954) Copyright © 2000 The American Society of Human Genetics Terms and Conditions

Figure 3 Permutation tests with a simulated data set with A=7.5%. a, The permutation surface. The height of the surface at point (i,pi) is the marker frequency of marker mi that has an estimated markerwise P value of pi. The observed frequency is plotted on the surface by projecting it from the marker-frequency plane onto the permutation surface. The closer the line gets to the ‘back wall’, the more significant is the marker frequency. b, Marker frequencies for different P values. The solid line shows the observed marker frequencies in the simulated data; the dashed lines have been plotted by connecting marker frequencies for which the markerwise P values are the same. c, Marker frequencies for different P values in an unsuccessful localization. The solid line shows the observed marker frequencies in the simulated data; the dashed lines have been plotted by connecting marker frequencies for which the markerwise P values are the same. d, The effect of permutation tests on prediction accuracy, with 100 data sets where A=5%. The solid line represents localization accuracy without permutations, and the dashed lines show the prediction accuracy with the smallest marker-wise P value (“min p”), or with the smallest P value at most .01 or .001. If the smallest P value is >.01 or .001, no prediction is made at all; the fraction on y axis is computed among the predictions made. The lowest, dotted curve is the prediction accuracy of random, uniform guesses. The American Journal of Human Genetics 2000 67, 133-145DOI: (10.1086/302954) Copyright © 2000 The American Society of Human Genetics Terms and Conditions

Figure 4 Effect of A, Proportion of DS mutation–carrying chromosomes, in the SNP data. The American Journal of Human Genetics 2000 67, 133-145DOI: (10.1086/302954) Copyright © 2000 The American Society of Human Genetics Terms and Conditions

Figure 5 Results on the real HLA data. a, Marker frequencies and background LD (BGLD), as measured by the markerwise mean of the 10 highest frequencies obtained by 10,000 permutations. b, Negative logarithm of the markerwise P values. The vertical line shows the gene location. The flat interval of −log p≈9.21 is the upper limit of the score, due to the limited number of permutations. The ratio of the marker frequency to BGLD (dashed curve) was used for estimating the gene location inside this interval. The American Journal of Human Genetics 2000 67, 133-145DOI: (10.1086/302954) Copyright © 2000 The American Society of Human Genetics Terms and Conditions