Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M21-621 Computational Statistical Genetics Computational Statistical Genetics.

Slides:



Advertisements
Similar presentations
A quantitative trait locus not associated with cognitive ability in children: a failure to replicate Hill, L. et al.
Advertisements

Statistical methods for genetic association studies
Generalized Regional Admixture Mapping (RAM) and Structured Association Testing (SAT) David T. Redden, Associate Professor, Department of Biostatistics,
Association Tests for Rare Variants Using Sequence Data
Genetic research designs in the real world Vishwajit L Nimgaonkar MD, PhD University of Pittsburgh
Evaluation of a new tool for use in association mapping Structure Reinhard Simon, 2002/10/29.
METHODS FOR HAPLOTYPE RECONSTRUCTION
Bayesian Estimation in MARK
Linkage Analysis: An Introduction Pak Sham Twin Workshop 2001.
High-density admixture mapping to find genes for complex disease David Reich Harvard Medical School Department of Genetics Broad Institute July 13, 2004.
Human Genetics Genetic Epidemiology.
MALD Mapping by Admixture Linkage Disequilibrium.
Office hours Wednesday 3-4pm 304A Stanley Hall. Fig Association mapping (qualitative)
Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics.
Today Today: Chapter 9 Assignment: Recommended Questions: 9.1, 9.8, 9.20, 9.23, 9.25.
Genetic Traits Quantitative (height, weight) Dichotomous (affected/unaffected) Factorial (blood group) Mendelian - controlled by single gene (cystic fibrosis)
Quantitative Genetics
MSc GBE Course: Genes: from sequence to function Genome-wide Association Studies Sven Bergmann Department of Medical Genetics University of Lausanne Rue.
Human Migrations Saeed Hassanpour Spring Introduction Population Genetics Co-evolution of genes with language and cultural. Human evolution: genetics,
Admixture mapping Paul McKeigue Public Health Sciences Section College of Medicine and Veterinary Medicine University of Edinburgh.
Lecture 5: Segregation Analysis I Date: 9/10/02  Counting number of genotypes, mating types  Segregation analysis: dominant, codominant, estimating segregation.
QTL mapping in animals. It works QTL mapping in animals It works It’s cheap.
Introduction to BST775: Statistical Methods for Genetic Analysis I Course master: Degui Zhi, Ph.D. Assistant professor Section on Statistical Genetics.
Haplotype Blocks An Overview A. Polanski Department of Statistics Rice University.
Admixture Mapping for Atherosclerosis Loci in African Americans the NHLBI Family Heart Study Q.Y. Zhang and M.A. Province Division of Statistical Genomics.
Population Stratification
1 Genes and MS in Tasmania, cont. Lecture 5, Statistics 246 February 3, 2004.
MStruct: A New Admixture Model for Inference of Population Structure in Light of Both Genetic Admixing and Allele Mutations Suyash Shringarpure and Eric.
Experimental Design and Data Structure Supplement to Lecture 8 Fall
Quantitative Genetics. Continuous phenotypic variation within populations- not discrete characters Phenotypic variation due to both genetic and environmental.
Complex Traits Most neurobehavioral traits are complex Multifactorial
Quantitative Genetics
QTL Mapping in Heterogeneous Stocks Talbot et al, Nature Genetics (1999) 21: Mott et at, PNAS (2000) 97:
Finnish Genome Center Monday, 16 November Genotyping & Haplotyping.
INTRODUCTION TO ASSOCIATION MAPPING
Regression-Based Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Estimating Genealogies from Marker Data Dario Gasbarra Matti Pirinen Mikko Sillanpää Elja Arjas Biometry Group Department of Mathematics and Statistics.
Lab 13: Association Genetics December 5, Goals Use Mixed Models and General Linear Models to determine genetic associations. Understand the effect.
Lecture 12: Linkage Analysis V Date: 10/03/02  Least squares  An EM algorithm  Simulated distribution  Marker coverage and density.
Sequential & Multiple Hypothesis Testing Procedures for Genome-wide Association Scans Qunyuan Zhang Division of Statistical Genomics Washington University.
FINE SCALE MAPPING ANDREW MORRIS Wellcome Trust Centre for Human Genetics March 7, 2003.
1 Haplotyping Algorithm Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Mar. 6, 2008.
A Statistical Method for Adjusting Covariates in Linkage Analysis With Sib Pairs Colin O. Wu, Gang Zheng, JingPing Lin, Eric Leifer and Dean Follmann Office.
Www. geocities.com/ResearchTriangle/Forum/4463/anigenetics.gif.
Association analysis Genetics for Computer Scientists Biomedicum & Department of Computer Science, Helsinki Päivi Onkamo.
Errors in Genetic Data Gonçalo Abecasis. Errors in Genetic Data Pedigree Errors Genotyping Errors Phenotyping Errors.
Practical With Merlin Gonçalo Abecasis. MERLIN Website Reference FAQ Source.
Populations: defining and identifying. Two major paradigms for defining populations Ecological paradigm A group of individuals of the same species that.
Fast test for multiple locus mapping By Yi Wen Nisha Rajagopal.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Powerful Regression-based Quantitative Trait Linkage Analysis of General Pedigrees Pak Sham, Shaun Purcell, Stacey Cherny, Gonçalo Abecasis.
Using Merlin in Rheumatoid Arthritis Analyses Wei V. Chen 05/05/2004.
Efficient calculation of empirical p- values for genome wide linkage through weighted mixtures Sarah E Medland, Eric J Schmitt, Bradley T Webb, Po-Hsiu.
Association Mapping in Families Gonçalo Abecasis University of Oxford.
Inferences on human demographic history using computational Population Genetic models Gabor T. Marth Department of Biology Boston College Chestnut Hill,
Power and Meta-Analysis Dr Geraldine M. Clarke Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015 Africa Centre for.
Understanding human admixture, and association mapping in admixed populations. Simon Myers.
Pharmacogenetics: Implications of race and ethnicity on defining genetic profiles for personalized medicine  Victor E. Ortega, MD, Deborah A. Meyers,
Imputation-based local ancestry inference in admixed populations
Genome-wide Associations
Genome-wide Association Studies
Long-Range LD Can Confound Genome Scans in Admixed Populations
Methods for High-Density Admixture Mapping of Disease Genes
QTL Fine Mapping by Measuring and Testing for Hardy-Weinberg and Linkage Disequilibrium at a Series of Linked Marker Loci in Extreme Samples of Populations 
A Flexible Bayesian Framework for Modeling Haplotype Association with Disease, Allowing for Dominance Effects of the Underlying Causative Variants  Andrew.
Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations.
Stephen C. Pratt, Mark J. Daly, Leonid Kruglyak 
Population Structure in Admixed Populations: Effect of Admixture Dynamics on the Pattern of Linkage Disequilibrium  C.L. Pfaff, E.J. Parra, C. Bonilla,
Clive J. Hoggart, Esteban J. Parra, Mark D
Presentation transcript:

Admixture Mapping Qunyuan Zhang Division of Statistical Genomics GEMS Course M Computational Statistical Genetics Computational Statistical Genetics March 25,

 Linkage Analysis (linkage): genotype & phenotype data from family (or families)  Association Scan (LD): genotype & phenotype data from population(s) or families  Admixture Mapping (LD): genotype data from admixed and ancestral populations, phenotype data from admixed populations (1) Ancestry-phenotype association mapping (2) Ancestry info for population structure control Three Mapping Strategies 2

Genetic Admixture Ancestral Population 2 Caucasians Ancestral Population 1 Africans Admixed Population African Americans Admixture Mapping Admixture Information (Ancestry Analysis) 3

 If a disease has some genetic factors, and the disease gene frequency in pop 2 is higher than in pop 1. After the admixture of pop 1 and 2, the diseased individuals in admixed generations will carry disease genes/alleles that have more ancestry from pop 2 than from pop 1.  If a marker is linked with disease genes, because of linkage disequilibrium, the diseased individuals will also carry the marker copies that have more ancestry from pop 2 than from pop 1.  Inversely, if we find a marker/locus whose ancestry from pop 2 in diseased group is significantly different from that in non-diseased group, we consider this marker/locus to be linked with (or a part of ) disease gene. Rationale of Admixture Mapping 4

Illustration of Admixture 5

Advantages of Admixture Mapping  Admixed population has more genetic variation and polymorphism than relatively pure ancestral populations. Admixed population has more genetic variation and polymorphism than relatively pure ancestral populations.  Admixture produces new LD in admixed population. Compared with ancestral populations, shorter genetic history of admixture population keeps more LD (long genetic history will destroy LD), In admixed population, LD could be detected for relatively loose linkage. Admixture produces new LD in admixed population. Compared with ancestral populations, shorter genetic history of admixture population keeps more LD (long genetic history will destroy LD), In admixed population, LD could be detected for relatively loose linkage.  Ancestry information can be used to control population stratification caused by genetic admixture. Ancestry information can be used to control population stratification caused by genetic admixture.  According to simulation, admixture mapping demonstrates higher power than regular methods, needs less sample size. According to simulation, admixture mapping demonstrates higher power than regular methods, needs less sample size.  Flexible design: case-control or case-only, qualitative or quantitative traits, no need of pedigree information Flexible design: case-control or case-only, qualitative or quantitative traits, no need of pedigree information 6

Proportion of genetic materials descending from each founding population Population level : population admixture proportion Individual level: individual admixture proportion Individual-locus level: locus-specific ancestry Ancestry 7

Individual Ancestry (IA) can be used as a genetic background covariate for population structure control Phenotype= a + b * Genotype + c * IA + Error Phenotype= a + b * Genotype + c * IA + Error Locus-specific Ancestry (LSA) can be directly used to detect association (admixture mapping) Phenotype=a + b * LSA Two Ways of Using Ancestral Info. 8

Individual Ancestry (IA) Estimation using MLE G: Observed genotypes of admixed and ancestral populations Q: Allelic frequencies in ancestral populations P : Individual Ancestry to be estimated Goal: obtain P that maximizes Pr(G|P,Q) 1.Assign prior values for Q (randomly or estimated from ancestral population genotype data) & P (randomly) 2.Compute P(i) by solving 3. Compute Q(i) by solving 4.Iterate Steps 1 and 2 until convergence. Tang et al. Genetic Epidemiology, 2005(28): 289–301 9

Locus-specific Ancestry Estimation using MCMC Observed G : genotypes of admixed and ancestral populations Unknown Z : admixed individuals’ locus specific ancestries from ancestral populations Problem: How to estimate Z ? Maximum Likelihood Estimate(MLE): How to obtain a Z that maximizes Pr( G|Z ) ? Z is a huge space of parameters, in which search is difficult for likelihood method. Bayesian and Markov Chain Monte Carlo (MCMC) methods 1.Assume ancestral population number K 2.Define prior distribution Pr( Z ) under K 3.Use MCMC to sample from posterior distribution Pr( Z|G ) = Pr( Z )∙ Pr( G|Z ) 4.Average over large number of MCMC samples to obtain estimate of Z Falush et al. Genetics, 2003(164):1567–

 STRUCTURE  STRUCTURE Falush D, Stephens M, Pritchard JK (2003) Inference of population structure using multilocus genotype data: linked loci and correlated allele frequencies. Genetics 164:1567–1587.  ADMIXMAP  ADMIXMAP Hoggart CJ, Parra EJ, Shriver MD, Bonilla C, Kittles RA, Clayton DG, McKeigue PM (2003) Control of confounding of genetic associations in stratified populations. Am J Hum Genet 72:1492–1504.  ANCESTRYMAP  ANCESTRYMAP Patterson N, Hattangadi N, Lane B, Lohmueller KE, Hafler DA, Oksenberg JR, Hauser SL, Smith MW, O’Brien SJ, Altshuler D, Daly MJ, Reich D (2004) Methods for high-density admixture mapping of disease genes. Am J Hum Genet 74:979–1000 Software 11

References D.C.Rife. Populations of hybrid origin as source material for the detection of linkage. Am.J.Hum.Genet. 1954, (6):26-33 R.Chakraborty et al. Adimixture as a tool for finding linked genes and detecting that difference from allelic association between loci. Proc.Natl.Acad.Sci. 1988,Vol.85: N. Risch. Mapping genes for complex disease using association studies with recently admixed populations. Am.J.Hum.Genet.Suppl. 1992, 51:13 … P.M.McKeigue. Prospects for admixture mapping of complex traits. Am.J.Hum.Genet. 2005, Vol.76:1-7 X.Zhu et al. Admixture mapping for hypertention loci with genome-scan markers. Nature Genetics. 2005,Vol.37(2): Q Zhang et al. Genome-wide admixture mapping for coronary artery calcification in African Americans: the NHLBI Family Heart Study. Genet Epidemiol Apr;32(3):

Marker Information Content (MIC ) Distribution Used for Simulation (300 Loci) Mean=0.22 Std Dev= (MIC) Freqency of allele k at locus i in Caucasians Freqency of allele k at locus i in Africans Allele number of locus i 13

African Americans 622 Subjects from 211 families Admixture Mapping CAC Loci CAC Loci 400 microsatellite markers Average distance 10 cM Coronary and aortic artery calcium (CAC) Quantified by CT calcified plaque 14

Data Samples 1672 subjects from 3 populations: 622 African Americans (211 families) fromFHS- SCAN 893 Caucasians (320 families) from FHS-SCAN 157 Africans (unrelated) from Marshfield Center Genotypes 302 microsatellite Loci of all subjects Average marker distance 11.9cM Phenotype Coronary and aortic artery calcium (CAC) of 622 African Americans, BLOM transformation 15

Statisticl Procedure Step 1 Randomly draw one subject from each family to create a sample of 688 unrelated subjects which comprises : 211 African Americans from 211 families (FHS-SCAN) 320 whites from 320 families (FHS-SCAN) 157 unrelated Africans (Marshfield Center) Step 2 Ancestry estimation, STRUCTURE 2.1 Step 3 Ancestry-CAC association analysis, regress 211 African Americans’ CAC scores on their locus-specific ancestries from Africans. Step 4 Repeat step1~step3 (100 times), obtain the average p-value of each locus Step 5 For each locus: permutation test on average p-value Number of random permutations:

RESULTS Sources of Variation of Ancestry-from-Africans Sources of variation Variance components Percent(%) Families Subjects within family Loci within subject Replications within locus

RESULTS Ancestry Analysis at Population Level Population Admixture Proportions in African Americans Founding population Ancestry(%) From Caucasians From Africans

Individual Ancestry Distribution of 622 African Americans Ancestry-from-Africans: average 77.96% (3.1%~96.9%) RESULTS Ancestry Analysis at Individual Level 19

RESULTS Ancestry Analysis at Individual-locus Level Distribution of Locus-specific Ancestries from Africans An Example African American Ancestry from Africans 302 Microsatellite Loci ordered by chromosome and position from Chrom. 1 (4.22cM) to Chrom. 23 (104.83cM) 20

RESULTS Locus-specific Ancestry-CAC association analysis No. No.LociChr#Pos. Permu. p Reg. coeff. R2R2R2R2 1AFM063XF (10p14) GATA64D (6q12) GATA42H (4q32) AFMB337ZH GGAA20G GATA73H GGAA3F UT UT GATA163B GATA88F GATA26D ATA1B ATA4E GATA137H GATA4D ATA31G

-log(p value) of Markers on Chromosome 4 GATA42H02 22

-log(p value) of Markers on Chromosome 6 GATA64D02 23

-log(p value) of Markers on Chromosome 10 AFM063XF4 24