Presentation is loading. Please wait.

Presentation is loading. Please wait.

Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos.

Similar presentations


Presentation on theme: "Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos."— Presentation transcript:

1 Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos

2 Introduction Analysis of association between markers and disease causing loci because of strong linkage (i.e. linkage disequilibrium) is more efficient than linkage analysis When samples arise from different ethnic groups, or an admixed population, spurious association occurs, resulting in false positives

3 Introduction Genomic control approach (GC) Transmission/disequilibrium test (TDT) Structured association (MCMC) Principle component approaches –Traditional: marker-oriented –Eigenstrat: sample-oriented Eigenstart Theory: implemented in EIGENSTART and HelixTree

4 Eigenstrat References 1. Price, Alkes L., Patterson, Nick J. Plenge, Robert M. Weinblatt, Michael E. Shadick, Nancy A. Reich, David. (2006). ユ Principal Components Analysis Corrects for Stratification in Genome-Wide Associations Studies ユ. Nature Genetics 38, 904-909.2. Patterson N, Price AL, Reich D (2006) Population Structure and Eigenanalysis PLoS Genet 2(12): e190. doi:10.1371/journal.pgen.0020190.

5 Eigenstart Theory: Model Data for association test: MK1MK2…….MKN Ind1g11g12…….g1N ………………………………….. IndMgM1gM2…….gMN

6 Eigenstrat Theory: Model Define random vector with M components for the M individuals Values of genotypes of the M Individuals at any marker are a special realization of this random vector

7 Eigenstrat Theory: Model The randomness is from both drawing genotypes and choosing allele frequency Under this model, genetically independent individuals will not be independent to each other Covariance between individuals from different subpopulations are smaller than that from the same subpopulations

8 Eigenstrat Theory: Model Only population properties of the PCA are considered (no sample properties considered), in order to gain some theoretical guidelines for interpreting PC-PC plots

9 Case 1: one-subpopulation Covariance matrix

10 Case 1: one-subpopulation Large eigenvalue Eigenvector: Small eigenvalue

11 Case 1: one-subpopulation Large eigenvalue reflects co-variation of individuals Small eigenvalues reflect variations between individuals Neither is for population stratification!

12 Case 1: one-subpopulation Zero-mean transform

13 Case 2: two-subpopulations Random vector Covariance matrix

14 Case 2: two-subpopulation There are two large eigenvalues, with corresponding eigenvectors having constant values for individuals in the same subpopulations. --- They are mixture of variances caused by stratification and intra- population co-variations Small eigenvalues are the same as in homogenous population

15 Case 2: Two-subpopulation Zero-mean transform

16 Case 2: two-subpopulation The two large eigenvalues and corresponding eigenvectors

17 Case 2: two-subpopulation case Reflecting variation caused by stratification If there are only two subpopulations, do NOT plot a PC vs PC figure; only the eigenvector of the largest eigenvalue shows the population structure.

18 Case 3: Three subpopulations There are now three sub- populations

19 Case 3: Three subpopulations Zero-mean transform

20

21 Case 3

22

23

24

25

26

27

28

29

30

31

32

33 General Case: K subpopulations

34

35

36

37

38

39

40

41

42

43 Summary Only large eigenvalues reflect variations caused by stratification There are K-1 large eigenvalues if there are K subpopulations If there are merely two subpopulations, only the eigenvector of the first largest eigenvalue tells the population structure; no two-dimensional PC-PC plot should be inspected In the case of multiple subpopulations, all K-1 vectors of the large eigenvalues should be carefully inspected in order to classify individuals into K subpopulations and infer the inter-population relationships

44 …… First off, if you choose as many components as there are markers, if that ユ s possible, you will wind up subtracting out ALL effects, thus getting nothing from your tests! The best answer consists of first simply obtaining the components themselves and their corresponding eigenvectors. (Do this either while running uncorrected tests or from the separate PCA window.) Then look at the pattern of the eigenvalues. If the first few are very large compared with the remaining eigenvalues, then use that many components in a second analysis in which you DO apply the PCA technique. ……. Helix Manual:


Download ppt "Understanding Principle Component Approach of Detecting Population Structure Jianzhong Ma PI: Chris Amos."

Similar presentations


Ads by Google