Presentation is loading. Please wait.

Presentation is loading. Please wait.

Evaluation of a new tool for use in association mapping Structure Reinhard Simon, 2002/10/29.

Similar presentations


Presentation on theme: "Evaluation of a new tool for use in association mapping Structure Reinhard Simon, 2002/10/29."— Presentation transcript:

1 Evaluation of a new tool for use in association mapping Structure Reinhard Simon, 2002/10/29

2 Structure 2.0 http://pritch.bsd.uchicago.edu Pritchard JK, Stephens M, Donelly P (2000): Inference of population structure using multilocus genotype data. Genetics, 155: 945-959 Software

3 Associations – the ideal CasesControls

4 Test for association A diploid locus: Pearsons Chi-square test

5 Example: Contingency table Frequency ofTotal # of alleles (diploid) AllelesAA* Casesqaqa 1-q a 2m a Controlsqoqo 1-q o 2m o totalnAnA n A* 2m

6 Associations – the less ideal CasesControls

7 Associations – simple admixture CasesControls

8 Associations – admixture complications CasesControls

9 Associations – admixture complications CasesControls High frequency of associated loci may indicate problems with underlying population structure (=stratification).

10 Associations – accounted for CasesControls

11 Questions Is there a stratification? If so: - how many subpopulations - which individual belongs to which subpopulation

12 Test for stratification - principle Summarizing over all loci: Xi is Chi-square at i-th locus Null hypothesis: no differences between allele frequencies over all loci df equal to sum of df at individual locus Pritchard: 1999

13 Test for stratification – ctd. Observations: strong positive selection requires increase of #loci subgroup specific markers decrease number of necessary loci Pritchard: 1999

14 How to group individuals? Based on distance measures Based on models

15 Pair wise distance measures Jaccard Nei & Li Sokal & Michener

16 Model based Bayesian inference Bayesean statistics: Uncertainty is modeled using probabilities probability statements are made about model parameters Advantages: very general framework assumptions are made explicit and are quantified

17 Bayesian inference – how? Bayesian inference centers on the posterior distribution p(theta|X), e.g. a genetic model of the distribution of allele frequencies However, analytic evaluation is seldom possible....

18 Bayesian inference - methods Alternatives: Numerical evaluation approximation simulation, e.g. Markov Chain Monte Carlo Methods

19 Simulation methods for Bayesian inference - general Generate random samples from a probability distribution (e.g. normal) Construct histogram If sample is large enough, this allows to calculate mean, variance,... MCMC allows to generate large samples from any probability distribution

20 Markov Chain behaviour Reaches an equilibrium (basic MCMC theorem) and the present state depends only on the preceding: “The future depends on the past only through the present.”

21 MCMC - strengths freedom in inference (e.g. simultaneous estimation, estimation of arbitrary functions of model parameters like ranks or threshold exceedence) Coherently integrates uncertainty Only available method for complex problems

22 MCMC – contra computational intensive requires often specialized software

23 Inferring population structure X = genotypes of sampled inviduals unknown: Z = population of origin P = allele frequencies in all populations Q = proportion of genome that originates from population k Pr(Z, P, Q|X) ~ Pr(Z) * Pr(P) * Pr(Q) * Pr(X|Z,P,Q) Solution: Using MCMC for Bayesian inference; simultaneous estimation of Q, Z and P.

24 Basic MCMC algorithm – no admixture (Q) Initialize: Random values for Z (pop), e.g. from Pr(z) = 1/k Repeat for m=1,2,... 1. Sample P(m) from Pr(P|X, Z(m-1) (estimate allele frequencies) 2. Sample Z(m) from Pr(Z|X, P(m)) (estimate population of origin for each indiv.)

25 Basic MCMC algorithm – with admixture (Q) Initialize: Random values for Z (pop), e.g. from Pr(z) = 1/k Repeat for m=1,2,... 1. Sample P(m), Q(m) from Pr(P, Q|X, Z(m-1) (estimate allele frequencies) 2. Sample Z(m) from Pr(Z|X, P(m), Q(m)) 3. Update alpha (admixture proportion)

26 Program – parameters: MCMC

27 Program – parameters: Q

28 Program – parameters: P

29 Program – parameters: Z, K

30 Program – data types - marker: SNP, microsatellites AFLP, RFLP,... (biallelic) - ploidy: >1 -extra optional information for inclusion: - prior knowledge on groups (e.g. geographic location) - genetic map location of marker

31 Program – data format

32 Example – S.t. tuberosum vs andigena Other: 1 st 30 genotypes from tuberosum 2 nd 20 genotypes from andigena

33 Example – S.t. tuberosum vs andigena PNA:

34 Example – S.t. tuberosum vs andigena PNA: Estimation of k Simulation # k Pr(k)

35 Example – S.t. tuberosum vs andigena PNA: assignment 1 = tbr; 2 = adg genotypes #31-#3: adg from India genotype #49: adg from Ecuador

36 Example – S.t. tuberosum vs andigena Parameter change: allow admixture Ancestry Model Info Use Admixture Model * Infer Alpha * Initial Value of ALPHA (Dirichlet Parameter for Degree of Admixture): 1.0 * Use Same Alpha for all Populations * Use a Uniform Prior for Alpha ** Maximum Value for Alpha: 10.0 ** SD of Proposal for Updating Alpha: 0.025 Frequency Model Info Allele Frequencies are Independent among Pops * Infer LAMBDA ** Use a Uniform Lambda for All Population ** Initial Value of Lambda: 1.0

37 Example – S.t. tuberosum vs andigena Parameter change: allow admixture

38 Example – S.t. tuberosum vs andigena Parameter change: allow admixture

39 Example – S.t. tuberosum vs andigena Parameter change: allow admixture

40 Example – andigena

41 Example – andigena: data

42 Example – andigena K = 2

43 Example – andigena K = 3

44 Example – andigena K = 3

45 Example – andigena: genetic distance K = 3

46 Example – andigena: geographic distribution - 1 K = 3

47 Example – andigena: geographic distribution - 2 K = 3

48 Example – andigena: geographic distribution - 3 K = 3

49 Example – I. batatas

50 Example – I. batatas: settings

51 Example – I. batatas: k K = 2

52 Example – I. batatas: k = 2 1=PAN, 2=HON, 3=GTM, 4=NIC, 5=MEX, 6=COL, 7=VEN, 8=ECU, 9=PER

53 Example – I. batatas: k = 3 1=PAN, 2=HON, 3=GTM, 4=NIC, 5=MEX, 6=COL, 7=VEN, 8=ECU, 9=PER

54 Example – I. batatas: k = 4 1=PAN, 2=HON, 3=GTM, 4=NIC, 5=MEX, 6=COL, 7=VEN, 8=ECU, 9=PER

55 Example – I. batatas: genetic distance

56 Example – S. paucissectum

57 Example – paucissectum: data

58 Example – paucissectum: configuration

59 Example – paucissectum: results: k =2

60 Example – paucissectum: results: k =3

61

62 Example – paucissectum: results: k =4

63 Example – paucissectum: results: k =5

64 Example – paucissectum: results: k =6

65 Summary The software was tested with population data from diploid, tetraploid and hexaploid species with microsatellite and biallelic marker The algorithm seems stable and delivers sensible results under a variety of settings Great advantage: assigns each individual a probability of being a member of a certain subgroup

66 Thanks for your attention!


Download ppt "Evaluation of a new tool for use in association mapping Structure Reinhard Simon, 2002/10/29."

Similar presentations


Ads by Google