Download presentation
Presentation is loading. Please wait.
Published byMonique Bouchard Modified over 5 years ago
1
Goals: To identify subpopulations (subsets of the sample with distinct allele frequencies) To assign individuals (probabilistically) to subpopulations
2
Ancestry models No Admixture: each individual is derived completely from a single subpopulation Admixture: individuals may have mixed ancestry: some fraction qk of the genome of individual i is derived from subpopulation k.
3
Typical data: Individual Locus Locus Locus Locus 4 A,A A,A A,C A,A A,B A,A A,B A,A B,B A,B A,A A,A C,C D,E D,E B,C C,C C,D D,D B,D B,C E,E A,E C,E A,C D,D C,D A,D {A,B,C,D,E} are labels for the different alleles at each locus.
4
More on the model... Let P1, P2, …, PK represent the (unknown) allele frequencies in each subpopulation Let Z1, Z2, … , Zm represent the (unknown) subpopulation of origin of the sampled individuals (no admixture model). Let Xijk be the genotype data for allele copy k of individual i at locus j. Assuming Hardy-Weinberg and linkage equilibrium within subpopulations, the likelihood of an individual’s genotype in subpopulation k, Gi is given by the product of the relevant allele frequencies: Pr(Gi | Zi= k, Pk) = P p(k) p(k) Xij1 Xij2 loci j
5
S Pr(Gi | Pj, Zi= j) Pr(Zi= j)
Then--adopting a Bayesian framework--we can write down the probability that individual i is from subpopulation k: Pr(Gi | Pk, Zi= k) Pr(Zi= k) S Pr(Gi | Pj, Zi= j) Pr(Zi= j) pops Here, Pr(Zi= k) gives the prior probability that individual i is from subpopulation k. Assigning the individual at random to a population according to the probabilities is an example of Gibbs sampling.
6
Similarly, a natural estimate of the allele frequencies in subpopulation k is:
Frequency of allele j at locus l in pop k = # copies of allele j in individuals from k 2*(# individuals from subpopulation k) But because we are Bayesian, and doing MCMC, we sample from a posterior distribution for the frequency that also depends on the prior.
7
MCMC algorithm (for fixed K)
Start at random initial values Z(0) for the population assignments. Then iterate the following steps for n=1,2,…. Step 1: Sample P(n) from Pr(P|Z(n-1) ,G) Step 2: Sample Z(n) from Pr(Z|P(n) ,G) For large n, P and Z will converge to the appropriate joint posterior distribution. Estimation of K performed separately (approximately)
8
Example: Taita Thrush data
three main sampling locations in Kenya low migration rates (radio-tagging study) 155 individuals, genotyped at 7 microsatellite loci
9
Neighbor-joining tree of data
15
Since 2000 Model more features of heredity
Infer more complex histories Faster algorithms Better data summaries More data
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.