The new Y Chromosome Haplotype Reference Database (YHRD) and optimized approaches for the forensic Y-STR analysis Sascha Willuweit & Lutz Roewer Institut für Rechtsmedizin und Forensische Wissenschaften Charité – Universitätsmedizin Berlin ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Workshop schedule 2015, September 1st, 2:30 pm – 6:30 pm Different frequency estimation methods implemented in the YHRD Mixture analysis using the YHRD Kinship analysis using the YHRD Ancestry information retrievable from YHRD Subpopulation analysis (AMOVA) using YHRD Discussion of casework examples ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
YHRD - Increasing numbers
Frequency estimation ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Frequency estimation methods Constant estimators Augmented counting (1/n+1) Counting with database inflation (Brenner‘s κ) Variable estimators Surveying method (Krawczak) Coalescence based estimation (Caliebe) Discrete Laplace method (Andersen) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Enabled in YHRD
Frequency estimation for Y-STR profiles ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
23 loci 17 loci 9 loci Frequency estimation for rare haplotypes with „Kappa inflation“ (0 observations) Count Singletons with kappa ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics x n.a x K=0.78 K=0.24 (1.4 x )* (7.9 x )* * counting - proportion of singletons estimator of the proportion of not sampled rare haplotypes in the database
©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Comparison of estimators for rare haplotypes Discrete Laplace vs. counting, kappa and surveying methods using a simulated population of 1 million, with a database size of 1000 and a kappa proportion of singletons of =0.864 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Courtesy of M.M. Andersen (Copenhagen)
Fig. 1 Comparison of (1) the relative frequency of a haplotype (number of times it has been observed divided by the database size) and (2) the estimated haplotype frequency using the discrete Laplace method. Note, that for frequently observed haplotypes, t... Mikkel Meyer Andersen, Poul Svante Eriksen, Niels Morling Cluster analysis of European Y-chromosomal STR haplotypes using the discrete Laplace method Forensic Science International: Genetics, Volume 11, 2014, ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Interpretation tools implemented in YHRD Mixture analysis (Frequency and LR based) Kinship calculation (Frequency and LR based) Population substructure (AMOVA, Fst/R st, MDS) Ancestry information (AI) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Mixture analysis ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
male mixture (major, minor component) only ♀ component no ♂ admixture in AMELOGENIN Autosomal analysisY chromosomal analysis Casework example Delict: sexual assault Evidence: contact stain on clothing ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Analyse with Mixture analysis tool (partial Y23 profiles) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Result for PowerPlex Y23 (20 loci) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Reanalysis using reduced PPY12 profiles ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Result for PowerPlex Y12 (10 loci) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Reanalysis using further reduced 9-locus minHt profiles ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Result for minHt (7 loci) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Kinship ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
The Y chromosom - a linearly inherited, haploid marker system ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
For which cases? ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Likelihood Calculation (LR) / Brotherhood (Probability for observing the haplotypes given same fathers vs. probability for observing the haplotypes given different fathers) L (X) = µ/2 x [f(A) + f(B)] L (Y) = f(A) x f(B) µ = mutation rate f = haplotype frequency (YHRD) Locus-spezific µ for one-step-mutations, see YHRD For the X hypothesis for each locus the probability of „non-mutation“ (1- µ) is also considered Rolf et al. (Int J. Legal Med. 2001); Buckleton et al. (CRC Press, 2005) AB Same or different fathers? ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Brothers? Related: L(X) = 1.4 x x 1 x µ/ x x 1 x µ/2 = 2.9 x Unrelated: L(Y) = 1.4 x x 2.3 x = 3.2 x LR (X/Y) = 91 14, 13, 31, 24, 11, 13, 14, 11-11, 14, 13 14, 13, 31, 25, 11, 13, 14, 11-11, 14, 13 µ = 3.6 x * f B = 2.3 x10 -5 * Meioses * YHRD f A = 1.4 x * AB Same or different fathers ? ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
L(X) = 1.4 x x 1 x µ/2 = 1.5 x L(Y) = 1.4 x x 2.3 x = 3.2 x LR (X/Y) = , 13, 31, 24, 11, 13, 14, 11-11, 14, 13 14, 13, 31, 24, 11, 14, 14, 11-11, 14, 13 µ = 2.1 x (moderate)* f B = 1.4 x * f A = 2.3 x10 -5 * * YHRD B A Father – son or unrelated ? Influence of the local mutation rate on LR ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
L(X) = 1.4 x x 1 x µ/2 = 8.4 x L(Y) = 1.4 x x 2.3 x = 3.2 x LR (X/Y) = , 13, 30, 24, 11, 13, 14, 11-12, 14, 13 14, 13, 30, 24, 11, 14, 14, 11-12, 14, 13 µ = 1.2 x (rapid)* B A f B = 1.4 x * f A = 2.3 x10 -5 * * YHRD Father – son or unrelated ? Influence of the local mutation rate on LR ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Common ancestor? L(X) = 1.4 x x 7 x µ/ x x 5 x µ/2 = 1.1 x L(Y) = 1.4 x x 2.3 x = 3.2 x LR (X/Y) = , 13, 31, 24, 11, 13, 13, 11-12, 14, 13 14, 13, 31, 24, 11, 14, 13, 11-12, 14, 13 µ = 2.1 x (moderate)* f obs = 1.4 x * f obs = 2.3 x10 -5 * Meioses * YHRD A B 7 5 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Common ancestor? L(X) = 1.4 x x 7 x µ/ x x 5 x µ/2 = 6.6 x L(Y) = 1.4 x x 2.3 x = 3.2 x LR (X/Y) = , 13, 31, 24, 11, 13, 13, 11-12, 14, 13 14, 13, 31, 24, 11, 14, 13, 11-12, 14, 13 µ = 1.2 x (rapid)* f obs = 1.4 x * f obs = 2.3 x10 -5 * Meioses * YHRD A B 7 5 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
LociMutation Rate [95% CI]MeiosesPosition[MutRate]Group[MutRate] dys438 2,96E slow dys392 4,04E slow dys393 1,09E slow dys437 1,19E slow dys448 1,65E slow dys390 2,06E medium dys385 mc 2,30E medium dys19 2,32E medium ygatah4 2,47E medium dys391 2,54E medium dys389i 2,68E medium dys635 3,72E medium dys389ii 3,78E medium dys456 4,19E medium dys481 4,97E medium dys E medium dys439 5,35E medium dys460 6,22E medium dys458 6,74E medium dys518 1,84E fast dyf387S1ab mc 1,59E fast dys576 1,43E fast dys570 1,24E fast dys627 1,23E fast dys449 1,22E fast ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Ranking of Y-STR mutation rates
©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
D A f (A) = 1/123* f (D) = 1/388** Likelihood Ratio (LR, KI) calculation for Y-STRs * Program uses counting (Discrete Laplace extrapolation: 1/311) ** Program uses counting (Discrete Laplace extrapolation: 1/821)
Population analysis (AMOVA) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
YHRD: Test on population substructure (Fst, Rst) (Example: 17,278 Chinese individuals in 52 populations) ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Ancestry information ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Fast and slowly mutating Y markers Y-SNPs µ = irreversible stable phylogeny Y-STRs µ = recurrent networks TCGAGGTATTAAC TCTAGGTATTAAC TCGAGGCATTAAC TCTAGGTGTTAAC TCGAGGTATTAGC TCTAGGTATCAAC * ** * * 17,13,30,25,10,11,13, ,13,30,25,10,11,13, ,13,31,25,10,11,13, ,13,30,24,10,11,13, ,13,30,25,10,11,13, ,13,29,25,10,11,13, ,13,30,26,10,11,13, ,13,30,25,10,11,14, ,13,30,25,11,11,13,10-14 Time ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Roewer et al. Hum Genet 2005 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Y-STR gradients (7 loci)
Y-SNP gradients (R1a) Fechner et al., Am J Phys Anthropology 2008 ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Semino et al (n = 2400) Haplogroup J2a Haplotype: 14,13,30,22,10,11,12,13-16,... Is ancestry prediction possible? Biogeographical analysis using Y doesn‘t predict nationality residency or phenotype Y markers infer very useful information the deep ancestry of a paternal lineage and its proliferation (radiation) over time until today Skeleton in a trolley, 5g femur extracted ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics Y marker analysis (Geppert et al. 2010)
©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Unknown skeletonized person – extract, type, search and add „ancestry information“
©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Ancestry information – three features and heat map
©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Heat Map (searched haplotypes are reduced to the most representatively sampled minHt)
©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Searched haplotype is compared with a database of STR+SNP typed samples
Hg prediction is prone to IBS errors (as evidenced by YHRD)! Mandatory: Y-SNP analysis using (mini)sequencing SNaPshot method (Hierarchical Multiplex Analysis) Geppert M & Roewer L (2012) SNaPshot® minisequencing analysis of multiple ancestry- informative Y-SNPs using capillary electrophoresis. Methods Mol Biol. 830: J2a Turkey, Fertile Crescent, Caucasus, Mediterranean ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
„Most frequent neighbour“ - 15,13,29,22,10,11,12,15-16 – 22 matches 15,13,30,22,10,11,12,15-16 – 2 matches to YHRD Legende: Each dot is one population sample (on average 120 individuals) with matching populations marked in red ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 CAVE!
But: SNaPshot analysis Haplogroup E-M2 highest frequency in West Africa (~ 80%) and Central Africa (~ 60%), not India Discrepancy between YSTR and YSNP distribution! ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015
Part II: Casework examples ©Charité – Universitätsmedizin Berlin, Dept. Forensic Genetics 2015 Frequency estimation Mixture Kinship Ancestry