Download presentation
Presentation is loading. Please wait.
Published byMildred Singleton Modified over 9 years ago
1
The Ashkenazi Genome Project Shai Carmi Pe’er lab, Columbia University and The Ashkenazi Genome Consortium (TAGC) Boston September 2013
2
Outline Ashkenazi Jewish (AJ) Genetics and TAGC Basic Variant Statistics Utility in AJ Medical Genetics Demographic History of AJ and Europeans Summary
3
Ashkenazi Jewish (AJ) Genetics & TAGC
4
Recent History of Ashkenazi Jews (AJ) Mediterranean origin (?) Ca. 1000: Small communities in Northern France, Rhineland Migration east Expansion Migration to US and Israel ≈10M today Relative isolation
5
Ashkenazi Jewish Genetics Behar et al., Nature, 2010 Bray et al., PNAS, 2010 Guha et al., Genome Biol., 2012 300 Jewish individuals; SNP arrays Recently, AJ shown to be genetically distinct Close to Middle-Easterners & South-Europeans Price et al., PLoS Genet., 2008 Olshen et al., BMC Genet., 2008 Need et al., Genome Biol., 2009 Kopelman et al., BMC Genet., 2009 AJ Atzmon et al., AJHG, 2010 Jewish non-AJ Middle- Eastern Europeans
6
Recent Demography & IBD A B A B A shared segment Recent, strong genetic drift leads to long identical-by-descent haplotypes. IBD sharing common in AJ (Gusev et al., MBE, 2011 and others) Inferred bottleneck of just ≈300 individuals ≈800 ya (Palamara et al., AJHG, 2012)
7
Ashkenazi-Jewish (AJ) Genetic Risk Factors Multitude of Mendelian disorders – Carrier screening: A success story Breast and ovarian cancer: BRCA1, BRCA2 Parkinson’s disease: LRRK2, GBA Gravel et al., 2001 Tay-Sachs births
8
AJ Genetics: Summary & Prospects Large population (≈10M) Narrow bottleneck (≈300) Mostly isolated Recruitable Well studied Insight on both European and Middle-Eastern past ×No genealogies ×Mobile ×Some recent admixture ×Significant ancient admixture
9
The Ashkenazi Genome Consortium Phase I: 128 AJ personal genomes Healthy controls Unrelated, PCA-validated AJ Technology: Complete Genomics Goal: 11+5 labs, mostly from the NY area Sequence to high coverage hundreds of healthy AJ o Use as a reference panel for imputation and clinical interpretation o Improve understanding of population history and functional genetic variation in AJ
10
Basic Variant Statistics
11
Variant Statistics & Comparison to Europeans Comparison panels: o 1000 Genomes Europeans o 26 Flemish from Belgium, sequenced by Complete Genomics Projection method: Gravel et al., PNAS, 2011
12
Allele Frequency Spectrum
13
Utility in AJ Medical Genetics
14
Screening AJ Genomes An ancestry-matched reference panel is expected to filter more benign variants in clinical genomes.
15
A Catalog of Mutations in Known AJ Disease Genes Tens of genes harbor known mutations for AJ-prevalent Mendelian disorders or risk factors for multifactorial diseases. o Tay-Sachs disease, Gaucher disease, Familial dysautonomia, Niemann-Pick disease, Torsion dystonia, Canavan disease, Bloom syndrome, etc. o Breast cancer (BRCA1/2), Colon cancer (APC), Parkinson’s (LRRK2), etc. We mapped 73 mutations in 48 genes. Detected carriers of 35 known disease mutations. Detected 184 missense and 18 loss-of-function novel (dbSNP135) variants. o Catalog will be made available.
16
Imputing AJ Arrays AJ outperforms CEU even for a larger CEU panel Accuracy improved across all frequencies and by all measures —Discordance rate, r 2, false negatives/positives, Impute2 metrics
17
Imputation by IBD Impute by copying long IBD segments from a fully sequenced genome into a sparsely genotyped one. – Only 1-2 recent mutations per segment are expected IBD detected using Germline with additional filtering. >3cM
18
A Short Detour: A Model for the Expected Coverage
19
Coverage by IBD: Theory Problem statement: – Reference panel (say, fully sequenced) of size n r – Study panel (say, sparsely genotyped) of size n s – Detect all IBD segments of length >m (Morgan) between study and reference panels – What is the average fraction of a study genome covered by IBD segments to the reference panel? Assumptions: – Haploid (phased), infinite genomes – All segments can be detected – Coalescent with recombination – Recombination breaks a shared segment (B>>1) B Time (generations) Present g g+1 Prob. 1-α
20
Coverage by IBD: Theory
21
Demographic History of AJ & Europeans
22
Recent AJ History Using IBD Palamara et al., AJHG 2012
23
Ancient History, One Population at a Time Fit the allele frequency spectrum, computed using diffusion (∂a∂I, Gutenkunst et al., PLoS Genetics, 2009)
24
A Consequence
25
Principal Component Analysis
26
Ancient History What we know/learned so far: AJ are a Middle-Eastern:European mix Slightly higher heterozygosity (+2.4%) – Larger ancient population size – Admixture – Recent explosive growth Many more AJ-specific variants – +14% for 25x25 genomes Out-of-Africa (Henn et al., PNAS, 2012) – ≈50-60 kya – Serial founder model: Africa → Middle-East → Europe – Hunter-gatherers in Europe at ≈40-45 kya (Higham et al., Nature, 2011) – Bottleneck and expansion at each step
27
The Joint AFS Allele frequencies correlated but substructure exists. Experimenting with inference using the joint AFS —For our sample size, can infer at most ≈10 parameters —Hard to infer very recent history —Hard to infer migration rates
28
A Proposed Model Time Present N0N0 N b,OOA N f,AJ T b,OOA N b,EU N f,EU T b,EU TaTa fafa Flemish AJ
29
The Inferred Model Time (years ago) Present 6500 2300 52,000 1800 58,000 10,800 1700 55% Flemish AJ 7500 Out-of-Africa? Early Neolithic migrants? Jewish diaspora? Middle-East/ Levant?
30
European Origins Farming began in Europe ≈5-8kya (“the Neolithic revolution”) Spread of ideas (“cultural diffusion”) Human migration (“demic diffusion”) For cultural diffusion, split from Middle-Easterners at ≈40-45 kya. We estimate ≈11 kya Earlier than ≈5-8 kya perhaps due to Early substructure before actual migration Incomplete replacement of hunter-gatherers Traces of recovery from the Last Glacial Maximum
31
Confidence Intervals Parameter Maximum likelihood Bias-corrected mean±SD 95% confidence interval 65436523±25[6475, 6572] 22562314±47[2223, 2406] 53,05052,007±1561[48,947, 55,067] 76327494±193[7116, 7872] 15561802±28[1748, 1857] 10,60010,835±188[10,467, 11202] 56,51957,977±2912[52,270, 63,685] 19401686±98[1495, 1878] 55%55%±1%[53%, 57%] Parametric bootstrap: o Simulate whole genomes with the maximum likelihood parameters o MaCS, Chen et al., Genome Res., 2009 o Infer using the simulated datasets
32
Hmmm… Mutation rate Model specification
33
Mutation Rate
34
Model Specification We tried several alternative models All models support >50% European ancestry in AJ and European-Middle-Eastern split 10-15 kya. For example, a two-wave model for the population of Europe supports LGM recovery + Neolithic replacement:
35
Summary & Outlook We sequenced 128 healthy AJ genomes to high coverage. Our reference panel will improve: – Screening of AJ clinical genomes or known disease genes – Imputation of AJ SNP arrays IBD sharing indicates a very recent bottleneck and expansion. The AJ-European joint allele frequency spectrum suggests: – Over 50% European ancestry in AJ – Europeans diverged from Middle-Easterners only ≈10-15 kya – Made possible by sequencing population with partly Middle-Eastern ancestry In the future: – Sequence ≈200 more genomes to cover entire bottleneck – Use genomes from more populations to fine-tune demographic models
36
Thank you! TAGC consortium members: Columbia University Computer Science: Itsik Pe’er Fillan Grady, Ethan Kochav, James Xue Shlomo Hershkop Long-Island Jewish Medical Center: Todd Lencz, Semanti Mukherjee, Saurav Guha Columbia University Medical Center: Lorraine Clark, Xinmin Liu Albert Einstein College of Medicine: Gil Atzmon, Harry Ostrer, Nir Barzilai, Kinnari Upadhyay, Danny Ben-Avraham Mount Sinai School of Medicine: Inga Peter, Laurie Ozelius Memorial Sloan Kettering Cancer Center: Ken Offit, Joseph Vijai Yale School of Medicine: Judy Cho, Ken Hui, Monica Bowen The Hebrew University of Jerusalem: Ariel Darvasi Funding: Human Frontiers Science program VIB, Gent, Belgium Herwig Van Marck, Stephane Plaisance Complete Genomics Omicia
37
AJ Genetics 2,300 N t Effective size 45,000 270 4,300,000 Years ago 800 Present Palamara et al., AJHG 2012 AJ UK Power of imputation by IBD
38
Complete Genomics WGS
39
Quality Control PropertyGenome (exome) Coverage≈56x Fraction called96.7±0.3% (98.1%) Fraction with coverage > 20x92.7±1.6% (94.9%) Concordance with SNP array99.67±0.25% Ti/Tv ratio2.14±0.004 (3.05) Ti/Tv 128 samples from two labs were sequenced in 3 batches Minimal batch effects Some results are for the first batch of 57 genomes
40
Quality Control False positive rate assessment —Counting (the few) hets inside long runs of homozygosity —A duplicate sample Genome wide extrapolation: – SNVs: ≈10-40k FP per genome (FDR: 0.3-1.3%) – Indels: ≈10-30k FP per genome (FDR: 2-6%) QC: – Remove indels and poly-allelic variants – Remove HWE violations, low call rate FP after QC: ≈5k per genome. hets roh
41
Concordance with Arrays 0.05% Asymptotic discordance
42
Processing and Cleaning Pipeline 58 Complete Genomics masterVar (hg19) AJ VCF file CGA tools mkvcf Remove low-quality, half-called, or non-SNVs Remove variants not fully called in at least one individual Remove inbred individual Custom script; Plink/Seq Remove poly-alleleic variants Remove variants with high no-call rate or that are not in Hardy-Weinberg equilibrium Cohort-based cleaning Plink file Local cleaning 26 Complete Genomics masterVar (hg18) Flemish testvariants file CGA tools Liftover hg18 => hg19 Remove low-quality, half-called, or non-SNVs Remove variants not fully called in at least one individual Cohort-based cleaning Plink file Local cleaning VCF file Custom script Remove coordinates with reference mapping problem Remove variants with AJ-Flemish incompatible alleles Initial filtering Variant in both cleaned files? Keep Variant in one cleaned file and in the VCF of the other? Discard Variant in one cleaned file and not at all in other? Keep and set other as hom-ref Merge AJ-Flemish genotypes Remove variants incompatible with 1000 Genomes Phase and impute sporadically missing genotypes SHAPEIT; using 1000 Genomes panel Phase using molecular phasing information seqphase 128 Complete Genomics masterVar (hg19) AJ complete project testvariants file CGA tools Remove low-quality, half-called, or non-SNVs Remove variants not fully called in at least one individual Remove poly-alleleic variants Remove variants with high no-call rate or that are not in Hardy-Weinberg equilibrium Cohort-based cleaning Plink file Local cleaningCustom script Phase and impute sporadically missing values Validate AJ ancestry Validate no cryptic relatedness Summary stats, array concordance, and duplicates analyses Ti/Tv statistics Monomorphic non-ref and runs-of- homozygosity analyses SHAPEIT
43
Mobile Element Insertions (MEIs) & Copy Number Variants (CNVs) Initial validation efforts suggested high false discovery rate, at least for novel events. Novel MEIs: 3/11 validated Strong batch effect 1000 Genomes MEIs
44
Variant Statistics StatisticPer genome (exome) Total SNPs3.4M (22k) Novel SNPs3.8% (4.1%) Het/hom ratio1.65 (1.67) Insertions count220k (242) Deletions count235k (223) Substitutions count83k (374) Synonymous SNPs10,536 Non-synonymous SNPs9706 Nonsense SNPs72 Other disrupting255 CNV count302 SV count1480 MEI count4090
45
Imputing AJ Arrays Compare imputation accuracy of AJ SNP arrays when using either AJ or European reference panels. AJ Arrays (1000) Phased AJ Sequences (57) 7 1000 Genomes CEU (87) AJ arrays (1007) Reference Panel 1 (50) Reference Panel 2 (87) Reference Panel 3 (137) Study Panel (1007) Phase (ShapeIT) Imputed Study Panel 1 Imputed Study Panel 2 Imputed Study Panel 3 87 Reduce to unphased arrays 1000 50 Impute (Impute2)
46
Mutation Burden in AJ Theoretically, a narrow bottleneck should increase the load of deleterious variants (e.g., Lohmuller, Nature, 2008) o Or not? (Simons et al., arXiv, 2013) o Expect higher load in AJ. Define deleterious: o Derived? Minor? Non-reference? Rare? o How to weight each variant? o Account for demography, sequencing errors? o Define significance? Compare 26 AJ and 26 Flemish. AJ have between 1-10% more deleterious variants than expected (using Flemish as baseline). P-values between 0.2 and 10 -60.
47
Mutation Burden in Disease Categories Many diseases have been suggested to be more prevalent in AJ (Goodman 1979) o Several Mendelian disorders o Some cancers o Inflammatory bowel diseases o Diabetes, obesity o Some psychiatric diseases, myopia Annotate genes according to disease category (Omicia Inc). Compare non-synonymous variant load between AJ and Flemish. Disease category#genesAJ/FL ratio Aging1061.07 Infectious701.03 Neonatal9561.02 Gastrointestinal2541.02 Dental861.01 Immunological4741.01 Hemic2021.01 Cardiovascular5021.01 Endocrinological7501.01 Oncological4711.01 Women’s391.00 Drug821.00 Neurological9801.00 Nutrition290.99 Respiratory1870.99 Kidney2850.96 Psychiatric210.93 No category comes out significant in Gene Set Enrichment Analysis.
48
AJEU IBD observed Het/Hom Ratio t Years ago Present
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.