Accelerating positional cloning in mice using ancestral haplotype patterns Mark Daly Whitehead Institute for Biomedical Research
Kerstin Lindblad-Toh Whitehead/MIT Center for Genome Research
Mouse sequence reveals great similarity with the human genome Extremely high conservation: 560,000 “anchors” Mouse-Human Comparison both genomes billion bp long > 99% of genes have homologs > 95% of genome “syntenic”
Mouse history Mouse Genetics, L. Silver
Recent mouse history W.E. Castle C.C. Little Fancy mouse breeding - Asia, Europe (last few centuries) Retired schoolteacher Abbie Lathrop collects and breeds these mice Granby, MA – 1900 Castle, Little and others form most commonly used inbred strains from Lathrop stock (1908 on)
Critical components of inbred strain diversity Asian musculus and European domesticus mice dominate the world but have evolved separately over ~ 1 Million years Thousands of years of fancy mouse breeding resulted in highly homogeneous versions of these wild mice being traded and ending up in Lathrop’s schoolhouse
Structure of variation in the laboratory mouse Study 1: compare finished BACs from strain 129 to recent C57BL/6J genome assembly Study 2: extrapolate general observations utilizing WGS reads from 129, C3H, Balb/c done as part of the MGSC
Distribution of variation rates 70 unlinked 50 kb segments (129 vs. B6) { <1 SNP/10 kb { ~40 SNP/10 kb
Distribution of variation rates 70 unlinked 50 kb segments { Only 1/3 validate { ~99% validate
Low and High SNP rate suggest recent and distant ancestry
SNP discovery analysis summary Comparisons of 129 and C57BL/6 show alternating regions of high SNP density (~1 per bp) and low SNP density (~1 per 20,000 bp) Genome-wide shotgun suggests these segments average 1 Mb C3H and Balb/c comparisons to C57BL/6 give identical picture with regions of divergence and identity varying
Genetic Background of the inbred lab mice musc domest C57BL/6 C3H DBA Avg segment size ~ 1-2 Mb { cast
Positional Cloning C3H (susceptible) B6 (resistant) 20 Mb Traditionally: positional cloning is painful (e.g., generating thousands of mice for fine mapping, breeding congenics) – As a result, countless significant QTLs have been identified in mapping crosses but only a handful of those have been successfully cloned
Using ancestral haplotypes to localize QTLs C3H (susc.) B6 (res.) Critical Region 20 Mb
Using ancestral haplotypes to localize QTLs C3H (susc.) B6 (res.) DBA (susc.) Critical Region 20 Mb
Using ancestral haplotypes to localize QTLs C3H (susc.) B6 (res.) DBA (susc.) Critical Region 20 Mb One can then also use the map to: - examine correlation of genotype to phenotype of other strains in the critical segments - choosing optimal additional strains for crossing
Pilot Haplotype Map ~150 SNPs across 25 Mb of chromosome 4 Typed in 37 inbred lines and 10 wild-derived isolates of potential founder strains Roughly 3-fold less dense than projected to give a finished picture
Strains proposed Wild derived ancestral strainsInbred strains CAST/Ei M.m.castaneus 129S1/SvImJC57BL/6JDDKNZW/LacJ WSB/Ei M.m.domesticus 129X1/SvJC57BLKS/JFVB/NJPL/J PERA/Ei M.m.domesticus A/JC57BR/cdJI/LnJRIIIS/J MOLF/Ei M.m.molossinus AKR/JC57L/JKK/HlJRF/J MAI/Pas M.m.musculus BALB/cByJC58/JLG/JSEA/GnJ CZECHII/Ei M.m.musculus BTBR+Ttf/tfCBA/JLP/JSJL/J SPRET/Ei M.spretus BUB/BnJCE/JMA/MyJSM/J SEG/Pas M.spretus C3H/HeJDBA/2NOD/LtJST/bJ BACT/Bon M.m.bactrianus C57BL/10020NON/LtJSWR/J
First few Mb… 129S1TA*CCC*CGGTACGAGGG AKRAGTTTAATGGTACGAGGG A_JAGTTTAATGGTACGAGGG BALB_cTA*CCCGCGGTACGAGGG C3HAGTTTAATGGTACGAGGG C57B6AGTTTAATCTAGTACCCA CBAAGTTTAATCTAGTACCCA DBA2AGTTTAATCTAGTACCCA FVBAGTTTAATCTAGTACCCA IAGTTTAATGGTACGAGGG NODAGTTTAATGGTACGAGGG NZB*ACCCC*CCT*GTACCCA SJLAGTTTAATCTAGTACCCA SWRAGTTTAATCTAGTACCCA Chr (Mb)
Regional Comparison 129/S1 vs. C57BL/6J Red = ancestrally divergent { GGGAGCATGGC*CCC*AT ACCCATGATCTAATTTGA { GGAACCGTGATCCACAAGAC GGAACCGT*ATCCACAAGAC Blue = ancestrally identical
QTL identified in two crosses 129xB6 DBAxB6
QTL identified in three crosses QTL identified by three crosses 1x2, 1x3, and 1x4 Across 25 Mb, only 4 segments (each between 300 and 700 kb) are ancestrally consistent with the QTL mapping data In total – 1.9 out of the 25 Mb is identified as most likely to contain the responsible mutation
A/J vs C3H Not a software bug – two strains identical at every SNP typed across the 25 Mb interval
Genetic Background of the inbred lab mice musc domest C57BL/6 C3H DBA Avg segment size ~ 1-2 Mb { cast
Genetic Background of the inbred lab mice musc domest C57BL/6 C3H DBA Avg segment size ~ 1-2 Mb { cast
Colleagues Claire Wade Andrew Kirby Whitehead Genome Center Kerstin Lindblad-Toh EJ Kulbokas Elinor Karlsson Mike Zody Eric Lander Mouse Genome Sequencing Consortium