A multi-strain, high-resolution mouse haplotype map reveals three distinctive genetic signatures Laboratory of Population Genetics
Motivations An accurate high-resolution haplotype map of the mouse genome enables prioritization of QTL candidate genes Different haplotype block structures have been reported in different studies >10MB block size in GNF study (Wiltshire et al, PNAS 2003) 1.0-2.0Mb block size in WI study (Wade et al, Nature 2002) 100-150kb block size in a 8MB region chr 19 (Park et al, Genome Research 2003) Analysis of a 10Mb region on chromosome 7 using the Celera mouse SNPs reveals a different genetic variation pattern Celera mouse chromosome 16 SNP data are publicly available Laboratory of Population Genetics
Objectives Develop an integrated, high resolution, multi-strain mouse haplotype map Compare the haplotype structure derived from high-density SNPs with those derived from low density markers Perform experimental validation in regions of conflict and in regions of interest across 20 inbred strains Analyze biological factors that have contributed to the formation mouse genetic variation patterns Laboratory of Population Genetics
Data Sources Chromosome 16 reference sequence MGSCv3 (NCBI build 30, Feb. 2003) SNP Data Laboratory of Population Genetics
Construction of Multi-Strain Haplotype Blocks with High Density SNP Markers Method Greedy algorithm that starts with two-haplotype per block Seed: a minimum of two adjacent SNPs with no-ambiguity in haplotype assignment Singleton SNP that breaks the two-haplotype configuration does not affect block extension Results 2,083 blocks 65,068 (95% ) Celera SNPs in 5 laboratory inbred strains. Laboratory of Population Genetics
Distribution of Haplotype Block Size Laboratory of Population Genetics
Blocks with Different Size Have Similar SNP Density Distribution Laboratory of Population Genetics
A 2.4-Mb Haplotype Block with Varying SNP Density DBA/2J A/J 129X1/SvJ 129S1/SvImJ C57BL/6J #SNP/10kb 400000 800000 1200000 1600000 2000000 2400000 >20 11-20 6-10 2-5 1 #SNP/10kb SNP Experimental Validation B6 Allele Non-B6 Allele 374 SNPs over 2.4Mb. Avg Density=0.156/kb. 153 of which were in hotspots (red and orange)
A 2.4-Mb Region with High SNP Density but Heterogeneous Variation Pattern (Erosion) Antaxin 2 binding protein 1 (nucleic acid binding, RNA binding) 129S1 DBA/2J A/J 129X1 B6 >20 11-20 6-10 2-5 1 #SNP/10kb B6 Allele Non-B6 Allele Missing Data Laboratory of Population Genetics
Details of Haplotype Erosion Across 160KB Location 5,721,639-5,878,633bp on chr16 Blocks 179SNPs in 14 blocks with the major pattern 116 SNPs in 19 blocks with the other patterns 49 Singleton SNPs 129S1/SvImJ DBA/2J A/J 129X1/SvJ C57BL/6J SNP Density Laboratory of Population Genetics
Other Heterogeneous Haplotype Patterns 2) Segmentation 129S1 DBA/2J A/J 129X1 C57BL/6J SNP Density Laboratory of Population Genetics
Other Heterogeneous Haplotype Patterns 3) Segmentation with Erosion 4) Random Laboratory of Population Genetics
Three Major Variation Patterns SNP Deserts: >1Mb with <0.5SNP per 10kb Large Blocks: >300kb “melded” haplotype blocks with consistent variation patterns Block Breakers: regions with heterogeneous variation patterns
Predictive Power of Haplotype Structures Test the ability to use the haplotype structure in one study to predict allelic variations in another study Our Haploytpe Blocks 98% accuracy on WI B6/129S1 SNPs that do not overlap with Celera SNPs 92% accuracy on GNF B6/129S1/AJ/DBA haplotypes WI B6/129S1 Haplotype Blocks 74% accuracy on Celera B6/129S1 genotypes 85% accuracy on GNF B6/129S1 genotypes 80% GNF markers are non-polymorphic across inbred strains used in Celera and WI shotgun sequencing Laboratory of Population Genetics
SNP Deserts in Chromosome 16 6 >1Mb SNP deserts in the five inbred strains used for Celera shotgun sequencing All 6 SNP deserts overlap with WI SNP deserts conserved across all WI strains 0.21% WI B6/SvJ SNPs in our SNP deserts 0.97% WI all SNPs in our SNP deserts 5 out of the 6 deserts have at least one end as part of large haplotype blocks SNP deserts are not genetically homogeneous There are STRP polymorphisms There are indel polymorphisms Laboratory of Population Genetics
Validation of a SNP Desert 0000100000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 001000 11111 01 0000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000000 000000 00000 00 1110100011111111111111111111111111111111111111111111111111111111111111111111111111111111111111110111111111110111111111101 000111 N0101 11 111111111111111101111111NNNNNN111111111111111111111111101111111011111NN011111N1000111111111111111111111111111111111111110 110001 11 Other Lab inbred B6,AKRJ Skive Czech 5 STRPs and 1 SNP in a 15kb SNP desert in all laboratory inbred strains The STRPs and the 1 SNP have the same variation pattern as the neighboring regions with high SNP density among the laboratory inbred strains Additional 120 SNPs discovered between the laboratory inbred strains and feral inbred strains Laboratory of Population Genetics
A Gene-Coding Region with Varying SNP Density WI SNPs Celera SNPs Mis-sense?? silent e3 e12 down UTR3 e4 mRNA sequence is MGC clone: from mammary tissues metastasized to lung The 10kb region is included in a 77kb haplotype block with 44 SNPs Variations in the mRNA sequence do not overlap with WI and Celera SNPs >=2 haplotypes in the regions?? Laboratory of Population Genetics
Results of Experimental Validation >down hap1 011110 129s1;129x1;AJ;BALB;C3HHe;DBA hap2 000000 AKRJ;C57BL hap3 110001 Czech;Skive >UTR3 /num_SNP=25 /num_strain=10 /num_hap=4 hap1 1100011000000100000000010 129s1;129x1;AJ;BALB;C3HHe;DBA2J; hap4 110001010011100N110001000 AKRJ; hap3 1111110011100111011110101 Czech;Skive; hap2 0000000000000000000000000 C57BL; >NM_145481_e12 /num_SNP=7 /num_strain=10 /num_hap=4 hap1 0000100 129s1;129x1;AJ;BALB; hap2 0010001 AKRJ; hap3 0000000 C3HHe;C57BL;DBA2J; hap4 1101011 Czech;Skive; >e4 hap1 0 Others hap2 1 Czech;Skive Mis-sense SNP does not validate Silent SNP validates
Laboratory of Population Genetics Mouse cSNPs Synonymous: 185 Non-synonymous: 100 Laboratory of Population Genetics
Haplotype Diversity in 43 Target Regions Assayed by 94 Amplicons Laboratory of Population Genetics
Conclusions We have compiled an accurate, multi-strain, high-resolution haplotype map for mouse chromosome 16 We have discovered three distinctive genetic variation patterns for laboratory inbred mouse: SNP deserts, large blocks and block breakers Large haplotype blocks may consist regions with varying SNP density Selection in inbreeding may have an effect on SNP distribution in protein coding regions as well as SNP rate in gene coding regions Our method is scalable for whole-genome analysis Laboratory of Population Genetics
Acknowledgement Laboratory of Population Genetics Ken Buetow Kent Hunter Michael Gandolph Bill Rowe Michael Edmonson Jenny Kelly University of Wisconsin Rob Williams Laboratory of Population Genetics