Kerstin Lindblad-Toh Whitehead/MIT Center for Genome Research Michael Kamal Broad/MIT Center For Genome Reseach
A First Look at the Mouse Genome Preliminary mouse genome analysis Future directions (briefly) Article available online:
Draft BAC map x 6.5 x shotgun coverage x Genome Assembly x Finished sequence BAC-based coverage X Finishing Whitehead Institute Washington University St Louis Sanger Institute EBI Mouse Genome Sequencing Consortium C57BL/6J Female
-41 M reads -2 and 4 kb plasmids (90%) -10 kb plasmids (5%) -40 kb fosmids (5%) -155 kb and 200 kb BACs (RPCI-23 & 24) -WI 54% of reads Mouse Genome Sequencing Consortium
Assembly: 88 ultracontigs, covers 96% of genome Contig: 25 kb Super: 17 Mb Ultra: 50 Mb
Regions of conserved synteny: ~95% of genome Extremely high conservation: 560,000 anchors
Regions of conserved synteny: ~95% of genome
Autosomes Chromosome X Genome size: Mouse < Human (2.5 vs 2.9 Gb) Expansion ratio (M/H)
Genome size: Mouse < Human (2.5 vs 2.9 Gb) 46% 37% 400 Mb Total Transposon-derived Repeat Human Mouse Less Transposon Activity in Mouse Lineage? 100 Mb Ancestral RepeatLineage-Specific Repeat Human Mouse No!!!! More Transposon Activity More deletion in mouse
Transposons: Accumulate in same regions
GC-content: human larger tails than mouse
Protein-coding gene count falling (<30,000) Mouse-Human Comparison ~ 99% have homologs (maybe 100%) ~ 96% have homolog in region of conserved synteny ~ 80% have 1:1-ortholog ~22,500 evidence-based gene predictions
Gene family expansions: reproduction, immunity 25 mouse-specific gene family cluster expansions 14 reproduction 5 host defense, immunity
Exons Non-exons 75% 90% Large conserved elements (>100 bp) Large conserved elements: Coding, Non-coding PPAR
How much of the genome is under selection? Extremely high conservation: 560,000 anchors Less than half are coding exons (~220,000)
Nucleotide-level alignment: ~40% of genomes WHYT Why so much? Given neutral substitution rate between mouse-human: Vast majority of truly orthologous sequence can be aligned! Alignable does NOT imply Functional
Nucleotide-level alignment: ~40% of genomes WHYT Suppose: Ancestral genome ~2.9 Gb New transposons are offset by deletion Ancestral genome remaining: in human = 73% in mouse = 57% in both = 73% x 57% = 42% Why so little?
Neutral substitution rate: ~0.46 per site Mouse Human Mouse 2x faster over 75 Myr Substitutions in Ancestral Repeats roughly normal distribution
Neutral substitution rate: ~0.46 per site Introns Coding exons 5’-UTR 3’-UTR Upstream Downstream CpG Islands Known Regulatory
Proportion of genome under selection: ~5% Neutral sequence: Ancestral repeat Whole genome: Alignable portion Excess Conservation Coding Exons only ~1.5% What is the rest? UTR, Regulatory Elements, RNA genes, Structural Elements?
TNFα enhancer Conserved RefSeq Genscan Human Mouse ACCGCTTCCTCCACATGAGATCATGGTTTTCTCCACCAAGGAAGTTTTCCGAGGGTTGAATGAGAGCTTTTCCCCGCCC ||||||||||||| ||||| |||||| |||||||||||||||||||||||| |||||||||| ||||||||||| ACCGCTTCCTCCAGATGAGCTCATGGGTTTCTCCACCAAGGAAGTTTTCCGCTGGTTGAATGA--TTCTTTCCCCGCCC ******* ******** ********** ****** ****** ****** ******** NFat/Ets CRE k3-Nfat Ets Nfat AP1 SP1
Genome evolving at non-uniform rate
Mouse Genome summary 2.5Gb in size (smaller than human, due to deletion) More lineage-specific repeats 99% with homologs in human) Evolves 2x faster than human 95% of genome in blocks of conserved synteny 5% under selection (1.5% coding, the rest is unknown) Large haplotype blocks of domesticus or musculus ancestry in inbred strains
Implications of mouse sequence Cloning of Classical mutations New Mutagenesis programs Identification of Quantitative Trait Loci (QTLs) Engineering Knock-outs, Knock-ins BAC transgenics Modeling human disease Understanding gene regulation
Future direction Finish mouse Genome Sequence more mammals (dog, chimp, marsupial) “Genomic accounting” Identify regulatory elements Mouse haplotype map
Genomic Alignments for Multiple Species Sequence more mammals (dog, chimp, marsupial) “Genomic accounting” Identify regulatory elements Mouse haplotype map …. integrated with gene expression analysis
Acknowledgement Whitehead Institute Kerstin Linblad-Toh Michael C. Zody David Jaffe Claire Wade Mark Daly Jade Vinson Elinor Karlsson EJ Kulbokas Nicole Stange-Thomann Rob Nicol Tim Holzer Toby Bloom Jill Mesirov Chad Nusbaum Bruce Birren Eric Lander Washington University John McPherson Bob Waterston Sanger Institute Jim Mullikin Jane Rogers Analysis Group David Haussler Jim Kent Arian Smit Chris Pontig Webb Miller Ross Hardison Laura Elnitsky Inna Dubchak Lior Pachter Sean Eddy Michael Brent Roderic Guigo Wayne Frankel Carol Bult Ensembl Ewan Birney Mouse Liaison group University of Oklahoma Albert Einstein/Harvard NIH ISC TIGR CHORI
Mouse Genome: SNPs: