Kerstin Lindblad-Toh Whitehead/MIT Center for Genome Research Michael Kamal Broad/MIT Center For Genome Reseach.

Slides:



Advertisements
Similar presentations
Using mouse genetics to understand human disease Mark Daly Whitehead/Pfizer Computational Biology Fellow.
Advertisements

Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Mouse Genome Annotation Summit, 12 Mar 2008 The Status of the Mouse Genome.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Comparative Motif Finding
Evolution and the Santa Cruz Genome Browser Jim Kent and the Genome Bioinformatics Group University of California Santa Cruz Pennsylvania State University.
Whole Genome Sequencing, Comparative Genomics, & Systems Biology Gene Myers University of California Berkeley.
Finding genes in human using the mouse Finding genes in mouse using the human Lior Pachter Department of Mathematics U.C. Berkeley.
[Bejerano Fall10/11] 1 Any Project reflections?
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
How to access genomic information using Ensembl August 2005.
[Bejerano Fall09/10] 1 Milestones due today. Anything to report?
28-Way vertebrate alignment and conservation track in the UCSC Genome Browser Journal club Dec. 7, 2007.
Defining the Regulatory Potential of Highly Conserved Vertebrate Non-Exonic Elements Rachel Harte BME230.
David Haussler Howard Hughes Medical Institute University of California, Santa Cruz Assembly, Comparison, and Annotation of Mammalian Genomes.
Genomic Rearrangements CS 374 – Algorithms in Biology Fall 2006 Nandhini N S.
Human Genome Project. Basic Strategy How to determine the sequence of the roughly 3 billion base pairs of the human genome. Started in Various side.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
A high-resolution map of human
Bioinformatics Genome anatomy Comparisons of some eukaryotic genomes Allignment of long genomic sequences Comparative genomics Oxford Grid Reconstruction.
[Bejerano Fall10/11] 1.
The Human Genome Project Public: International Human Genome Sequencing Consortium (aka HUGO) Private: Celera Genomics, Inc. (aka TIGR)
Genome sequencing. Vocabulary Bac: Bacterial Artificial Chromosome: cloning vector for yeast Pac, cosmid, fosmid, plasmid: cloning vectors for E. coli.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Sequencing a genome and Basic Sequence Alignment
BNFO 602/691 Biological Sequence Analysis Mark Reimers, VIPBG
Mouse Genome Sequencing
Genome Annotation and Databases Genomic DNA sequence Genomic annotation BIO520 BioinformaticsJim Lund Reading Ch 9, Ch10.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Cryptic Variation in the Human mutation rate Alan Hodgkinson Adam Eyre-Walker, Manolis Ladoukakis.
NEW NEWS of HUMAN FROM MOUSE and CHIMP Nature 420 (6915), 5 Dec 2002 Genome Research 13(3), March 2003.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Sequencing a genome and Basic Sequence Alignment
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Chapter 21 Eukaryotic Genome Sequences
VISTA family of computational tools for comparative genomics How can we leverage genome sequences from many species to learn about genome function?How.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Initial sequencing and analysis of the human genome Averya Johnson Nick Patrick Aaron Lerner Joel Burrill Computer Science 4G October 18, 2005.
Bioinformatic Tools for Comparative Genomics of Vectors Comparative Genomics.
Mark D. Adams Dept. of Genetics 9/10/04
MEME homework: probability of finding GAGTCA at a given position in the yeast genome, based on a background model of A = 0.3, T = 0.3, G = 0.2, C = 0.2.
Tools for Comparative Sequence Analysis Ivan Ovcharenko Lawrence Livermore National Laboratory.
Gene prediction roderic guigó i serra IMIM/UPF/CRG.
Genome Biology and Biotechnology
Genomics Chapter 18.
1 MAVID: Constrained Ancestral Alignment of Multiple Sequence Author: Nicholas Bray and Lior Pachter.
Comparative Genomics I: Tools for comparative genomics
Human Genome Resources Chiki Gupta November 21 st, 2005 Biophysics 101.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Accessing and visualizing genomics data
A high-resolution map of human evolutionary constraints using 29 mammals Kerstin Lindblad-Toh et al Presentation by Robert Lewis and Kaylee Wells.
Schematic of Eukaryotic Protein-Coding Locus
Accelerating positional cloning in mice using ancestral haplotype patterns Mark Daly Whitehead Institute for Biomedical Research.
Kerstin Lindblad-Toh1 et al.
Human Genome Project.
Genetics and Evolutionary Biology
A multi-strain, high-resolution mouse haplotype map reveals three distinctive genetic signatures Laboratory of Population Genetics.
Genomes and Their Evolution
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Ultraconserved Elements in the Human Genome
Gene Density and Noncoding DNA
Chapter 6 Clusters and Repeats.
Human Genome Project Seminal achievement. Scientific milestone.
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Kerstin Lindblad-Toh Whitehead/MIT Center for Genome Research Michael Kamal Broad/MIT Center For Genome Reseach

A First Look at the Mouse Genome Preliminary mouse genome analysis Future directions (briefly) Article available online:

Draft BAC map x 6.5 x shotgun coverage x Genome Assembly x Finished sequence BAC-based coverage X Finishing Whitehead Institute Washington University St Louis Sanger Institute EBI Mouse Genome Sequencing Consortium C57BL/6J Female

-41 M reads -2 and 4 kb plasmids (90%) -10 kb plasmids (5%) -40 kb fosmids (5%) -155 kb and 200 kb BACs (RPCI-23 & 24) -WI 54% of reads Mouse Genome Sequencing Consortium

Assembly: 88 ultracontigs, covers 96% of genome Contig: 25 kb Super: 17 Mb Ultra: 50 Mb

Regions of conserved synteny: ~95% of genome Extremely high conservation: 560,000 anchors

Regions of conserved synteny: ~95% of genome

Autosomes Chromosome X Genome size: Mouse < Human (2.5 vs 2.9 Gb) Expansion ratio (M/H)

Genome size: Mouse < Human (2.5 vs 2.9 Gb) 46% 37% 400 Mb Total Transposon-derived Repeat Human Mouse Less Transposon Activity in Mouse Lineage? 100 Mb Ancestral RepeatLineage-Specific Repeat Human Mouse No!!!! More Transposon Activity More deletion in mouse

Transposons: Accumulate in same regions

GC-content: human larger tails than mouse

Protein-coding gene count falling (<30,000) Mouse-Human Comparison ~ 99% have homologs (maybe 100%) ~ 96% have homolog in region of conserved synteny ~ 80% have 1:1-ortholog ~22,500 evidence-based gene predictions

Gene family expansions: reproduction, immunity 25 mouse-specific gene family cluster expansions 14 reproduction 5 host defense, immunity

Exons Non-exons 75% 90% Large conserved elements (>100 bp) Large conserved elements: Coding, Non-coding PPAR 

How much of the genome is under selection? Extremely high conservation: 560,000 anchors Less than half are coding exons (~220,000)

Nucleotide-level alignment: ~40% of genomes WHYT Why so much? Given neutral substitution rate between mouse-human: Vast majority of truly orthologous sequence can be aligned! Alignable does NOT imply Functional

Nucleotide-level alignment: ~40% of genomes WHYT Suppose: Ancestral genome ~2.9 Gb New transposons are offset by deletion Ancestral genome remaining: in human = 73% in mouse = 57% in both = 73% x 57% = 42% Why so little?

Neutral substitution rate: ~0.46 per site Mouse Human Mouse 2x faster over 75 Myr Substitutions in Ancestral Repeats roughly normal distribution

Neutral substitution rate: ~0.46 per site Introns Coding exons 5’-UTR 3’-UTR Upstream Downstream CpG Islands Known Regulatory

Proportion of genome under selection: ~5% Neutral sequence: Ancestral repeat Whole genome: Alignable portion Excess Conservation Coding Exons only ~1.5% What is the rest? UTR, Regulatory Elements, RNA genes, Structural Elements?

TNFα enhancer Conserved RefSeq Genscan Human Mouse ACCGCTTCCTCCACATGAGATCATGGTTTTCTCCACCAAGGAAGTTTTCCGAGGGTTGAATGAGAGCTTTTCCCCGCCC ||||||||||||| ||||| |||||| |||||||||||||||||||||||| |||||||||| ||||||||||| ACCGCTTCCTCCAGATGAGCTCATGGGTTTCTCCACCAAGGAAGTTTTCCGCTGGTTGAATGA--TTCTTTCCCCGCCC ******* ******** ********** ****** ****** ****** ******** NFat/Ets CRE k3-Nfat Ets Nfat AP1 SP1

Genome evolving at non-uniform rate

Mouse Genome summary 2.5Gb in size (smaller than human, due to deletion) More lineage-specific repeats 99% with homologs in human) Evolves 2x faster than human 95% of genome in blocks of conserved synteny 5% under selection (1.5% coding, the rest is unknown) Large haplotype blocks of domesticus or musculus ancestry in inbred strains

Implications of mouse sequence Cloning of Classical mutations New Mutagenesis programs Identification of Quantitative Trait Loci (QTLs) Engineering Knock-outs, Knock-ins BAC transgenics Modeling human disease Understanding gene regulation

Future direction Finish mouse Genome Sequence more mammals (dog, chimp, marsupial) “Genomic accounting” Identify regulatory elements Mouse haplotype map

Genomic Alignments for Multiple Species Sequence more mammals (dog, chimp, marsupial) “Genomic accounting” Identify regulatory elements Mouse haplotype map …. integrated with gene expression analysis

Acknowledgement Whitehead Institute Kerstin Linblad-Toh Michael C. Zody David Jaffe Claire Wade Mark Daly Jade Vinson Elinor Karlsson EJ Kulbokas Nicole Stange-Thomann Rob Nicol Tim Holzer Toby Bloom Jill Mesirov Chad Nusbaum Bruce Birren Eric Lander Washington University John McPherson Bob Waterston Sanger Institute Jim Mullikin Jane Rogers Analysis Group David Haussler Jim Kent Arian Smit Chris Pontig Webb Miller Ross Hardison Laura Elnitsky Inna Dubchak Lior Pachter Sean Eddy Michael Brent Roderic Guigo Wayne Frankel Carol Bult Ensembl Ewan Birney Mouse Liaison group University of Oklahoma Albert Einstein/Harvard NIH ISC TIGR CHORI

Mouse Genome: SNPs: