The Human Genome Project and 100 Million Years of Human Evolution David Haussler Center for Biomolecular Science and Engineering University of California, Santa Cruz
The human genome is a recipe for an entire body and brain The genome is organized into 23 pairs of human chromosomes (1-22 and the pair X,Y or X,X) Each chromosome consists of DNA – molecular string of A, C, G, & T (bases), 3 billion in all All cells in the body have the same DNA that was in the original fertilized egg Genes are DNA sequence that codes for proteins (only about 1.5% of human genome) The reference human genome sequence is a computer file of about 3 billon bases
To what extent does a person’s genome define them?
On July 7, 2000, UCSC posted human genome on the web Outgoing UCSC internet traffic for year 2000
The UCSC Genome Browser: a new kind of web-based genome microscope Data from all over the world are fed into nightly updates of the UCSC browser database, analysis, and display Every day, more than 7,000 biomedical researchers use it to scan the genome at ever greater detail, dimension, and depth, making more than 300,000 web page requests Explore the genome at http://genome.ucsc.edu UCSC Genome Bioinformatics Group
Large-scale Operations in Genome Evolution Zack Sanborn
Example: evolutionary history of a mammalian chromosome Roughly 200 segments, we also have 1,000 and 50,000. History of rat chromosome X Jian Ma, Bernard Suh, Brian Raney
Example: evolutionary history of a mammalian chromosome Roughly 200 segments, we also have 1,000 and 50,000. History of rat chromosome X Jian Ma, Bernard Suh, Brian Raney
Morpheus: new genes by segmental duplication Figure 2: Human vs. Baboon Duplication Architecture. The organization of five LCR16 (low copy repeats on chromosome 16) segmental duplications is compared between human and baboon. In humans, the duplications range in size from 19-75 kb, are 97-99.5% identical and are distributed in different permutations to 27 different map positions shown along the ideogram. In baboons (and other Old World monkeys), the corresponding segments are not duplicated and map to a single locus. The data suggest a dramatic expansion of segmental duplications during hominoid evolution on this chromosome. Note: The LCR16a duplication (red) contains a novel gene family (morpheus) that shows positive selection only in humans and the great-apes. Baboon chromosome Human chromosome Evan Eichler
The demise of a gene Jing Zhu, Zack Sanborn, Craig Lowe This was the last protein with the Pfam domain Acyl_transf_3 Interpro: IPR002656 Acyltransferase 3 GO:transferase activity, transferring groups other than amino-acyl groups There are 925 protein in interpro with this domain, covering all branches of life. Mouse has 2. Human and chimp have none. Codon TGG for amino acid tryptophan became a stop codon in this gene before the human-chimp ancestor, killing the gene. Proteins of this type (acyltransferase 3) appear in all branches of life; this was the last in the hominid genome. Jing Zhu, Zack Sanborn, Craig Lowe 11
Project to reconstruct the evolutionary history of the genomes of placental mammals 12 key nodes in human evolution (+neanderthal and human itself makes 14) Data from NHGRI Comparative Genome Sequencing Program
12 key nodes in human evolution (+neanderthal and human itself makes 14) Homo sapiens sapiens
Homo sapiens neanderthalensis 12 key nodes in human evolution (+neanderthal and human itself makes 14) Neanderthal picture:www.neanderthal-man.com/ Homo sapiens
chimpanzee (Pan troglodytes) Homo/Pan 12 key nodes in human evolution (+neanderthal and human itself makes 14) Chimp picture: www.goodcommitment.tv/category/blog/ Homo/Pan
Gorilla Homo/Pan/Gorilla 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://safariweb.com/images/gorilla.jpg Mountain gorilla Homo/Pan/Gorilla
orangutan Hominidae (great apes) 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://www.african-safari-journals.com/image-files/orangutan-pictures.jpg Hominidae (great apes)
gibbon Homonoidae (apes) 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://www.ms-starship.com/journal/aug00/images/GIBBON-md34P52083.jpg
rhesus macaque Catarrhini (old world monkeys and apes) 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://www.aaas.org/news/releases/2007/images/0416macaque_lone.jpg Rhesus macaque
marmoset Anthropoidea 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://www.dkimages.com/discover/previews/752/261371.JPG Common marmoset (Callithrix jacchus) Anthropoidea
tarsier 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://dsc.discovery.com/convergence/quest/borneo/field-guide/gallery/guide_tarsier.jpg Haplorhines
bushbaby 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://lemur.duke.edu/animals/lesserbushbaby/general.jpg Lesser bushbaby, Galago moholi Primates
pygmy tree shrew Eurachonta 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://hoglezoo.org/animal.photos/pygmy.tree.shrew1.jpg
(Mus musculus “genomicus”) mouse (Mus musculus “genomicus”) 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://www.sanger.ac.uk/Info/Press/gfx/021205_mouse_300.jpg Mus musculus genomicus !! Euarchontoglires
common shrew Boreoeutheria 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://www.naturephoto-cz.com/photos/andera/common-shrew-3806.jpg Common shrew Sorex araneus
elephant shrew Eutheria (placental mammals) 12 key nodes in human evolution (+neanderthal and human itself makes 14) http://www.wordinfo.info/words/images/shrew-elephant.gif Macroscelidea
Not all descendants of the eutherian ancestor are shrew-like Tursiops truncates http://www.farallones.org/e_newsletter/2006-08/images/bottlenose_jumping.jpg http://mariewin.server304.com/marieblog/uploaded_images/BROWN%20BAT-768944.jpg http://www.elephantcountryweb.com/AfricanElephant111.jpg Not all descendants of the eutherian ancestor are shrew-like
We found 49 genomic regions that showed extremely accelerated evolution in humans Katie Pollard and Sofie Salama Human Accelerated Region 1
Computational prediction of structure conserved throughout amniotes HAR1 produces a structured RNA sequence that is expressed in the fetal brain New interactions in the human version of this gene Computational prediction of structure conserved throughout amniotes Jakob Pedersen
The six layers of the cerebral cortex are built during fetal brain development During development, the cerebral cortex is built “inside-out” by neurons migrating radially from the subventricular zone to the pial surface. This process is guided by the neurodevelopmental gene Reelin. Human embryo looks human at this stage (roughly Carnegie stage 23), but is only about 1 inch long Image: www.thebrain.mcgill.ca
HAR1 is expressed in the same cells as Reelin (the Cajal-Retzius neurons), and during the same period of development (8-20 GW) At 19GW, HAR1F and reelin show the same pattern in the MZ/SGL (arrows), and HAR1F remains expressed in the upper CP (arrowheads), whereas the HAR1F sense probe shows no signal. Lower panels centred on the MZ illustrate double in situ hybridization/ immunohistochemistry experiments. Lower left and middle panels show that HAR1F (in dark blue) and reelin (in dark brown) are co-localized in most cells, corresponding to Cajal–Retzius neurons (black arrows), although some reelin-positive cells do not express HAR1F (brown arrows). The lower right panel illustrates that HAR1F cells (blue arrows) seem to be mainly distinct from interneurons (brown arrows) expressing vesicular GABA transporter (vGAT). Scale bar: 250 microns. Nelle Lambert, Marie-Alexandra Lambot, Sandra Coppens, Pierre Vanderhaeghen
We are pursuing the hypothesis that HAR1 functions in cortical development and was involved in the evolution of the human brain Optional movie: Method: peroxidase-driven tyramide signal amplification (TSA). You directly conjugate horseradish peroxidase (HRP) to (RNA? Or oligodeoxynucleotides (ODNs) and use these probes to study the transcript of interest. This movie shows takes you on a tour from the bottom to top of nuclei from a group of mouse neuroblastoma cells, Neuro-2A. These have been transfected with a vector that encoded the predicted full length HAR1 transcript. We are visualizing the HAR1 using a FISH probe against the structured region of the molecule. To our surprise, though the HAR1 RNA is a capped, spliced and polyadenylated transcript, we do not see it in the cytoplasm. Instead, it collects in individual spots in the nucleus. This is especially evident in the cell at the center of the movie which has the highest expression. We are currently performing experiments to co-localize these bodies with known nuclear subcompartments and also figure out whether HAR1 gets stuck in the nucleus or has already left the nucleus after splicing and has been re-imported ... a biogenesis pathway similar to snRNAs.
Grand challenge of human molecular evolution Reconstruct the evolutionary history of each base in the human genome Discover functional elements of the genome Find the human evolutionary innovations Map the important human genetic variation Map the genome adaptations in individual cancer tumors that make them dangerous
Katie Pollard and Gill Bejerano The UCSC Team Katie Pollard and Gill Bejerano Jim Kent Sofie Salama Adam Siepel
Extended Credits Thanks to Jim Kent, Sofie Salama, Gill Bejerano*, Katie Pollard*, Adam Siepel*, Robert Baertsch, Galt Barber, Hiram Clawson, Mark Diekhans, Jorge Garcia, Rachel Harte, Angie Hinrichs, Fan Hsu, Donna Karolchik, Sol Katzman, Andy Kern, Bryan King, Robert Kuhn, Victoria Lin, Andre Love, Craig Lowe, Yontao Lu, Jian Ma, Chester Manuel, Courtney Onodera, Jakob Pedersen, Andy Pohl, Brian Raney, Brooke Rhead, Kate Rosenbloom, Krishna Roskin, Zack Sanborn, Kayla Smith, Mario Stanke, Bernard Suh, Paul Tatarsky, Archana Thakkapallayil, Daryl Thomas, Heather Trumbower, Jason Underwood, Ting Wang, Erich Weiler, Chen-Hsiang Yeang, Jing Zhu, and Ann Zweig, in my group at UCSC And to Webb Miller, Nadav Ahituv, Manny Ares, Mathieu Blanchette, Rico Burhans, Michele Clamp, Richard Gibbs, Eric Green, Haller Igel, John Karro, Eric Lander, Kerstin Lindblad-Toh, Jim Mullikin, Tom Pringle, Eddy Rubin, Armen Shamamian, Pierre Vanderhaeghen, and many other outside collaborators
Single nucleotide polymorphisms (SNPS) When we compare the genomes of many people, we see ~3 million variable bases (SNPs). That is one every 1000 bases. Each SNP is a change that happened only once. The more ancient the SNP, the more common – most SNPs come from before the time of a population bottleneck about 100,000 years ago, before our ancestors migrated out of Africa. Each of your kids has about 175 new DNA changes, but nearly all changes are lost within 20 generations. SNPs inherited together with no recombination form “haplotype blocks”.
Polymorphism Data is Used to Help Locate Disease Genes With new genotyping technology, there has been a revolution in our ability to discover disease-related genes. New discoveries have been made for diabetes, cancer, cardiovascular disease, auto immune diseases, and neurological diseases. The ability to interactively explore the genome on the web is accelerating biomedical research and will eventually help us to better diagnose and cure disease. Collins: SIFTER project stores linkage and association results mapped to golden path coordinates. Custom tracks containing markers they have genotyped or plan to genotype, And other info delineating the boundaries of the region that is likely to contain the disease gene. pchines@nhgri.nih.gov. (unpublished)
Genomes and the Central Dogma of Molecular Biology
The Tree of Life DNA -> DNA (molecular evolution) DNA -> RNA -> protein (molecular cell biology)
Neutral drift: a genetic change that does not affect the organism Note: alanine (A) is 4 fold degenerate, so all 3rd position changes there are presumably neutral. Same with threonine change in mouse. Mutations occur all the time in protein-coding regions; some do not change the protein, so do not affect the fitness of the organism Changing the third DNA base in this codon does not change the amino acid it encodes, alanine (A) Browser: Kent et al; conservation track: Siepel and Rosenbloom
Negative selection: rejecting a change that decreases fitness Some mutations would change the protein and thereby reduce fitness Such changes are rejected by natural selection, and the DNA is conserved Browser: Kent et al; conservation track: Siepel and Rosenbloom
Positive selection: a genetic change that increases fitness OMIM 605317: one of these two changes in exon 7 creates a potential phosphorylation site for protein kinease C and a minor change in secondary structure (not sure which one). Enard, W.; Przeworski, M.; Fisher, S. E.; Lai, C. S. L.; Wiebe, V.; Kitano, T.; Monaco, A. P.; Paabo, S. : Molecular evolution of FOXP2, a gene involved in speech and language. Nature 418: 869-872, 2002. PubMed ID : 12192408 Some mutations have a positive effect: This change from C to A in the gene FOXP2 changed the amino acid from threonine (T) to asparagine (N) , which may have improved fitness Possible role in the evolution of speech Browser: Kent et al; conservation track: Siepel and Rosenbloom; FOXP2 results: Enard et al, Nature, 2002