BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh@bath.ac.uk)
Applications genome projects BB30055: Genomes - MVH 3 broad areas Genomes Applications genome projects (C) Genome evolution
Why sequence the genome? 3 main reasons description of sequence of every gene valuable. Includes regulatory regions which help in understanding not only the molecular activities of the cell but also ways in which they are controlled. identify & characterise important inheritable disease genes or bacterial genes (for industrial use) Role of intergenic sequences e.g. satellites, intronic regions etc
History of Human Genome Project (HGP) 1953 – DNA structure (Watson & Crick) 1972 – Recombinant DNA (Paul Berg) 1977 – DNA sequencing (Maxam, Gilbert and Sanger) 1985 – PCR technology (Kary Mullis) 1986 – automated sequencing (Leroy Hood & Lloyd Smith 1988 – IHGSC established (NIH, DOE) Watson leads 1990 – IHGSC scaled up, BLAST published (Lipman+Myers) 1992 – Watson quits, Venter sets up TIGR 1993 – F Collins heads IHGSC, Sanger Centre (Sulston) 1995 – cDNA microarray 1998 – Celera genomics (J Craig Venter) 2001 – Working draft of human genome sequence published 2003 – Finished sequence announced
Human Genome Project (HGP) Goal: Obtain the entire DNA sequence of human genome Players: International Human Genome Sequence Consortium (IHGSC) - public funding, free access to all, started earlier - used mapping overlapping clones method (B) Celera Genomics – private funding - used whole genome shotgun strategy
Whose genome is it anyway? International Human Genome Sequence Consortium (IHGSC) - composite from several different people generated from 10-20 primary samples taken from numerous anonymous donors across racial and ethnic groups (B) Celera Genomics – 5 different donors (one of whom was J Craig Venter himself !!!)
sequencing genomes Mapping phase Sequencing phase
Strategies for sequencing the human genome IHGSC Celera
Result…. ~30 - 40,000 protein-coding genes estimated based on known genes and predictions IHGSC Celera definite genes 24,500 26,383 possible genes 5000 12,000
Other genomes sequenced Sept 2003 18,473 human orthologs 1997 4,200 genes 1998 19,099 genes June 2006 Sept 2007 diploid genome ‘HuRef’ 2002 38,000 genes 2002 36,000 genes Science (26 Sep 2003)Vol301(5641)pp1854-1855
Genomics: World's smallest genome the smallest genome known is the DNA of a 'nucleomorph' of Bigelowiella natans, a single-celled algae of the group known as chlorarachniophytes. 373,000 base pairs and a mere 331 genes The nucleomorph is an evolutionary vestige that was originally the nucleus of a eukaryotic cell. The eukaryotic cell swallowed a cyanobacterium to acquire a photosynthetic 'plastid' organelle, and that cell was in turn engulfed by another cell to produce B. natans as we know it. Now, most of the nucleomorph's genome is concerned with its own maintenance, and just 17 of its genes still exert any control over the plastid. Its small size suggests it is heading for evolutionary oblivion. Proc. Natl Acad. Sci. USA 103, 9566–9571 (2006) by G McFadden, University of Melbourne, Australia
Organisation of human genome Mitochondrial genome Nuclear genome (3.2 Gbp) 24 types of chromosomes Y- 51Mb and chr1 -279Mbp http://www.ncbi.nlm.nih.gov/Genomes/
General organisation of human genome
Basic structure of a gene Figure 21.11 Fig. 21.11
Polypeptide-coding regions
Gene organisation Rare bicistronic transcription units E.g. UBA52 transcription generates ubiquitin and a ribosomal protein S27a
Non polypeptide–coding: RNA encoding 700-800 rRNA and 497 tRNA genes
Class of RNA Example types Function Ribosomal RNA 16,23,18,28S Ribosomal subunits Transfer RNA 22 mitochondrial 49 cytoplasmic mRNA binding Small nuclear RNA(snRNA) U1,U2,U4,U5 etc RNA splicing Small nucleolar RNA (snoRNA) U3,U8 etc rRNA modification and processing microRNA (miRNA) >200 types Regulatory RNA XIST RNA Inactivation of X chromosome Imprinting associated RNA H19 RNA Genomic imprinting Antisense RNA >1500 types Suppression of gene expression Telomerase RNA Telomere formation
General organisation of human genome
Pseudogenes () non functional copies of an active gene. May be either a) Nonprocessed pseudogenes May contain exons, introns & promoters but are inactive due to inappropriate termination codons Arise by gene duplication events usually in gene clusters (e.g. a and b–globin gene clusters)
Pseudogenes in globin gene cluster
Gene fragments or truncated genes Gene fragments: small segments of a gene (e.g. single exon from a multiexon gene) Truncated genes: Short components of functional genes (e.g. 5’ or 3’ end) Thought to arise due to unequal crossover or exchange
b) processed pseudogenes - Thought to arise by genomic insertion of a cDNA as a result of retrotransposition Contributes to overall repetitive elements (<1%) Retrogene – functional copy - testis
processed pseudogenes - Retrogene – functional copy - testis
General organisation of human genome
Unique or low copy number sequences Non –coding, non repetitive and single copy sequences of no known function or significance
General organisation of human genome
Repetitive elements Main classes based on origin Tandem repeats Interspersed repeats Segmental duplications
References Chapter 9 pp 265-268 Chapter 10: pp 351-366 HMG 3 by Strachan and Read Chapter 10: pp 351-366 Genetics from genes to genomes by Hartwell et al (3/e) Nature (2001) 409: pp 879-891 http://www.bath.ac.uk/library/subjects/bs/links.html
http://www.ncbi.nlm.nih.gov/Genomes/