The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME:440.714 J. Pevsner

Slides:



Advertisements
Similar presentations
The Human Genome Project Main reference: Nature (2001) 409,
Advertisements

Introduction to genomes & genome browsers
The Organization of Cellular Genomes Complexity of Genomes Chromosomes and Chromatin Sequences of Genomes Bioinformatics As we have discussed for the last.
© Wiley Publishing All Rights Reserved. Using Nucleotide Sequence Databases.
Human Genome Project What did they do? Why did they do it? What will it mean for humankind? Animation OverviewAnimation Overview - Click.
Peter Tsai, Bioinformatics Institute.  University of California, Santa Cruz (UCSC)  A rapid and reliable display of any requested portion of genomes.
Genome Browsers Carsten O. Daub Omics Science Center RIKEN, Japan May 2008.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Background About the Pufferfish: Fugu is a teleost fish belonging to the order Tetraodontiformes. Fugu rubripes, an eukaryota and vertebrate, more commonly.
Copyright OpenHelix. No use or reproduction without express written consent1 Organization of genomic data… Genome backbone: base position number sequence.
ECE 501 Introduction to BME
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Genome Browsers Ensembl (EBI, UK) and UCSC (Santa Cruz, California)
How to access genomic information using Ensembl August 2005.
Genome Browsers UCSC (Santa Cruz, California) and Ensembl (EBI, UK)
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
The Human Genome The International Human Genome Consortium Initial sequencing and analysis of the human genome Nature, 409, February 15, (2001)
Goals of the Human Genome Project determine the entire sequence of human DNA identify all the genes in human DNA store this information in databases improve.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Doug Brutlag 2011 Genome Databases Doug Brutlag Professor Emeritus of Biochemistry & Medicine Stanford University School of Medicine Genomics, Bioinformatics.
Doug Brutlag Professor Emeritus Biochemistry & Medicine (by courtesy) Genome Databases Computational Molecular Biology Biochem 218 – BioMedical Informatics.
Genome projects and model organisms Level 3 Molecular Evolution and Bioinformatics Jim Provan.
What is genomics? Study of genomes. What is the genome? Entire genetic compliment of an organism.
Lesson 10 Bioinformatics
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Genomics BIT 220 Chapter 21.
Copyright OpenHelix. No use or reproduction without express written consent 2 Overview of Genome Browsers Materials prepared by Warren C. Lathe, Ph.D.
UCSC Genome Browser 1. The Progress 2 Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
CS177 Lecture 10 SNPs and Human Genetic Variation
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Chapter 21 Eukaryotic Genome Sequences
1 Genome Evolution Chapter Introduction Genomes contain the raw material for evolution; Comparing whole genomes enhances – Our ability to understand.
Professional Development Course 1 – Molecular Medicine Genome Biology June 12, 2012 Ansuman Chattopadhyay, PhD Head, Molecular Biology Information Services.
Lecture 10 Genes, genomes and chromosomes
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Genomics Chapter 18.
How many genes are there?
Tools in Bioinformatics Genome Browsers. Retrieving genomic information Previous lesson(s): annotation-based perspective of search/data Today: genomic-based.
Accessing and visualizing genomics data
BIOL 433 Plant Genetics Term 2, Instructors: Dr. George Haughn Dr. Ljerka Kunst BioSciences 2239BioSciences Tel
Genomes at NCBI. Database and Tool Explosion : 230 databases and tools 1996 : first annual compilation of databases and tools lists 57 databases.
1 From Bi 150 Lecture 0 October 4, 2012 An introduction to molecular biology... but you will learn the cell biology in this course.
Using public resources to understand associations Dr Luke Jostins Wellcome Trust Advanced Courses; Genomic Epidemiology in Africa, 21 st – 26 th June 2015.
Eukaryotic genes are interrupted by large introns. In eukaryotes, repeated sequences characterize great amounts of noncoding DNA. Bacteria have compact.
Lecture/Lab 7.31
BB30055: Genes and genomes Major insights from the HGP.
Introduction to Bioinformatics
Week-6: Genomics Browsers
Introduction to Bioinformatics and Functional Genomics
BIOL 433 Plant Genetics Term 2,
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Genomes and Their Evolution
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Genomes and Their Evolution
#34 - Comparative Genomics
Evolution of eukaryote genomes
with the Ensembl Genome Browser
The Release 5.1 Annotation of Drosophila melanogaster Heterochromatin
Chapter 6 Genome Sequences and Gene Numbers
BIOL 433 Plant Genetics Term 2,
Evolution of Genomes Chapter 21.
Gene Safari (Biological Databases)
Human Genome Project Seminal achievement. Scientific milestone.
Presentation transcript:

The Human Genome (part 1 of 2) Wednesday, November 5, 2003 Introduction to Bioinformatics ME: J. Pevsner

Many of the images in this powerpoint presentation are from Bioinformatics and Functional Genomics by J Pevsner (ISBN ). Copyright © 2003 by Wiley. These images and materials may not be used without permission from the publisher. Visit Copyright notice

Today: Human genome Friday Nov. 7: computer lab Monday Nov. 10: Human disease (West Lecture Hall) Wednesday Nov. 12: Final exam (in class); find-a-gene project due Announcements

Final exam on November 12 Format: -- closed book -- one hour, in-class (ok to take longer) -- to practice, do the self-test quizzes at the ends of chapters Some of the questions will be based on the recent article on human chromosome 6: Mungall AJ et al., The DNA sequence and analysis of human chromosome 6. Nature 425, , 23 October See also the accompanying News & Views: Grimwood J and Schmutz J, Six is seventh, Nature 425, , 23 October 2003.

Outline of today’s lecture 1.Summary of major findings of Human Genome Project 2. Web resources for the human genome 3.We will follow the outline of the February 2001 Nature paper describing the human genome. Page 607

Main conclusions of the human genome project Page 608

Main web sites for the human genome Genome Hub National Human Genome Research Institute (NHGRI) NCBI Genome Central Ensembl Page 608

1.There are about 30,000 to 40,000 human genes. This number is far smaller than earlier estimates. Page 608 Main conclusions of human genome project

1.There are about 30,000 to 40,000 human genes. This number is far smaller than earlier estimates. The public consortium estimated 31,000, while Celera estimated 38,500. But note: Many predicted genes are unique to each group There are many transcripts of unknown function Current estimates (2003) are ~30,000 genes. Page 608 Main conclusions of human genome project

Page 608 Main conclusions of human genome project 1. We have about the same number of genes as fish and plants, and not that many more genes than worms and flies.

1. We have about the same number of genes as fish and plants, and not that many more genes than worms and flies. Fugu rubripes (pufferfish): 31,000 to 38,000 Arabidopsis thaliana (thale cress): 26,000 Caenorhabditis elegans (worm): 19,000 Drosophila melanogaster (fly): 13,000 Page 608 Main conclusions of human genome project

2. The human proteome is far more complex than the set of proteins encoded by invertebrate genomes. Page 608 Main conclusions of human genome project

2. The human proteome is far more complex than the set of proteins encoded by invertebrate genomes. Vertebrates have a more complex mixture of protein domain architectures. Additionally, the human genome displays greater complexity in its processing of mRNA transcripts by alternative splicing. Page 608 Main conclusions of human genome project

Page 608 Main conclusions of human genome project 3. Hundreds of human genes were acquired from bacteria by lateral gene transfer, according to the initial report.

3. Hundreds of human genes were acquired from bacteria by lateral gene transfer, according to the initial report. Evidence: compare the proteomes of human, fly, worm, yeast, Arabidopsis, eukaryotic parasites, and all completed prokaryotic genomes. Find some genes shared exclusively by humans and bacteria—but according to TIGR, only about 40 of these genes (or fewer?) were acquired by LGT. (See Salzberg et al., Science 292:1903, 2001). Reasons for artifactually high estimates include: -- gene loss -- small sample size of species Page 608 Main conclusions of human genome project

4. 98% of the genome does not code for genes Page 608 Main conclusions of human genome project

4. 98% of the genome does not code for genes >50% of the genome consists of repetitive DNA derived from transposable elements (also called interspersed repeats): LINEs (20%) SINEs (13%) LTR retrotransposons (8%) DNA transposons (3%) Page 608 Main conclusions of human genome project

4. 98% of the genome does not code for genes >50% of the genome consists of repetitive DNA derived from transposable elements: LINEs (20%) SINEs (13%) LTR retrotransposons (8%) DNA transposons (3%) There has been a decline in activity of some of these elements in the human lineage. Page 608 Main conclusions of human genome project

5. Segmental duplication is a frequent occurrence in the human genome. -- tandem duplications (rare) -- retrotransposition (intronless paralogs) -- segmental duplications (common) Page 608 Main conclusions of human genome project

6. There are 300,000 Alu repeats in the human genome. These are about 300 base pairs and contain an AluI restriction enzyme site. They occupy 3% of the genome. We saw an example of an Alu repeat in Chapter 16. Their distribution is non-random: they are retained in GC-rich regions and may confer some benefit. Page 608 Main conclusions of human genome project

7. The mutation rate is about twice as high in male meiosis than female meiosis. Most mutation probably occurs in males. Page 609 Main conclusions of human genome project

8. More than 1.4 million single nucleotide polymorphisms (SNPs; single base pair changes) were identified. Celera initially identified 2.1 million SNPs. Currently, dbSNP at NCBI (build 118) has about 5.8 million human SNPs (2.4 million validated). A SNP occurs every 100 to 300 base pairs. A random pair of haploid genomes differs at a rate of 1 base pair every 1250, on average (Celera). Fewer than 1% of SNPs alter protein sequence. Page 609 Main conclusions of human genome project

Three gateways to access the human genome Page 608

Three gateways to access the human genome NCBI map viewer Ensembl Project (EBI/Sanger Institute) UCSC (Golden Path) Page 609

Three gateways to access the human genome NCBI map viewer Ensembl Project (EBI/Sanger Institute) UCSC (Golden Path) Each of these three sites provides essential resources to study the human genome (and other genomes)

Fig Page 610 NCBI offers a human map viewer

Fig Page 611 Map viewer: RBP4 on chromosome 10 Click to customize the tracks on this map

LocusLink DNA (contig) OMIM Sequence viewer protein evidence viewer Model maker HomoloGene Confirmed gene model orientation

Fig Page 613 NCBI’s evidence viewer provides data on gene models (e.g. mapping ESTs to genomic DNA)

Fig Page 613 NCBI evidence viewer: gene structures

Fig Page 613 NCBI evidence viewer: gene structures Evidence for a discrepancy (e.g. sequencing error or polymorphism)

The Ensembl project currently includes genome browsers for nine organisms: Humanmousezebrafish Fugumosquitofruitfly C. elegans C. briggsaerat Visit Ensembl Page 610

Fig Page 614 Ensembl human genome browser

Fig Page 615 Ensembl: GeneView for RBP4

Fig Page 616 Ensembl: GeneView for RBP4

Fig Page 617 Ensembl human genome browser: ContigView

Fig Page 617 Ensembl human genome browser: ContigView

Fig Page 618 Ensembl human genome browser: TransView

Fig Page 619 Ensembl: ProteinView for RBP4

Fig Page 620 Ensembl: MapView for chromosome 10

Fig Page 621 Ensembl: SyntenyView for chromosome 10

The University of California at Santa Cruz (UCSC) offers a genome browser with the “golden path” annotation of the human genome. The browser features searches by keyword, gene name, or other text searches. UCSC offers the lightning fast BLAT BLAST-like tool (see Chapter 5). A key feature of this browser is its customizable annotation tracks. About half of these tracks are offered by users of the site throughout the world. Visit The UCSC human genome browser Page 614

Fig Page 622

Fig Page 623

This lecture continues with part 2 of 2…