BB30055: Genes and genomes Major insights from the HGP.

Slides:



Advertisements
Similar presentations
The Human Genome Project Main reference: Nature (2001) 409,
Advertisements

Repetitive elements Evolutionary ‘signposts’
Genomics – The Language of DNA Honors Genetics 2006.
Introduction to genomes & genome browsers
Major insights from the HGP on Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & )Gene content 2)Proteome content 3)SNP identification.
Duplication, rearrangement, and mutation of DNA contribute to genome evolution Chapter 21, Section 5.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
Introduction to Linkage Analysis March Stages of Genetic Mapping Are there genes influencing this trait? Epidemiological studies Where are those.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College Medical Genomics Course – Debrecen, Hungary, May 2006.
02_13.jpg Human chromosome 4 02_15.jpg 02_15_2.jpg.
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
Human Genome Project Seminal achievement. Scientific milestone. Scientific implications. Social implications.
Active Lecture Questions for BIOLOGY, Eighth Edition Neil Campbell & Jane Reece Questions prepared by Jung Choi, Georgia Institute of Technology Copyright.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Large-Scale Copy Number Polymorphism in the Human Genome J. Sebat et al. Science, 305:525 Luana Ávila MedG 505 Feb. 24 th /24.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
1 Genetic Variability. 2 A population is monomorphic at a locus if there exists only one allele at the locus. A population is polymorphic at a locus if.
HAPLOID GENOME SIZES (DNA PER HAPLOID CELL) Size rangeExample speciesEx. Size BACTERIA1-10 Mb E. coli: Mb FUNGI10-40 Mb S. cerevisiae 13 Mb INSECTS.
- any detectable change in DNA sequence eg. errors in DNA replication/repair - inherited ones of interest in evolutionary studies Deleterious - will be.
Ultraconserved Elements in the Human Genome Bejerano, G., et.al. Katie Allen & Megan Mosher.
Chapter 5 Genome Sequences and Gene Numbers. 5.1Introduction  Genome size vary from approximately 470 genes for Mycoplasma genitalium to 25,000 for human.
Selfish DNA Honors Genetics.
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
Genome Organization and Evolution. Assignment For 2/24/04 Read: Lesk, Chapter 2 Exercises 2.1, 2.5, 2.7, p 110 Problem 2.2, p 112 Weblems 2.4, 2.7, pp.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Gene & Genome Evolution1 Chapter 9 You will not be responsible for: Read the How We Know section on Counting Genes, and be able to discuss methodologies.
GenomesGenomes Chapter 21 Genomes Sequencing of DNA Human Genome Project countries 20 research centers.
Biology 101 DNA: elegant simplicity A molecule consisting of two strands that wrap around each other to form a “twisted ladder” shape, with the.
CS177 Lecture 10 SNPs and Human Genetic Variation
SNP Haplotypes as Diagnostic Markers Shrish Tiwari CCMB, Hyderabad.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Chapter 21 Eukaryotic Genome Sequences
Click to edit Master title style Click to edit Master subtitle style CLICKER QUESTIONS For CAMPBELL BIOLOGY, NINTH EDITION Jane B. Reece, Lisa A. Urry,
National Taiwan University Department of Computer Science and Information Engineering Pattern Identification in a Haplotype Block * Kun-Mao Chao Department.
BB30055: Genes and genomes Genomes - Dr. MV Hejmadi Lecture 2 – Repeat elements.
Chapter 5 The Content of the Genome 5.1 Introduction genome – The complete set of sequences in the genetic material of an organism. –It includes the.
BB30055: Genes and genomes Genomes - Dr. MV Hejmadi (bssmvh)
ABC for the AEA Basic biological concepts for genetic epidemiology Martin Kennedy Department of Pathology Christchurch School of Medicine.
Lecture 6. Functional Genomics: DNA microarrays and re-sequencing individual genomes by hybridization.
Lecture 10 Genes, genomes and chromosomes
Lecture 2 – Repeat elements
February 20, 2002 UD, Newark, DE SNPs, Haplotypes, Alleles.
Chapter 2 From Genes to Genomes. 2.1 Introduction We can think about mapping genes and genomes at several levels of resolution: A genetic (or linkage)
MPL The DNA Sequence of chimpanzee chromosome 22 and comparative analysis with its human ortholog, chromosome 21 Bioinformatics Dae-Soo Kim.
Single nucleotide polymorphisms and Large scale variation
Genomics Chapter 18.
The Secret of Life! DNA. 2/4/20162 SOMETHING HAPPENS GENE PROTEIN.
Evolutionary Genome Biology Gabor T. Marth, D.Sc. Department of Biology, Boston College
‘mobile’ DNA or ‘jumping’ DNA Transposable elements as drivers of evolution.
Published primate genome sequences - I Published primate genome sequences - II.
The Haplotype Blocks Problems Wu Ling-Yun
 DNA- genetic material of eukaryotes.  Are highly variable in size and complexity.  About 3.3 billion bp in humans.  Complexity- due to non coding.
Objective: I can explain how genes jumping between chromosomes can lead to evolution. Chapter 21; Sections ; Pgs Genomes: Connecting.
BB30055: Genes and genomes Major insights from the HGP.
SNP Detection Congtam Pham 2/24/04 Dr. Marth’s Class.
Thursday, March 2, 2017 GOALS: Finish Ghost in your Genes
SGN23 The Organization of the Human Genome
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Evolution of eukaryote genomes
Fig Figure 21.1 What genomic information makes a human or chimpanzee?
Linking Genetic Variation to Important Phenotypes
Organization of the human genome
Chapter 9 Organization of the Human Genome
Lecture 11 LTRs Properties of Chromatin Telomeres.
Gene Density and Noncoding DNA
BB30055: Genes and genomes Major insights from the HGP.
Human Genome Project Seminal achievement. Scientific milestone.
SNPs and CNPs By: David Wendel.
Presentation transcript:

BB30055: Genes and genomes Major insights from the HGP

What makes us human? How does this…. …..become this?

What makes us human? SNPS occur at a mean rate of 1.23% Nature 437, (1 September 2005)

Major insights from the HGP Nature (2001) 15 th Feb Vol 409 special issue; pgs 814 & )Gene size, content and distribution 2)Proteome content 3)SNP identification 4)Distribution of GC content 5)CpG islands 6)Recombination rates 7)Repeat content

1) Gene size

More genes: Twice as many as drosophila / C.elegans Uneven gene distribution: Gene-rich and gene-poor regions More paralogs: some gene families have extended the number of paralogs e.g. olfactory gene family has 1000 genes More alternative transcripts: Increased RNA splice variants produced thereby expanding the primary proteins by 5 fold (e.g. neurexin genes) Gene content….

Gene-poor regions 82 gene deserts identified ? Large or unidentified genes Uneven gene distribution Gene-rich E.g. MHC on chromosome 6 has 60 genes with GC content of 54% What is the functional significance of these variations?

2) Proteome content proteome more complex than invertebrates Protein Domains (sections with identifiable shape/function) Domain arrangements in humans largest total number of domains is 130 largest number of domain types per protein is 9 Mostly identical arrangement of domains

Pr oteome more complex than invertebrates……  no huge difference in domain number in humans  BUT, frequency of domain sharing very high in human proteins (structural proteins and proteins involved in signal transduction and immune function) However, only 3 cases where a combination of 3 domain types shared by human & yeast proteins. e.g carbomyl-phosphate synthase (involved in the first 3 steps of de novo pyrimidine biosynthesis) has 7 domain types, which occurs once in human and yeast but twice in drosophila

3) SNPs (single nucleotide polymorphisms) Densities vary over regions and chromosomes e.g. HLA region has a high SNP density, reflecting maintenance of diverse haplotypes over many MYears Nature (2001) 15 th Feb Vol 409 special issue; pgs & 928  Point mutations in single base pairs  > 1.4million SNPs identified (~ 1 in every 1.9kb length on average)  ~60,000 SNPs lie within exons and untranslated regions (85% of exons lie within 5kb of a SNP)  May or may not affect the ORF (synonymous or non synonymous)  Most SNPs may be regulatory

How does one distinguish sequence errors from polymorphisms? sequence errors Each piece of genome sequenced at least 10 times to reduce error rate (0.01%) Polymorphisms Sequence variation between individuals (0.1%) To be defined as a polymorphism, the altered sequence must be present in a significant population Rate of polymorphisms in diploid human genome is about 1 in 500 bp Nature (2001) 15 th Feb Vol 409 special issue; pgs & 928

– identifying common haplotypes in four populations from different parts of the world. - identifying "tag" SNPs with unique haplotype identities Haplotype( haploid genotype) Haplotype is a set of single nucleotide polymorphisms (SNPs) on a single chromatid that are statistically associated. Haplotypes are generally shared between populations but their frequency can vary International HapMap Project (

Copy number variants (CNVs) challenge SNP concept DNA segment > 1 kb, with a variable copy number compared with a reference genome Variations in the copy number of sequences (>500bp) Caused by insertions/deletions (‘indels’) inversions / translocations NATURE|Vol 447|10 May 2007 pp161

4) Distribution of GC content Genome wide average of 41% Huge regional variations exist E.g.distal 48Mb of chromosome 1p-47% but chromosome 13 has only 36% Confirms cytogenetic staining with G-bands (Giemsa) dark G-bands – low GC content (37%) light G-bands – high GC content (45%) Nature (2001) 15 th Feb Vol 409 special issue; pg

5) CpG islands Significance of CpG islands 1)Non-methylated CpG islands associated with the 5’ ends of genes 2)Usually overlap the promoter region 3)Aberrant methylation of CpG islands linked to pathologies like cancer or epigenetic diseases like Rhett’s syndrome CpG Methyl CpG TpG methylated at C Deamination CpG islands show no methylation CT

CpG islands Greatly under-represented in human genome –~28,890 in number (5 times less than expected) ~ 56% of human genes and 47% of the mouse genes have CpG islands Variable density e.g. Y – 2.9/Mb but 16,17 & 22 have 19-22/Mb Average is 10.5/Mb Nature (2001) 15 th Feb Vol 409 special issue; pg

6) Recombination rates 2 main observations Recombination rate increases with decreasing arm length Recombination rate suppressed near the centromeres and increases towards the distal 20-35Mb

7) Repeat content a)Age distribution b)Comparison with other genomes c)Variation in distribution of repeats d)Distribution by GC content e)Y chromosome Nature (2001) 409: pp

overall decline in interspersed repeat activity in hominid lineage in the past 35-40MYr compared to mouse genome, which shows a younger and more dynamic genome a) Age distribution

Repeat content…….  Most interspersed repeats predate eutherian radiation (confirms the slow rate of clearance of nonfunctional sequence from vertebrate genomes)  LINEs and SINEs have extremely long lives  2 major peaks of transposon activity  No DNA transposition in the past 50MYr  LTR retroposons teetering on the brink of extinction a) Age distribution

b) Comparison with other genomes  Higher density of transposable elements in euchromatic portion of genome  Higher abundance of ancient transposons  60% of IR made up of LINE1 and Alu repeats  whereas DNA transposons represent only 6%

c) Variation in distribution of repeats Some regions show either High repeat density e.g. chromosome Xp11 – a 525kb region shows 89% repeat density Low repeat density e.g. HOX homeobox gene cluster (<2% repeats) (indicative of regulatory elements which have low tolerance for insertions)

High GC – gene rich ; High AT – gene poor LINEs abundant in AT-rich regions SINEs lower in AT-rich regions Alu repeats in particular retained in actively transcribed GC rich regions E.g. chromosme 19 has 5% Alus compared to Y chromosome d) Distribution by GC content

Unusually young genome (high tolerance to gaining insertions) Mutation rate is 2.1X higher in male germline e) The Y chromosome !

Working draft published – Feb 2001 Finished sequence – April 2003 Annotation of genes going on (refer: International Human Genome Sequencing Consortium. Finishing the euchromatic sequence of the human genome. Nature 21 October 2004 (doi: /nature03001)

References Chapter 9 pp HMG 3 by Strachan and Read Chapter 10: pp Genetics from genes to genomes by Hartwell et al (3/e) Nature (2001) 409: pp Nature (2005) for Chimp genome