Repetitive Elements May Comprise Over Two-Thirds of the Human Genome

Slides:



Advertisements
Similar presentations
Site-specific recombination
Advertisements

Genomics – The Language of DNA Honors Genetics 2006.
DNA Organization Lec 2. Aims The aims of this lecture is to investigate how cells organize their DNA within the cell nucleus, how is the huge amount of.
De novo identification of repeat families in large genomes Alkes L. Price, Neil C. Jones and Pavel A. Pevzner June 28, 2005.
PLANT OF THE DAY Paris japonica (native of Japan)
GENE DUPLICATIONS A.Non-homologous recombination B.Transposition C.Non-disjunction in meiosis.
Genomic Repetitive Elements (Human Focus). TYPES OF ELEMENTS Tandem repeats: a) satellite DNA 1) centromeric and heterochromatic 2) minisatellite 3) microsatellite.
Genes. Outline  Genes: definitions  Molecular genetics - methodology  Genome Content  Molecular structure of mRNA-coding genes  Genetics  Gene regulation.
SNP Discovery in the Human Genome C244/144 November 21, 2005.
Genomes and Genetic Architecture. Life on Earth.
Students ± PV92 Alu Insert. Transposons are “mobile genetic elements” of which there are a great many kinds. Some jump around in genomes. Others jump,
DNA Sequencing. DNA sequencing How we obtain the sequence of nucleotides of a species …ACGTGACTGAGGACCGTG CGACTGAGACTGACTGGGT CTAGCTAGACTACGTTTTA TATATATATACGTCGTCGT.
CS273a Lecture 2, Autumn 10, Batzoglou DNA Sequencing (cont.)
Genomes summary 1.>930 bacterial genomes sequenced. 2.Circular. Genes densely packed Mbases, ,000 genes 4.Genomes of >200 eukaryotes (45.
[Bejerano Fall10/11] 1 Primer, Friday 10am, Beckman B-302 Ex. 1 is coming.
ABSTRACT We have conducted an extensive computational analysis of the Culex quinquefasciatus genome to find and annotate a specific subfamily of the TEs:
[Bejerano Aut08/09] 1 MW 11:00-12:15 in Beckman B302 Profs: Serafim Batzoglou, Gill Bejerano TA: Cory McLean.
Kinetics and Components
Genomic Organization at the DNA level! By: Caroline Fowle, Amanda Zink, Ben Whitfield, Farvah Khaja and Danielle Siegert.
Introduction Basic Genetic Mechanisms Eukaryotic Gene Regulation The Human Genome Project Test 1 Genome I - Genes Genome II – Repetitive DNA Genome III.
Todd J. Treangen, Steven L. Salzberg
Eukaryotic Gene Expression The “More Complex” Genome.
Fig Chapter 12: Genomics. Genomics: the study of whole-genome structure, organization, and function Structural genomics: the physical genome; whole.
Transposable Elements (TE) in genomic sequence Mina Rho.
Selfish DNA Honors Genetics.
Eukaryotic Genomes Demonstrate Sequence Organization Characterized by Repetitive DNA Honors Genetics Lemon Bay High School
Genomes and Their Evolution. GenomicsThe study of whole sets of genes and their interactions. Bioinformatics The use of computer modeling and computational.
Transposon and Mechanisms of Transposition
Copyright ©The McGraw-Hill Companies, Inc. Permission required for reproduction or display CHAPTER 17 RECOMBINATION AND TRANSPOSITION AT THE MOLECULAR.
Chapter 11 Outline 11.1 Large Amounts of DNA Are Packed into a Cell, A Bacterial Chromosome Consists of a Single Circular DNA Molecule,
Biological Motivation for Fragment Assembly Rhys Price Jones Anne R. Haake.
Ch. 21 Genomes and their Evolution. New approaches have accelerated the pace of genome sequencing The human genome project began in 1990, using a three-stage.
Genomes & their evolution Ch 21.4,5. About 1.2% of the human genome is protein coding exons. In 9/2012, in papers in Nature, the ENCODE group has produced.
Used for detection of genetic diseases, forensics, paternity, evolutionary links Based on the characteristics of mammalian DNA Eukaryotic genome 1000x.
Chapter 21 Eukaryotic Genome Sequences
BACTERIAL TRANSPOSONS
BSL 2016 – Lecture 3 – Genome evolution and repetitive DNA (1) DNA content of organisms appears to increase with complexity ORGANISMDNA CONTENT (bp) Mycoplasma10.
Non-Coding Areas & Mutations Within the human genome the majority of the DNA (~75%) is made up of sequences not involved in coding for proteins, RNA, or.
HUMAN GENOME Gene density 1/100 kb (vary widely); Averagely 9 exons per gene 363 exons in titin gene Many genes are intronsless Largest intron is 800.
BB30055: Genes and genomes Genomes - Dr. MV Hejmadi Lecture 2 – Repeat elements.
` Gene Diversification and Transcript Variants by Transposable Elements Un-Jong Jo 1, Dae-Soo Kim 1, Tae-Hyung Kim 1, Jae-Won Huh 2 and Heui-Soo Kim 1,2.
Lecture 10 Genes, genomes and chromosomes
Lecture 2 – Repeat elements
GENETICS ESSENTIALS Concepts and Connections SECOND EDITION GENETICS ESSENTIALS Concepts and Connections SECOND EDITION Benjamin A. Pierce © 2013 W. H.
Copyright, ©, 2002, John Wiley & Sons, Inc.,Karp/CELL & MOLECULAR BIOLOGY 3E The Structure of the Genome Denaturation, Renaturation and Complexity.
DNA Sequencing.
Differences in DNA Heterochromatin vs. Euchromatin
Genomics Chapter 18.
“Jumping Genes” Lead The Way
Gregory Nature Rev. Genet. 6:699, 2005 & textbooks Repeated sequences comprise ~ 45% of total human genome !! Composition of human genome Contribution.
PLANT OF THE DAY Native of Japan Family – Melanthaceae Large plant genome – 150 Gbp DNA from a single cell stretched out end- to-end would be taller than.
‘mobile’ DNA or ‘jumping’ DNA Transposable elements as drivers of evolution.
CyVerse Workshop Transcriptome Assembly. Overview of work RNA-Seq without a reference genome Generate Sequence QC and Processing Transcriptome Assembly.
1 Junk DNA domestic imported domestic imported (e.g., dead genes) (e.g., retroviruses)
 DNA- genetic material of eukaryotes.  Are highly variable in size and complexity.  About 3.3 billion bp in humans.  Complexity- due to non coding.
Genome Evolution Evolution of Gene clusters. Why are genes are arranged in clusters? Many genes are arranged in groups of related genes along a chromosome.
Working with the Human Genome
Organization of prokaryotic, eukaryotic and viral genomes
Sequencing, de novo assembling, and annotating the genome of the endangered Chinese crocodile lizard, shinisaurus crocodilurus Jian gao, qiye li, zongji.
Design and Use of RepeatMasker
Genomes and Their Evolution
SGN23 The Organization of the Human Genome
Henrik Lantz - NBIS/SciLife/Uppsala University
Evolution of eukaryote genomes
What kinds of things have been learned?
Transposable Elements
Evolution of Genomes Chapter 21.
Human Transposon Tectonics
Forensic DNA Sadeq Kaabi
Repetitive DNA sequences
Presentation transcript:

Repetitive Elements May Comprise Over Two-Thirds of the Human Genome 2012-03-05

Abstract Transposable elements (TEs) are conventionally identified in eukaryotic genomes by alignment to consensus element sequences. Using this approach, about half of the human genome has been previously identified as TEs and low-complexity repeats. We recently developed a highly sensitive alternative de novo strategy, P-clouds, that instead searches for clusters of high-abundance oligonucleotides that are related in sequence space (oligo “clouds”). We show here that P-clouds predicts >840 Mbp of additional repetitive sequences in the human genome, thus suggesting that 66%–69% of the human genome is repetitive or repeat-derived. To investigate this remarkable difference, we conducted detailed analyses of the ability of both P-clouds and a commonly used conventional approach, RepeatMasker (RM), to detect different sized fragments of the highly abundant human Alu and MIR SINEs. RM can have surprisingly low sensitivity for even moderately long fragments, in contrast to P-clouds, which has good sensitivity down to small fragment sizes (~25 bp). Although short fragments have a high intrinsic probability of being false positives, we performed a probabilistic annotation that reflects this fact. We further developed “element-specific” P-clouds (ESPs) to identify novel Alu and MIR SINE elements, and using it we identified ~100 Mb of previously unannotated human elements. ESP estimates of new MIR sequences are in good agreement with RM-based predictions of the amount that RM missed. These results highlight the need for combined, probabilistic genome annotation approaches and suggest that the human genome consists of substantially more repetitive sequence than previously believed.

Repeated sequence Tandem repeats Interspersed repeats (transposon) Satellite DNA, Minisatellite, Microsatellite Interspersed repeats (transposon) Retrotransposon (copy and paste) SINEs (Alu, MIR) LINEs (LINE1, LINE2) LTRs (HERV, MER4, retroposon) DNA transposon (cut and paste)

What is the human genome sequence made of ?

Motivation Evolution would have heavily altered substantial amounts of TE-derived sequence The relations among large clusters of sequences may make them detectable The commonly used approach, RepeatMasker, relies on Repbase library

Outline The P-clouds method De novo repeat annotation of the human genome with P-clouds P-clouds and RepeatMasker detection capability for fragments of known elements Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements

Outline The P-clouds method De novo repeat annotation of the human genome with P-clouds P-clouds and RepeatMasker detection capability for fragments of known elements Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements

Outline The P-clouds method De novo repeat annotation of the human genome with P-clouds P-clouds and RepeatMasker detection capability for fragments of known elements Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements

Outline The P-clouds method De novo repeat annotation of the human genome with P-clouds P-clouds and RepeatMasker detection capability for fragments of known elements Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements

Outline The P-clouds method De novo repeat annotation of the human genome with P-clouds P-clouds and RepeatMasker detection capability for fragments of known elements Elements specific P-clouds (ESPs) performance for annotation of novel Alu and MIR elements

Potential application C-paradox Next generation sequencing Population structure based on TEs