Do not reproduce without permission 1 Gerstein.info/talks (c) 2004 1 (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.

Slides:



Advertisements
Similar presentations
Chap. 6 Problem 2 Protein coding genes are grouped into the classes known as solitary (single) genes, and duplicated or diverged genes in gene families.
Advertisements

Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.
Transcriptome Sequencing with Reference
Describe the structure of a nucleosome, the basic unit of DNA packaging in eukaryotic cells.
Gene prediction in ENCODE roderic guigó i serra crg-imim-upf, barcelona Advanced Bioinformatics, chsl, october 2005.
Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. CHAPTER 18 LECTURE SLIDES.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Alternative splicing and evolution Daniel Jeffares.
Displaying associations, improving alignments and gene sets at UCSC Jim Kent and the UCSC Genome Bioinformatics Group.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
Genome organization Eukaryotic genomes are complex and DNA amounts and organization vary widely between species.
March 9, 2007 Bologna, February the complexity of human genes The ENCODE Genes & Transcripts group Roderic Guigó Centre de Regulació Genòmica, Barcelona.
Chapter 19: Eukaryotic Genomes Most gene expression regulated through transcription/chromatin structure Most gene expression regulated through transcription/chromatin.
RExPrimer Pongsakorn Wangkumhang, M.Sc. Biostatistics and Informatics Laboratory, Genome Institute, National Center for Genetic Engineering and Biotechnology.
ENCODE pseudogene updates Adam Frankish, HAVANA 6/10/05.
1 ENCODE Pseudogene Summary for GT call Mark Gerstein 2005, :00 EDT summary of 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27.
Eukaryotic Gene Expression The “More Complex” Genome.
Human Genetics The Human Genome 1.
EXPLORING DEAD GENES Adrienne Manuel I400. What are they? Dead Genes are also called Pseudogenes Pseudogenes are non functioning copies of genes in DNA.
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
Finish up array applications Move on to proteomics Protein microarrays.
Discussion Points for 2 nd Pseudogene Call Mark Gerstein 2005, :00 EST.
Mutation And Natural Selection how genomes record a history of mutations and their effects on survival Tina Hubler, Ph.D., University of North Alabama,
The Biology and Genetic Base of Cancer. 2 (Mutation)
Mutations.
Chapter 21 Eukaryotic Genome Sequences
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
Gene Regulations and Mutations
LARVA: An integrative framework for Large-scale Analysis of Recurrent Variants in noncoding Annotations M Gerstein, Yale Slides freely downloadable from.
Proposed redefinition of “gene” requires it to have a biological role Gerstein MB, …, Snyder M Genome Res 17: example of complexities observed.
RNA-Seq Primer Understanding the RNA-Seq evidence tracks on the GEP UCSC Genome Browser Wilson Leung08/2014.
Lecture 10 Genes, genomes and chromosomes
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
David Sadava H. Craig Heller Gordon H. Orians William K. Purves David M. Hillis Biologia.blu B – Le basi molecolari della vita e dell’evoluzione The Eukaryotic.
Gene discovery using combined signals from genome sequence and natural selection Michael Brent Washington University The mouse genome analysis group.
1 ENCODE Pseudogene Call Summary Mark Gerstein 2005, :00 EDT (Draft for G&T call on 2005, :00 EDT)
Johnson - The Living World: 3rd Ed. - All Rights Reserved - McGraw Hill Companies Genomics Chapter 10 Copyright © McGraw-Hill Companies Permission required.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05.
Gene Regulation In 1961, Francois Jacob and Jacques Monod proposed the operon model for the control of gene expression in bacteria. An operon consists.
Evolution at the Molecular Level. Outline Evolution of genomes Evolution of genomes Review of various types and effects of mutations Review of various.
1 of 28 Evaluating Genes and Transcripts (“Genebuild”)
Work Presentation Novel RNA genes in A. thaliana Gaurav Moghe Oct, 2008-Nov, 2008.
Gene structure and function
Click to continue How do a few genes build a diversity of body parts? There’s more in the genetic toolkit than just genes! Click your forward cursor to.
Gene Regulation, Part 2 Lecture 15 (cont.) Fall 2008.
GENCODE: a rich dataset of all gene features in the human genome The GENCODE consortium aims to identify all gene features in the human genome, using a.
The Transcriptional Landscape of the Mammalian Genome
The modern view of dispersed genome activity
miRNA genomic organization, biogenesis and function
ENCODE Pseudogenes and Transcription
Genomes and Their Evolution
International Conference on Bioinformatics HKUST, Hong Kong 2007
SGN23 The Organization of the Human Genome
Mark M Metzstein, H.Robert Horvitz  Molecular Cell 
Introduction to Bioinformatics II
Gene duplications: evolutionary role
Organization of the human genome
Chapter 9 Organization of the Human Genome
Ensembl Genome Repository.
mRNA Degradation and Translation Control
lincRNAs: Genomics, Evolution, and Mechanisms
closing in on the set of human genes. The ENCODE project.
.1Sources of DNA and Sequencing Methods 2 Genome Assembly Strategy and Characterization 3 Gene Prediction and Annotation 4 Genome Structure 5 Genome.
Volume 11, Issue 7, Pages (May 2015)
Presentation transcript:

Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation is copyright Mark Gerstein, Yale University, 2004, Feel free to use images in it with PROPER acknowledgement.

Do not reproduce without permission 2 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes in the ENCODE Regions: Consensus Annotation, Analysis of Transcription and Evolution Deyou Zheng, Adam Frankish, Robert Baertsch, Philipp Kapranov, Alexandre Reymond, Siew Woh Choo, Yontao Lu, France Denoeud, Stylianos Antonarakis, Michael Snyder, Yijun Ruan, Chia-Lin Wei, Thomas Gingeras, Roderic Guigo, Jennifer Harrow, Mark Gerstein Yale, Sanger, UCSC, GIS, AFFX, U Geneva, IMIM a GT effort with great thanks to MSA, VAR, TR Talk at ENCODE 2006, ' in 20:30-21:30

Do not reproduce without permission 3 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes are among the most interesting intergenic elements Regulatory regions, repeats, non-coding RNA, origins of replication…. Formal Properties of Pseudogenes (  G)  Inheritable  Homologous to a functioning element  Non-functional* No selection pressure so free to accumulate mutations –Frameshifts & stops –Small Indels –Inserted repeats (LINE/Alu) What does this mean? no transcription, no translation?… [Mighell et al. FEBS Letts, 2000]

Do not reproduce without permission 4 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Pseudogenes (  G) as Disabled Homologies Cyc gene A pseudogene

Do not reproduce without permission 5 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Why Study Pseudogenes?  Important for Doing Accurate Gene Annotation  Abundant: > 8000 retropseudogenes in human  High sequence similarity with genes  25% in C. elegans ? [Mounsay, Genome Research, 2002]  Interfere with study on functional genes  Cross-hybridation in micro-array and RT-PCR. [Ruud, Int. J. Cancer 1999]  Some pseudogenes have regulatory roles   G are “genomic fossils”  Study the evolution of genes and genomes  Measure mutation/insertion rates

Do not reproduce without permission 6 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Why Study Pseudogenes?  Cause errors in sequence databases  > 8000 retropseudogenes in human  Contamination in Ensembl  25% in C. elegans ? [Mounsay, Genome Research, 2002]  "Interfere" with functional genes  Cross-hybridation in microarray and PCR (Cytokeratin 19, Int. J. Cancer 1999)  Very rarely this gives some pseudogenes regulatory roles   G are “genomic fossils”  Study the evolution of genes and genomes  Measure mutation/insertion rates In mouse, a pseudogene up-regulates gene expression of Makorin1 by binding to a transcriptional repressor or an RNA- digesting enzyme [Hirotsune et al. Nature ]

Do not reproduce without permission 7 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Why Study Pseudogenes?  Cause errors in sequence databases  > 8000 retropseudogenes in human  Contamination in Ensembl  25% in C. elegans ? [Mounsay, Genome Research, 2002]  Interfere with study on functional genes  Cross-hybridation in micro-array and RT-PCR. [Ruud, Int. J. Cancer 1999]  Some pseudogenes have regulatory roles   G are “genomic fossils”  Study the evolution of genes and genomes  Illuminate important genomic remodeling processes of duplication and retrotransposition  Measure mutation/insertion rates

Do not reproduce without permission 8 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Duplicated Pseudogenes Original Gene Gene Duplication Mutations retains intron/exon structure e.g. globins, Hox cluster and Arabidopsis genome sometimes can be transcribed

Do not reproduce without permission 9 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Retro-pseudogenes (Processed  G) Original Gene LINE-11 mediated retrotransposition Mostly dead-on-arrival (DOA) Intronless, poly-A tail, direct repeats Target-primed reverse-transcription: -TT|AAA- AACATA AAAAAA Other types: Numt (nuclear mitochondria DNA)

Do not reproduce without permission 10 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Overlap of Pseudogenes by 5 Different Methods 4 automatic pipelines (comparing protein or transcript v genomic DNA, filtering, application of rules) + HAVANA manual GIS

Do not reproduce without permission 11 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu  HNRPA1  MTND2  MTND4  CYTB Ribonucleoprotein A1 proc. pseudogene Inserted mito. seq. resulting in 3 pseudogenes Complexities in Pseudogene Annotation

Do not reproduce without permission 12 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Complexities in Pseudogene Annotation  HNRPA1  MTND2  MTND4  CYTB Ribonucleoprotein A1 proc. pseudogene Inserted mito. seq. resulting in 3 pseudogenes

Do not reproduce without permission 13 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Regional Distribution 201 pseudogenes 77 non-processed 124 processed OR

Do not reproduce without permission 14 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Ex. Pseudogene Intersecting Transcriptional Evidence TARS CAGE diTAG ChIP- chip

Do not reproduce without permission 15 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Intersection of Pseudogenes with Transcriptional Evidence

Do not reproduce without permission 16 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Targeted Transcription Expts. RACE expts  Interrogated 160 pseudogenes (49 non-processed & 111 processed)  In 51 cases (26 non-processed and 25 processed pseudogenes), could design distinguishing primers (>4 mismatched bp v. parent)  The resulting data supported transcription from 14 (8 processed and 6 non-processed) of the 160 pseudogenes (9 with pseudogene specific primers)  These numbers might represent a conservative estimate since a RACEfrag was assigned to its parent gene by default if it could be mapped to both a parent locus and a pseudogene locus. RACE expts + sequencing (CAGE, PET, EST and mRNA)  unambiguous evidence for pseudogene transcription  All together, these data indicate 38 of 201 pseudogenes being the source of novel RNA transcripts  5 of these had cryptic promotors (from TR analysis)

Do not reproduce without permission 17 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu History of Pseudogene Preservation Absent Present with Disablement Present without Disablement

Do not reproduce without permission 18 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Retrotransposition within Last 45 MYA Created Many Processed Pseudogenes

Do not reproduce without permission 19 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Sequence Decay of Pseudogenes, Approximately Neutral

Do not reproduce without permission 20 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Sequence Decay of Pseudogenes Relative to their Immediate Genomic Context

Do not reproduce without permission 21 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Scaling Issues 201 pseudogenes X 100 = ~20K, which agrees with previous est. for whole genome Interplay between manual annotation and automatic pipelines  Dynamic interplay with gene annotation (can't overlap)  Need to have a protein alignment

Do not reproduce without permission 22 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Using phastOdd value to examine neutral evolution of pseudogenes