ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05.

Slides:



Advertisements
Similar presentations
EAnnot: A genome annotation tool using experimental evidence Aniko Sabo & Li Ding Genome Sequencing Center Washington University, St. Louis.
Advertisements

Breakdown of 244 total (Yale+Vega) Pseudogenes Amongst Various ENCODE Regions 211 Yale, 178 Vega, Union is 244 More pseudogenes in the manually picked.
The Consensus CoDing Sequence (CCDS) Database
Central dogma of genetics Lecture 4. The conversion of DNA to Proteins.
Protein Synthesis.
1 Computational Molecular Biology MPI for Molecular Genetics DNA sequence analysis Gene prediction Gene prediction methods Gene indices Mapping cDNA on.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
The Molecular Genetics of Gene Expression
© 2006 W.W. Norton & Company, Inc. DISCOVER BIOLOGY 3/e
UCSC Known Genes Version 3 Take 9. Known Gene History Initially based on Genie predictions constrained by BLAT mRNA alignments. –David Kulp got busy at.
From Gene to Protein. Genes code for... Proteins RNAs.
UCSC Known Genes Version 3 Take 10. Overall Pipeline Get alignments etc. from database Remove antibody fragments Clean alignments, project to genome Cluster.
 Assemble the DNA  Follow base pair rules  Blue—Guanine  Red—Cytosine  Purple—Thymine  Green--Adenine.
Protein Synthesis.
TRNA. Transfer RNA (tRNA) is a small molecule, existing as a single- strand that is folded into a clover-leaf shape.
BIOLOGY 3020 Fall 2008 Gene Hunting (DNA database searching)
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
CHAPTER 17 FROM GENE TO PROTEIN Copyright © 2002 Pearson Education, Inc., publishing as Benjamin Cummings Section B: The Synthesis and Processing of RNA.
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
ENCODE pseudogene updates Adam Frankish, HAVANA 6/10/05.
1 ENCODE Pseudogene Summary for GT call Mark Gerstein 2005, :00 EDT summary of 6 Calls: Sept. 15, 22; Oct. 6, 13, 20, 27.
Eukaryotic cells modify RNA after transcription
Genome Annotation BBSI July 14, 2005 Rita Shiang.
GeneWise and Artemis Exercises Spliced Alignment using GeneWise Click on the GeneWise hyperlink on the course links page,
발표자 석사 2 년 김태형 Vol. 11, Issue 3, , March 2001 Comparative DNA Sequence Analysis of Mouse and Human Protocadherin Gene Clusters 인간과 마우스의 PCDH 유전자.
NCBI’s Genome Annotation: Overview Incremental processing Re-annotation ( batch ) Post-annotation review Case studies NOTE: limiting discussion to annotation.
You should be able to label these pictures Label the following: –RNA polymerase –DNA –mRNA –tRNA –5’ end –3’ end –Amino acid –Ribosome –Polypeptide chain.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Do Now: On the “Modeling DNA” handout, determine the complimentary DNA sequence and the mRNA sequence by using the sequence given.
Transcription & Translation Chapter 17 (in brief) Biology – Campbell Reece.
1 Transcript modeling Brent lab. 2 Overview Of Entertainment  Gene prediction Jeltje van Baren  Improving gene prediction with tiling arrays Aaron Tenney.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Review of Protein Synthesis. Fig TRANSCRIPTION TRANSLATION DNA mRNA Ribosome Polypeptide (a) Bacterial cell Nuclear envelope TRANSCRIPTION RNA PROCESSING.
Srr-1 from Streptococcus. i/v nonpolar s serine (polar uncharged) n/s/t polar uncharged s serine (polar uncharged) e glutamic acid (neg. charge) sserine.
SPIDA Substitution Periodicity Index and Domain Analysis Combining comparative sequence analysis with EST alignment to identify coding regions Damian Keefe.
The Havana-Gencode annotation GENCODE CONSORTIUM.
Genes and How They Work Chapter The Nature of Genes information flows in one direction: DNA (gene)RNAprotein TranscriptionTranslation.
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Gerstein Lab Aims in ModENCODE.
Annotation of Drosophila virilis Chris Shaffer GEP workshop, 2006.
Genes – Coding and Flanking Genes are made up of different regions: –Coding region – part that contains information for producing the protein –Flanking.
Fgenes++ pipelines for automatic annotation of eukaryotic genomes Victor Solovyev, Peter Kosarev, Royal Holloway College, University of London Softberry.
1 ENCODE Pseudogene Call Summary Mark Gerstein 2005, :00 EDT (Draft for G&T call on 2005, :00 EDT)
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Functions of RNA mRNA (messenger)- instructions protein
GENOME: an organism’s complete set of genetic material In humans, ~3 billion base pairs CHROMOSOME: Part of the genome; structure that holds tightly wound.
Lesson Four Structure of a Gene. Gene Structure What is a gene? Gene: a unit of DNA on a chromosome that codes for a protein(s) –Exons –Introns –Promoter.
Chapter 17: From Gene to Protein. Figure LE 17-2 Class I Mutants (mutation In gene A) Wild type Class II Mutants (mutation In gene B) Class III.
-1- Module 3: RNA-Seq Module 3 BAMView Introduction Recently, the use of new sequencing technologies (pyrosequencing, Illumina-Solexa) have produced large.
1 Many to 1 Gene Associations The following slides show a few examples of gene predictions by one annotation group that overlap one or more genes from.
Finding genes in the genome
Do not reproduce without permission 1 Gerstein.info/talks (c) (c) Mark Gerstein, 2002, Yale, bioinfo.mbb.yale.edu Permissions Statement This Presentation.
Cells use information in genes to build several thousands of different proteins, each with a unique function. But not all proteins are required by the.
Gene Finding in Chimpanzee Evidence based improvement of ab initio gene predictions Chris Shaffer06/2009.
TRANSLATION. Cytoplasm Nucleus DNA Transcription RNA Translation Protein.
HOW DO CELLS KNOW WHEN TO EXPRESS A GENE? DO NOW:.
Ch. 11: DNA Replication, Transcription, & Translation Mrs. Geist Biology, Fall Swansboro High School.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
Protein Synthesis. One Gene – One Enzyme Protein Synthesis.
Protein synthesis DNA is the genetic code for all life. DNA literally holds the instructions that make all life possible. Even so, DNA does not directly.
ENCODE Pseudogenes and Transcription
Transcription & Translation.
Secreted Fringe-like Signaling Molecules May Be Glycosyltransferases
Central Dogma Central Dogma categorized by: DNA Replication Transcription Translation From that, we find the flow of.
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
Practice Clone 3 Download and get ready!.
Protein Synthesis The genetic code – the sequence of nucleotides in DNA – is ultimately translated into the sequence of amino acids in proteins – gene.
Common Errors in Student Annotation Submissions contributions from Paul Lee, David Xiong, Thomas Quisenberry Annotating multiple genes at the same locus.
Splice isoforms of the JNK1, JNK2, and JNK3 proteins.
BRCA1 protein functional domains and predicted frameshift and premature truncation. BRCA1 protein functional domains and predicted frameshift and premature.
Presentation transcript:

ENCODE pseudogene updates Adam Frankish, HAVANA 13/10/05

Not added - AK The transcripts on which this pseudogene is based do not appear to have a valid translation (only BC has a translation which looks spurious) Reverse strand mRNAs Ral-GDS related protein Rgr (Rgr) pseudogene Translation

Not added - YalePgene_139 I have been able to reconstruct a coding gene with a full length CDS at this locus (AC ) and would not annotate a coding gene and pseudogene at the same locus as discussed previously. The majority of the gene (3' end of exon 3 to final exon (8)) is supported by 100% matching (best in genome hits) human EST (Em:DN , Em:BG ) and mRNA evidence (Em:BC ) which together support a structure (although there is a small gap in support in exon 5) with an ORF extending from start to the final exon. Using human ESTs not from this locus eg Em:BM (approx 70% ID at this locus best hit in genome 100% to the KIR2DL4 gene also on chr19 by ensembl SSAHA) the 5' end of exon 3 and two further upstream exons can be clearly identified (all splice sites are clearly intact). The structure contains a CDS which starts in exon 1 (shares homology with the N-terminal sequence of several KIR2D family members in the exon), ends in the final exon and contains three immunoglobulin domains. The fact that despite the lack of transcript evidence from the 5’ end locus and the quite high degree of divergence between this locus and other gene family members, these splice sites are preserved suggests that this structure is correct and a coding gene rather than a pseudogene.

Not added - YalePgene_139 ProteinESTmRNA Supporting evidence

Not added - YalePgene_139 Dot plot of EST Splice donor

Havana+, Yale-, UCSC- AC AC AC AF RP11-143H AC Z Z AC AC AC AC AC AC AC AC AC AC AL We think the annotation of these as pseduogenes can be supported

ENm001 - AC , AC heterogeneous nuclear ribonucleoprotein A1 (Hnrpa1) pseudogene NADH dehydrogenase 2 (MTND2) pseudogene NADH dehydrogenase 4 (MTND4) pseudogene Yale pseudo UCSC pseudo New cytochrome b (CYTB) pseudogene

ENm002 - AC Dot plot Alignment

ENm004 - RP1-127L4.3 UCSC pseudo Yale pseudo HAVANA pseudo

ENm006 - AF olfactory receptor family pseudogene

ENm006 - RP11-143H17.1 HAVANA pseudo Frameshift

ENm007 - AC HAVANA LIR pseudogene

ENm008 - Z HAVANA hemoglobin, alpha pseudogene

ENm009 - AC olfactory receptor, family 51, subfamily N, member 1 pseudogene Frameshift

ENm009 - AC olfactory receptor, family 52, subfamily Y, member 1 pseudogene

ENm009 - AC olfactory receptor, family 52, subfamily Z, member 1 pseudogene No Met First possible Met

ENm009 - AC olfactory receptor, family 51, subfamily A, member 10 pseudogene Frameshift

ENm009 - AC Novel pseudogene

ENm013 - AC ribosomal protein L5 (RPL5) pseudogene

ENr121 - AC hydroxytryptamine (serotonin) receptor 5B (HTR5B) pseudogene Frameshift

ENr131 - AC UDP glycosyltransferase 1 family, polypeptide A2 pseudogene Frameshift

ENr233 - AC Novel pseudogene 3’ truncation ~350aa missing, no stop

ENr233 - AC stereocilin (STRC) pseudogene Stop codon in exon 20

ENr322 - AL pseudogene similar to part of ribosomal protein L3 (RPL3) Protein dot plot mRNA dot plot

HAVANA pseudogene overlaps exon Non-coding locus –AC , AC , AC , AC , AC , AC , AC , AC , RP3-477O4.5 Coding locus opposite strand –AC , RP11-143H17.1, AC , RP11-398K22.9, RP3-477O4.4 Coding locus same strand –AC , Z , AC We believe all these pseudogenes are valid

Non-coding locus HAVANA sialyltransferase pseudogene Putative novel transcript Supporting EST Aligned proteins (column collapsed)

Coding locus opposite strand Protein alignment HAVANA novel pseudogene Non-coding exon ENm001 Pseudogene: AC

Coding locus same strand Frameshift LILRA3 LILR pseudogene

But not…. In-frame stop codon KIR2DL3 – coding gene