Prediction of selenoprotein genes in eukaryotic genomes roderic guigó i serra, bioinformatica, UPF curs 2005/2006 11/29/2018 Bioinformatica UPF març.

Slides:



Advertisements
Similar presentations
Profiles for Sequences
Advertisements

Comparative Genome Analysis. Comparative yeast genomics Kellis et al (2003) Nature 423,
Lecture 12 Splicing and gene prediction in eukaryotes
Finding prokaryotic genes and non intronic eukaryotic genes
Transcription Transcription is the synthesis of mRNA from a section of DNA. Transcription of a gene starts from a region of DNA known as the promoter.
Structural Bioinformatics R. Sowdhamini National Centre for Biological Sciences Tata Institute of Fundamental Research Bangalore, INDIA.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
Chapter 2: From genes to Genomes. 2.1 Introduction.
Recerca de selenoproteïnes en el genoma d’organimes eucariotes Bioinformàtica, UPF.
1 The Interrupted Gene. Ex Biochem c3-interrupted gene Introduction Figure 3.1.
1 TRANSCRIPTION AND TRANSLATION. 2 Central Dogma of Gene Expression.
1 Genes and How They Work Chapter Outline Cells Use RNA to Make Protein Gene Expression Genetic Code Transcription Translation Spliced Genes – Introns.
Centra Dogma Primer. Structure of DNA and RNA Nucleic acids made of nucleotides G, A, T/U, C Ribose vs. deoxyribose Template-dependent synthesis Double.
Review of Protein Synthesis. Fig TRANSCRIPTION TRANSLATION DNA mRNA Ribosome Polypeptide (a) Bacterial cell Nuclear envelope TRANSCRIPTION RNA PROCESSING.
Finding genes by comparing genomes roderic guigó i serra imim/upf/crg, barcelona.
Mark D. Adams Dept. of Genetics 9/10/04
Gene prediction roderic guigó i serra IMIM/UPF/CRG.
While replication, one strand will form a continuous copy while the other form a series of short “Okazaki” fragments Genetic traits can be transferred.
JIGSAW: a better way to combine predictions J.E. Allen, W.H. Majoros, M. Pertea, and S.L. Salzberg. JIGSAW, GeneZilla, and GlimmerHMM: puzzling out the.
Chapter 3 The Interrupted Gene.
Applications of HMMs in Computational Biology BMI/CS 576 Colin Dewey Fall 2010.
Finding genes in the genome
CFE Higher Biology DNA and the Genome Transcription.
BIOINFORMATICS Ayesha M. Khan Spring 2013 Lec-8.
RNA and Protein Synthesis. RNA Structure n Like DNA- Nucleic acid- composed of a long chain of nucleotides (5-carbon sugar + phosphate group + 4 different.
Genetic Code and Interrupted Gene Chapter 4. Genetic Code and Interrupted Gene Aala A. Abulfaraj.
bacteria and eukaryotes
Bioinformatics Overview
The Transcriptional Landscape of the Mammalian Genome
Protein Synthesis Translation.
Figure 8. Subcellular localisation of murine and human nuclear forms in fusion with EGFP in NIH 3T3 and HeLa cells. Ung2EGFP in NIH 3T3 (A) and HeLa (B)
Genetics and Evolutionary Biology
Distribution of Introns among Full Length cDNA
Transcription Translation
Transcription & Gene Expression
BTY100-Lec#4.2 DNA to Protein (Central Dogma).
Exam #1 W 9/26 at 7-8:30pm in UTC 2.102A Review T 9/25 at 5pm in WRW 102 and in class 9/26.
Interpolated Markov Models for Gene Finding
Eukaryotic selenoprotein genes identified so far.
Are Complex Behaviors Specified by Dedicated Regulatory Genes
Gene Expression : Transcription and Translation
Eukaryotic Gene Finding
Central Dogma.
more regulating gene expression
Genome Projects Maps Human Genome Mapping Human Genome Sequencing
Mark M Metzstein, H.Robert Horvitz  Molecular Cell 
Ab initio gene prediction
The triplet code Starter A DNA molecule is 23% guanine.
What are the Patterns Of Nucleotide Substitution Within Coding and
Comparative functional genomics of mammalian DNA methyltransferases
Regulated Unproductive Splicing
Recerca de selenoproteïnes en el genoma d’organimes eucariotes
Homework #2 is due 10/17 Bonus #1 is due 10/24 FrakenFlowers.
Nat. Rev. Endocrinol. doi: /nrendo
Recerca de selenoproteïnes en el genoma d’organimes eucariotes
Relationship between Genotype and Phenotype
Chapter 4 The Interrupted Gene.
Relationship between Genotype and Phenotype
Genes and How They Work Chapter 15
The 3D models in part a are reproduced with permission from Daum, B
What do you with a whole genome sequence?
Reprogrammed Genetic Decoding in Cellular Gene Expression
closing in on the set of human genes. The ENCODE project.
Higher Biology Unit 1: 1.3 Transcription.
GT repeats are unique to Cdk6 and are conserved in different mammals.
Emmanuelle Bitoun, Stéphane Chavanas, Alan D
Characterization and Mutation Analysis of Human LEFTY A and LEFTY B, Homologues of Murine Genes Implicated in Left-Right Axis Development  K. Kosaki,
Nonsense and missense amino acid mutation analysis.
Nucleotide and predicted amino acid sequence of the adult mouse brain cdr2 cDNA. Nucleotide and predicted amino acid sequence of the adult mouse brain.
BRCA1 protein functional domains and predicted frameshift and premature truncation. BRCA1 protein functional domains and predicted frameshift and premature.
Presentation transcript:

prediction of selenoprotein genes in eukaryotic genomes roderic guigó i serra, bioinformatica, UPF curs 2005/2006 11/29/2018 Bioinformatica UPF març 2006

what are selenoproteins? Selenoproteins are proteins that incorporate selenocysteine, the 21st aminoacid Mostly redox enzimes Distributed in the three domains of life About 25 known selenoproteins in mammals, but the number varies for different taxa 11/29/2018 Bioinformatica UPF març 2006

selenocysteine 11/29/2018 Bioinformatica UPF març 2006

11/29/2018 Bioinformatica UPF març 2006

the selenocysteine codon? 11/29/2018 Bioinformatica UPF març 2006

the selenocysteine codon:UGA 11/29/2018 Bioinformatica UPF març 2006

recoding of UGA 11/29/2018 Bioinformatica UPF març 2006

the dual function of UGA compounds the identification of selenoproteins 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search: SECIS search SECIS came in a variety of sequences 11/29/2018 Bioinformatica UPF març 2006

SECIS search: PatScan 11/29/2018 Bioinformatica UPF març 2006

SECIS search in the Drosophila genome 35,876 potential SECIS elements 1,220 termodynamically stable 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search: codon bias 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search: codon bias Protein coding codon bias No codon bias selenoprotein TGA STOP Non selenoprotein TGA STOP 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search: codon bias Coding Potential Coding region TGA - STOP STOP - STOP 10 SPs 10.21 9.90 -0.16 1169 non-SPs 8.37 -0.83 -2.52 Coding Potential: from the bias in the use of amino acids and, moreover, synonimous codons. 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search: SECIS + exon prediction Predict SECIS with PatScan Gene prediction with geneid (allowing TGA-interrupted exons) Geneid uses dynamic programming to chain input exons into gene structures maximizing a log-likelihood function. SECIS predictions and TGA-interrupted exons are now among the input exons. Chaining rules state that SECIS elements can only be chained if they terminate genes containing TGA exons, and that genes containing TGA exon can only be terminated by SECIS predictions. 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search: 5’ 3’ SECIS elements and genes are predicted independently along the DNA sequence, but joined in the final gene prediction in such a way that SECIS elements are only allowed after a gene containing a TGA in-frame (within a defined range). 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search: 5’ 3’ SECIS elements and genes are predicted independently along the DNA sequence, but joined in the final gene prediction in such a way that SECIS elements are only allowed after a gene containing a TGA in-frame (within a defined range). 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search: 5’ 3’ SECIS elements and genes are predicted independently along the DNA sequence, but joined in the final gene prediction in such a way that SECIS elements are only allowed after a gene containing a TGA in-frame (within a defined range). 11/29/2018 Bioinformatica UPF març 2006

Independent but coordinated selenoprotein search: Putative selenoprotein 5’ 3’ SECIS elements and genes are predicted independently along the DNA sequence, but joined in the final gene prediction in such a way that SECIS elements are only allowed after a gene containing a TGA in-frame (within a defined range). Independent but coordinated TGA in-frame gene and SECIS prediction 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search in Drosophila (Castellano et al selenoprotein search in Drosophila (Castellano et al. EMBO Reports 2:697-702, 2001) SECIS predicted 35876 SECIS thermo assessment 1220 Genes predicted 12194 Predicted Selenoproteins (4) Real Selenoproteins 3 11/29/2018 Bioinformatica UPF març 2006

11/29/2018 Bioinformatica UPF març 2006

dSelG 11/29/2018 Bioinformatica UPF març 2006

dSelH 11/29/2018 Bioinformatica UPF març 2006

dSelG and dSelH are ubiquitous selenoproteins 11/29/2018 Bioinformatica UPF març 2006

dSelH has selenoprotein homologues in vertebrates 11/29/2018 Bioinformatica UPF març 2006

selenoprotein search in mammalian genomes Larger genome. Much more room for false positive SECIS predictions Poorer gene predicitons. 11/29/2018 Bioinformatica UPF març 2006

conserved SECIS between human and mouse 11/29/2018 Bioinformatica UPF març 2006

characterization of mammalian selenoproteins (Kryukov et al., Science 300:1439-1443, 2003) 11/29/2018 Bioinformatica UPF març 2006

11/29/2018 Bioinformatica UPF març 2006

selenoprotein search in other vertebrate genomes 11/29/2018 Bioinformatica UPF març 2006

human vs. fugu 11/29/2018 Bioinformatica UPF març 2006

SelU: a novel selenoprotein family (Castellano et al SelU: a novel selenoprotein family (Castellano et al., EMBO reports 5:71-77, 2004) 11/29/2018 Bioinformatica UPF març 2006

11/29/2018 Bioinformatica UPF març 2006

SelU: exonic structure, and SECIS elements 11/29/2018 Bioinformatica UPF març 2006

SelU: a novel selenoprotein family 11/29/2018 Bioinformatica UPF març 2006

SelU: scattered phylogenetic distribution 11/29/2018 Bioinformatica UPF març 2006

11/29/2018 Bioinformatica UPF març 2006

11/29/2018 Bioinformatica UPF març 2006

Fig. 1. SelJ gene and SECIS structure Castellano, Sergi et al. (2005) Proc. Natl. Acad. Sci. USA 102, 16188-16193 11/29/2018 Bioinformatica UPF març 2006 Copyright ©2005 by the National Academy of Sciences

Fig. 2. 75Se labeling Castellano, Sergi et al. (2005) Proc. Natl. Acad. Sci. USA 102, 16188-16193 11/29/2018 Bioinformatica UPF març 2006 Copyright ©2005 by the National Academy of Sciences

Fig. 3. Subcellular localization of SelJ Castellano, Sergi et al. (2005) Proc. Natl. Acad. Sci. USA 102, 16188-16193 11/29/2018 Bioinformatica UPF març 2006 Copyright ©2005 by the National Academy of Sciences

SelJ and crystallins 11/29/2018 Bioinformatica UPF març 2006

Fig. 4. Expression pattern of the SelJ gene during development in zebrafish embryos Castellano, Sergi et al. (2005) Proc. Natl. Acad. Sci. USA 102, 16188-16193 11/29/2018 Bioinformatica UPF març 2006 Copyright ©2005 by the National Academy of Sciences

the eukaryotic selenoproteome 11/29/2018 Bioinformatica UPF març 2006

11/29/2018 Bioinformatica UPF març 2006

SELENOPROTEINS University of Nebraska IMIM, Barcelona Gregory V. Kryukov Sergey V. Novoselov Vadim N. Gladyshev IBMC, Strasbourg Alain Lescure Alain Krol IMIM, Barcelona Sergi Castellano Charles Chapple Universitat de Barcelona Marta Morey Montserrat Corominas Florenci Serras Harvard Unversity, Boston Nadia Morozova Marla J. Berry 11/29/2018 Bioinformatica UPF març 2006

sergi in hawaii 11/29/2018 Bioinformatica UPF març 2006