Recerca de selenoproteïnes en el genoma d’organimes eucariotes

Slides:



Advertisements
Similar presentations
Using phylogenetic profiles to predict protein function and localization As discussed by Catherine Grasso.
Advertisements

Basics of Comparative Genomics Dr G. P. S. Raghava.
DNA Chip Scanning DNA Chips are scanned with a laser to excite the fluorescein dye that is attached to the target cDNA. Only those probe spots where target.
1 Gene Finding Charles Yan. 2 Gene Finding Genomes of many organisms have been sequenced. We need to translate the raw sequences into knowledge. Where.
Protein Modules An Introduction to Bioinformatics.
O-linkage to GalNAc N-linkage to GlcNAc Posttranslation Modifications 1 Vitamin K-Dependent Modifications Vitamin K is a cofactor in the carboxylation.
Genome Annotation BCB 660 October 20, From Carson Holt.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
TGCAAACTCAAACTCTTTTGTTGTTCTTACTGTATCATTGCCCAGAATAT TCTGCCTGTCTTTAGAGGCTAATACATTGATTAGTGAATTCCAATGGGCA GAATCGTGATGCATTAAAGAGATGCTAATATTTTCACTGCTCCTCAATTT.
Protein Bioinformatics Course
Genome Annotation using MAKER-P at iPlant Collaboration with Mark Yandell Lab (University of Utah) iPlant: Josh Stein (CSHL) Matt Vaughn.
What is comparative genomics? Analyzing & comparing genetic material from different species to study evolution, gene function, and inherited disease Understand.
PENTOSE PHOSPHATE SHUNT or HEXOSE MONOPHOSPHATE PATHWAY This pathway consists of two parts: 1) Glucose-6-P undergoes two oxidations by NADP +, the second.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
1 Orthology and paralogy A practical approach Searching the primaries Searching the secondaries Significance of database matches DB Web addresses Software.
Recerca de selenoproteïnes en el genoma d’organimes eucariotes Bioinformàtica, UPF.
Muhammad Awais PhD Biochemistry 08-ARID-1103 Understanding Basic Local Alignment Search Tool.
Figure 1. Current concepts of how drought increases the generation of reactive oxygen species (ROS) in photosynthesis. A. Cartoon of leaf section in well-watered.
Browsing the Genome Using Genome Browsers to Visualize and Mine Data.
Web Databases for Drosophila Introduction to FlyBase and Ensembl Database Wilson Leung6/06.
Protein Structure & Modeling Biology 224 Instructor: Tom Peavy Nov 18 & 23, 2009
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Pattern Matching Rhys Price Jones Anne R. Haake. What is pattern matching? Pattern matching is the procedure of scanning a nucleic acid or protein sequence.
Finding genes by comparing genomes roderic guigó i serra imim/upf/crg, barcelona.
Supplementary teaching slides for: Generating disulfides in multicellular organisms: emerging roles for a new flavoprotein family Colin Thorpe and Donald.
Oxydo-reducteur systems and the thiol-dependent antioxidant equipment in Melampsora Benjamin Selles, Nicolas Rouhier, Eric Gelhaye.
Minerals as co-ezymes Dr. Shariq Syed Shariq AIKC/SYB/2014.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Bioinformatics and Computational Biology
PROTEIN PATTERN DATABASES. PROTEIN SEQUENCES SUPERFAMILY FAMILY DOMAIN MOTIF SITE RESIDUE.
A Non-EST-Based Method for Exon-Skipping Prediction Rotem Sorek, Ronen Shemesh, Yuval Cohen, Ortal Basechess, Gil Ast and Ron Shamir Genome Research August.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
Bioinformatics Workshops 1 & 2 1. use of public database/search sites - range of data and access methods - interpretation of search results - understanding.
Tweaking BLAST Although you normally see BLAST as a web page with boxes to place data in and tick boxes, etc., it is actually a command line program that.
Rita Casadio BIOCOMPUTING GROUP University of Bologna, Italy Prediction of protein function from sequence analysis.
What is BLAST? Basic BLAST search What is BLAST?
The Biologist’s Wishlist A complete and accurate set of all genes and their genomic positions A set of all the transcripts produced by each gene The location.
PENTOSE PHOSPHATE SHUNT or HEXOSE MONOPHOSPHATE PATHWAY This pathway consists of two parts: 1) Glucose-6-P undergoes two oxidations by NADP +, the second.
Protein s-nitrosylation analysis Nitric oxide (NO) is produced by three isoforms of nitric oxide synthase (NOS) and is a chemical messenger that reacts.
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
What is BLAST? Basic BLAST search What is BLAST?
bacteria and eukaryotes
Genome Annotation (protein coding genes)
Basics of BLAST Basic BLAST Search - What is BLAST?
Basics of Comparative Genomics
Sequence based searches:
Bioinformatics Madina Bazarova. What is Bioinformatics? Bioinformatics is marriage between biology and computer. It is the use of computers for the acquisition,
PENTOSE PHOSPHATE SHUNT or HEXOSE MONOPHOSPHATE PATHWAY
Eukaryotic selenoprotein genes identified so far.
Genome Annotation Continued
Predicting Active Site Residue Annotations in the Pfam Database
Bioinformatics and BLAST
Mitochondria, Oxidants, and Aging
There are four levels of structure in proteins
Prediction of selenoprotein genes in eukaryotic genomes roderic guigó i serra, bioinformatica, UPF curs 2005/ /29/2018 Bioinformatica UPF març.
Recerca de selenoproteïnes en el genoma d’organimes eucariotes
by Sean X. Gu, Jeff W. Stevens, and Steven R. Lentz
Dr Tan Tin Wee Director Bioinformatics Centre
Prediction of protein function from sequence analysis
What do you with a whole genome sequence?
The production of reduced or oxidized glutathione can occur at various stages during the formation of native disulphide bonds. The production of reduced.
Applying principles of computer science in a biological context
Basics of Comparative Genomics
Hydrogen Peroxide Sensing and Signaling
Welcome - webinar instructions
Volume 2, Issue 2, Pages (March 2009)
Regulating gene expression
DNA AND RNA 12-5 Gene Regulation.
Selenium nutrition. Selenium nutrition Selenium - Functions Composition of amino-acids – seleno-cysteine (SeCys) and seleno-methionine (SeMet) Enzymatic.
Bioinformatics Methods in University of Nebraska - Lincoln
Presentation transcript:

Recerca de selenoproteïnes en el genoma d’organimes eucariotes Bioinformàtica Didac Santesmasses PhD student didac.santesmasses@crg.eu Bioinformatics and genomics programme Roderic Guigó's group Centre for Genomic Regulation, Barcelona

Structure of a selenoprotein gene

Selenoprotein families include selenoproteins and cysteine homologues (= orthologues or paralogues)

Selenoproteins are generally misannotated …protein DBs have the same limitations Percentages are computed by comparison Selenoprofiles-Ensembl annotations – see Mariotti and Guigó, 2010 - Bioinformatics.

Bioinformatics methods for selenoproteins De novo: Selenogeneid (Castellano et al. 2001) Homology based approaches: UGA / Sec or UGA / Cys alignments (e.g. Kryukov et al. 2003) - Selenoprofiles (Mariotti and Guigó 2010) - Seblastian (Mariotti et al. 2013) SECIS prediction: - SECISearch (Kryukov et al. 2003) - SECISearch3 (Mariotti et al. 2013) – explained later

Why selenocysteine? What is cysteine used for? mainly for: Disulfide bonds Inter or intramolecular, important for the folding of proteins, their stability, and in many cases necessary for catalysis. Thioredoxins are a large class of proteins that perform redox reactions using thiol/disulfide switches (catalytic cysteines).

Why selenocysteine? The term “Thioredoxin” generally designates small oxidoreductase proteins, constituing a pool of enzymes for anti-oxidant defense. The thioredoxin system includes these proteins and those operating on them, both using them as electron donors, or maintaining them reduced. Plenty of other proteins possess a thioredoxin-like fold, relying on the same local structure that includes a thiol/disulfide switch, and performing redox functions. Selenocysteine is found almost always replacing a catalytic cysteine in redox proteins. Many selenoproteins possess a thioredoxin-like fold. Some are involved in the thioredoxin system. Consistently, selenocysteine is more reactive than cysteine, and it is a better reductant. Anyway, some selenoproteins possess totally unrelated functions (e.g. SelJ = eye crystallin, SelP = selenium storage and transport)

Thioredoxin Reductases (TR) Examples: Thioredoxin Reductases (TR) S +NADPH+H+ TR SH Trx Trx +NADP- S SH 3 paralogous genes are present for this family in human, all of which are selenoproteins. Sec is present as penultimate residue, and it is catalytic. TR are the only responsible for the reduction of thioredoxins in cell. TR3: mitochondrial Found in all vertebrates TGR (TR2): can reduce glutathione disulfide. Contains a glutaredoxin (Grx) domain Found only in tetrapodes TR1: cytosolic.

Glutathione Peroxidases (GPx) R-OOH + 2 GSH R-OH + H2O + GSSG GPx 3 groups: GPx1/GPx2, GPx3/GPx5/GPx6, GPx4/GPx7/GPx8 Reduces superoxides at expenses of glutathione (GSH). 8 paralogues in human, 5 with Sec GPx1: cytosolic, abundant in liver and red blood cells GPx2: cytosolic, found in liver and gastrointestinal system GPx3: secreted, found in plasma and intestine GPx4: cytolic / mytochondrial, abundant in testis. Can reduce phospholipid hydroperoxides GPx5: secreted/membrane bound. Found only in epididymis GPx6: secreted (?). Found in olfactory epithelium and embryonic tissues GPx7: secreted (?) GPx8: membrane bound (?) GPx6 converted to Cys independently in 3 species

Selenoproteins in mammals / vertebrates 20 duplications 9 gene losses 13 Sec  Cys 28

This is very different than the situation in insects: Among vertebrates, selenoproteins are quite conserved: most of them are found in all species, few transformations occurred. This is very different than the situation in insects: (Chapple and Guigó, 2008)

Selenoproteins as test case Selenoproteins have the peculiar characteristic of possessing a UGA codon, recoded because of the presence of the SECIS element. If you learn how to predict selenoproteins, you are able to do the same with any “standard” protein family. Bioinformatics project: find all selenoproteins in a given genome

UPF Biologia. Curs 2007-17 2007/08 – 2008/09: find all selenoproteins in a given protist genome 2009/10 – 2011/12: find a given selenoprotein family in all protist genomes 2012/13 – 2016/17: find all selenoproteins in a given vertebrate genome

selenoproteins in vertebrates Project 2016-17 selenoproteins in vertebrates Objective: find all selenoproteins in a given vertebrate genome http://bioinformatica.upf.edu/

selenoproteins in vertebrates Project 2016-17 selenoproteins in vertebrates viquipedia: species description selenoDB: insert your selenoprotein genes predictions into a real world database. Available to the scientific community.

Insert genes form

Useful resources Sequences of your assigned species: Ensembl Collection of genomes (and annotations) Your assigned genomes will be available in the UPF computers when you will start the project

Useful resources Sequences of your assigned species: NCBI nucleotide Collection of all sequences (genomes, ESTs, etc) If you do not find something that you expect to be there, look for other sources of sequences. Most genomes today are low/medium-quality

Useful resources Selenoprotein sequences: SelenoDB 2.0 (and 1.0) Database containing manual annotations for human, and automated annotations (selenoprofiles) for other vertebrate species. NCBI protein NCBI hosts the sets of all known proteins. Noisy, but comprehensive. You can find here more selenoprotein sequences searching by homology (blast) or by keywords

Useful resources Tools: Blast - typically tblastn Exonerate - protein2genome mode Genewise T-coffee Webserver with SECISearch3 and Seblastian: http://seblastian.crg.es/ S14. Anotació de genomes (I) S15. Anotació de genomes (II)

SECISearch 3 Mariotti et al. 2013 Based on a manually curated 2ndary structure alignment Combines up to 3 methods to ensure maximum sensitivity Filter and grading procedure based on manual inspection of hundreds of SECIS elements Mariotti et al. 2013

annotated homologue(s) (Sec/Cys) in the reference protein database Seblastian Assumptions: the presence of a detectable SECIS within acceptable genomic distance from the Sec-UGA annotated homologue(s) (Sec/Cys) in the reference protein database Concept: you look for SECISes, then upstream for selenoprotein coding sequences We’re running a large scale project, we are confident to identify new selenoproteins in basal eukaryotes Mariotti et al. 2013

Notes for the project Results must be presented in a web page with the structure of a scientific paper aminoacid sequences; SECIS sequences genes in gff format (insert selenoDB form) All genes should be as complete as possible: starting with a AUG, ending with a stop codon, and with an identified SECIS element downstream Ignore alternative isoforms (if any), just choose one

Notes for the project Report also the genes of selenoprotein machinery: SecS, eEFsec, pstk, SBP2, SPS1, (SPS2). Ignore tRNAsec, for technical reasons

Common pitfalls Zero, one or many genes? Careful with superfamilies, and gene duplications Know what to expect Genome assemblies are not perfect!

Evaluation - results - discussion - methods The projects will be evaluated based on: - results you are expected to find all selenoproteins in your assembly - discussion interpret your results logically - methods scripting is encouraged (but not compulsory) - presentation the web page should present the work as clearly as possible