Functional Annotation Final Results

Slides:



Advertisements
Similar presentations
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
Advertisements

Homology Based Analysis of the Human/Mouse lncRNome
Basic Genomic Characteristic  AIM: to collect as much general information as possible about your gene: Nucleotide sequence Databases ○ NCBI GenBank ○
January 25, Current and Future Database (CH)  Indexing vgd_common (JM; 1Q)  Fully implement Taxonomy tables (JO, DD; 2Q)  Allow subspecies-level.
Bioinformatics for biomedicine Summary and conclusions. Further analysis of a favorite gene Lecture 8, Per Kraulis
Tools to analyze protein characteristics Protein sequence -Family member -Multiple alignments Identification of conserved regions Evolutionary relationship.
UCSC Archaeal genome browser Advanced browsing September 19, 2006 David Bernick, Aaron Cozen and Todd Lowe September 19, 2006 David Bernick, Aaron Cozen.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
Today’s menu: -UniProt - SwissProt/TrEMBL -PROSITE -Pfam -Gene Onltology Protein and Function Databases Tutorial 7.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProt Jennifer McDowall, Ph.D. Senior InterPro Curator Protein Sequence Database:
Data retrieval BioMart Data sets on ftp site MySQL queries of databases Perl API access to databases Export View.
Claire O’Donovan EMBL-EBI. In UniProtKB, we aim to provide… o A high quality protein sequence database A non redundant protein database, with maximal.
Predicting Function (& location & post-tln modifications) from Protein Sequences June 15, 2015.
Genome Evolution: Duplication (Paralogs) & Degradation (Pseudogenes)
Pattern databasesPattern databasesPattern databasesPattern databases Gopalan Vivek.
Wellcome Trust Workshop Working with Pathogen Genomes Module 3 Sequence and Protein Analysis (Using web-based tools)
01/03/2013UK NEQAS UV Participants Meeting 2013 in a quality perspective.
NCBI Review Concepts Chuong Huynh. NCBI Pairwise Sequence Alignments Purpose: identification of sequences with significant similarity to (a)
BASys: A Web Server for Automated Bacterial Genome Annotation Gary Van Domselaar †, Paul Stothard, Savita Shrivastava, Joseph A. Cruz, AnChi Guo, Xiaoli.
Multiple Alignments Motifs/Profiles What is multiple alignment? HOW does one do this? WHY does one do this? What do we mean by a motif or profile? BIO520.
COURSE OF BIOINFORMATICS Exam_31/01/2014 A.
Module 3 Sequence and Protein Analysis (Using web-based tools) Working with Pathogen Genomes - Uruguay 2008.
Genome databases and webtools for genome analysis Become familiar with microbial genome databases Use some of the tools useful for analyzing genome Visit.
1 of 38 Data Mining in Ensembl with BioMart. 2 of 38 Simple Text-based Search Engine.
Functional Annotation 基因功能预测 唐海宝 基因组与生物技术研究中心 2013 年 11 月 23 日.
RNA Sequencing I: De novo RNAseq
Construction of Substitution Matrices
Current Challenges in Metagenomics: an Overview Chandan Pal 17 th December, GoBiG Meeting.
Fea- ture Num- ber Feature NameFeature description 1 Average number of exons Average number of exons in the transcripts of a gene where indel is located.
Genome Annotation Rosana O. Babu.
Variant Prioritization in Disease Studies. 1. Remove common SNPs Credit: goldenhelix.com.
Alastair Kerr, Ph.D. WTCCB Bioinformatics Core An introduction to DNA and Protein Sequence Databases.
Protein and RNA Families
Building WormBase database(s). SAB 2008 Wellcome Trust Sanger Insitute Cold Spring Harbor Laboratory California Institute of Technology ● RNAi ● Microarray.
Genome annotation and search for homologs. Genome of the week Discuss the diversity and features of selected microbial genomes. Link to the paper describing.
Protein Structure Database for Structural Genomics Group Jessica Lau December 13, 2004 M.S. Thesis Defense.
EBI is an Outstation of the European Molecular Biology Laboratory. UniProtKB Sandra Orchard.
August 2008Bioinformatics tools for Comparative Genomics of Vectors1 Genome Annotation Daniel Lawson EBI.
Construction of Substitution matrices
Gene Structure and Identification III BIO520 BioinformaticsJim Lund Previous reading: 1.3, , 10.4,
Group discussion Name this protein. Protein sequence, from Aedes aegypti automated annotation >25558.m01330 MIHVQQMQVSSPVSSADGFIGQLFRVILKRQGSPDKGLICKIPPLSAARREQFDASLMFE.
InterPro Sandra Orchard.
Protein databases Petri Törönen Shamelessly copied from material done by Eija Korpelainen and from CSC bio-opas
 What is MSA (Multiple Sequence Alignment)? What is it good for? How do I use it?  Software and algorithms The programs How they work? Which to use?
BLAST: Basic Local Alignment Search Tool Robert (R.J.) Sperazza BLAST is a software used to analyze genetic information It can identify existing genes.
Protein families, domains and motifs in functional prediction
The Transcriptional Landscape of the Mammalian Genome
COMPUTATIONAL FUNCTIONAL and structural annotation OF HYPOTHETICAL PROTEINS OF Neisseria Meningitidis MC58 Suresh Kumar* Senior lecturer, Department.
GSEA-Pro Tutorial Anne de Jong University of Groningen.
Protein Families, Motifs & Domains.
Genome Sequence Annotation Server
Sequence based searches:
Mirela Andronescu February 22, 2005 Lab 8.3 (c) 2005 CGDN.
[Rz/Rz1, LysB/LysC, gp u/v] proteins of Lytic Cassette
Functional Annotation Background and Strategy
GEP Annotation Workflow
There are four levels of structure in proteins
Genome Annotation w/ MAKER
Geneomics and Database Mining and Genetic Mapping
Identify D. melanogaster ortholog
PANTHER (Protein Analysis Through Evolutionary Relationships): Trees, Hidden Markov Models, Biological Annotations Paul Thomas, Ph.D. Division of Bioinformatics.
Quantification of antibiotic resistance marker and virulence factor abundances on subway surfaces. Quantification of antibiotic resistance marker and virulence.
A brief on: Domain Families & Classification
2 Unité de Biométrie et d’Intelligence Artificielle (UBIA) INRA
GSEA-Pro Tutorial Gene Set Enrichment Analysis for Prokaryotes
TF candidate selection pipeline.
Functional Annotation Group
Identification of acetylated peptides and proteins by LC-MS/MS.
Overview of Enzyme, Protein and Network Databases
A brief on: Domain Families & Classification
Presentation transcript:

Functional Annotation Final Results

Annotations : Coding Regions

Pipeline: Coding Regions

Structure annotation Lipo-Protein - LipoP Signal-peptide - SignalP Transmembrane proteins-TMHMM Protein domain identification-Interproscan and CD search

LipoP

Interproscan and CD search The results for the two softwares is similar but we used CD search because the cas genes were highly annotated by CDD database which is not included Interproscan.

Interproscan and CD search The results for the two softwares is similar but we used CD search because the cas genes were highly annotated by CDD which is not included in Interproscan.

Function annotation Specific Annotation : Virulence Factors-VFDB Antibiotic resistance proteins- CARD Overall Annotation : Uniprot Refseq

VFDB and CARD results This table shows the number of virulent and antibiotic resistant genes annotated by each database, these results show that the virulent genes are mostly conserved across the serogroups.

Gene Ontology The table shows the classification of the gene ontologies for our samples

Pathway annotation MetaCyc UniPathway Kobas-KEGG Orthologous Based Annotations

Pathway Results Annotated Genes Kegg orthology based annotation gave us the more annotations than the other two databases, however we wanted to include experimental database too from MetaCyc. MetaCyc is available through Interpro.

Pathway Results Total Number of Genes Annotated genes

Gene Naming E-value<10e-50 Alignment length>100 Coverage>60% Absolute Function All 3 criteria fulfilled E-value<10e-50 Alignment length>100 Coverage>60% Conserved Hypothetical Function All criteria meet with”hypothetical” term Hypothetical Proteins No criteria fulfilled

Number of Genes as per naming

Total Annotated Genes vs Total number of Genes Predicted

1827 834 640 355

Annotations : Non - Coding Regions

RNAs and CRISPRs This table shows the number of RNA elements found in each sample. CRISPRs were fairly difficult to annotate because each software (minced/Rfam/CRISPR finder) had their own algorithm to identify crisprs leading to inconsistency with the results obtained. We concluded to use CRISPR finder through CRISPR DB.

Deliverables Completely annotated gff files uploaded to the server. The annotations included the following information: id=<gene ID>;Name=<gene symbol>;signature=<Gene name>;GO=<GO ID>;Interpro=<Interpro ID>;KEGG=<KEGG ID>;UniPathway=<UniPathway ID>;MetaCyc=<MetaCyc ID>;TMHMM=<Yes/NO>;Coils=<Yes/No>;SignalP=<Yes/No>;LipoP=<Yes/No>;comment=< LOCATION;CATALYTIC ACTIVITY;SIMILARITY;PATHWAY>

Questions???