Download presentation
Presentation is loading. Please wait.
1
Tools For Vertebrate Gene Naming
Bethan Yates – HGNC SAB 2015
2
Key aims of the VGNC project
To coordinate the naming of genes across vertebrate species. Initial work has focused on identifying a consensus set of 1:1 orthologs between chimpanzee and human that could be named in a semi-automated manner and creating a system to allow this. To assign gene names within complex gene families across multiple vertebrate species. We are working on developing a curatorial interface for gene superfamily annotation by expert collaborators, focusing initially on cytochrome P450s and olfactory receptors.
3
1:1 consensus orthologs in in chimp
58 Ensembl NCBI Panther OMA 10,834 96 375 92 545 1153 68 100 4555 5837 296 229 2915
4
Curation Database A MySQL database schema has been designed to store vertebrate gene symbols and their associated data This database has been populated with a set of chimp genes provided by identifying consensus 1:1 orthologs with human genes using out HCOP tool. In this case consensus was based on agreement between OMA, Panther, Ensembl Compara and NCBI’s “ortholog gene group data”. This seed set of 10,834 chimp genes is being used to test the database schema and curation tools. Our curators have currently been able to use the system to assign approved gene names and symbols to 6000 chimpanzee genes.
5
Database schema
6
Curation Tools Website
A new website has been developed, this site provides tools for VGNC curators to input, access and edit vertebrate gene nomenclature data. Access is restricted to people with user accounts. This website is for internal use only and will not be made accessible to the general public. New curation tools will be added to this site as needed.
8
Quick curate tool
9
Quick curate user interface
10
Tool to restrict human nomenclature data to human genes only
11
Symbol list
12
Preview Symbol Report
13
Family upload tool – work in progress!
14
VGNC database Database schema is identical to the curation database
Contains vertebrate genes that have had their nomenclature data approved by our curators. Updated daily Provides the data that will be displayed at Used to generate download files for FTP site
15
This will be the public facing website and will mirror the website for human data, The site is currently in development, we hope to have a beta site out early in 2016. Initially the site will be very simple and will host symbol reports and a basic search facility for chimp genes as well as download files for the curated chimp data.
16
Approved symbol list/search
17
Symbol report
18
Statistics and Downloads
19
Future Plans: New Species
We need to identify which species we should be working on next and are happy to take input from the SAB members. Species : # of protein coding genes we can name using HCOP: Total # of protein coding genes (taken from Ensembl) Chimp 10,834 18,749 Cow 11,574 19,994 Dog 10.748 19,856 Horse 9909 20,449 Macaque 9745 21,905 Chicken 7649 15,508 Opossum 6605 21,327 Zebrafish 4974 25,642 Platypus 1415 21,698
20
Future Plans: New Tools/Features
For herd.genenames.org: Gene family curation tool Comprehensive gene curation tool Gene mapping tool For vgnc.genenames.org: SOLR powered search and REST service BioMart server/Custom downloads tool? Data Submission tools Sequence Alignment tool
21
Future Plans: Gene family information
Complete work on the family upload tool to allow data that has been already been curated by our gene family experts to be entered into the VGNC database and website. Enable display of vertebrate gene family data on the VGNC website, replicating the displays we have for human gene family data on the HGNC site. Continue working with our family experts to provide tools to enable them to more easily curate gene family data.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.