Genetic Literature Curation at FlyBase-Cambridge Steven Marygold ABC meeting, December 2007 A Database of Drosophila Genes & Genomes
Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions
Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions
Group structure FlyBase FB-Indiana - website - fly stocks - image curation FB-Harvard - database - genome annotation - expression curation Group Manager Steven Marygold FB-Cambridge - bibliography - gene and phenotype curation - ontologies Literature Curators 3.25 FTEs GO Curator 1 FTE Reactome Curator 1 FTE Developer 1 FTE FB Ontology Editor 0.25 FTE Principal Investigators Michael Ashburner Nick Brown
Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions
Bibliography Search for string ‘Drosophil*’ in title, abstract or keywords Semi-automated search of publication databases –Medline, BIOSIS, ZooRec Manual searches of journal issues
Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions
Curation prioritization Types of publication curated: –Primary research papers –Supplemental information –Errata –Personal communications to FlyBase –Conference abstracts –Reviews –Books/Book chapters –Miscellaneous others
Curation prioritization 1.Prioritization of selected journals: Set of (~50) journals publishing on Drosophila biology Chronological, issue by issue curation 2.Prioritization of selected papers: Flagged by ‘skim curation’ Flagged by stock center Genes prioritized by GO project Alerted to by research community
Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions
Curation practice Access pdf Identify/select relevant paper Read abstract; skim-read intro Highlight curatable material within Results, Methods, Figures & legends, Tables Curate material into individual ‘proformae’ to form a ‘curation record’ Error-checking: - spelling - consistency - validity Completed records submitted for loading into Chado database
Curation practice Curated data classes (proforma types): –Publication –Gene –Allele –Aberration –Transgenic constructs –Transgenic insertions –Natural transposons
Curation practice Gene-level curated data: –valid FlyBase gene symbol/name –gene symbol/name used in paper –action gene rename or merge –action creation or deletion of gene –etymology of gene name –Sequence Ontology (SO) terms –cytological map position –relationship to cDNA/genomic clone –Gene Ontology (GO) terms –y/n flags to indicate paper has expression or annotation information
Curation practice Allele-level curated data: –valid FlyBase allele symbol/name –allele symbol/name used in paper –action allele rename or merge –action creation or deletion of allele –allele class –mutagen –nucleotide/amino acid changes –phenotype: class, anatomy, free text –genetic interaction: class, anatomy, free text –complementation data –associated transgenic construct/insertion –associated tag
Curation practice ! GENE PROFORMAVersion 50: 05 Oct 2007! ! G1a. Gene symbol to use in database :ey ! G1b. Gene symbol used in reference :ey ! G24a. GO -- Cellular component | evidence [CV] : ! G24b. GO -- Molecular function | evidence [CV] :calcium channel activity ; GO: | IDA ! G24c. GO -- Biological process | evidence [CV] :eye-antennal disc development ; GO: | IMP ! ALLELE PROFORMAVersion 39: 6 July 2007! ! GA1a. Allele symbol to use in database :ey[46] ! GA1b. Allele symbol used in paper :ey[461] ! GA56. Phenotypic | dominance class [bipartite CV] :visible | recessive ! GA17. Phenotype [CV, body part(s) where manifest] :eye anterior vertical bristle
Talk Outline 1.Group Structure 2.The FlyBase bibliography 3.Prioritizing curation 4.Curation practice 5.Curation support 6.Future directions
Curation support Curation support files –Text files of data from latest DB instance Ontology files –GO, SO, FB-anatomy, FB-phenotypes etc. PeeVeS –Proforma Validation Software Other custom scripts
Future directions More paper-by-paper prioritization ‘Skim curation’ –Manual curation –Automated curation? –User-submitted data Use of text-mining aids for ‘deep curation’ Review breadth and depth of curation Enhanced curation interface
Acknowledgements FB-Cambridge: Michael Ashburner (co-PI) Nick Brown (co-PI) Steven Marygold (Manager) Gillian Millburn (Literature curator) David Osumi-Sutherland (Ontology Editor and Literature curator) Ruth Seal (Literature curator) Peter McQuilton (Literature curator) Paul Leyland (Developer) Susan Tweedie (GO curator) Mark Williams (Reactome curator) Rachel Drysdale (former FB-Cambridge co-PI) Genetics Dept., University of Cambridge, UK The FlyBase Consortium NHGRI at the NIH