Download presentation
Presentation is loading. Please wait.
Published bySilas Underwood Modified over 9 years ago
1
Ontology Driven Data Collection for EuPathDB Jie Zheng, Omar Harb, Chris Stoeckert Center for Bioinformatics, University of Pennsylvania
2
Issues associated with Data Collection Heterogeneity of free text Difficulty in data integration, requires human intervention Complex queries are limited 2
3
Examples: GenBank
4
Data Collection for EuPathDB Apply ontology to data submission form design – Form to collect sequence data and information on isolates of pathogens Geographic location from where isolate specimen collected Host organism information: species, age, clinical information – Genetic manipulation with resulting phenotype data collection form Mutation method Effects of genetic modification on the parasite and on the location, function, and involvement in biological process of the resultant modified protein These data are important for parasite epidemiology and research on vaccines and anti-parasitic drugs Enable Queries – Compare sequence data from Plasmodium isolates that are restricted to East Africa to those from West Africa and are controlled for age and health of hosts – List genes that when knocked out result in a defect in parasite growth during the erythrocytic cycle – List genes fused to green fluorescent protein (GFP) that when expressed are located in the cell membrane
5
EupathDB EupathDB (Eukaryotic Pathogen Database Resources ) is a NIAID Bioinformatics Resource Center covering Eukaryotic Parasites EuPathDB: a portal to eukaryotic pathogen databases.Aurrecoechea C, et al.Nucleic Acids Res. 2010
6
Isolate Data Need to import and integrate datasets from GenBank But GenBank did not specify needed metadata for isolates Manual curation required Harmonize: enable host queries: Human-> Homo sapiens Deconvolute descriptions in free text: isolated from storm waters isolated from Homo sapiens patient infected with HIV
7
Isolate Data: GenBank ->EuPathDB
8
Isolate Submission Form Target isolate information Geographic location Source organism samples information or Environmental samples information Sequence information
9
Ontology-based Representation of Isolate Data The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box
10
Isolate Submission Form
11
Ontology Selection
12
Excel Format Generally already collected in this format according to our community advisors – Lowers the barrier for usage Easily converted to GenBank submission- ready format automatically Allows multiple sequence submission
13
Parser for GenBank Submission
14
Genetic Manipulation and Phenotype Data T. bruceiRNAi knockdowns Integrate phenotype data from other resources (GeneDB) Allow individuals to submit phenotype data via the EuPathDB web site via User Comments on Gene pages Either way these are free text descriptions limiting utility for data exploration
15
Genetic Manipulation and Phenotype Submission Form Genetic Manipulation – Mutation method including selective marker, report if available – Mutation type (effect on gene function) Phenotype data – impact of genetic manipulation on four possible observed features: – Quality of the organism – Cellular location of gene product – Molecular function of gene product – Biological process of gene product
16
Ontology-based Representation of Genetic Manipulation with Resulting Phenotype Data The data collected in the submission form are in the bold font. The fields require ontology terms are in thick border box. Ontology for Parasite Lifecycle (OPL) will be used in the annotation of life cycle stage
17
Ontology-based Representation of Genetic Manipulation – Gene Knock Out
18
Genetic Manipulation Section OBI
19
Phenotype Section Cellular location Biological process GO OBI OPL GO PATO OBI
20
Web-based Form Collect the data directly from specific components of the EuPathDB web site Change dynamically based on user’s inputs (lifecycle stage based on species, display selective marker, report, etc. section when needed)
21
Future Work Submission forms are at the prototype stage Distribute isolate submission forms to EuPathDB users Incorporate genetic manipulation and phenotype form into EuPathDB website Evaluation of submission forms based on the data collected Improve the submission forms based on feedback
22
Acknowledgements Stoeckert Lab Haiming Wang and EuPathDB Team EuPathDB Community Dr. G Robinson, Dr. R Chalmers, Dr. CJ Janse, Dr. G. Widmer, Dr. L. Xiao, Dr. SM Khan Funding – NIH grant 5R01GM93132-1 – National Institute of Allergy and Infectious Diseases at the National Institutes of Health Award NO1-AI900038C Contract No. HHSN272200900038C
23
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.