Elucidating effects of nerve injury on gene expression using . Bio-Ontologies SIG – July 8 2016 Alison Callahan, Matthew C. Danzi, Giulia Zunino, Daniel J. Cooper, Nigam H. Shah, Ubbo Visser, John L. Bixby, and Vance P. Lemmon
Spinal cord injury is a significant burden on individuals and the U. S Spinal cord injury is a significant burden on individuals and the U.S. healthcare system ~12,500 new SCI cases each year in the U.S. ~276,000 total individuals affected in 2014 average yearly expenses in the first year after injury range from $350K to >$1M depending on severity
There are no effective therapies for spinal cord injury – is this because experiments are not replicable? Steward et al. 2012. Replication and reproducibility in spinal cord injury research. Experimental Neurology. Prinz et al. 2011. Believe it or not: how much can we rely on published data on potential drug targets? Nature Drug Discovery. Mechanisms of injury image: Ruff and Fehlings. 2010. Neural stem cells in regenerative medicine: bridging the gap. Panminerva medica 52(2):125-147.
Aggregating and linking data across studies and experiment types is managed by individual scientists Sejnowski et al. 2014. Putting big data to good use in neuroscience. Nature Neuroscience.
We have lots of data at our disposal
Goal: Structure and integrate SCI research relevant data
The RegenBase ontology defines classes and properties specific to the SCI research domain 435 classes 18 object properties 8 data properties mappings to FMA and MPO based on lexical match of class labels
Getting information from the literature into RegenBase SCI + regeneration related publications Expert curation + RDF conversion pipeline + entity identifier mapping
MIASCI Online A tool for SCI researchers to curate publications 11 major sections: investigator, organism, surgery, perturbagen, cell transplantation, biomaterials, histology, immunohistochemistry, imaging, behavior, and data analysis and statistics.
Literature-sourced data model
Literature-sourced data model
Getting assay data into RegenBase Raw assay data Kinase activity assays Neurite outgrowth assays Data processing + RDF conversion + entity identifier mapping
Assay data model
Getting gene expression data into RegenBase Raw RNA-seq or microarray data Data processing + RDF conversion + entity identifier mapping
Gene expression data model
RegenBase content literature-sourced data: ~20,000 statements from 42 publications kinase activity data: effect of ~52,000 compounds on 476 kinases neurite outgrowth data: effect of ~1600 compounds on neurite outgrowth gene expression data: changes in gene expression after injury in rats and mice for > 40,000 genes and gene probes Callahan et al. 2016. RegenBase: a knowledge base of spinal cord injury biology for translational research. Database (Oxford) 16: baw040.
Gene expression data model is motivated by 3 use cases Image credits: - Protein - Thomas Splettstoesser (www.scistyle.com) https://en.wikipedia.org/wiki/Protein_domain#/media/File:Pyruvate_kinase_protein_domains.png - Sequence homology - Thomas Shafee https://upload.wikimedia.org/wikipedia/commons/b/b5/Histone_Alignment.png - Gene expression - http://bmccellbiol.biomedcentral.com/articles/10.1186/1471-2121-11-7
Use case #1: What genes significantly differentially expressed in DRGs after a peripheral nerve injury have a protein product with an RNA-recognition motif? Symbol Fold change P-value Time (hours) A1cf -0.744789 0.00178355 1 -0.544549 0.0147546 3 -0.449241 0.0453885 24 -0.532934 0.0173952 28 Acin1 -0.881353 0.0245572 72 Cirbp 1.23386 0.0115497 1.13581 0.0184748 8 1.40815 0.0202129 12 Cpsf6 1.03969 0.00164697 Cpsf7 -0.840796 0.00846955
Use case #2: Does the mouse gene CALM2 have any rat gene orthologues that are significantly differentially expressed in DRG neurons after a peripheral nerve injury?
Use case #3: What genes are differentially regulated at the early time points after injury, but then move toward to their homeostatic levels (or even go the opposite direction) later? Gene Symbol 1st Fold Change 1st P-value 2nd Fold Change 2nd P-value Pdpk1 5.59 4.56E-10 4.39 7.57E-08 Cdh22 5.26 2.85E-06 -0.056 0.999 Tcf4 4.52 9.24E-05 -0.088 Flrt3 4.51 1.07E-13 2.86 3.17E-08 Il6 4.42 1.42E-13 3.08 4.25E-09 Cacna2d1 4.35 1.73E-08 3.80 2.14E-07 Kcna1 4.33 0.000425 0.00793 Ap3s1 4.32 2.02E-09 4.13 3.70E-09 Gnb1 4.05 0.000169 3.56 0.000608 Gda 3.98 0.000372 3.85 0.000451
http://regenbase.org/example-sparql-queries
RegenBase and the Linked Data Web enable faster, easier SCI data search and analysis We have extended RegenBase with an important new data source, and the code we developed to do this is re-usable Each of the 3 research use cases require many researcher hours if executed “by hand”, each time a gene of interest is identified RegenBase reduces this time to minutes for query formulation and seconds for query response Using URI patterns, identifier mapping services, and Bio2RDF gives us data integration for free
What next? A RegenBase search tool, new methods and data sources for adding content to RegenBase, and working with the broader neuroscience community to extend to new domains
Acknowledgements Literature curators RegenBase team John Bixby Vance Lemmon Ubbo Visser Shah Lab @ Stanford funding: NLM R01s HD057632 and NS080145
Thank you! questions? more information available online http://regenbase.org - project description, simple paper browser, example queries, data download http://bioportal.bioontology.org/ontologies/RB - RegenBase ontology in BioPortal http://regenbase.stanford.edu:8890/sparql - SPARQL endpoint