How to publish in a format that enhances literature-based discovery? All Course Materials : sci.ai/fsci2017 Join Community : sci.ai/community @sci_ai
Key question of our work : How to extract molecules of knowledge from research results so that new discoveries can be built from it? ? ?
Knowledge Graph of the Literature-Based Discovery
Defining molecules of knowledge in the text means: 1. Conceptualizing terms “cyclooxygenase-2” is a http://identifiers.org/uniprot/P35354 2. Identifying relationships between terms
Information Representation and Operations with Information
Linguistic Levels Linguistics Levels Corresponding Algorithms of Reading and Understanding in Computational Linguistics Pragmatics Sentiment analysis Topic segmentation and recognition Anaphora resolution Question answering Semantics Named entity recognition (NER) Word sense disambiguation Relationship extraction Syntax Parse (syntax) tree Morphology Stemming Lemmatization Part-of-Speech (POS) Tagging * Line between Pragmatics and Semantics is very blurred https://www.boundless.com/psychology/textbooks/boundless-psychology-textbook/language-10/introduction-to-language-60/the-structure-of-language-234-12769/
Basics Information Retrieval Operates with Syntactical Representation Document and query are represented in vector space model: If there is term, then corresponding value in the vector is non-zero Relevant document ~ higher syntactic (string form) similarity to the query 1. https://en.wikipedia.org/wiki/Vector_space_model 2. https://nlp.stanford.edu/IR-book/html/htmledition/a-first-take-at-building-an-inverted-index-1.html
Semantic Biomedicine Semantic Web Features: 1. Web of Data 2. The same language for data exchange. 3. Relationships and Ontologies. Royer L., Linse B., Wächter T., Furch T., Bry F., Schroeder M. (2007) Querying Semantic Web Contents. In: Baker C.J.O., Cheung KH. (eds) Semantic Web. Springer, Boston, MA https://link.springer.com/content/pdf/10.1007%2F978-0-387-48438-9.pdf
Literature-based discovery Knowledge extraction Reasoning Hypothesis Generation
How to publish in a format that enhances literature-based discovery? That is how search engines see your paper.
Standard vs XML-centered Publishing Process [Bazargan K. A complete end-to-end publishing system based on JATS.https://www.ncbi.nlm.nih.gov/books/NBK279828/]
Semanticized Paper Publishing Workflow
Semanticization: conceptualizing terms and defining relationships
Practice. Single paper semanticizing Step 1. Write research in Google Docs (support of other text editors is upcoming) Step 2. Send for automatic semanticization Step 3. Validate results in app.sci.ai
Practice. Validation of the semanticization results. Validate biomedical concepts in the text. 1. It is / it is not a bio object 2. Confirm / Reject proposed term-to-concept in ontology relationship 3. Add custom ontology concept
Practice. Validate Facts Extracted From The Text Validate and label new facts: 1. Confirm / Reject Facts. 2. Create New Facts.
Practice. Single paper knowledge graph
Practice. Export semanticized paper in JATS and HTML
Recipients of the Publishing Formats Humans Machines HTML PDF Printed JATS RDF RDF/XML CSV / XML / JSON Data API That is how search engines see your paper.
Extended JATS with biomedical metadata http://demo.sci.ai/jats/full-example.xml
Extended JATS validation http://demo.sci.ai/jats/full-example.xml JATS specific validator. XML validator against specific tag set https://www.ncbi.nlm.nih.gov/pmc/tools/xmlchecker/ Generic RDF/XML validator https://www.w3.org/RDF/Validator/
HTML + RDFa with Biomedical Microdata
HTML Page Layout Downloadable version of the paper in JATS, RDF/XML, RDF etc. HTML with hints for reading https://doi.org/10.3389/fncel.2017.00074
How Google reads HTML+RDFa https://search.google.com/structured-data/testing-tool/u/0/#url=http%3A%2F%2Fdemo.sci.ai%2Fsgc%2Fhdac6.html
Practice. Publishing preprint. Annotating with hypothes.is
How would you use semantic data layer in your research communication? e-mail: roman.gurinovich@sci.ai All Course Materials : sci.ai/fsci2017 Join Community : sci.ai/community
Bonus. Semantic Biomedicine Fundamentals
Bonus. Semantic Biomedicine Applications Applications and Technologies
Nanopublications Mons, Barend and Velterop, Jan. "Nano-Publication in the e-science era." Paper presented at the meeting of the International Semantic Web Conference, 2009. https://www.w3.org/wiki/images/4/4a/HCLS$$ISWC2009$$Workshop$Mons.pdf http://nanopub.org
Bonus Practice. Authors Authentication and Labeling with ORCID ID