Ontological Framework for Enabling Free-Form Search in Scientific Discovery Chaitali Gupta, Madhusudhan Govindaraju Grid Computing Research Laboratory SUNY Binghamton 5/5/2019 E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session:
Motivation Most computer users today do not have to write programs most end users of Grid and scientific data sets should be shielded from low-level details Web Search engines search billions of web pages use Natural Language Processing (NLP) and Information Retrieval (IR) technologies return many links for any given search XML based technology and ontologies can be used to categorize and organize information machine-readable and understandable manner retrieve specific information from Grid/scientific services. E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
Project Vision Our vision is that Web semantics can be leveraged to build search engine like interfaces even for Grid/Scientific Application Meta-Data. abstract away the fundamental complexity of XML based services specifications and toolkits Add a search box on portal dashboards Automatically convert queries to Job description specification formats All these related schemes work well for scientists who have a working knowledge of the query system. Our work extends the features provided by these systems with a free-form query based interface that provides ease-of-use for domain scientists without requiring them to learn any specific XML technology or query language details. E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
Related Work MDS. WSRF compliant service to publish/retrieve resource information Condor ClassAds. Combines schema, data, and query in a simple but powerful query specification language. Condor Gangmatching. Overcomes bilateral matching limitations of the ClassAds. E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
Comparing with SPARQL SPARQL Query Query in the Proposed Framework PREFIX dc:<http://example.org/dc/element/1.1/> PREFIX ns:<http:/example.org/ns#> SELECT ?machine-name ?CPU WHERE { ?x ns:cpu ?cpu. FILTER (?cpu > 2.0). ?x dc:machine-name ?machine-name. } “All machine names with CPU speed greater than 2.0 GHz” E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
Scope of Free-Form Queries The problem of processing and acting upon arbitrary English is an extremely challenging actively addressed in the AI community Use many techniques from NLP and semantic web Scope of our work is therefore limited cannot accept any free-form query designed to accept a limited form of English with a vocabulary taken from the ontology. E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
Example queries for New York State Grid (NYSGrid) List all sites of NYSGrid All Sites of NYSGrid with Xeon processors Processor configuration of nodes at Binghamton site of NYSGrid All machine names in NYSGrid with CPU speed greater than 2.0GHz speed Status of job ID 117 running on NYSGrid Names of 16 free nodes on the NYSGrid with at least 4GB of memory List all nodes of NYSGrid having CPU speed greater than 1Ghz and less than 4 Ghz E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
Example ontology model E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
System Components WSDL Processor User Query Interface Query Processor Match Processor Ontology Matcher Dictionary Matcher direct, stripped matching, hypernyms, hyponym Lexicon how people use words etc. Relevance Checker Glossary, input and output parameters of the Web service E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
Example query that lights up the model The Ontology Matcher retrieves the ontologies from the ontology repository and matches them with the user query. Ontologies built in OWL for storing the vocabularies concepts include “CPU”, “memory”, “storage”, “job”, etc. use Jena to process OWL models/statements <subject, object, predicate> E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
System Components Queries that hit Ontology Matcher have an average of 95% - 96% better performance benefit than those requiring both Ontology and Dictionary Matcher. E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
Performance of System Components Execution time taken by the major components E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
System Components Recall and Precision increases when domain dependent ontologies are considered. E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019
Research Challenges Design algorithms to automatically infer the context of user queries and map them to an appropriate set of Grid and scientific services. Automatically extend and update domain knowledge using Semantic Web techniques and WordNet. Build a feedback loop for cases that don’t work Enable construction of simple workflows multiple Grid services may be needed for a query merging results from different services E-science Microsoft Workshop 2008: Semantics Birds of a Feather Session: 5/5/2019