BAA - Big Mechanism using SIRA Technology Chuck Rehberg CTO at Trigent Software and Chief Scientist at Semantic Insights™
The Big Mechanism Vision Big Mechanisms are causal, explanatory models of complicated systems in which interactions have important causal effects. The collection of Big Data is increasingly automated, but the creation of Big Mechanisms remains a human endeavor made increasingly difficult by the fragmentation and distribution of knowledge. To the extent that we can automate the construction of Big Mechanisms, we can change how science is done. The Big Mechanism program will develop technology to read research abstracts and papers to extract fragments of causal mechanisms, assemble these fragments into more complete causal models, and reason over these models to produce explanations. The domain of the program will be cancer biology with an emphasis on signaling pathways. - Broad Agency Announcement - Big Mechanism, DARPA-BAA-14-14, January 30, 2014
The Big Challenge Valuable information is locked-up in research abstracts and papers The rate of change of information can be high The information is fragmented Keyword search and statistical methods for locating the information of interest in documents still leaves the human to determine and extract the useful information Effectively automating the identification and extraction of useful information requires some level of natural language understanding Automating the extraction of useful information from natural language texts (i.e. without human interpreters) requires the application of both domain and world knowledge
About Semantic Insights™ Who are we? “Semantic Insights” is the R&D division of Trigent Software, Inc. What is our Mission? Automate research tasks (faster, better, cheaper) Why Semantics? Semantics allows us to operate at the “meaning level”, to “separate the know from the show”* Why us? Bright People, Proprietary Technology (IP), Passion for Excellence and Track Record of Delivery *
The Semantic Insights Research Assistant (SIRA) Project Mission: – Research: The SIRA Technology was developed to automate research tasks requiring natural language, domain-knowledge, understanding and reasoning. – Development: All SIRA-based products must be easy-to-use requiring little or no training beyond what the user already understands. Mission Status: – Trigent Software has spent nearly 10 years doing fundamental self-funded research resulting in 6 patents for a fast scaleable inference engine, as well as, numerous aspects of natural language processing and natural language understanding – Our current clients are “early adopters” with a time-critical need for specific detailed information that cannot be met using conventional technology – We have begun testing products based on this research – More fundamental research needs to be done to realize its full potential Today the SIRA Technology can: 1.Semantically understand a statement of your interest expressed in Natural Language (i.e. your research statements) 2.Use that understanding to Read through a vast number of documents 3.Quickly identify the semantically relevant information of interest in a large corpus of Natural Language text 4.Restructure and Report the findings in useful ways including Natural Language text
Goal of the SIRA Technology Goal: Automate the capture and application of domain knowledge, world knowledge and experience to automate the extraction and reporting of useful information from natural language texts. 1.Automate To the extent possible remove the requirement for human guidance or interpretation 2.Knowledge Semantic Quanta in relation, representing a model of what is, or is possible (e.g. a robust kind of Ontology) Linguistic representation of Semantic Quanta (e.g. a rich dictionary) Inferences and Implications Experience 3.Extract Recognize and map natural language into semantic item clusters 4.Useful information Information that is sought, already known, or directly related (skip over the rest) 5.Report Translate “report requests” into various aggregations and representations of only the useful information 6.Natural language texts Natural language prose formatted as text, pdf, doc, docx, html…
What is required to automate extraction and presentation of useful information from natural language text? 1.Natural Language Understanding – Natural Language Processing (NLP) – Determine possible meanings in linguistic and semantic context – Requires enough domain and world knowledge 2.Adding to the Understanding – Using the system, adds to the same domain and world knowledge already used in Natural Language Understanding 3.Reasoning, Querying and Reporting – Finding only the require information in domain and world knowledge – Applying reasoning algorithms to add to the domain and world knowledge
Requirements for Natural Language Understanding 1.The various relationships expressed in each sentence need to be identified and understood in context. 2.The relationships expressed across sentences need to be identified and understood in context. 3.The possibly valid senses of each term in a sentence need to be identified and evaluated in context. 4.Multiple ways of expressing the semantically same relationships need be recognized. 5.Multiple ways of expressing the semantically same terms need be recognized. 6.The mapping of natural language expressions to ontological expressions needs to be identified and processed in context. 7.Evidentiary and implication relationships need to be identified and evaluated in context. 8.Context in natural language text needs to be identified and exploited at various levels of scoping.
Requirements for Adding to the Understanding 1.An adequate representation of the semantic items, including concepts, relationship, instances, properties and units of measure needs to be identified and populated 2.Representations of time and identity need to be defined and populated 3.An adequate representation of the definition of semantic items, including senses, synonyms, and rich linguistic metadata, needs to be identified and populated 4.The mapping of natural language expressions to ontological expressions resulting from “machine reading” needs to be exploited to update the ontology (concepts, relationship, instances, properties and units of measure) and dictionary. 5.Evidentiary and implication relationships need to be updated based on analysis of the updated Ontology.
Requirements for Reasoning, Querying and Reporting 1.The resulting ontology needs to be made widely available for ad hoc access (e.g. as a linked data endpoint queryable using SPARQL) 2.Researchers need to also be able to query the natural language corpus directly using natural language to discover new relationships not yet known in the ontology. 3.Various reasoning algorithms including, pattern discovery, analogy, and implication, need to be employed over the ontology to discover non-explicit knowledge 4.Reports in various representations (including fresh and quoted natural language prose) need to be generated from the ontology.
SIRA Approach to Natural Language Understanding
SIRA Approach to adding to Understanding
SIRA Approach to Reasoning, Querying and Reporting
BAA – Big Mechanism
SIRA Technology to Big Mechanism “Read”
SIRA Technology to Big Mechanism “Assembly”
SIRA Technology to Big Mechanism “Explanation”