Download presentation
Presentation is loading. Please wait.
Published byAlbert Larose Modified over 6 years ago
1
Patrick McCauley CMSC 691S - Semantic Web Spring 2009
Cyc Patrick McCauley CMSC 691S - Semantic Web Spring 2009 Semantic Web in miniature
2
Why do we need Cyc? Evolution of rule-based expert systems
MYCIN - diagnosis of blood infections DENDRAL - chemical analysis Rule-based systems are brittle Cannot detect typos Only work within a specific domain Brittleness due to lack of “common sense” Rule-based systems were big in the 1970’s Brittle examples Typos - apply for a credit loan - been at a job for 20 years, but the applicant is only 18 years old Expert system can diagnose “rust spots” on a car as measles
3
What is Cyc? All apps can benefit from common sense
Began in 1984 by Doug Lenat Initial goals: KB and Ontology Building (pump priming) NL Understanding / Interactive Dialog Whether designing an online credit loan form or recording a patient’s medical history, all apps can benefit from common sense Lenat realized there was “no free lunch” - took a brute force approach to encoding basic human knowledge
4
Vocab 101 Knowledge - underlying heuristics that allow us to reason
Data - Facts or statements about specific items in the world Knowledge must be hand-crafted and entered into Cyc Data can be gleaned or even referenced from external sources
5
Where to start? How much does a system need to know in order to be useful? What kinds of knowledge are necessary? How should this knowledge be represented? Three basic questions that guide design of Cyc
6
Priming the Pump Need to encode basic, common sense knowledge “representing human consensus reality” - insulting to state these facts to another person E.g. “You have ten fingers.” Assumes ten is a number and that a person has a specific number of fingers. E.g. “Cardinals are red.” Assumes that cardinals are a type of bird and that birds have feathers which, in this case, are red. Also assumes “red” is a color. Answer to first question is tough to answer - depends on what the questions are. Answer to second question: There is a basic core understanding that is needed by ANY system. Includes pieces of data that another person would be insulted if you asserted this fact to them.
7
As data grows, so do inconsistencies
Too much data gives rise to inconsistencies Microtheory Internally consistent data module Explicitly represented logical context Cyc knows or is told which MTs should be used to solve a problem Dracula example “Who was Dracula?” “A vampire” - MICROTHEORY = Fictional Literature “Are vampires real?” “No” - MICROTHEORY = the real world Subdividing microtheories adds context and helps performance
8
Cyc KB Topic Map
9
How to represent knowledge?
CycL - augmented FOPC Each assertion in the KB carries a “truth value” Monotonically false Default false Unknown Default true Monotonically true Answer to third question: use CycL FOPC - First Order Predicate Calculus If a conflict in truth arises, Monotonically wins out (e.g. if something is Default false and Monotonically true, it is true) Most assertions are Monotonically or default true
10
What about external data?
SKSI - Semantic Knowledge Source Integration CycL used to describe external DB columns SKSI is a tool kit developed by Cycorp Helps keep KB small - just access data that exists elsewhere Also works well for data that rapidly changes (e.g. stock prices, news headlines) In effect, CycL can be used to create and execute SQL statements
11
Cyc I/O Communications between applications, external data sources is easy - well defined interfaces that can be implemented by computers Communication between users/authors is hard - need to define new knowledge without being technically savvy E.g. An ancient Roman historian knows a lot about Tarquinius Superbus but nothing about a regular expression
12
Natural Language Processing
Extremely difficult since human speech/language is ridiculously complex Written text often violates proper grammar, but its meaning is understood by humans Fred saw the plan flying over Zurich. Fred saw the mountains flying over Zurich.
13
CycNL to the rescue! Lexicon - “contains syntactic and semantic information about English words” Relationships between English words and Cyc constants are stored CycNL is divided into three parts - first part is the Lexicon
14
CycNL - Syntactic Parser
Uses a phrase-structure grammar, context free rules Builds multiple tree structures for each phrase/sentence However, some trees do not make “syntactic” sense Syntactic Parser is the second part Some trees do not make sense…this is where the third part comes in E.g. the phrase “with a telescope” can be interpreted in two ways: John used a telescope to see the light John saw the light which had a telescope
15
CycNL - Semantic Interpreter
Transforms results into CycL formulas Result is “pure” CycL Pure CycL can be used to make queries against the KB or extract data from external sources KB is consulted to see whether telescopes are used as instruments in seeing or if lights are things that usually have telescopes. Since the second statement causes a constradiction in the KB, it is rejected.
16
How is this useful to Humans?
Ambient Research Assistant flexibility and ease of communication are key Must be capable of “learning” Deciding what facts to learn Learning those facts Learning of rules Generalizing rules Testing and revision We now have vast quantities of data and complex relationships, so how do we find what we really want? Need to create an Assistant (Agent) who is capable of learning and anticipating needs 1. Deciding what facts to learn. An assistant system must reason about what knowledge gaps would be most cost-effective to fill in any given context. If a researcher is considering submitting a paper to an upcoming conference, finding submission dates and contact information is likely to be more useful than organizing older work, and should be a higher priority task. 2. Learning those facts. The factual gaps should be filled, from available documentation, online sources, and/or communication with the scientist being assisted. In the aforementioned example, the system should set out to learn any missing facts by appropriately querying all its available sources, both online ones and people, starting with the conference web site or call-for-papers and progressing to information that requires some knowledge of the research in question. The submission information may depend on the track to which the paper is being submitted, which requires knowledge of the research topic. 3. Learning of rules. Once knowledge is acquired, it is possible to hypothesize general rules. If several conferences have been identified, an assistant might correlate information about each of them and conclude that conferences in some broad field (e.g. machine learning) are often of interest, or that knowing submission dates is often useful. Such a rule can then guide the selection and prioritization of tasks. 4. Generalizing rules. Carrying this example through, an effective assistant might learn from one or more identified rules that, for some particular user or researcher, learning and then tracking dates by which some particular action must occur is valuable. 5. Testing and revision. The rules, especially the generalized rules, will need to be tested independently of how they were produced. For example, when a general rule about tracking dates is hypothesized, a system might discover after experimentation that it is less helpful to track and remind a user of recurring dates, such as a weekly report that must be made to an overview body. This discovery would force revision (tightening) of the generalized rule.
17
Benefits of Assistant Capable of searching much faster than humans
Availability - supercedes the 9-to-5
18
“Truly Intelligent” Assistant
Plan Recognition Learning NL
19
Acknowledgements CYC Website http://www.cyc.com/
CYC: A Large-Scale Investment in Knowledge Infrastructure Mapping Ontologies into Cyc Common Sense Reasoning – From Cyc to Intelligent Assistant
24
CycL is Cyc’s language "Bill Clinton belongs to the collection of U.S. presidents" and (#$isa #$BillClinton #$UnitedStatesPresident) "All trees are plants". (#$genls #$Tree-ThePlant #$Plant) "Paris is the capital of France." (#$capitalCity #$France #$Paris) "a fact about sets" (#$implies (#$and (#$isa ?OBJ ?SUBSET) (#$genls ?SUBSET ?SUPERSET)) (#$isa ?OBJ ?SUPERSET))
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.