Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Phone: 512-342-4001 Computers versus Common Sense C YC: Software that.

Slides:



Advertisements
Similar presentations
ARCHITECTURES FOR ARTIFICIAL INTELLIGENCE SYSTEMS
Advertisements

Lesson Overview 1.1 What Is Science?.
1 CHAPTER 4 RELATIONAL ALGEBRA AND CALCULUS. 2 Introduction - We discuss here two mathematical formalisms which can be used as the basis for stating and.
Situation Calculus for Action Descriptions We talked about STRIPS representations for actions. Another common representation is called the Situation Calculus.
Week 11 Review: Statistical Model A statistical model for some data is a set of distributions, one of which corresponds to the true unknown distribution.
Context of White Paper 3 The Data Reference Model (DRM) Version 2.0 had three components, Data Description, Data Context and Data Sharing It pushed details.
Data Mining Methodology 1. Why have a Methodology  Don’t want to learn things that aren’t true May not represent any underlying reality ○ Spurious correlation.
Rationality Alan Kaylor Cline Department of Computer Sciences The University of Texas at Austin Based upon classic decision puzzlers collected by Gretchen.
Section 3 Systems of Professional Learning Module 1 Grades 6–12: Focus on Practice Standards.
Intellectual Challenge of Teaching
NaLIX: A Generic Natural Language Search Environment for XML Data Presented by: Erik Mathisen 02/12/2008.
Beginning the Research Design
Let remember from the previous lesson what is Knowledge representation
CS 330 Programming Languages 09 / 16 / 2008 Instructor: Michael Eckmann.
Modules, Hierarchy Charts, and Documentation
Describing Syntax and Semantics
Principles of High Quality Assessment
Introduction, Acquiring Knowledge, and the Scientific Method
Systems Analysis I Data Flow Diagrams
Creating Web Page Forms
Copyright © Cengage Learning. All rights reserved.
Semantic Web Technologies Lecture # 2 Faculty of Computer Science, IBA.
Expert Systems Infsy 540 Dr. Ocker. Expert Systems n computer systems which try to mimic human expertise n produce a decision that does not require judgment.
IT 244 Database Management System Data Modeling 1 Ref: A First Course in Database System Jeffrey D Ullman & Jennifer Widom.
Michael Witbrock Ph.D. Cycorp, Inc. February 2008 Cycorp © 2008.
1 Introduction to Modeling Languages Striving for Engineering Precision in Information Systems Jim Carpenter Bureau of Labor Statistics, and President,
Knowledge representation
 Knowledge Acquisition  Machine Learning. The transfer and transformation of potential problem solving expertise from some knowledge source to a program.
Author: William Tunstall-Pedoe Presenter: Bahareh Sarrafzadeh CS 886 Spring 2015.
Artificial intelligence project
Big Idea 1: The Practice of Science Description A: Scientific inquiry is a multifaceted activity; the processes of science include the formulation of scientifically.
Observation & Analysis. Observation Field Research In the fields of social science, psychology and medicine, amongst others, observational study is an.
XML A web enabled data description language 4/22/2001 By Mark Lawson & Edward Ryan L’Herault.
 Architecture and Description Of Module Architecture and Description Of Module  KNOWLEDGE BASE KNOWLEDGE BASE  PRODUCTION RULES PRODUCTION RULES 
CMPF144 FUNDAMENTALS OF COMPUTING THEORY Module 5: Classical Logic.
WELNS 670: Wellness Research Design Chapter 3. The Problem: The Heart of the Research Process Chapter 3.
Pattern-directed inference systems
LOGIC AND ONTOLOGY Both logic and ontology are important areas of philosophy covering large, diverse, and active research projects. These two areas overlap.
Where did plants and animals come from? How did I come to be?
What Needs to be Shared, to Enable the Semantic Web? © 2001 CYCORP Dr. Douglas B. Lenat President, C YCORP March 5, 2001.
Microsoft ® Office Excel 2003 Training Using XML in Excel SynAppSys Educational Services presents:
Use Case Diagram The purpose is to communicate the system’s functionality and behaviour to the customer or end user. Mainly used for capturing user requirements.
Albert Gatt LIN3021 Formal Semantics Lecture 4. In this lecture Compositionality in Natural Langauge revisited: The role of types The typed lambda calculus.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
Trustworthy Semantic Webs Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #4 Vision for Semantic Web.
Of 33 lecture 1: introduction. of 33 the semantic web vision today’s web (1) web content – for human consumption (no structural information) people search.
Lesson Overview Lesson Overview What Is Science? Lesson Overview 1.1 What Is Science?
EEL 5937 Ontologies EEL 5937 Multi Agent Systems Lecture 5, Jan 23 th, 2003 Lotzi Bölöni.
Some Thoughts to Consider 8 How difficult is it to get a group of people, or a group of companies, or a group of nations to agree on a particular ontology?
Chapter 7 Measuring of data Reliability of measuring instruments The reliability* of instrument is the consistency with which it measures the target attribute.
Sight Words.
Software Reuse Course: # The Johns-Hopkins University Montgomery County Campus Fall 2000 Session 4 Lecture # 3 - September 28, 2004.
Copy right 2004 Adam Pease permission to copy granted so long as slides and this notice are not altered Ontology Overview Introduction.
Artificial Intelligence: Research and Collaborative Possibilities a presentation by: Dr. Ernest L. McDuffie, Assistant Professor Department of Computer.
TYPE OF READINGS.
March 15, July 2005 MicrowaveOven is a type of Kitchen-Appliance Dishwasher is a type of Kitchen-Appliance.
Artificial Intelligence
Experimental Psychology PSY 433 Chapter 5 Research Reports.
Artificial Intelligence Knowledge Representation.
Experiments Textbook 4.2. Observational Study vs. Experiment Observational Studies observes individuals and measures variables of interest, but does not.
Rationality Alan Kaylor Cline Department of Computer Sciences
WHIT 3.0 December 11, 2007 Christopher Pierce and Chimezie Ogbuji
Knowledge Representation
Survey of Knowledge Base Content
Spreadsheets, Modelling & Databases
Core Course Knowledge Lesson 6
Core Course Knowledge Lesson 6
Representations & Reasoning Systems (RRS) (2.2)
Presentation transcript:

Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX Phone: Computers versus Common Sense C YC: Software that partially understands C YC: Software that partially understands The impact of that on the AIM dream The impact of that on the AIM dream May 4, 2009

ELIZA (DOCTOR) 1965 Joe Weizenbaum, MIT Carl Rogers advocated reflection Patient: “I swear a lot.” “How do you feel about the fact that you swear a lot?” Counted a lot on anthropomorphism Count on patient-to-doctor respect/awe/… Patient: “My dog’s mother died recently.” “Tell me more about your mother.” Veneer of intelligence, not intelligence

May 2009 ALICE (40 years post-Eliza) JUDGE11[9:26:0]I WILL ASK YOU SOME SIMPLE QUESTIONS AND GIVE SIMPLE COMMANDS PROGRAM[9:26:0]Do you think your plan will succeed? JUDGE11[9:26:16]WHAT IS THE COLOR OF A BLUE CAR PROGRAM[9:26:16]WHAT COLOR IS a blue car JUDGE11[9:26:35]WHERE IS SUE’S NOSE WHEN SUE IS IN HER HOUSE PROGRAM[9:26:35]Where it belongs. Try searching the Web.

May 2009

2009: Software is still Brittle “How old was Martin van Buren when he was elected President of the U.S.?” “Is the Space Needle taller than the Eiffel Tower?”

May 2009 Natural Language Understanding requires having lots of knowledge 1.The pen is in the box. The box is in the pen. 2. The police watched the demonstrators because they feared violence. The police watched the demonstrators because because they advocated violence. 3.Mary and Sue are sisters. Mary and Sue are mothers. 4.Every American has a mother. Every American has a president. 5. John saw his brother skiing on TV. The fool didn’t have a coat on! John saw his brother skiing on TV. The fool didn’t recognize him!

7. “…include all the re-do CABG procedures utilizing ITA and SVG in 1991”. “And” usually does mean “and”. But in this query, “and” really must mean “or”. Medical knowledge, not grammar, disambiguates this: a single CABG will not have both an ITA and a SVG. 8. “…that the tumor cells are stopping dividing or dying…” Do they mean “stopping dividing or stopping dying”? Of course not, but in 16 of 30 randomly selected syntactically similar constructions from the coordination (i.e., the wider scope of the modifier, in this case the word “stopping”) was the intended meaning. In each case, only one choice “makes sense” (is consistent with medical knowledge and common sense). 9. “Adult patients who underwent MAZE III with or without Mitral Valve Repair or Replacements.” Is the second half of that query just a waste of space? Discourse pragmatics says no, the physician must have had some reason for saying that. Medical knowledge provides a plausible interpretation: “Adult patients who underwent MAZE III with no concomitant procedures other than Mitral Valve Repair or Replacements” May 2009

2 July 2005 The basic idea: Get the computer to understand, not just store, information. Then it can reason to answer your queries. Okay, so let’s tell the computer the same sorts of things that human beings know about cars, and colors, heights, movies, time, driving to a place, etc.  all the other stuff that everybody knows.

May July 2005 The basic idea: Get the computer to understand, not just store, information. Then it can reason to answer your queries. MicrowaveOven is a type of Kitchen-Appliance Dishwasher is a type of Kitchen-Appliance

May July 2005 Rthagide-disjaks is a type of Kitchen-Appliance Gracinimumples is a type of Kitchen-Appliance Rthagide-disjaks alorxes Vorawnistz. Gracinimumples alorxes Vorawnistz and Buzqa. Buzqa is a Thwarn and supplied through Epluns. You can’t use X if it alorxes Y but lacks any Y

May July 2005 The basic idea: Get the computer to understand, not just store, information. Then it can reason to answer your queries. Eventually, after writing millions of these rules, the system knows as much about pipes, liquids, water, electricity, microwave ovens, dishwashers, cars, colors, movies, heights, etc. as you and I do. Ultimately, there is just 1 interpretation of that model, and it corresponds to the real world. etc.  all the other stuff that everybody knows. Long before that, incrementally, the system gains competence and trustworthiness

May 2009 Cyc is… –The typical bird has 1 beak, 1 heart, lots of feathers,… –Hearts are internal organs; feathers are external protrusions –Most vehicles are steered by an awake, sane, adult,… human –Tangible objects can’t be in 2 (disjoint) places at once –Badly injuring a child is much worse than killing a dog –Causes temporally precede (i.e., start before) their effects –A stabbing requires 2 cotemporal and proximate actors – etc. Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

-Each of these represented in formal logic -Info. about a set of hundreds of thousands of terms -Language-independent Penitentiary EnglishWord-Plume EnglishWord-Pen FrenchWord-Plume … WritingPen BirdFeather Authoring ChineseWordForWritingPen Cyc is… Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

May Each of these represented in formal logic -Info. about a set of hundreds of thousands of terms An inference engine that produces the same sorts of inferences from those that people would. Interfaces so the system can communicate with people, data bases, spreadsheets, websites, etc. Cyc is… Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

May 2009 bits/bytes/streams/network… alphabet, special characters,… words, morphological variants,… syntactic meta-level markups (HTML) semantic meta-level markups (SGML, XML) content (logical representation of doc/page/...) context (common sense, recent utterances, and n dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.) What Needs to be Shared? Sem. Web

Query: “Someone smiling” Caption: “A man helping his daughter take her first step” find information by inference (+KB) When you become happy, you smile. You become happy when someone you love accomplishes a milestone. Taking one’s first step is a milestone. Parents love their children.. How formalized knowledge helps search (ForAll ?P (ForAll ?C (implies (and (isa ?P Person) (children ?P ?C)) (loves ?P ?C))))impliesandisaPersonchildrenloves May 2009

Query: “Show me pictures of strong and adventurous people” Caption: “A man climbing a rock face” find information by inference (+KB) How formalized knowledge helps search

May 2009 Text Document Query: “Government buildings damaged in terrorist events in Beirut between 1990 and 2001” Document: “1993 pipe bombing of France’s embassy in Lebanon.” find information by inference (+KB) How formalized knowledge helps search

How can our programs be intelligent, not merely have the veneer of it? ANSWER: By having a large corpus of knowledge, spanning the gamut from specific domain-dependent all the way up to general common sense. The computer needs to be able to apply the knowledge, not just store some English gloss –Represent it formally (predicate calculus), and apply logic –Represent it numerically, and apply mathematics/statistics And after all that: Be compelling to the human deciding

Magic tricks –“How do they do that?!”  “How was I ever fooled by that?!” Efficacy of punishment vs reward –“Punishment is more effective, and the statistics back me up” Clinical decision-making (by doctors and by patients) –“Because 0.814” versus “Because ” Organ donation in European countries: – Why is it so often 15%/85% or 85%/15% ? [Answer: Because when you apply for a drivers license in some countries, you have to check a box to “opt in”; in others, you have to check a box to “opt out”; and in the U.S. and most European countries at least, 85% of the people don’t know what they should do, even though it’s an emotional, serious choice, and end up just leaving it unchecked.] And after all that: Be compelling to the human deciding One Good Explanation is worth 20 points of IQ

Reflection Framing Effect Philadelphia is preparing for a Legionaire’s Disease outbreak expected to kill 600 people today. Two alternative programs to combat the disease have been proposed. The consequences of each program are as follows: If Program A is adopted, 200 people will be saved. (72%) If Program B is adopted, there is a 1/3 chance that all 600 will be saved, and a 2/3 chance that no lives will be saved. (28%) If Program A ’ is adopted, 400 people will die. (22%) If Program B ’ is adopted, there is a 2/3 chance that 600 will die, and a 1/3 chance that no one will die. (78%) ==== For more information, see: Kahneman, D. and Tversky, A. (1984). Choices, values, and frames. American Psychologist, 39,

Conjunction Fallacy A health survey was conducted in a representative sample of adult males in Chicago of all ages and occupations. Mr. F was included in the sample. He was selected by random chance from the list of participants. Please rank the following statements in terms of which is most likely to be true of Mr. F. (1=more likely to be true, 6=least likely) ____ Mr. F smokes more than 1 cigarette per day on average. ____ Mr. F has had one or more heart attacks. A ____ Mr. F had a flu shot this year. A and B ____ Mr. F eats red meat at least once per week. ____ Mr. F has had one or more heart attacks and he is over 55 years old. ____ Mr. F never flosses his teeth. For more information, see: Tversky, A. and Kahneman, D. (1983). Extensional vs. intui- tive reasoning: The conjunction fallacy in probability judgment. Psych.Rev. 90, % rated “A and B” more likely than A

Why there is a need for meta-logical elements (rationale and POV) to convince decision-makers Early hominids: pre-rational decision-makers Later hominids: usually rational Even later hominids: almost always rational

A 67 year old woman suffering from ICM with elevated bilirubin, history of diabetes, body mass index of 39.5, NYHA function class III, mitral valve regurgitation grade (MVRG) of 2+, and no aortic valve regurgitation (AVR) is assigned to CABG surgery. RF+Cyc is consulted and the RF (random forest statistical reasoning) component, having been trained on a large database, identifies CABG alone as the most likely treatment option, citing an odds ratio of 2.6 over the next most favorable treatment, CABG+MVA. As rationale, the Cyc (AI) component observes that the low MVRG is atypical of MVA which is a surgical procedure typically reserved for patients with severe mitral regurgitation and thus the simpler CABG procedure is preferred. However, an intraoperative transesophageal echocardiogram (TEE) suggests MVRG is 3+. Based on this, the surgical team overrides the initial diagnosis without consultation, opting instead for CABG+MVA. The patient dies 3 days later from complications due to surgery. In this setting, RF+Cyc, if consulted, could have alerted the heart team to additional data that might have swayed their decision, thus potentially saving a life. RF+Cyc would have noted that while an MVRG of 3+ is consistent with CABG+MVA, the odds favoring CABG only marginally decrease from 2.6:1 to 1.7:1 when MVRG is upstaged for this patient from 2+ to 3+, and that surgery under CABG alone offers a 20% increase in median survival compared to CABG+MVA. RF+Cyc could further argue that intraoperative MVRG can falsely appear to be upstaged due to altered hemodynamics in anesthetized patients. An Cyc-assisted semantic search of the recent literature reveals that transesophageal transthoracic echocardiograms (TTE) more reliably reflect the degree of mitral regurgitation than TEE. That (+co-morbidities) argues for just CABG.

May Pitfalls of Semantic Technology Ignorance-based: A small theory size (#terms, instances, rules) Static KB (massively tuned, optimized, cached ahead of time) Simple assertions (SAT constraints; propositional calculus; Horn clause logic; Description Logic; first order logic) 1 global context (no contradic.’s, tiny domain, simplified world)

May 2009 Cyc is a power source, not a single application. Like oil, electricity, telephony, computers,… Cyc can spawn and sustain a knowledge utility industry. It can cost-effectively underlie almost all apps. (Provide a common-sense layer to reduce brittleness when faced with unexpected inputs/situations) To apply Cyc, we extend its ontology, its KB, and possibly its suite of specialized reasoning modules Applying Cyc

May 2009 "What sequences of events could lead to the destruction of Hoover Dam?" “Were there any attacks on targets of symbolic value to Muslims since 1987 on a Christian holy day?" Cycorp Tools For: Ontology-Building, -Browsing, -Editing, & Fact/Rule Entry Domain Experts Scenario Generation Explanation Generation Query Formulation Scenario Generator Explanation Generator Query Formulator Others’/GOTS Analysis and Collaboration Components AKB The Analyst’s Knowledge Base Relational DB “projection” of the AKB CT Analyst Terrorism Knowledge General Knowledge OWL &

May 2009 A more recent example “What major US cities are particularly vulnerable to an anthrax attack?” The answer is logically implied by data dispersed through several sources: USGS GNIS DB AMVA KB RAND R UN FAO DB DTRA CATS DB

May 2009 “major US city”  ?C is a U.S. City with >1M population “particularly vulnerable to an anthrax attack”  –the current ambient temperature at ?C is above freezing, and –?C has more than 100 people for each hospital bed, and –the number of anthrax host animals near ?C exceeds 100k “What major US cities are particularly vulnerable to an anthrax attack?”

May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB state | name | type | county | state_fips | TX | Dallas | ppl | Dallas | 48 | MN | Hennepin County | civil | Hennepin | 27 | CA | Sacramento County | civil | Sacramento | 6 | AZ | Phoenix | ppl | Maricopa | 4 | primary_lat | primary_long| elevation | population | status | | | 463 | | BGN | | 0 | | | | 0 | | | | 1072 | | BGN

May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB So how do we explain to our system that: row 1 of that table is “about” the city of Dallas, TX the population field of that table contains the number of inhabitants of the city that that row is “about” here is exactly how to access tuples of that database that access will be fast, accurate, recent, complete

May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB the population field of that table contains the number of inhabitants of the city that that row is “about” We provide the field encodings and decodings, some of which correspond to explicit fields like population, two-letter state codes, etc: (fieldDecoding Usgs-Gnis-LS ?x (TheFieldCalled “population”) (numberOfInhabitants (TheReferentOfTheRow Usgs-Gnis) ?x))

May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB how to access tuples of that database We provide all the information needed for a JDBC connection script: We assert, in the context (MappingMtFn Usgs-KS), all of these: (passwordForSKS Usgs-KS "geografy") (portNumberForSKS Usgs-KS 4032) (serverOfSKS Usgs-KS "sksi.cyc.com") (sqlProgramForSKS Usgs-KS PostgreSQL) (structuredKnowledgeSourceName Usgs-KS "usgs") (subProtocolForSKS Usgs-KS "postgresql") (userNameForSKS "sksi")

May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB that access will be fast, accurate, recent, complete We provide meta-level assertions about the database, about each table of the database, about the completeness etc. of various kinds of data in the DB, etc. We assert, in the context (MappingMtFn Usgs-KS): (schemaCompleteExtentKnownForValueTypeInArg Usgs-Gnis-LS USCity numberOfInhabitants 1)

May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB that access will be fast, accurate, recent, complete We provide meta-level assertions about the database, about each table of the database, about the completeness etc. of various kinds of data in the DB, etc. We assert, in the context (MappingMtFn Usgs-KS): (resultSetCardinality Usgs-Gnis-PS (TheSet (PhysicalFieldFn Usgs-Gnis-PS "state")) TheEmptySet 60.0) (resultSetCardinality Usgs-Gnis-PS (TheSet (PhysicalFieldFn Usgs-Gnis-PS "primary_long") (PhysicalFieldFn Usgs-Gnis-PS "primary_lat") (PhysicalFieldFn Usgs-Gnis-PS "name")) (TheSet (PhysicalFieldFn Usgs-Gnis-PS "county") (PhysicalFieldFn Usgs-Gnis-PS "state")) )

May 2009 “major US city”  U.S. City with >1M population “particularly vulnerable to an anthrax attack”  –the current ambient temperature at ?C is above freezing, and –?C has more than 100 people for each hospital bed, and –the number of anthrax host animals near ?C exceeds 100k “What major US cities are particularly vulnerable to an anthrax attack?” Cyc knows that pullets are chickens, so don’t add those two numbers together!

May 2009

“In what countries bordering Pakistan are there members of the ANVC?” Even simple queries often require 1-4 reasoning steps Each answer that CAE finds for this generally involves a 1-4-step (not 0-step) argument (reasoning chain): E.g., for the answer “India”, the justification is: According to the web site ‘Inside Terrorism’, the ANVC’s headquarters has been in Garo Hills, India from the beginning of January, 1996 through today. If an organization’s HQ is in place x, then there are members of that organization in place x. If someone is in place x, they are in every super-region of x. India borders Pakistan. Don’t include Prior & Tacit Knowledge

May 2009 The Cyc Knowledge Base Thing Intangible Thing Intangible Thing Individual Temporal Thing Temporal Thing Spatial Thing Spatial Thing Partially Tangible Thing Partially Tangible Thing Paths Sets Relations Sets Relations Logic Math Logic Math Human Artifacts Human Artifacts Social Relations, Culture Social Relations, Culture Human Anatomy & Physiology Human Anatomy & Physiology Emotion Perception Belief Emotion Perception Belief Human Behavior & Actions Human Behavior & Actions Products Devices Products Devices Conceptual Works Conceptual Works Vehicles Buildings Weapons Vehicles Buildings Weapons Mechanical & Electrical Devices Mechanical & Electrical Devices Software Literature Works of Art Software Literature Works of Art Language Agent Organizations Agent Organizations Organizational Actions Organizational Actions Organizational Plans Organizational Plans Types of Organizations Types of Organizations Human Organizations Human Organizations Nations Governments Geo-Politics Nations Governments Geo-Politics Business, Military Organizations Business, Military Organizations Law Business & Commerce Business & Commerce Politics Warfare Politics Warfare Professions Occupations Professions Occupations Purchasing Shopping Purchasing Shopping Travel Communication Travel Communication Transportation & Logistics Transportation & Logistics Social Activities Social Activities Everyday Living Everyday Living Sports Recreation Entertainment Sports Recreation Entertainment Artifacts Movement State Change Dynamics State Change Dynamics Materials Parts Statics Materials Parts Statics Physical Agents Physical Agents Borders Geometry Borders Geometry Events Scripts Events Scripts Spatial Paths Spatial Paths Actors Actions Actors Actions Plans Goals Plans Goals Time Agents Space Physical Objects Physical Objects Human Beings Human Beings Organ- ization Organ- ization Human Activities Human Activities Living Things Living Things Social Behavior Social Behavior Life Forms Life Forms Animals Plants Ecology Natural Geography Natural Geography Earth & Solar System Earth & Solar System Political Geography Political Geography Weather General Knowledge about Various Domains Cyc contains: 15,000Predicates 500,000Concepts 5,200,000Assertions Represented in: First Order Logic Higher Order Logic Context Logic Micro-theories Specific data, facts, and observations These numbers are not a good way to really get a handle on the Cyc KB

May 2009 Cyc contains: 15,000Predicates 500,000Concepts 5,200,000Assertions These numbers are not a good way to really get a handle on the Cyc KB The Cyc Knowledge Base “Is any seagull also a moose?” If Cyc knows 10,000 kinds of animals, it should be able to answer 100,000,000 queries like that. Option 1: Add those 100M assertions to the KB Option 2: Add 50M disjointWith assertions instead Option 3: Add about 10k Linnaean taxonomy assertions to the KB, plus one extra assertion: (isa BiologicalTaxon SiblingDisjointCollectionType) If taxons A and B are not explicitly known (via those 10k assertions) to be in a subset/superset relationship, then assume that they are disjoint. A few hundred such SiblingDisjoint assertions take the place of over 6 billion disjointness ones… which in turn take the place of 100 trillion ones like this: (not (isa Cher Moose))

E.g., Cyc’s 5M axioms are divided into thousands of contexts by: granularity, topic, culture, geospatial place, time,... There is no one correct monolithic ontology. There is a correct monolithic reasoning mechanism, but it is so deadly slow that we never call on it unless we have to E.g., the Cyc inference engine is a community of 1000 “agents” that attack every problem and, recursively, every subproblem (subgoal). One of these 1000 is a general theorem prover; the others have special-purpose data structures/algorithms to handle the most important, most common cases, very fast. May 2009

What factors argue the conclusion that ? For: - ETA often executes attacks near national election - ETA has performed multi-target coordinated attacks - Over the past 30 years, ETA performed 75% of all terrorist attacks in Spain - Over the past 30 years, 98% of all terrorist attacks in Spain were performed by Spain-based groups, and ETA is a Spain-based group. Against: -ETA warns (a few minutes ahead of time) of attacks that would result in a high number civilian casualties, to prevent them. There was no such warning prior to this attack. -ETA generally takes responsibility for its attacks, and it did not do so this time. -ETA has never been known to falsely deny responsibility for an attack, and it did deny responsibility for this attack.

May 2009 Building Cyc qua Engineering Task amount known rate of learning learning by discovery learning via natural language CYC 900 person-years 23 realtime years $90 million Frontier of human knowledge today codify & enter each piece of knowledge, by hand

May 2009

Temporal Relations 37 Relations Between Temporal Things temporalBoundsIntersect temporallyIntersects startsAfterStartingOf endsAfterEndingOf startingDate temporallyContains temporallyCooriginating temporalBoundsContain temporalBoundsIdentical startsDuring overlapsStart startingPoint simultaneousWith after

May 2009 Temporal Relations “Ariel Sharon was in Jerusalem during 2005 with granularity calendar-week” “Condoleezza Rice made a ten-day trip to Jerusalem in February of 2005” Both of them were in Jerusalem during February 2005

May 2009 Rather than struggling to reason in natural language sentences, use logic as the representation language. Most knowledge is default; reason by argumentation Rather than striving in vain for a single fast inference engine, use a suite of heuristic modules that each handles a class of commonly-occurring problems very fast. [EL  HL split] Some of these HL modules act as tacticians (meta-reasoners) to guide the reasoning; a few are strategists (meta-meta-reasoners) Bridging the knowledge gap: do the “intermediate theories.” Probabilities / certainty factors are useful (risk: overdependence) Rather than striving in vain for a monolithic consistent KB, divide the KB up into many locally-consistent contexts Lessons Learned

May 2009 Each assertion should be situated in a context: in a region of context-space We identified 12 dimensions of mt-space We developed a vocabulary of predicates and terms to describe points and regions along each of those 12 dimensions; and We have been situating assertions more and more precisely, and we have been working out calculi for inferring contexts –E.g., if P is true in C1, and P=>Q is true in C2, in what context C2 can Q be validly concluded? Anthropacity Time GeoLocation TypeOfPlace TypeOfTime Culture Sophistication/Security Topic Granularity Modality/Disposition /Epistemology Argument-Preference Justification

May 2009 Mathematical Factoring of Context-space Dimensions UnitedStatesIn1985Context: Ronald Reagan is president. PennsylvaniaIn1985Context: Dick Thornburgh is governor. LehighCountyInFebruary1985Context: Dick Thornburgh is governor and Ronald Reagan is president. This inference depends on the time, space, and respective granularities of the contexts. There are at least 900,000 doctors. Dick Thornburgh is governor and there are at least 900,000 doctors.

May 2009 Time Indices and Granularities But should remain noncommittal about: Doug is talking, at 14:42:09, on 4 May Doug is talking, at , on 4 May Doug is talking, at 14:42-14:47, on 4 May Therefore Cyc should infer (as a default):

May 2009 Time Indices and Granularities t = that two-hour interval t ’ = a continuous 15-min. sub-interval Future t t’t’ So: Talking during each 15-minute interval? Yes Talking during each 2-second interval: Unknown Calendar Minutes P = Doug is talking. Doug is talking, at 14:00 to 15:00, on 4 May 2009 with temporal granularity 1 calendar minute Past |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

May 2009 performedBy causes-EventEvent objectPlaced objectOfStateChange outputsCreated inputsDestroyed assistingAgent beneficiary fromLocation toLocation deviceUsed driverActor damages vehicle providerOfMotiveForce transportees Relations Between an Event and its Participants Over 400 more.

May 2009 In In Our Geospatial Ontology We started in 1984 with just one binary predicate, “in”. in(X,Y) means the inner object X is spatially located in the region defined by the outer object Y. If I just tell you in(X,Y), and you aren’t told what X and Y are, then you (and Cyc) can’t answer questions like these: –From the outside of Y, can I see any part of X? –If I turn Y over and shake it, will X fall out? –Is there room to put more things in Y? –Is X actually a part of Y? Such failures led to our introducing new, more precise, more specialized versions of “in”. By now there are over 75 such predicates, organized in a graphical taxonomy.

May 2009 Propositional Attitudes Relations Between Agents and Propositions goals intends desires hopes expects believes opinesThat knowsThat remembersThat perceivesThat seesThat fearsThat Most of these are modal; assertions using them go beyond 1 st -order logic

May 2009 Represented in: First Order Logic Higher Order Logic Context Logic Microtheories Handcrafted Cyc KB Thing Intangible Thing Intangible Thing Individual Temporal Thing Temporal Thing Spatial Thing Spatial Thing Partially Tangible Thing Partially Tangible Thing Paths Sets Relations Sets Relations Logic Math Logic Math Human Artifacts Human Artifacts Social Relations, Culture Social Relations, Culture Human Anatomy & Physiology Human Anatomy & Physiology Emotion Perception Belief Emotion Perception Belief Human Behavior & Actions Human Behavior & Actions Products Devices Products Devices Conceptual Works Conceptual Works Vehicles Buildings Weapons Vehicles Buildings Weapons Mechanical & Electrical Devices Mechanical & Electrical Devices Software Literature Works of Art Software Literature Works of Art Language Agent Organizations Agent Organizations Organizational Actions Organizational Actions Organizational Plans Organizational Plans Types of Organizations Types of Organizations Human Organizations Human Organizations Nations Governments Geo-Politics Nations Governments Geo-Politics Business, Military Organizations Business, Military Organizations Law Business & Commerce Business & Commerce Politics Warfare Politics Warfare Professions Occupations Professions Occupations Purchasing Shopping Purchasing Shopping Travel Communication Travel Communication Transportation & Logistics Transportation & Logistics Social Activities Social Activities Everyday Living Everyday Living Sports Recreation Entertainment Sports Recreation Entertainment Artifacts Movement State Change Dynamics State Change Dynamics Materials Parts Statics Materials Parts Statics Physical Agents Physical Agents Borders Geometry Borders Geometry Events Scripts Events Scripts Spatial Paths Spatial Paths Actors Actions Actors Actions Plans Goals Plans Goals Time Agents Space Physical Objects Physical Objects Human Beings Human Beings Organ- ization Organ- ization Human Activities Human Activities Living Things Living Things Social Behavior Social Behavior Life Forms Life Forms Animals Plants Ecology Natural Geography Natural Geography Earth & Solar System Earth & Solar System Political Geography Political Geography Weather Real World Domain Knowledge Cyc contains: 15,000Predicates 500,000Concepts 5,200,000Assertions Specific cases, facts, details,… The pump has been primed, Use it as an inductive bias to power more automatic knowledge acquisition

May 2009 Abu Sayyaf was founded in ___ Al Harakat Islamiya, established in ___ ASG was established in ___ Search Strings Abu Sayyaf was founded in the early 1990s  Parse (foundingDate AbuSayyaf (EarlyPartFn (DecadeFn 199))) (foundingDate AbuSayyaf ?X) AKA by Shallow Fishing Automated Knowledge Acquisition

May 2009 The height of the Eiffel Tower is ___ The Eiffel Tower is ___ tall Search Strings (height EiffelTower ?x) AKA by Shallow Fishing Automated Knowledge Acquisition The height of the Eiffel Tower is 36 feet The height of the Eiffel Tower is 984 feet  Parse (height EiffelTower (Foot 36)) (height EiffelTower (Foot 984))

May 2009

Recent/Future AKB Directions Make it comprehensive (13%  100%); apply it to other dom. Make it easier for SME’s to enter/vet/modify info. Improve the automatic acquis. (parsing / fishing from unstructured texts; SKSI to structured sources, incl. SPARQL) Make it easier for end users to pose questions: –Automatically select (a small superset of) the relevant fragments –Use semantic constraints (argIsa, disjointness, domain knowledge…) to combine the relevant fragments into a meaningful logical query Make justifications more terse and more compelling Speed up inference (in general; and for AKB entry and AKB query-answering) Graceful degradation [½-way betw. QA & Google] falling back on Semantic Search of auto. tagged documents (tagged with Cyc terms) CYC

May 2009 Extend Cyc’s KB –Augment its ontology –New assertions involving those new terms New Heuristic Level modules –Identify the need(s) for them –Design, build, and debug them New interface modules –For manual entry; for SKSI mapping; for end users –Domain-specific interfaces (e.g., sketching military unit movements; drawing chemical formulae; etc.) Developing a Cyc App.

May 2009 OpenCyc Open Source release of: [most of] the Cyc Ontology + Simple Relns. + Inference Engine ResearchCyc Almost All of Cyc (for free for R&D purposes)

The Ontology Pre-existing general medical knowledge framework Prior to the CCF project, Cyc’s KB had184 specializations of MedicalCareEvent: MedicalCareEvent Ablation Ligation CoronaryArteryBypassGraft Biopsy-SurgicalProcedure TrephiningSomeone Prostatectomy RoboticSurgery OutpatientSurgery InpatientSurgery LiposuctionSurgery RemovalOfUniqueBodyPart Appendectomy … Tonsillectomy GumSurgery SurgicalTreatment TransplantSurgery HeartTransplantSurgery GeneralSurgery MajorSurgery OpenHeartSurgery RootCanalSurgery VaccinationEvent BoosterVaccinationEvent AnthraxMilitaryVaccinationScript MedicalTesting …

The Ontology Pre-existing general medical knowledge framework Prior to the CCF project, Cyc’s KB had 350+ specializations of AilmentCondition: AttentionDeficitDisorder Glaucoma SpinalStenosis SleepDeprivation Ache- AilmentCondition Migraine Hemorrhaging-TheCondition Jaundice ParasiticAilment BacillaryAngiomatosis Cryptosporidiosis Rickettsiosis EpidemicTyphus-NAmerica ArthropodInfestation ExternalArthropodInfestation InternalArthropodInfestation Trichinosis Schistosomiasis Ascariasis BladderFlukeInfestation … Atherosclerosis MultiplePersonalityDisorder Adenomyosis Scabies AmyotrophicLateralSclerosis Scoliosis Hypoglycemia TemproMandibularJointSyndro me AcetylcholinePoisoning CadmiumPoisoning CarbonMonoxidePoisoning FoodborneBotulism InhalationalBotulism WoundBotulism InfantBotulism Endometriosis Neuralgia Sciatica Diverticulitis Gout MacularDegeneration …

The Ontology Pre-existing general medical knowledge framework Prior to the CCF project, Cyc’s KB had 200+ specializations of Bacterium: StreptococcusPneumoniae StreptococcusPyogenes Bacillaceae-Family Bacillus-Genus BacillusCereus-Species Monotrichous Bacterium-Monotrichous Peritrichous Bacterium-Peritrichous Amphitrichous Bacterium-Amphitrichous Tenericutes-Division Mollicutes-Class Anaeroplasmataceae-Family … Asteroplasma-Genus Acholeplasmatales-Order Acholeplasmataceae-Family Acholeplasma-Genus Phytoplasma-Genus Eperythrozoon-Genus Mycoplasmatales-Order Mycoplasmataceae-Family Mycoplasma-Genus MycoplasmaPneumoniae-Species Spirillales-Order Vibrionaceae-Family Vibrio-Genus VibrioCholerae-Species …

The Ontology Hundreds of pre-existing relevant relationships General Role Predicates: objectActedOn eventOccursAt dateOfEvent objectPlaced objectRemoved deviceUsed … Medical domain specific relations: infectionCausedByOrganism infectingPathogen patientTreated deviceTypeTreatsConditionType causeOfDeathTypeOfType formOfDisease ailmentTypeAffects ailmentEpidemicType ailmentAcquiredBy ailmentTypicallyAcquiredBy indicatedDrug mortalityRiskForCondition survivalRate riskOfInfectionFromTypeToType …

The Ontology Methodology Establish bridging (translation) rules Define rules that allow users to associate patients, dates, locations, etc. with the various events – e.g. define patientTreated as a relationship between a medical event and a patient. Define rules that allow users to easily express complicated logical conditions – e.g. the defining rules for PrimarySurgery, isolatedProcedureOfType, concomitantProcedures, etc. Define concise vocabulary for constructions that are complicated or difficult to express – e.g. “aortic valve replacement’ is represented as a single non-atomic term. This allows the user to specify this very common procedure with a single fragment instead of three distinct fragments in the CCF ontology (which in turn came about due to there not being an explicit functional term composition construct in the CCF representation).

Typical Query for outcomes study The examples in this presentation were short, simple, “Medical English” queries; the ones being focused on while building the application, and now that it is actually being used at CCF, are much larger ones, e.g.: IDENTIFY PATIENT POPULATION: FIND all native aortic valve replacements performed at CCF between January 1, 2000 and December 31, 2004 with a pre-operative diagnosis, as determined by echocardiogram, of moderately severe or severe aortic stenosis and moderate to severe left ventricular impairment. INCLUDE operations in which concomitant primary CABG or concomitant mitral or tricuspid valve repair was performed. EXCLUDE all patients with any prior valve repair or replacement; or with concomitant pulmonary valve repair; or with concomitant mitral, tricuspid, or pulmonary valve replacement; or with aortic regurgitation greater than moderate degree.

Researchers and clinicians sometimes ask the same queries “Are there cases in the last decade where patients had pericardial aortic valves inserted in the reverse position, to serve as mitral valve replacements, and how often in such cases did endocarditis or tricuspid valve infection develop, and how long after the procedure?” May 2009

77 Get a large set of use-cases (CCF task: the last 900 queries) Arrange them into maximally mutually-dissimilar classes Manually represent a couple from each of those buckets –Reveals most of the necessary new predicates (+ interfaces) Now go through each of the use-cases, trolling for new domain-specific terms to add to the ontology –Can be done manually, but we are beginning to rely more on semi-automatic methods where the system itself helps with that process –As appropriate, lexify the terms and/or align them to existing standards Run exemplars from each bucket (i.e., to completion) –tracer bullets to reveal nec. new rules, reasoning modules (+interfaces) Replace the largest bucket by 2-4 spec.’s, recur (i.e., repeat the preceding 3 steps, and this one, again) until there is no new gain

78 Test the system on previously-unseen use-cases (or at least ones which were not among those previously-selected from their bucket) Have users try to use the system, and watch them (their results, of course, but also to the extent possible their time-feature trajectory) –Which features did they rarely or never use (to good effect)? –Which features did they make heavy use of? –Independent of this, ask them for their feedback and suggestions –Try to identify classes of users which will translate into classes of documentation and training materials/regimes/interface specifics All along, identify what elements of the ontology (if any) are proprietary, and assimilate everything else into future versions of OpenCyc and ResearchCyc

May 2009

(implies (and (cCFhasLeftAtriumDiameter ?EVT ?D) (greaterThan ?D ((Centi Meter) 3.8)) (patientTreated ?EVT ?PAT) (patientSex ?PAT FemaleHuman) (rdf-type ?EVT ?TYPE) (genls ?TYPE CCF-Evaluation)) (isa ?EVT EvaluationThatIndicates- LeftAtrialEnlargement))

1784 pieces of pre-existing (prior to this project) Cyc KB knowledge used while handling a typical query. E.g.: Inferred Disjointness constraints: (disjointWith PericardialWindow-SurgicalProcedure MedicalPatient) Justification: [we are “counting” each of these assertions, in the total:] (genls PericardialWindow-SurgicalProcedure PericardialProcedure-Surgical) in UniversalVocabularyMt (genls PericardialProcedure-Surgical CardiacProcedure-Surgical) in UniversalVocabularyMt (genls CardiacProcedure-Surgical SurgicalProcedure) in UniversalVocabularyMt (genls SurgicalProcedure MedicalCareEvent) in BaseKB (genls MedicalCareEvent PhysicalSituation) in BaseKB (genls PhysicalSituation Situation-Localized) in UniversalVocabularyMt (genls Situation-Localized Situation) in UniversalVocabularyMt (disjointWith SpatialThing-NonSituational Situation) in BaseKB (genls EnduringThing-Localized SpatialThing-NonSituational) in UniversalVocabularyMt (genls Agent-NonGeographical EnduringThing-Localized) in UniversalVocabularyMt (genls EmbodiedAgent Agent-NonGeographical) in UniversalVocabularyMt (genls PerceptualAgent-Embodied EmbodiedAgent) in UniversalVocabularyMt (genls Animal PerceptualAgent-Embodied) in UniversalVocabularyMt (genls MedicalPatient Animal) in UniversalVocabularyMt

Ideas for NLM Grand Challenges Comprehensive Ontology of Medicine –Ties to terminological standards (Snomed, ICD…), lexical ones (WordNet), conceptual ones (Cyc) –Knowledge about/involving the concepts Contextualized for time, source, level of detail,… Sample sub-project: multicultural Engl.-Engl. translation English-to-English “translation” –Using the above ontology of medicine, and models of discourse, models of classes of users (by age, occupation, etc.), models of individual users (built up over time and stored HIPAA-securely) –Translate articles, web pages, medicine bottle labels, etc. into comprehensible form for that user In some cases this means literally writing more text expanding its length, or paring it down (eliminating prior knowledge) In less clear cases (where the user might or might not already know some piece of information), the best way to expand the original text might be to add footnotes containing the borderline information, and to pare down the original text by relegating borderline material to footnote form –The translations needn’t just be static; they can sync with the user’s calendars, cell phones, computers, etc., to provide reminders, proactively send them relevant news articles or new warnings, and so on Automated Clinical/Biomedical Discovery –Hypothesis formation, Experiment design, Data gathering, Analysis, New terms&hypotheses May 2009