Presentation is loading. Please wait.

Presentation is loading. Please wait.

Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Phone: 512-342-4001 Computers versus Common Sense C YC: Software that.

Similar presentations


Presentation on theme: "Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Phone: 512-342-4001 Computers versus Common Sense C YC: Software that."— Presentation transcript:

1

2 Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Email: Lenat@cyc.com Phone: 512-342-4001 Computers versus Common Sense C YC: Software that partially understands C YC: Software that partially understands The impact of that on the AIM dream The impact of that on the AIM dream May 4, 2009

3 ELIZA (DOCTOR) 1965 Joe Weizenbaum, MIT Carl Rogers advocated reflection Patient: “I swear a lot.” “How do you feel about the fact that you swear a lot?” Counted a lot on anthropomorphism Count on patient-to-doctor respect/awe/… Patient: “My dog’s mother died recently.” “Tell me more about your mother.” Veneer of intelligence, not intelligence

4 May 2009 ALICE (40 years post-Eliza) JUDGE11[9:26:0]I WILL ASK YOU SOME SIMPLE QUESTIONS AND GIVE SIMPLE COMMANDS PROGRAM[9:26:0]Do you think your plan will succeed? JUDGE11[9:26:16]WHAT IS THE COLOR OF A BLUE CAR PROGRAM[9:26:16]WHAT COLOR IS a blue car JUDGE11[9:26:35]WHERE IS SUE’S NOSE WHEN SUE IS IN HER HOUSE PROGRAM[9:26:35]Where it belongs. Try searching the Web.

5 May 2009

6

7 2009: Software is still Brittle “How old was Martin van Buren when he was elected President of the U.S.?” “Is the Space Needle taller than the Eiffel Tower?”

8 May 2009 Natural Language Understanding requires having lots of knowledge 1.The pen is in the box. The box is in the pen. 2. The police watched the demonstrators because they feared violence. The police watched the demonstrators because because they advocated violence. 3.Mary and Sue are sisters. Mary and Sue are mothers. 4.Every American has a mother. Every American has a president. 5. John saw his brother skiing on TV. The fool didn’t have a coat on! John saw his brother skiing on TV. The fool didn’t recognize him!

9 7. “…include all the re-do CABG procedures utilizing ITA and SVG in 1991”. “And” usually does mean “and”. But in this query, “and” really must mean “or”. Medical knowledge, not grammar, disambiguates this: a single CABG will not have both an ITA and a SVG. 8. “…that the tumor cells are stopping dividing or dying…” Do they mean “stopping dividing or stopping dying”? Of course not, but in 16 of 30 randomly selected syntactically similar constructions from www.clinicaltrials.gov, the coordination (i.e., the wider scope of the modifier, in this case the word “stopping”) was the intended meaning. In each case, only one choice “makes sense” (is consistent with medical knowledge and common sense). 9. “Adult patients who underwent MAZE III with or without Mitral Valve Repair or Replacements.” Is the second half of that query just a waste of space? Discourse pragmatics says no, the physician must have had some reason for saying that. Medical knowledge provides a plausible interpretation: “Adult patients who underwent MAZE III with no concomitant procedures other than Mitral Valve Repair or Replacements” May 2009

10 2 July 2005 The basic idea: Get the computer to understand, not just store, information. Then it can reason to answer your queries. Okay, so let’s tell the computer the same sorts of things that human beings know about cars, and colors, heights, movies, time, driving to a place, etc.  all the other stuff that everybody knows.

11 May 2009 2 July 2005 The basic idea: Get the computer to understand, not just store, information. Then it can reason to answer your queries. MicrowaveOven is a type of Kitchen-Appliance Dishwasher is a type of Kitchen-Appliance

12 May 2009 2 July 2005 Rthagide-disjaks is a type of Kitchen-Appliance Gracinimumples is a type of Kitchen-Appliance Rthagide-disjaks alorxes Vorawnistz. Gracinimumples alorxes Vorawnistz and Buzqa. Buzqa is a Thwarn and supplied through Epluns. You can’t use X if it alorxes Y but lacks any Y

13 May 2009 2 July 2005 The basic idea: Get the computer to understand, not just store, information. Then it can reason to answer your queries. Eventually, after writing millions of these rules, the system knows as much about pipes, liquids, water, electricity, microwave ovens, dishwashers, cars, colors, movies, heights, etc. as you and I do. Ultimately, there is just 1 interpretation of that model, and it corresponds to the real world. etc.  all the other stuff that everybody knows. Long before that, incrementally, the system gains competence and trustworthiness

14 May 2009 Cyc is… –The typical bird has 1 beak, 1 heart, lots of feathers,… –Hearts are internal organs; feathers are external protrusions –Most vehicles are steered by an awake, sane, adult,… human –Tangible objects can’t be in 2 (disjoint) places at once –Badly injuring a child is much worse than killing a dog –Causes temporally precede (i.e., start before) their effects –A stabbing requires 2 cotemporal and proximate actors – etc. Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

15 -Each of these represented in formal logic -Info. about a set of hundreds of thousands of terms -Language-independent Penitentiary EnglishWord-Plume EnglishWord-Pen FrenchWord-Plume … WritingPen BirdFeather Authoring ChineseWordForWritingPen Cyc is… Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

16 May 2009 -Each of these represented in formal logic -Info. about a set of hundreds of thousands of terms An inference engine that produces the same sorts of inferences from those that people would. Interfaces so the system can communicate with people, data bases, spreadsheets, websites, etc. Cyc is… Millions of facts, rules of thumb, etc. that capture human common sense about our everyday world

17 May 2009 bits/bytes/streams/network… alphabet, special characters,… words, morphological variants,… syntactic meta-level markups (HTML) semantic meta-level markups (SGML, XML) content (logical representation of doc/page/...) context (common sense, recent utterances, and n dimensions of metadata: time, space, level of granularity, the source’s purpose, etc.) What Needs to be Shared? Sem. Web

18 Query: “Someone smiling” Caption: “A man helping his daughter take her first step” find information by inference (+KB) When you become happy, you smile. You become happy when someone you love accomplishes a milestone. Taking one’s first step is a milestone. Parents love their children.. How formalized knowledge helps search (ForAll ?P (ForAll ?C (implies (and (isa ?P Person) (children ?P ?C)) (loves ?P ?C))))impliesandisaPersonchildrenloves May 2009

19 Query: “Show me pictures of strong and adventurous people” Caption: “A man climbing a rock face” find information by inference (+KB) How formalized knowledge helps search

20 May 2009 Text Document Query: “Government buildings damaged in terrorist events in Beirut between 1990 and 2001” Document: “1993 pipe bombing of France’s embassy in Lebanon.” find information by inference (+KB) How formalized knowledge helps search

21 How can our programs be intelligent, not merely have the veneer of it? ANSWER: By having a large corpus of knowledge, spanning the gamut from specific domain-dependent all the way up to general common sense. The computer needs to be able to apply the knowledge, not just store some English gloss –Represent it formally (predicate calculus), and apply logic –Represent it numerically, and apply mathematics/statistics And after all that: Be compelling to the human deciding

22 Magic tricks –“How do they do that?!”  “How was I ever fooled by that?!” Efficacy of punishment vs reward –“Punishment is more effective, and the statistics back me up” Clinical decision-making (by doctors and by patients) –“Because 0.814” versus “Because ” Organ donation in European countries: – Why is it so often 15%/85% or 85%/15% ? [Answer: Because when you apply for a drivers license in some countries, you have to check a box to “opt in”; in others, you have to check a box to “opt out”; and in the U.S. and most European countries at least, 85% of the people don’t know what they should do, even though it’s an emotional, serious choice, and end up just leaving it unchecked.] And after all that: Be compelling to the human deciding One Good Explanation is worth 20 points of IQ

23 Reflection Framing Effect Philadelphia is preparing for a Legionaire’s Disease outbreak expected to kill 600 people today. Two alternative programs to combat the disease have been proposed. The consequences of each program are as follows: If Program A is adopted, 200 people will be saved. (72%) If Program B is adopted, there is a 1/3 chance that all 600 will be saved, and a 2/3 chance that no lives will be saved. (28%) If Program A ’ is adopted, 400 people will die. (22%) If Program B ’ is adopted, there is a 2/3 chance that 600 will die, and a 1/3 chance that no one will die. (78%) ==== For more information, see: Kahneman, D. and Tversky, A. (1984). Choices, values, and frames. American Psychologist, 39, 341-350.

24 Conjunction Fallacy A health survey was conducted in a representative sample of adult males in Chicago of all ages and occupations. Mr. F was included in the sample. He was selected by random chance from the list of participants. Please rank the following statements in terms of which is most likely to be true of Mr. F. (1=more likely to be true, 6=least likely) ____ Mr. F smokes more than 1 cigarette per day on average. ____ Mr. F has had one or more heart attacks. A ____ Mr. F had a flu shot this year. A and B ____ Mr. F eats red meat at least once per week. ____ Mr. F has had one or more heart attacks and he is over 55 years old. ____ Mr. F never flosses his teeth. For more information, see: Tversky, A. and Kahneman, D. (1983). Extensional vs. intui- tive reasoning: The conjunction fallacy in probability judgment. Psych.Rev. 90, 293-315. 58% rated “A and B” more likely than A

25 Why there is a need for meta-logical elements (rationale and POV) to convince decision-makers Early hominids: pre-rational decision-makers Later hominids: usually rational Even later hominids: almost always rational

26 A 67 year old woman suffering from ICM with elevated bilirubin, history of diabetes, body mass index of 39.5, NYHA function class III, mitral valve regurgitation grade (MVRG) of 2+, and no aortic valve regurgitation (AVR) is assigned to CABG surgery. RF+Cyc is consulted and the RF (random forest statistical reasoning) component, having been trained on a large database, identifies CABG alone as the most likely treatment option, citing an odds ratio of 2.6 over the next most favorable treatment, CABG+MVA. As rationale, the Cyc (AI) component observes that the low MVRG is atypical of MVA which is a surgical procedure typically reserved for patients with severe mitral regurgitation and thus the simpler CABG procedure is preferred. However, an intraoperative transesophageal echocardiogram (TEE) suggests MVRG is 3+. Based on this, the surgical team overrides the initial diagnosis without consultation, opting instead for CABG+MVA. The patient dies 3 days later from complications due to surgery. In this setting, RF+Cyc, if consulted, could have alerted the heart team to additional data that might have swayed their decision, thus potentially saving a life. RF+Cyc would have noted that while an MVRG of 3+ is consistent with CABG+MVA, the odds favoring CABG only marginally decrease from 2.6:1 to 1.7:1 when MVRG is upstaged for this patient from 2+ to 3+, and that surgery under CABG alone offers a 20% increase in median survival compared to CABG+MVA. RF+Cyc could further argue that intraoperative MVRG can falsely appear to be upstaged due to altered hemodynamics in anesthetized patients. An Cyc-assisted semantic search of the recent literature reveals that transesophageal transthoracic echocardiograms (TTE) more reliably reflect the degree of mitral regurgitation than TEE. That (+co-morbidities) argues for just CABG.

27 May 2009 4 Pitfalls of Semantic Technology Ignorance-based: A small theory size (#terms, instances, rules) Static KB (massively tuned, optimized, cached ahead of time) Simple assertions (SAT constraints; propositional calculus; Horn clause logic; Description Logic; first order logic) 1 global context (no contradic.’s, tiny domain, simplified world)

28 May 2009 Cyc is a power source, not a single application. Like oil, electricity, telephony, computers,… Cyc can spawn and sustain a knowledge utility industry. It can cost-effectively underlie almost all apps. (Provide a common-sense layer to reduce brittleness when faced with unexpected inputs/situations) To apply Cyc, we extend its ontology, its KB, and possibly its suite of specialized reasoning modules Applying Cyc

29 May 2009 "What sequences of events could lead to the destruction of Hoover Dam?" “Were there any attacks on targets of symbolic value to Muslims since 1987 on a Christian holy day?" Cycorp Tools For: Ontology-Building, -Browsing, -Editing, & Fact/Rule Entry Domain Experts Scenario Generation Explanation Generation Query Formulation Scenario Generator Explanation Generator Query Formulator Others’/GOTS Analysis and Collaboration Components AKB The Analyst’s Knowledge Base Relational DB “projection” of the AKB CT Analyst Terrorism Knowledge General Knowledge OWL &

30 May 2009 A more recent example “What major US cities are particularly vulnerable to an anthrax attack?” The answer is logically implied by data dispersed through several sources: USGS GNIS DB AMVA KB RAND R UN FAO DB DTRA CATS DB

31 May 2009 “major US city”  ?C is a U.S. City with >1M population “particularly vulnerable to an anthrax attack”  –the current ambient temperature at ?C is above freezing, and –?C has more than 100 people for each hospital bed, and –the number of anthrax host animals near ?C exceeds 100k “What major US cities are particularly vulnerable to an anthrax attack?”

32 May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB state | name | type | county | state_fips | -------+-----------------------+-------+----------------+------------+ TX | Dallas | ppl | Dallas | 48 | MN | Hennepin County | civil | Hennepin | 27 | CA | Sacramento County | civil | Sacramento | 6 | AZ | Phoenix | ppl | Maricopa | 4 | primary_lat | primary_long| elevation | population | status | ------------+-------------+-----------+------------+------------------+ 32.78333 | -96.8 | 463 | 1022830 | BGN 1978 1959 45.01667 | -93.45 | 0 | 1032431 | 38.46667 | -121.31667 | 0 | 1041219 | 33.44833 | -112.07333 | 1072 | 1048949 | BGN 1931 1900 1897

33 May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB So how do we explain to our system that: row 1 of that table is “about” the city of Dallas, TX the population field of that table contains the number of inhabitants of the city that that row is “about” here is exactly how to access tuples of that database that access will be fast, accurate, recent, complete

34 May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB the population field of that table contains the number of inhabitants of the city that that row is “about” We provide the field encodings and decodings, some of which correspond to explicit fields like population, two-letter state codes, etc: (fieldDecoding Usgs-Gnis-LS ?x (TheFieldCalled “population”) (numberOfInhabitants (TheReferentOfTheRow Usgs-Gnis) ?x))

35 May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB how to access tuples of that database We provide all the information needed for a JDBC connection script: We assert, in the context (MappingMtFn Usgs-KS), all of these: (passwordForSKS Usgs-KS "geografy") (portNumberForSKS Usgs-KS 4032) (serverOfSKS Usgs-KS "sksi.cyc.com") (sqlProgramForSKS Usgs-KS PostgreSQL) (structuredKnowledgeSourceName Usgs-KS "usgs") (subProtocolForSKS Usgs-KS "postgresql") (userNameForSKS "sksi")

36 May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB that access will be fast, accurate, recent, complete We provide meta-level assertions about the database, about each table of the database, about the completeness etc. of various kinds of data in the DB, etc. We assert, in the context (MappingMtFn Usgs-KS): (schemaCompleteExtentKnownForValueTypeInArg Usgs-Gnis-LS USCity numberOfInhabitants 1)

37 May 2009 The Geographic Names Information System (GNIS) DB maintained by the US Geological Survey (USGS). USGS GNIS DB that access will be fast, accurate, recent, complete We provide meta-level assertions about the database, about each table of the database, about the completeness etc. of various kinds of data in the DB, etc. We assert, in the context (MappingMtFn Usgs-KS): (resultSetCardinality Usgs-Gnis-PS (TheSet (PhysicalFieldFn Usgs-Gnis-PS "state")) TheEmptySet 60.0) (resultSetCardinality Usgs-Gnis-PS (TheSet (PhysicalFieldFn Usgs-Gnis-PS "primary_long") (PhysicalFieldFn Usgs-Gnis-PS "primary_lat") (PhysicalFieldFn Usgs-Gnis-PS "name")) (TheSet (PhysicalFieldFn Usgs-Gnis-PS "county") (PhysicalFieldFn Usgs-Gnis-PS "state")) 530.36)

38 May 2009 “major US city”  U.S. City with >1M population “particularly vulnerable to an anthrax attack”  –the current ambient temperature at ?C is above freezing, and –?C has more than 100 people for each hospital bed, and –the number of anthrax host animals near ?C exceeds 100k “What major US cities are particularly vulnerable to an anthrax attack?” Cyc knows that pullets are chickens, so don’t add those two numbers together!

39 May 2009

40

41

42

43

44 “In what countries bordering Pakistan are there members of the ANVC?” Even simple queries often require 1-4 reasoning steps Each answer that CAE finds for this generally involves a 1-4-step (not 0-step) argument (reasoning chain): E.g., for the answer “India”, the justification is: According to the web site ‘Inside Terrorism’, the ANVC’s headquarters has been in Garo Hills, India from the beginning of January, 1996 through today. If an organization’s HQ is in place x, then there are members of that organization in place x. If someone is in place x, they are in every super-region of x. India borders Pakistan. Don’t include Prior & Tacit Knowledge

45 May 2009 The Cyc Knowledge Base Thing Intangible Thing Intangible Thing Individual Temporal Thing Temporal Thing Spatial Thing Spatial Thing Partially Tangible Thing Partially Tangible Thing Paths Sets Relations Sets Relations Logic Math Logic Math Human Artifacts Human Artifacts Social Relations, Culture Social Relations, Culture Human Anatomy & Physiology Human Anatomy & Physiology Emotion Perception Belief Emotion Perception Belief Human Behavior & Actions Human Behavior & Actions Products Devices Products Devices Conceptual Works Conceptual Works Vehicles Buildings Weapons Vehicles Buildings Weapons Mechanical & Electrical Devices Mechanical & Electrical Devices Software Literature Works of Art Software Literature Works of Art Language Agent Organizations Agent Organizations Organizational Actions Organizational Actions Organizational Plans Organizational Plans Types of Organizations Types of Organizations Human Organizations Human Organizations Nations Governments Geo-Politics Nations Governments Geo-Politics Business, Military Organizations Business, Military Organizations Law Business & Commerce Business & Commerce Politics Warfare Politics Warfare Professions Occupations Professions Occupations Purchasing Shopping Purchasing Shopping Travel Communication Travel Communication Transportation & Logistics Transportation & Logistics Social Activities Social Activities Everyday Living Everyday Living Sports Recreation Entertainment Sports Recreation Entertainment Artifacts Movement State Change Dynamics State Change Dynamics Materials Parts Statics Materials Parts Statics Physical Agents Physical Agents Borders Geometry Borders Geometry Events Scripts Events Scripts Spatial Paths Spatial Paths Actors Actions Actors Actions Plans Goals Plans Goals Time Agents Space Physical Objects Physical Objects Human Beings Human Beings Organ- ization Organ- ization Human Activities Human Activities Living Things Living Things Social Behavior Social Behavior Life Forms Life Forms Animals Plants Ecology Natural Geography Natural Geography Earth & Solar System Earth & Solar System Political Geography Political Geography Weather General Knowledge about Various Domains Cyc contains: 15,000Predicates 500,000Concepts 5,200,000Assertions Represented in: First Order Logic Higher Order Logic Context Logic Micro-theories Specific data, facts, and observations These numbers are not a good way to really get a handle on the Cyc KB

46 May 2009 Cyc contains: 15,000Predicates 500,000Concepts 5,200,000Assertions These numbers are not a good way to really get a handle on the Cyc KB The Cyc Knowledge Base “Is any seagull also a moose?” If Cyc knows 10,000 kinds of animals, it should be able to answer 100,000,000 queries like that. Option 1: Add those 100M assertions to the KB Option 2: Add 50M disjointWith assertions instead Option 3: Add about 10k Linnaean taxonomy assertions to the KB, plus one extra assertion: (isa BiologicalTaxon SiblingDisjointCollectionType) If taxons A and B are not explicitly known (via those 10k assertions) to be in a subset/superset relationship, then assume that they are disjoint. A few hundred such SiblingDisjoint assertions take the place of over 6 billion disjointness ones… which in turn take the place of 100 trillion ones like this: (not (isa Cher Moose))

47 E.g., Cyc’s 5M axioms are divided into thousands of contexts by: granularity, topic, culture, geospatial place, time,... There is no one correct monolithic ontology. There is a correct monolithic reasoning mechanism, but it is so deadly slow that we never call on it unless we have to E.g., the Cyc inference engine is a community of 1000 “agents” that attack every problem and, recursively, every subproblem (subgoal). One of these 1000 is a general theorem prover; the others have special-purpose data structures/algorithms to handle the most important, most common cases, very fast. May 2009

48 What factors argue the conclusion that ? For: - ETA often executes attacks near national election - ETA has performed multi-target coordinated attacks - Over the past 30 years, ETA performed 75% of all terrorist attacks in Spain - Over the past 30 years, 98% of all terrorist attacks in Spain were performed by Spain-based groups, and ETA is a Spain-based group. Against: -ETA warns (a few minutes ahead of time) of attacks that would result in a high number civilian casualties, to prevent them. There was no such warning prior to this attack. -ETA generally takes responsibility for its attacks, and it did not do so this time. -ETA has never been known to falsely deny responsibility for an attack, and it did deny responsibility for this attack.

49 May 2009 Building Cyc qua Engineering Task amount known rate of learning learning by discovery learning via natural language CYC 900 person-years 23 realtime years $90 million Frontier of human knowledge 19842004today codify & enter each piece of knowledge, by hand

50 May 2009

51 Temporal Relations 37 Relations Between Temporal Things temporalBoundsIntersect temporallyIntersects startsAfterStartingOf endsAfterEndingOf startingDate temporallyContains temporallyCooriginating temporalBoundsContain temporalBoundsIdentical startsDuring overlapsStart startingPoint simultaneousWith after

52 May 2009 Temporal Relations “Ariel Sharon was in Jerusalem during 2005 with granularity calendar-week” “Condoleezza Rice made a ten-day trip to Jerusalem in February of 2005” Both of them were in Jerusalem during February 2005

53 May 2009 Rather than struggling to reason in natural language sentences, use logic as the representation language. Most knowledge is default; reason by argumentation Rather than striving in vain for a single fast inference engine, use a suite of 1000+ heuristic modules that each handles a class of commonly-occurring problems very fast. [EL  HL split] Some of these HL modules act as tacticians (meta-reasoners) to guide the reasoning; a few are strategists (meta-meta-reasoners) Bridging the knowledge gap: do the “intermediate theories.” Probabilities / certainty factors are useful (risk: overdependence) Rather than striving in vain for a monolithic consistent KB, divide the KB up into many locally-consistent contexts Lessons Learned

54 May 2009 Each assertion should be situated in a context: in a region of context-space We identified 12 dimensions of mt-space We developed a vocabulary of predicates and terms to describe points and regions along each of those 12 dimensions; and We have been situating assertions more and more precisely, and we have been working out calculi for inferring contexts –E.g., if P is true in C1, and P=>Q is true in C2, in what context C2 can Q be validly concluded? Anthropacity Time GeoLocation TypeOfPlace TypeOfTime Culture Sophistication/Security Topic Granularity Modality/Disposition /Epistemology Argument-Preference Justification

55 May 2009 Mathematical Factoring of Context-space Dimensions UnitedStatesIn1985Context: Ronald Reagan is president. PennsylvaniaIn1985Context: Dick Thornburgh is governor. LehighCountyInFebruary1985Context: Dick Thornburgh is governor and Ronald Reagan is president. This inference depends on the time, space, and respective granularities of the contexts. There are at least 900,000 doctors. Dick Thornburgh is governor and there are at least 900,000 doctors.

56 May 2009 Time Indices and Granularities But should remain noncommittal about: Doug is talking, at 14:42:09, on 4 May 2009. Doug is talking, at 1400-1500, on 4 May 2009. Doug is talking, at 14:42-14:47, on 4 May 2009. Therefore Cyc should infer (as a default):

57 May 2009 Time Indices and Granularities t = that two-hour interval t ’ = a continuous 15-min. sub-interval Future t t’t’ So: Talking during each 15-minute interval? Yes Talking during each 2-second interval: Unknown Calendar Minutes P = Doug is talking. Doug is talking, at 14:00 to 15:00, on 4 May 2009 with temporal granularity 1 calendar minute Past |||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

58 May 2009 performedBy causes-EventEvent objectPlaced objectOfStateChange outputsCreated inputsDestroyed assistingAgent beneficiary fromLocation toLocation deviceUsed driverActor damages vehicle providerOfMotiveForce transportees Relations Between an Event and its Participants Over 400 more.

59 May 2009 In In Our Geospatial Ontology We started in 1984 with just one binary predicate, “in”. in(X,Y) means the inner object X is spatially located in the region defined by the outer object Y. If I just tell you in(X,Y), and you aren’t told what X and Y are, then you (and Cyc) can’t answer questions like these: –From the outside of Y, can I see any part of X? –If I turn Y over and shake it, will X fall out? –Is there room to put more things in Y? –Is X actually a part of Y? Such failures led to our introducing new, more precise, more specialized versions of “in”. By now there are over 75 such predicates, organized in a graphical taxonomy.

60 May 2009 Propositional Attitudes Relations Between Agents and Propositions goals intends desires hopes expects believes opinesThat knowsThat remembersThat perceivesThat seesThat fearsThat Most of these are modal; assertions using them go beyond 1 st -order logic

61 May 2009 Represented in: First Order Logic Higher Order Logic Context Logic Microtheories Handcrafted Cyc KB Thing Intangible Thing Intangible Thing Individual Temporal Thing Temporal Thing Spatial Thing Spatial Thing Partially Tangible Thing Partially Tangible Thing Paths Sets Relations Sets Relations Logic Math Logic Math Human Artifacts Human Artifacts Social Relations, Culture Social Relations, Culture Human Anatomy & Physiology Human Anatomy & Physiology Emotion Perception Belief Emotion Perception Belief Human Behavior & Actions Human Behavior & Actions Products Devices Products Devices Conceptual Works Conceptual Works Vehicles Buildings Weapons Vehicles Buildings Weapons Mechanical & Electrical Devices Mechanical & Electrical Devices Software Literature Works of Art Software Literature Works of Art Language Agent Organizations Agent Organizations Organizational Actions Organizational Actions Organizational Plans Organizational Plans Types of Organizations Types of Organizations Human Organizations Human Organizations Nations Governments Geo-Politics Nations Governments Geo-Politics Business, Military Organizations Business, Military Organizations Law Business & Commerce Business & Commerce Politics Warfare Politics Warfare Professions Occupations Professions Occupations Purchasing Shopping Purchasing Shopping Travel Communication Travel Communication Transportation & Logistics Transportation & Logistics Social Activities Social Activities Everyday Living Everyday Living Sports Recreation Entertainment Sports Recreation Entertainment Artifacts Movement State Change Dynamics State Change Dynamics Materials Parts Statics Materials Parts Statics Physical Agents Physical Agents Borders Geometry Borders Geometry Events Scripts Events Scripts Spatial Paths Spatial Paths Actors Actions Actors Actions Plans Goals Plans Goals Time Agents Space Physical Objects Physical Objects Human Beings Human Beings Organ- ization Organ- ization Human Activities Human Activities Living Things Living Things Social Behavior Social Behavior Life Forms Life Forms Animals Plants Ecology Natural Geography Natural Geography Earth & Solar System Earth & Solar System Political Geography Political Geography Weather Real World Domain Knowledge Cyc contains: 15,000Predicates 500,000Concepts 5,200,000Assertions Specific cases, facts, details,… The pump has been primed, Use it as an inductive bias to power more automatic knowledge acquisition

62 May 2009 Abu Sayyaf was founded in ___ Al Harakat Islamiya, established in ___ ASG was established in ___ Search Strings Abu Sayyaf was founded in the early 1990s  Parse (foundingDate AbuSayyaf (EarlyPartFn (DecadeFn 199))) (foundingDate AbuSayyaf ?X) AKA by Shallow Fishing Automated Knowledge Acquisition

63 May 2009 The height of the Eiffel Tower is ___ The Eiffel Tower is ___ tall Search Strings (height EiffelTower ?x) AKA by Shallow Fishing Automated Knowledge Acquisition The height of the Eiffel Tower is 36 feet The height of the Eiffel Tower is 984 feet  Parse (height EiffelTower (Foot 36)) (height EiffelTower (Foot 984))

64 WWW.CYC.COM

65 May 2009

66

67

68 Recent/Future AKB Directions Make it comprehensive (13%  100%); apply it to other dom. Make it easier for SME’s to enter/vet/modify info. Improve the automatic acquis. (parsing / fishing from unstructured texts; SKSI to structured sources, incl. SPARQL) Make it easier for end users to pose questions: –Automatically select (a small superset of) the relevant fragments –Use semantic constraints (argIsa, disjointness, domain knowledge…) to combine the relevant fragments into a meaningful logical query Make justifications more terse and more compelling Speed up inference (in general; and for AKB entry and AKB query-answering) Graceful degradation [½-way betw. QA & Google] falling back on Semantic Search of auto. tagged documents (tagged with Cyc terms) CYC

69 May 2009 Extend Cyc’s KB –Augment its ontology –New assertions involving those new terms New Heuristic Level modules –Identify the need(s) for them –Design, build, and debug them New interface modules –For manual entry; for SKSI mapping; for end users –Domain-specific interfaces (e.g., sketching military unit movements; drawing chemical formulae; etc.) Developing a Cyc App.

70 May 2009 OpenCyc Open Source release of: [most of] the Cyc Ontology + Simple Relns. + Inference Engine ResearchCyc Almost All of Cyc (for free for R&D purposes)

71 The Ontology Pre-existing general medical knowledge framework Prior to the CCF project, Cyc’s KB had184 specializations of MedicalCareEvent: MedicalCareEvent Ablation Ligation CoronaryArteryBypassGraft Biopsy-SurgicalProcedure TrephiningSomeone Prostatectomy RoboticSurgery OutpatientSurgery InpatientSurgery LiposuctionSurgery RemovalOfUniqueBodyPart Appendectomy … Tonsillectomy GumSurgery SurgicalTreatment TransplantSurgery HeartTransplantSurgery GeneralSurgery MajorSurgery OpenHeartSurgery RootCanalSurgery VaccinationEvent BoosterVaccinationEvent AnthraxMilitaryVaccinationScript MedicalTesting …

72 The Ontology Pre-existing general medical knowledge framework Prior to the CCF project, Cyc’s KB had 350+ specializations of AilmentCondition: AttentionDeficitDisorder Glaucoma SpinalStenosis SleepDeprivation Ache- AilmentCondition Migraine Hemorrhaging-TheCondition Jaundice ParasiticAilment BacillaryAngiomatosis Cryptosporidiosis Rickettsiosis EpidemicTyphus-NAmerica ArthropodInfestation ExternalArthropodInfestation InternalArthropodInfestation Trichinosis Schistosomiasis Ascariasis BladderFlukeInfestation … Atherosclerosis MultiplePersonalityDisorder Adenomyosis Scabies AmyotrophicLateralSclerosis Scoliosis Hypoglycemia TemproMandibularJointSyndro me AcetylcholinePoisoning CadmiumPoisoning CarbonMonoxidePoisoning FoodborneBotulism InhalationalBotulism WoundBotulism InfantBotulism Endometriosis Neuralgia Sciatica Diverticulitis Gout MacularDegeneration …

73 The Ontology Pre-existing general medical knowledge framework Prior to the CCF project, Cyc’s KB had 200+ specializations of Bacterium: StreptococcusPneumoniae StreptococcusPyogenes Bacillaceae-Family Bacillus-Genus BacillusCereus-Species Monotrichous Bacterium-Monotrichous Peritrichous Bacterium-Peritrichous Amphitrichous Bacterium-Amphitrichous Tenericutes-Division Mollicutes-Class Anaeroplasmataceae-Family … Asteroplasma-Genus Acholeplasmatales-Order Acholeplasmataceae-Family Acholeplasma-Genus Phytoplasma-Genus Eperythrozoon-Genus Mycoplasmatales-Order Mycoplasmataceae-Family Mycoplasma-Genus MycoplasmaPneumoniae-Species Spirillales-Order Vibrionaceae-Family Vibrio-Genus VibrioCholerae-Species …

74 The Ontology Hundreds of pre-existing relevant relationships General Role Predicates: objectActedOn eventOccursAt dateOfEvent objectPlaced objectRemoved deviceUsed … Medical domain specific relations: infectionCausedByOrganism infectingPathogen patientTreated deviceTypeTreatsConditionType causeOfDeathTypeOfType formOfDisease ailmentTypeAffects ailmentEpidemicType ailmentAcquiredBy ailmentTypicallyAcquiredBy indicatedDrug mortalityRiskForCondition survivalRate riskOfInfectionFromTypeToType …

75 The Ontology Methodology Establish bridging (translation) rules Define rules that allow users to associate patients, dates, locations, etc. with the various events – e.g. define patientTreated as a relationship between a medical event and a patient. Define rules that allow users to easily express complicated logical conditions – e.g. the defining rules for PrimarySurgery, isolatedProcedureOfType, concomitantProcedures, etc. Define concise vocabulary for constructions that are complicated or difficult to express – e.g. “aortic valve replacement’ is represented as a single non-atomic term. This allows the user to specify this very common procedure with a single fragment instead of three distinct fragments in the CCF ontology (which in turn came about due to there not being an explicit functional term composition construct in the CCF representation).

76 Typical Query for outcomes study The examples in this presentation were short, simple, “Medical English” queries; the ones being focused on while building the application, and now that it is actually being used at CCF, are much larger ones, e.g.: IDENTIFY PATIENT POPULATION: FIND all native aortic valve replacements performed at CCF between January 1, 2000 and December 31, 2004 with a pre-operative diagnosis, as determined by echocardiogram, of moderately severe or severe aortic stenosis and moderate to severe left ventricular impairment. INCLUDE operations in which concomitant primary CABG or concomitant mitral or tricuspid valve repair was performed. EXCLUDE all patients with any prior valve repair or replacement; or with concomitant pulmonary valve repair; or with concomitant mitral, tricuspid, or pulmonary valve replacement; or with aortic regurgitation greater than moderate degree.

77 Researchers and clinicians sometimes ask the same queries “Are there cases in the last decade where patients had pericardial aortic valves inserted in the reverse position, to serve as mitral valve replacements, and how often in such cases did endocarditis or tricuspid valve infection develop, and how long after the procedure?” May 2009

78 77 Get a large set of use-cases (CCF task: the last 900 queries) Arrange them into maximally mutually-dissimilar classes Manually represent a couple from each of those buckets –Reveals most of the necessary new predicates (+ interfaces) Now go through each of the use-cases, trolling for new domain-specific terms to add to the ontology –Can be done manually, but we are beginning to rely more on semi-automatic methods where the system itself helps with that process –As appropriate, lexify the terms and/or align them to existing standards Run exemplars from each bucket (i.e., to completion) –tracer bullets to reveal nec. new rules, reasoning modules (+interfaces) Replace the largest bucket by 2-4 spec.’s, recur (i.e., repeat the preceding 3 steps, and this one, again) until there is no new gain

79 78 Test the system on previously-unseen use-cases (or at least ones which were not among those previously-selected from their bucket) Have users try to use the system, and watch them (their results, of course, but also to the extent possible their time-feature trajectory) –Which features did they rarely or never use (to good effect)? –Which features did they make heavy use of? –Independent of this, ask them for their feedback and suggestions –Try to identify classes of users which will translate into classes of documentation and training materials/regimes/interface specifics All along, identify what elements of the ontology (if any) are proprietary, and assimilate everything else into future versions of OpenCyc and ResearchCyc

80 May 2009

81 (implies (and (cCFhasLeftAtriumDiameter ?EVT ?D) (greaterThan ?D ((Centi Meter) 3.8)) (patientTreated ?EVT ?PAT) (patientSex ?PAT FemaleHuman) (rdf-type ?EVT ?TYPE) (genls ?TYPE CCF-Evaluation)) (isa ?EVT EvaluationThatIndicates- LeftAtrialEnlargement))

82 1784 pieces of pre-existing (prior to this project) Cyc KB knowledge used while handling a typical query. E.g.: Inferred Disjointness constraints: (disjointWith PericardialWindow-SurgicalProcedure MedicalPatient) Justification: [we are “counting” each of these assertions, in the total:] (genls PericardialWindow-SurgicalProcedure PericardialProcedure-Surgical) in UniversalVocabularyMt (genls PericardialProcedure-Surgical CardiacProcedure-Surgical) in UniversalVocabularyMt (genls CardiacProcedure-Surgical SurgicalProcedure) in UniversalVocabularyMt (genls SurgicalProcedure MedicalCareEvent) in BaseKB (genls MedicalCareEvent PhysicalSituation) in BaseKB (genls PhysicalSituation Situation-Localized) in UniversalVocabularyMt (genls Situation-Localized Situation) in UniversalVocabularyMt (disjointWith SpatialThing-NonSituational Situation) in BaseKB (genls EnduringThing-Localized SpatialThing-NonSituational) in UniversalVocabularyMt (genls Agent-NonGeographical EnduringThing-Localized) in UniversalVocabularyMt (genls EmbodiedAgent Agent-NonGeographical) in UniversalVocabularyMt (genls PerceptualAgent-Embodied EmbodiedAgent) in UniversalVocabularyMt (genls Animal PerceptualAgent-Embodied) in UniversalVocabularyMt (genls MedicalPatient Animal) in UniversalVocabularyMt

83 Ideas for NLM Grand Challenges Comprehensive Ontology of Medicine –Ties to terminological standards (Snomed, ICD…), lexical ones (WordNet), conceptual ones (Cyc) –Knowledge about/involving the concepts Contextualized for time, source, level of detail,… Sample sub-project: multicultural Engl.-Engl. translation English-to-English “translation” –Using the above ontology of medicine, and models of discourse, models of classes of users (by age, occupation, etc.), models of individual users (built up over time and stored HIPAA-securely) –Translate articles, web pages, medicine bottle labels, etc. into comprehensible form for that user In some cases this means literally writing more text expanding its length, or paring it down (eliminating prior knowledge) In less clear cases (where the user might or might not already know some piece of information), the best way to expand the original text might be to add footnotes containing the borderline information, and to pare down the original text by relegating borderline material to footnote form –The translations needn’t just be static; they can sync with the user’s calendars, cell phones, computers, etc., to provide reminders, proactively send them relevant news articles or new warnings, and so on Automated Clinical/Biomedical Discovery –Hypothesis formation, Experiment design, Data gathering, Analysis, New terms&hypotheses May 2009


Download ppt "Dr. Douglas B. Lenat, 7718 Wood Hollow Drive, Austin, TX 78731 Phone: 512-342-4001 Computers versus Common Sense C YC: Software that."

Similar presentations


Ads by Google