Download presentation
Presentation is loading. Please wait.
1
Building knowledge bases
The fundamental problem of understanding intelligence is not the identification of a few powerful techniques, but rather the question of how to represent large amounts of knowledge in a fashion that permits their effective use and interaction. To do this, we must first say how to acquire this knowledge? Think knowledge = codified experience. This way we can view knowledge as a transportable substance, manipulated by a process called knowledge acquisition (KA). KA consists of two interrelated activities: Eliciting knowledge from some knowledge source (domain expert, data bases, textbooks, etc.). This activity is called knowledge engineering. Representation of the elicited knowledge in some formal language, testing its validity, and its subsequent refinement. This activity is called ontological engineering.
2
An overview of the KA process
Reformulation Redesign Refinement Knowledge engineer is the main player in the whole process. To do her job, she must (1) learn enough about the problem domain to be able to recognize important objects and relations, (2) know the KR language to correctly encode knowledge, and (3) know enough about the inference procedure to keep track about the efficiency of the knowledge processing. Identifica- tion of participants, problem cha- racteristics and goals Conceptualiza- tion: find key concepts and relations already mentioned during the identification stage Formalization: mapping key concepts into more formal representation Implementation formulate rules to embody knowledge Testing: validate rules representing knowledge
3
Modes of knowledge acquisition
“Expert -- knowledge engineer” mode Domain expert Knowledge Engineer “Expert -- Intelligent Editing Program” mode Domain expert Intelligent Editing Program Knowledge base Inference engine Knowledge base Inference engine
4
Modes of knowledge acquisition (cont.)
“Data -- Induction Program” mode Data bases Induction Program “Textbooks -- Text Understanding Program” mode Textbooks Text Understanding Program Knowledge base Inference engine Knowledge base Inference engine
5
Main problems is accessing expert knowledge
Expertise may not be expressible in language (D. Michie’s “cheese diagnosis” example, 1982). Expertise may not be understandable (by KE), even when it can be expressed in language. Even if the expertise can be verbally expressed and understood by the KE, it might be impossible to convert a verbal comprehension of a skill into a skilled performance. Expertise communicated by the expert may be irrelevant, incomplete or even incorrect. KA is the bottleneck in KBS design. It may take years to build even a moderately large knowledge base.
6
Types of knowledge: shallow knowledge
Shallow knowledge is represented in terms of heuristic rules, which map data abstractions (such as symptoms in diagnostic domains) and solution abstractions (such as diagnoses). In many domains, PL is the language of choice for representing shallow knowledge. Example. This is one of MYCIN top-level goal rules: If there is an organism which requires therapy, and consideration has been given to any other organisms requiring therapy Then compile a list of possible therapies, and determine the best one on the list Shallow knowledge does not reflect causal mechanisms underlying the relationship between symptoms and diagnoses; MYCIN-like rules typically reflect empirical associations derived from experience.
7
Acquiring and implementing shallow knowledge
Consider the following example (adapted from Gonzalez and Dankel “The Engineering of KBS”). We want to build an expert system to advise a motorist who is not mechanically inclined about his car’s cooling system malfunction. Assume that the KE serves also as a domain expert. The first issue which the KE will address is to compile a list of all possible problems with the cooling system. The possible outputs expected in this domain are: radiator leaks broken fan belt defective water pump broken water hose frozen coolant
8
Example (cont.) Next, the KE must identify possible inputs to discover these problems, namely: temperature indicator on the dashboard weather conditions spots of coolant underneath engine compartment steam coming out of the hood (the presence of a hissing sound) Finally, the KE must determine the relationships between the inputs and the outputs, which may require some intermediate states. Here, these relationships are translated into the following heuristic rules: Rule1: The presence of a “hot” reading on the dashboard implies that at least one problem exists. Rule 2: The absence of a “hot” reading on the dashboard does not necessarily imply absence of a problem. Rule 3: A large pool of coolant under the engine compartment can indicate radiator leaks, broken hoses, and/or a defective water pump. Rule 4: A relatively small pool of coolant under the engine compartment usually implies a defective water pump.
9
Example (cont.) Rule 5: Absence of a pool of coolant under the engine compartment, and a “hot” reading on the dashboard implies a broken fan belt. Rule 6: An ambient temperature below 10 degrees Fahrenheit implies that the coolant is frozen. Rule 7: The presence of a hissing sound accompanied by a small pool of coolant under the engine compartment indicates a radiator and/or hose leak. In PL, to represent this domain we need the following vocabulary: A: “hot” reading on the dashboard B: at least one problem with the cooling system exists C: there is a large pool of coolant under the engine compartment D: radiator leaks E: broken water hose F: defective water pump H: there is a relatively small pool of coolant under the engine compartment J: broken fan belt I: an ambient temperature is below 10 degrees Fahrenheit G: frozen coolant K: a hissing sound is present
10
Example (cont.) Rules 1 to 7 can now be represented as follows:
Rule1: A => B Rule 2: not B => not A Rule 3: C => D v E v F Rule 4: H => F Rule 5: not (C V H) & A => J Rule 6: I => G Rule 7: K & H => D v E Further refinements of this set of rules may be required to improve the performance adequacy of the KBS.
11
Deep knowledge Deep knowledge reflects causal mechanisms underlying the relationships between the objects in the domain. To represent such knowledge, we need at least FOL. Example: The electronic circuits domain (AIMA, Chapter 12). Build a KB which can answer queries about digital circuits, such as: What combinations of inputs would cause the first output to be off, and the second output to be on? What are the possible sets of values of all the terminals? Note that we only want to analyze the circuit (i.e. to verify that it complies with the design specifications), not to diagnose faults. This is why we can limit our ontology to include gates only, and ignore wires.
12
Designing the electronic circuits KB: Problem identification stage
1. What problems the KBS is intended to solve? 2. What data will be used? 3. What are important terms and relations? 4. What does a solution look like? 5. What is the nature of knowledge underlying the solution? Answer: Verification of circuits to see if they match their specifications. Answer: Descriptions of specific instances of circuits. Answer: Circuits, gates, terminals, signals, gate types (and, or, xor, not) Answer: Combinations of signals on designated terminals including a complete input / output table of signals for the circuit. Answer: General knowledge about the flow of signals, connectivity of circuit components, and the behavior of gates.
13
Designing the electronic circuits KB: Conceptualization stage
1. What types of data are to be considered? 2. What are the general dependencies in the circuit domain? Answer: Objects (such as circuits, terminals, gates, signal values), functions (such as types of gates), instances (such as gate1, gate2, gate1input1), predicates (such as connected which takes two terminals as arguments. Answer: An example dependency is the following. If two terminals are connected, then they have the same signal.
14
Designing the electronic circuits KB: Formalization stage
Mapping the identified domain entities into FOL constants, functions and predicates. Examples: Gates are named with constants x1, x2,…. Terminals are represented by means of the IN and OUT functions, for example OUT(1,x1), IN(1,x1), IN(2,x1), … Types of gates are represented by function TYPE, for example TYPE(x1), TYPE(x2) Signal values are represented by objects On and Off, and the function SIGNAL which takes a terminal as argument and denotes a signal value.
15
Designing the electronic circuits KB: Implementation stage
1. Encoding dependencies into rules. Example rules are the following: “If two terminals are connected, then they have the same signal.” t1, t2 Connected(t1, t2) => (Signal(t1) = Signal(t2)) "An AND gate's output is Off if and only if (iff) any of its inputs is Off." g [ (Type(g) = AND => Signal (Out(1, g)) = Off ) <=> <=> n Signal (In(n, g)) = Off ] "A NOT gate's output is different from its input." g [ (Type(g) = NOT) => (Signal (Out(1, g)) != Signal (In(1, g)) ]
16
Designing the electronic circuits KB: Implementation stage (contd.)
2. Encoding specific instances. Examples: Type(x1) = XOR Type(a1) = AND Connected (Out(1, x1), In(1, x2)) Note that the ontology of the electronic circuits domain is a very simple special- purpose ontology. If we want to represent a general-purpose ontology, we must represent a large variety of knowledge such as structured objects, time, space, beliefs, processes, which is a very difficult task. An attempt to build a general ontology was one of the goals of the CYC project (if interested, see D. Lenat and R. Guha “Building Large Knowledge Bases: Representation and Inference in the CYC Project”, Addison-Wesley, 1991 available in the library).
17
Cyc on the Semantic Web -- see [http://www.opencyc.org/]
Cyc is a heavyweight upper ontology, which development started in This is the world’s largest and most complete general knowledge base and commonsense reasoning engine. It includes: ~239,000 terms (up from ~177,000 terms in the previous release) ~2,093,000 triples (up from ~1,500,000 in the previous release) Select class (direct and indirect) instance counts: ‘place’: ~19,000; ‘organization’: ~26,000; ‘predicate’: ~22,000 ‘business related thing’: ~28,000; ‘person’: ~12,700 ~69,000 owl:sameAs links to external (non-Cyc) semantic data namespaces: DBpedia: ~47,000 links, including 696 links to the DBpedia ontology UMBEL: ~21,000 links, WordNet: ~11,000 links, Wikicompany: 1028 links CIA World Factbook: 172 links, RDFAbout SEC company identifiers: 661 links RDF About states and counties: 71 links, FOAF: 44 links You can download OpenCyc from
18
Designing the electronic circuits KB: An example query
A possible query is the following one: "What combination of inputs would cause the first output of C1 to be Off, and the second to be On?" i1, i2, i3 (Signal(In (1, C1)) = i1) & (Signal(In (2, C1)) = i2) & & (Signal(In (3, C1)) = i3) & (Signal(Out (1, C1)) = Off) & & (Signal(Out (2, C1)) = On) The expected answer is: (i1 = On & i2 = On & i3 = Off) V (i1 = On & i2 = Off & i3 = On) V V (i1 = Off & i2 = On & i3 = On)
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.