Presentation is loading. Please wait.

Presentation is loading. Please wait.

Reading to Learn Q1 review (7/6/05) Peter ClarkMichael Glass Phil HarrisonTom Jenkins John ThompsonRick Wojcik Boeing Phantom Works.

Similar presentations


Presentation on theme: "Reading to Learn Q1 review (7/6/05) Peter ClarkMichael Glass Phil HarrisonTom Jenkins John ThompsonRick Wojcik Boeing Phantom Works."— Presentation transcript:

1 Reading to Learn Q1 review (7/6/05) Peter ClarkMichael Glass Phil HarrisonTom Jenkins John ThompsonRick Wojcik Boeing Phantom Works

2 Agenda Introduction 1. Textual knowledge to CPL 2. CPL to logic: How CPL is interpreted Processing some CPL (demo) 3. Knowledge Integration Extracting knowledge from text

3 LbR Framework Corpus Using what you know to get more Knowledge Repository “Worldview” Knowledge Acquisition Loop Robust Reasoning Tasks Introspection Solidification Loop Knowledge Integration

4 Corpus Using what you know to get more Knowledge Integration Knowledge Repository “Worldview” Knowledge Acquisition Loop Robust Reasoning Tasks Introspection Solidification Loop CPL or Logic LbR Framework

5 Overview What does a person do? a) Start aready knowing something about a domain + general knowledge b) Read; existing knowledge helps him/her understand the new material c) integrate the new knowledge into pre-existing knowledge d) can now perform new tasks – In its full and unrestricted form, is too difficult to be implemented – BUT: significant, partial approaches are feasible

6 Reduced Version Select a domain where KB exists (chemistry) Manually reformulate a section of text into controlled English (CPL) Automatically process that controlled English to generate new knowledge Integrate the new knowledge into the KB Rationale: Two key problems in reading to learn: full natural language processing knowledge integration This approach separates them and can focus on (2)

7 Corpus Using what you know to get more Knowledge Integration Knowledge Repository “Worldview” Knowledge Acquisition Loop Robust Reasoning Tasks Introspection Solidification Loop CPL or Logic Step 1: Unrestricted NL to CPL

8 Step 1: Possible methods Texts (unrestricted English) Text in restricted English Manual reformulation Machine “translation” Locate subset in simple English Knowledge extraction from corpora

9 Step 1: Possible methods Texts (unrestricted English) Text in restricted English Manual reformulation Machine “translation” Locate subset in simple English Knowledge extraction from corpora

10 Reformulating Chemistry Text into CPL John A. Thompson

11 Selecting the important text Textbook: “Acids have a sour taste (for example, citric acid in lemon juice) and cause certain dyes to change color (for example, litmus turns red on contact with acids). Indeed, the word acid comes from the Latin word acidus, meaning sour or tart.” Selected and reworded as CPL: “Acids have a sour taste.” “Acids cause some dyes to change color.” Judgment calls regarding “what might be on the test” Simplified rewordings – no pronouns or complex sentences, see CPL User Guide

12 Example 2: Text to CPL Textbook: “Sodium hydroxide is an Arrhenius base. Because NaOH is an ionic compound, it dissociates into Na+ and HO- ions when it dissolves in water, thereby releasing OH- ions into the solution.” CPL: “Sodium hydroxide is an Arrhenius base.” “NaOH is sodium hydroxide.” [inserted background knowledge] “NaOH is an ionic compound.” “NaOH dissolves in water.” “NaOH dissociates in water.” “The dissociating produces Na-plus ions and OH-minus ions.”

13 Example 3: Text to CPL Textbook: “Some substances can act as an acid in one reaction and as a base in another. For example, H2O is a Bronsted-Lowry base in its reaction with HCl and a Bronsted-Lowry acid in its reaction with NH3. A substance that is capable of acting as either an acid or a base is called amphoteric.” CPL: “Some substances sometimes act as a Bronsted-Lowry acid and sometimes act as a Bronsted-Lowry base.” [required “and”] “These substances are called amphoteric substances.” [vocab] “H2O acts as a Bronsted-Lowry base in a reaction with HCl.” “H20 acts as a Bronsted-Lowry acid in a reaction with NH3.” “Therefore, H2O is an amphoteric substance.” [deduced]

14 Rewording generics “Acids have a sour taste” is a generic sentence about a class of things – the textbook is full of generics! One interpretation: “Every instance of an acid has a sour taste” But is it “every” instance, or just “typically”? Another interpretation: “If a person is tasting an acid, then the person is experiencing a sour taste” We are planning to do some automatic interpretation of generic sentences in the future Our short-term strategy is to reword every generic sentence to another form, especially to an if-then rule

15 Rewording generics - 2 Another example: “Acids cause some dyes to change color” One interpretation: “There are some dyes that change color when in contact with any acid” Another interpretation: “For each acid there is some dye that changes color when in contact with the acid” Possible CPL rewording as an if-then rule: “If an acid-sensitive dye is in contact with an acid, then the acid is causing the dye to change color” Bottom line: Generics are a major issue with textbook knowledge

16 CPL now includes if-then rules Rewriting generic sentences required if-then rules to be added to CPL Examples of if-then rules in CPL: “If HCl is immersed in water, then the HCl is dissolving in the water.” “If HCl is dissolving water, then each molecule of the HCL is reacting with a molecule of the water.” “If an HCl molecule is reacting with a water molecule, then an H-plus ion is transferring from the HCl molecule to the water molecule.”

17 Connecting verb and noun forms Example: reacts vs. reaction “Hydrogen chloride gas in water reacts with the water” “The reaction produces H-plus ions and Cl-minus ions The CPL writer assumes that the interpreter will make the connection from the verb to the related noun in the next sentence We are building a large set of verb-noun relations to handle these cases

18 Gross vs. molecular events Chemistry text switches between: gross-level events involving substances reactions between two molecules Example: “When HCl dissolves in water, we find that the HCl molecule transfers an H+ ion (a proton) to a water molecule” In CPL, we interpret “HCl” as the gross-level substance, and “HCl molecule” as the molecular-level individual If-then rules carry the logic from the gross level to the molecular level: “If HCl is dissolving water, then each molecule of the HCL is reacting with a molecule of the water.”

19 Issues with fuzzy information Examples: “An H-plus ion sometimes reacts with an H2O molecule” “A molecule of a Bronsted-Lowry acid can donate a proton to another substance” “The NH4 is mostly solid particles” “Some substances containing hydrogen are not acids” “An ion of a Bronsted-Lowry acid must have a hydrogen atom.” These are easy to state in CPL, but what logical representation should be produced? This is a universal problem in any kind of language interpretation We are developing solutions as we progress

20 The textbook teaches by example Example: “A molecule of a Bronsted-Lowry acid can donate a proton to another substance” “An HCl molecule in water donates a proton to an H2O molecule” “Therefore, the HCl molecule acts as a Bronsted-Lowry acid” Text is using an example to teach the student how to make similar deductions Unclear how to capture this in CPL – how to specify where the deductive chain begins? Perhaps we should skip the examples and only enter the general principles being taught?

21 Teaching by hypothetical situations Example: “[Assume that] H2O is a stronger base than X-minus in Equation 16.9” “[Assume that] X-minus is the conjugate base of HX in Equation 16.9” “H2O extracts the proton from HX in the reaction in Equation 16.9” “The reaction produces H3O-plus and X-minus” Therefore, the equilibrium is on the right side of Equation 16.9” How can CPL make it clear that this is a hypothetical? Note that X stands for a variety of atoms So the hypothetical teaches the student a useful pattern How should we represent and reason about this?

22 Real world vs. written equations The textbook switches between the real world and the syntax of written chemical equations: “If H2O (the base in the forward reaction) is a stronger base than X- (the conjugate base of HX), then H2O will abstract the proton from HX to produce H30+ and X-. As a result, the equilibrium will lie to the right.” The CPL author must be careful to mention the equation number when referring to its syntax: “In Equation 16.9 H2O is the base in the forward reaction.” “The equilibrium is on the right side of Equation 16.9”

23 Textbook figures and tables A textbook is not just text! Some figures and diagrams can be re-expressed in text, but others cannot be Tables (such as 16.4, a listing of the relative strengths of some conjugate acid-base pairs) can be converted to many lines of CPL text We are omitting sample & practice exercises from the CPL

24 Textbook to CPL: Conclusions Authors can learn to restate the essential parts of a textbook in simple CPL sentences Most textbook sentences contain ambiguous “generics” about classes of things For now, we can disambiguate generics by rewriting them as if-then rules in CPL For most CPL sentences, we can process them to get an adequate logical representation (with some user assistance) Special problems involving fuzzy statements and hypothetical examples require further work

25 Corpus Using what you know to get more Knowledge Integration Knowledge Repository “Worldview” Knowledge Acquisition Loop Robust Reasoning Tasks Introspection Solidification Loop CPL or Logic Step 2: CPL Interpretation

26 “HCl is immersed in water” Parser & LF Generator Word sense disambiguator Relational disambiguator Coreference identifier Structural reorganizer HClWater Immerse object is-inside-of (_HCl13320 instance_of HCl-Substance) (_Water13321 instance_of Water) (_Immerse13319 instance_of Move-Into) (_Immerse13319 object _HCl13320) (_Immerse13319 is-inside-of _Water13321) World Knowledge Linguistic Knowledge Step 2: CPL Interpretation (overview)

27 CPL Processing Spots coreferences across sentences Handles nominalizations (“reacts”, “the reaction”) Use of WordNet to coerce text to KB’s ontology Set of heuristics for identifying semantic relations Interprets rules, as well as ground facts Rules are in KM (inference-capable) Can be used in interactive or automatic mode

28 Integrating Knowledge from Reading into a Knowledge Base Michael Glass Boeing Phantom Works

29 Corpus Using what you know to get more Knowledge Integration Knowledge Repository “Worldview” Knowledge Acquisition Loop Robust Reasoning Tasks Introspection Solidification Loop CPL or Logic Step 3: Knowledge Integration

30 The Task After knowledge is input through CPL it must integrated with existing knowledge in the KB. This task carries with it three main problems. Missing Concepts: The new knowledge may contain concepts the KB does not have. Conflicting Concepts: The new knowledge may use concepts in the KB in an incompatible way. Elaboration Requires Updates: The knowledge base may be resistant to new knowledge.

31 1. Missing Concepts Consider: If HCl is immersed in water, then the HCl is dissolving in the water. The knowledge base does not know the concept “immerse” or “dissolve”. Even if they were added in a superficial way, the knowledge base would have no axioms allowing it to reason about immersion or dissolving. This is the simplest type of mismatch. It is relatively easy to detect. It simply requires adding the concepts to the KB or the user may restrict himself to concepts the KB already has.

32 1. Missing Concepts: Solutions Add the concepts to the KB. The concepts could be added selectively as needed and (partially) axiomatized. The concepts could be added in bulk in a vacuous way. Defining new concepts through CPL. Restrict the user to the set of concepts in the KB. In many cases a similar concept in the knowledge base may serve just as well.

33 2. Conflicting Concepts Two significant classes of conflicting concepts: Genuine Conflicting Concepts: A concept in the KB may simply not match what is intended. Naïve Encodings: The KB may expect knowledge structured in a specific way.

34 2. Conflicting Concepts Consider: If HCl is dissolving water, then each molecule of the HCl is reacting with a molecule of the water. The knowledge base has a concept of a Reaction, but the inputs are required to be Chemicals (aggregates of molecules). NOT Molecules. The reaction concept is well axiomatized, but some of these axioms are not applicable to molecules.

35 2. Conflicting Concepts This might be considered another case of missing concepts. The KB has concepts for reasoning about reactions at the macroscopic level, but not at the molecular level.

36 2. Conflicting Concepts Consider: If an HCl molecule is reacting with a water molecule, then an H-plus ion is transferring from the HCl molecule to the water molecule. Transferring is in the KB, but it is axiomatized in a way consistent with a person transferring an object to another person. The object transferred must be in the “possession” of the donor.

37 2. Naïve Encodings, A Special Case of Conflicting Concepts There can be a mismatch in the form in which a new piece of knowledge is stated and the form in which it is expected. A chemistry textbook might refer to the equilibrium constant of a reaction rather than the equilibrium constant of a equilibrium reaction. The Chemistry knowledge base expects only equilibrium reactions to have equilibrium constants.

38 2. Naïve Encodings Also the CPL parser may produce KM that is not quite right: Produced by CPL: (a Chemical with (color (*red))) Expected by KM: (a Chemical with (color ((a Color-Value with (value (*red)))))) The difference is purely structural.

39 2. Conflicting Concepts: Solutions First the conflict must be detected Constraint violations: Certainly if a reaction requires Chemical raw-materials and it is given Molecules instead, there must be a conflict. Resemblance check: If the knowledge looks totally novel, such as a Transfer from one Molecule to another, it may be a conceptual conflict. To fix a naïve encoding an automatic solution may be used to coerce the concepts into an acceptable form. (James Fan’s Loosespeak)

40 3. Elaboration Requiring Updates Two significant ways a KB may require updates to add knowledge: Closed World Assumption: The KB may implicitly or explicitly assume it already knows everything about a given topic. Unstated Assumptions: The KB designers may have built the KB with some assumptions about the circumstances it will reason about.

41 3. Updates and the Closed World Assumption Some rules in the KB close off the KB to further elaboration. The else part of an if…then…else may give a default value that prevents other components from concluding anything else.

42 3. Unstated Assumptions Often a knowledge base will be built with a set of unstated assumptions about how it will be used. These assumptions may lead to assertions in the knowledge base stated as always true, but really true only under certain circumstances. This can result in knowledge that is difficult to extend.

43 3. Unstated Assumptions: Example Consider a knowledge base meant to deal with chemical reactions at room temperature and standard pressure. This knowledge base might include the assertion that water is a liquid. (every Water has (state (*liquid))) However, if the scope of the knowledge base is extended to consider a range of temperatures, the knowledge base is not simply incomplete, but wrong.

44 3. Elaboration Requiring Updates: Solutions First the conflict must be detected. The knowledge base could forward chain for a while on concepts related to the new knowledge. Resolving the conflict is difficult in general. It may entail a case of the attribution problem.

45 Conclusion and Future Directions Missing Concepts Allow additions to ontology through CPL Naïve Encodings Loosespeak interpreter Genuine Conflicting Concepts and Elaboration Requiring Updates Detection through constraint checks, resemblance check and forward chaining. Correction assisted by feedback.

46 Acquiring Knowledge from Reading Phil Harrison Boeing Phantom Works

47 Corpus Using what you know to get more Knowledge Integration Knowledge Repository “Worldview” Knowledge Acquisition Loop Robust Reasoning Tasks Introspection Solidification Loop CPL or Logic Step 1: Unrestricted NL to CPL - an additional approach

48 The Problem Uncontrolled (non CPL) text is difficult for computers to analyze. Advances are needed in all areas of NLP: grammar, WSD, semantic representation, and discourse processing. A method is needed for extracting as much knowledge as possible from text.

49 Tuple extraction The Parse trees are a source of “head- complement” or “head-modifier” relations: From “The heavy man bought an expensive book”  (S “man” “buy” “book”) (AN “heavy” “man”) (AN “expensive” “book”) “Books can be bought” “Men can be heavy” “Books can be expensive” Even incorrect parses can generate some valid tuples.

50 Examples from chemistry Acids can be strong or weak. Acids can be vitamins. Acids can release ions. Bases can turn litmus. Carbonates can form CO2. Chlorides can become ions. Concentrations can be measured.

51 Uses of tuples Extracted tuples can guide parsing. Human review of tuples is necessary. Iteration over a corpus improves accuracy and allows more knowledge extraction. The generic sentences derived from tuples can be integrated into a knowledge base.

52 Interpretation of generic sentences Generics are characterized by the use of bare plurals, mass terms, or adverbs of quantification. Examples: “Birds fly” “Ice is cold” “John usually drinks a beer” Interpretation of generics is problematic, but a substantial amount of work has been done.

53 Summary This is not a method for full and general reading. But, can potentially acquire certain types of knowledge very quickly Illustrated in the domain of chemistry

54 Overall Future Directions CPL extensions CPL processing of generics Appropriate structure of KB for effective knowledge integration Degree of automation possible now, 5 years from now Evaluation


Download ppt "Reading to Learn Q1 review (7/6/05) Peter ClarkMichael Glass Phil HarrisonTom Jenkins John ThompsonRick Wojcik Boeing Phantom Works."

Similar presentations


Ads by Google