Reading to Learn Q4 Review Peter Clark John Thompson Phil Harrison Bill Murray.

Slides:



Advertisements
Similar presentations
UNIT IV: Tutorial 14 - Part II.
Advertisements

Modern Theories of Acids & Bases
Sec. 18.1: Acids & Bases: An Introduction
Acids & Bases.
Acids And Bases Chemistry Ms. Piela.
Acids and Bases. Acids & Bases These were introduced in Chapter 4 Arrhenius: Acid = any substance that produces H + in soution. Base = any substance that.
Acids bases & salts.
Acids  Taste sour  Reach with certain metals (Zn, Fe, etc.) to produce hydrogen gas  cause certain organic dyes to change color  react with limestone.
Acids & Bases Properties Acid-Base Theories Acid-Base Reactions.
Chapter 16 Acid-Base Equilibria
Chapter 1611 Chapter 16 Acid-Base Equilibria CHEMISTRY The Central Science 9th Edition.
Chapter 14 Preview Lesson Starter Objectives Acids Bases
AP Chem Acids/Bases Thursday, April 12, May 17, When asked to give conjugate base or acid of a species, remember: Conjugate acid is simply.
Year 12 Chemistry. He classified all chemicals into three categories – acids, bases and salts He classified all chemicals into three categories – acids,
Modern Theories of Acids & Bases The Arrhenius and Bronsted-Lowry Theories.
Chapter 14: Acids and Bases. Initial concepts of Acids and bases First, acids were recognized as substances with a sour taste, but this was a dangerous.
Chapter 2 Acids & Bases. 2 Arrhenius acids and bases Bronstead-Lowry acids and bases Acids and Bases Acid-base systems:
Chem-To-Go Lesson 38 Unit 10.  Both acids and bases ionize or dissociate in water  Acids: taste sour, conduct electricity, cause certain indicators.
Copyright © McGraw-Hill Education. Permission required for reproduction or display Chapter 13: Acids and Bases.
Chapter 6 (CIC) and Chapter 16 (CTCS) Read in CTCS Chapter Problems in CTCS: 16.3, 5, 7, 9, 11, 15, 19, 21.
Acid and Base Equilibrium. Some Properties of Acids Produce H 3 O + ions in water (the hydronium ion is a hydrogen ion attached to a water molecule) Taste.
The Reading to Learn Project Peter Clark Phil Harrison Tom Jenkins John Thompson Rick Wojcik (Boeing Phantom Works) David Israel (SRI)
Bronsted-Lowry Acid – Base Reactions Chemistry. Bronsted – Lowry Acid Defined as a molecule or ion that is a hydrogen ion donor Defined as a molecule.
Acids and Bases: Introduction Section Objectives Identify the physical and chemical properties of acids and bases Classify solutions as acidic,
Year 12 Chemistry. He classified all chemicals into three categories – acids, bases and salts He classified all chemicals into three categories – acids,
Acid-Base Theories The “Boyz”. Acid and Base Theories2 Arrhenius Theory of Acids Acid: molecular substances that breaks-ups in aqueous solution into H+
Acids & Bases Properties Acid-Base Theories Acid-Base Reactions.
1 Acids, Bases and Salts Version Acid Properties sour taste change the color of litmus from blue to red. react with –metals such as zinc and magnesium.
Reading to Learn Q3 Review Peter Clark John Thompson Tom Jenkins Phil Harrison Bill Murray.
11111 Chemistry 132 NT A man is rich in proportion to the number of things he can let alone. Henry David Thoreau.
1 Reactions in Aqueous Solutions I. 2 Properties of Aqueous Solutions of Acids & Bases Acidic properties taste sour change the colors of indicators turn.
ACIDS, BASES & SALTS The Arrhenius Theory of Acids and Bases.
Chapter 15 Acids and Bases Examples of acids: Vinegar Lemon Juice Soft Drink Battery Acid Stomach Acid Apple Juice Black Tea.
Review Acids and Bases. Acids taste ______ and bases taste _______? Sour, bitter.
1 - SCH3U1 - Acids and Bases Sections Learning Goals 1.What is Arrhenius's definition of an acid? A base? 2.What is the Brønsted-Lowry definition.
Acids 1.Aqueous solutions of acids have a sour taste. 2.Acids change the color of acid-base indicators. 3.Some acids react with active metals and release.
(8.2) Weak Acids & Bases: Ionization Constants. Percent Ionization for Weak Acids Most weak acids ionize < 50% Percent ionization (p) General Weak Acid:
Acids and Bases All you ever wanted to know, and more!
Acid/Base Properties In the past, we have classified acids and bases according to their observed properties ACIDS BASES Sour tastebitter taste Watery.
16.1 Properties of Acids and Bases
© Houghton Mifflin Harcourt Publishing Company Preview Lesson Starter Objectives Acids Bases Arrhenius Acids and Bases Chapter 14.
Acids and Bases – Acid Strength and K a.
Chapter 14 Acids, Bases, and pH. Objectives 14.1 Distinguish acids from bases by their properties 14.1 Relate acids and bases to their reactions in water.
Definition of Acids Traditional (Arrhenius)- a chemical compound that contains hydrogen and ionizes in aqueous solutions to form hydrogen ions Examples:
Ch 9: Acids, Bases and Salts Suggested Problems: 2, 6, 10, 12, 28-44, 82, , Bonus: 118.
Acids and Bases Ch.14/15. The Battle to define them Arrhenius was first in 1884 Acids: something that produces H + ions in solution. Bases: something.
Acids & Bases Chapter 15 & 16. Acids Have a sour taste Affect indicators React with bases to produce salt & water Conduct an electric current Examples.
© Houghton Mifflin Harcourt Publishing Company Acids 1.Aqueous solutions of acids have a sour taste. 2.Acids change the color of acid-base indicators.
“K” Chemistry (part 3 of 3) Chapter 15: Acids and Bases.
Classifying Acids and Bases. Acid and Base Theories  There are different ways of defining what an acid and base is 1. Arrhenius 2. Bronstead Lowry.
Acids – Quick Survey of General Features 1.Aqueous solutions of acids have a sour taste. 2.Acids change the color of acid-base indicators. CHEMISTRY CHAPTER.
Properties of Acids and Bases
Acids and bases Chapter 19.
Acids and Bases Bronsted Lowry Acids and Bases Autoionization of Water
The Nature of Acids and Bases - Acid Strength and the Acid Ionization Constant (Ka) Rachel Pietrow.
Unit 4: Equilibrium, Acids & Bases Part 2: Acids and Bases
Section 1 Properties of Acids and Bases
Acid/Base Equilibria Notes Part 1: The 3 Acid/Base Definitions, Hydronium, Conjugate Acid/Base Pairs & their Relative Strengths March 23, 2018.
General Characteristics
Acids and Bases.
Acids and Bases.
ACIDS and BASES.
Unit 4: Equilibrium, Acids & Bases Part 2: Acids and Bases
ACIDS and BASES Chapter 19
Reading to Learn Q2 review (10/17/05)
CHM 101 Sinex Acids and Bases Ch. 19.
Unit 12: Acids, Bases, and Salts
Strong Acids Ch
ACIDS and BASES.
Unit 12: Acids, Bases, and Salts
Presentation transcript:

Reading to Learn Q4 Review Peter Clark John Thompson Phil Harrison Bill Murray

Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

SRI-Boeing’s Reading to Learn Seedling Goal: –study issues in learning through reading by working with a reduced version of the problem, namely working with controlled, rather than unrestricted natural language. The NLP task is factored into two: full NL → CL, CL → logic Rationale: –by sidestepping some of the shallow linguistic issues of full NLP, can focus on deeper issues –methods for full NL → CL can be studied separately this project

SRI-Boeing’s Reading to Learn Seedling Approach: –Rewrite 5 pages of chemistry text into our controlled language, CPL –Extend and use our CPL interpreter to generate logic –Integrate this new knowledge with an existing chemistry knowledge base (from the Halo Pilot), which has the new knowledge surgically deleted from it –Report on the problems encountered and solutions developed

This Seedling in Mobius Knowledge Integration Introspection Natural Language Processing Test Generation This seedling

Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

Recap: October 2005 Tutorial on the 5 pages of chemistry text –Acid-base reactions, proton transfer Where is that knowledge in the text? –Wanted: Clear, declarative statements –Got: obscure/missing/complex/indirect Where is that knowledge in the Halo KB? –Wanted: Modular, constructed from general pieces –Got: buried in procedures and code Very hard to ablate or extend –Suggestions for a better KB structure

(every Compute-Conjugate-Acid has (input ((a Chemical with (plays ((a Base-Role)))))) (parent_formula ((the term of (the nested-atomic-chemical-formula of (the has-basic-structural-unit of (the input of Self)))))) (target-unit ((if (the parent_formula of Self) then (:set (#'(LAMBDA () (GET-CONJUGATE-ACID-ATOMIC-FORMULA-BACK (KM0 '(|the| |parent_formula| |of| |Self|))))))))) (output ((if (oneof (the input of Self) where (It isa H2O-Substance)) then (a H3O-Plus-Substance) else ((forall (allof2 (the target-unit of Self) where ((not (It2 = (the parent_formula of Self))))) (the output of (a Identify-Chemical with (input ((a Chemical with (has-basic-structural-unit ((the output of (a Identify-Chemical-Entity with (input ((a Chemical-Entity with (nested-atomic-chemical-formula ((a Chemical-Formula with (term (It))))))))))))))))))))))) ? “An acid = a base + a proton”

(every Acid-Role has (intensity ( (a Intensity-Value with (value ( (:pair ;; Case statement for Acids. (if ((the played-by of Self) isa Ionic-Compound-Substance) then (if (((the played-by of Self) isa HCl-Substance) or ((the played-by of Self) isa HBr-Substance) or ((the played-by of Self) isa HI-Substance) or ((the played-by of Self) isa HClO3-Substance) or ((the played-by of Self) isa HClO4-Substance) or ((the played-by of Self) isa H2SO4-Substance) or ((the played-by of Self) isa HNO3-Substance)) then *strong else (if (((the played-by of Self) isa H3PO4-Substance) or ((the played-by of Self) isa HF-Substance)or ((the played-by of Self) isa HC2H3O2-Substance) or ((the played-by of Self) isa H2CO3-Substance)or Relative strengths of different acids

Two CPL versions: (i) close to text (ii) close to inference –Predictable performance Discussion of “bridging the gap” Recap: March 2006 IF there is an equation of a reaction AND a first chemical entity has a chemical formula AND a second chemical entity has a second chemical formula AND the first chemical formula is part of the left side of the equation ….. THEN the direction of the reaction is right AND the equilibrium side of the reaction is right. Manually bridging the “gap”

Inference-Supporting CPL: Predictable Performance Conjugate pairs Relative strengths Labelling acid/bases in a reaction Computing direction of the reaction Giant KM procedure for formula manipulation Qualitative absolute strengths (strong/weak/negligible) + qualitative comparison Giant KM procedure for reaction manipulation KM rule TaskHalo KB Lookup table Relative strength assertions if-then rule using conjugate pairs if-then rule CPL More general ≈ ≈ (equivalent)

Questions and Tasks from Last Time Analysis of “the gap” –What is the nature of the gap? –Can we characterize it? –Can we quantify it? AP chemistry vs. grade-school biology –How does the gap look in different texts? Domains? –What are the fundamental problems? –How severe are they? –How might they be overcome? Case Studies –Given text/naïve CPL formulation A Inference-capable target B –What knowledge is needed to get from A to B? –How much can be pump-primed, how much bootstrapped?

I: Understanding Language Knowledge Integration Introspection Natural Language Processing Test Generation This seedling

Natural and Controlled Languages Where is Reading to Learn/Mobius’s Achilles’ heel? –Schubert: “Dealing with real natural language” –Not (just) the grammatical complexity –It is the imprecision, messiness, incompleteness, and erroneous nature of real language Two styles of CPL usage: (i) As a declarative rule language (ii) As grammatically simpler real language Worked with both within this Seedling (i) does inference, but is far from original text (ii) is close to the text, but barely supports inference

(i) CPL as a declarative rule language “IF a first chemical is stronger than a second chemical AND the second chemical is stronger than a third chemical THEN the first chemical is stronger than the third chemical.” “IF there is an equation of a reaction AND a first chemical entity has a chemical formula AND a second chemical entity has a second chemical formula AND the first chemical formula is part of the left side of the equation AND the second chemical formula is part of the right side of the equation AND the first chemical entity is playing a base role AND the second chemical entity is playing a base role AND the first chemical entity is stronger than the second chemical entity THEN the direction of the reaction is right AND the equilibrium side of the reaction is right.”

(ii) CPL as grammatically simpler real language Acids have a sour taste. Acids cause some dyes to change color. Bases have a bitter taste. Bases have a slippery feel. All acids contain hydrogen. 37 percent of the mass of concentrated hydrochloric acid is HCl. The concentration of HCl in concentrated hydrochloric acid is 12 M. HCl reacts with NH3 without an aqueous solution. The reaction transfers a proton from an HCl molecule to an NH3 molecule. The "HX" in Equation 16.6 donates a proton. The donating leaves behind an X-minus ion. The X-minus ion plays a Bronsted-Lowry base in the reverse reaction. The H2O molecule in Equation 16.6 accepts a proton. The accepting produces an H3O-plus ion.

Two Paths from Language to Logic… Declarative CPL rules Inference- supporting Representation “The Knowledge Gap” Real Text Real(istic) CPL Text Literal/messy logic representation

“Israel’s Problem” Real(istic) CPL Text Inference- supporting Representation “The Knowledge Gap” Real Text Literal/messy logic representation Assume a perfect algorithm for English to (literal-like) logic. Are you done? Declarative CPL rules

Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

An Analysis of the Gap What is the nature of the gap? Can we characterize it? Can we quantify it? How does the gap look in different texts? Domains? What are the fundamental problems? How severe are they? How might they be overcome?

Analysis Looked at these phenomena in two sets of text –5 target pages of AP chemistry –5 pages of grade-school level biology from the Web, about the heart and its function Categorization of main causes Loose quantification of their frequency

9 Fundamental Causes of the Gap 1.Many idiomatic words/phrases, each requiring a theory 2.Some knowledge is taught by example 3.Much important knowledge is conveyed by diagrams and tables 4.Generic sentences are ubiquitous 5.Some text teaches problem-solving knowledge 6.Discourse context is important (need sentence context) 7.Many sentences pose major representational challenges 8.Math/Algebraic models are extremely challenging 9.Text is full of ambiguity, metaphor and metonymy/loosespeak

1. Idiomatic/special-purpose words/phrases Many words/phrases require special interpretation –Breadth requirement is very challenging! 70% in chem, 40% in bio –Chemistry “The reaction favors transfer of…” “From the earliest days of experimental chemistry…” “The ion, however, more closely represents reality” “When we closely examine the reaction…” “According to their definition…” –Biology “This is important for the cells to do their work.” “On its way back to the heart…” “The right-side pumps stale blood…” “to smaller and smaller branched tubes…”

2. Examples Examples play a key role in human teaching How important are these for a machine? –Consolidation, verification, disambiguation? 35% chem, <5% bio

3. Diagrams and Tables “Teaching” how to compute conjugate acid/base pairs Relative strengths of acids 10% in chemistry but key ones!!! Incidental in bio. Show-stopper for some needed knowledge

4. Generics Reference to a collection rather than individual object Ubiquitous! 90% chemistry, 95% biology –Chemistry “Acids cause certain dyes to change color” “Acids have a bitter taste” “A substance that is …. is called amphoteric” –Biology “The blood leaving the aorta is full of oxygen” “Veins have thin walls” “The heart pumps blood to your lungs”

Why are generics hard? Quantification “Acids contain hydrogen.” Fuzzy quantifiers “An HO 3 + ion sometimes reacts with three H 2 0 molecules” Presuppositions “HCl dissolves in water.” “Acids cause some dyes to change color” “Acid irritates the skin” Need background knowledge! IF an acid touches some skin THEN that skin is irritated” or more generally “IF acid + skin are related in way where irritation may plausibly occur… THEN it will occur.”

5. Needing/Teaching Problem-Solving Knowledge Problem-solving knowledge –Chemistry (20%) biology (<5%) Worse, is often not even explicit in the text, e.g.:

6. Discourse Context Can we take sentences in isolation? (“bag of lines”) Obstacles: –Pronoun resolution (30% chem, 50% bio) –Context: unqualified compound nouns (most) “Every [Bronsted-Lowry] acid has a conjugate [BL] base” “The [human] heart…The [human] arteries…” –Other dependencies (15% chem, <5% bio) “Therefore, HX is the Bronsted-Lowry acid” “The other conjugate acids are HS -, PH 3 and CO 3 2- ”

Discourse Context (Biology) Sentences stand on their own more often, e.g.,

7. Major Representational Challenges Hard to quantify: ~70% chem, ~40% bio Potentiality: –an acid is a substance (molecule or ion) that can donate a proton to another substance. Likewise, a base is a substance that can accept a proton.” Conveying a proof: Imprecision and comparatives: –“About 37% by mass” –“Interacts strongly” –“The aorta is the largest artery in the body”

8. Math/Algebraic models ~65% chemistry use or manipulate formulae “NaOH dissociates into Na + and OH - ions.” “An H + ion is simply a proton with no electrons” “HX and X - differ only in the presence of a proton” Challenges –Relating the symbol system to the real world –Defining and apply operations on the symbol system –Relating those operations to the real world

Math/Algebraic models (cont) Minimal in grade-school biology –nearest is rates and measures “the heart contracts 70 times a minute” “The plasma is 95% water and the other 5% of dissolved substances” “In an adult’s body there is 10.6 pints of blood”

9. Loosespeak (metaphor, metonymy, etc) –Where a “literal interpretion” is incorrect Ignoring overgenerality –In these texts, 30% chem, 10% bio Probably both higher in general –Chemistry The molecule, substance, symbol distinction – Huge! Accounts for ~50% of the complexity of Halo KB. In other texts (not this one) metaphor also used often basic-unit “HC 2 H 3 O 2 (aq)+…C 2 H 3 O 2 - ” formula

Loosespeak (metaphor, metonymy, etc) Biology: metaphor more common “Your heart’s job is to pump blood” “Blood delivers oxygen…On the return trip, the blood picks up waste products”

Loosespeak (metaphor, metonymy, etc) Analysis by Univ Texas at Austin (chemistry) –Loosespeak is everywhere!

Relative Frequency of Phenomena

Relative frequency of phenomena AP Chemistry (5 pages)Grade-school biology (5 pages)

Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

Case Study 1 of the Gap AP Level Chemistry

Some acids are better proton donors than other acids. Some bases are better proton acceptors than other bases. The conjugate base of a strong proton donor is a weak proton acceptor. The conjugate acid of a strong proton acceptor is a weak proton donor. A stronger acid has a weaker conjugate base. A stronger base has a weaker conjugate acid. A stronger acid is a better proton donor. A stronger base is a better proton acceptor. Original English CPL (like) How do we bridge the gap?

From Original English to CPL - 1 Resolve “others” to mean “other acids/bases” Use “likewise” to guide a parallel construction Need to represent “some,” “other,” “better” Assumes a scale of ability to donate/accept

From Original English to CPL – 2a Need to interpret “If we do X, we find that Y” as a mental exercise that draws a conclusion Need to have a concept of an ordering based on some ability (to donate a proton) Resolve “their ability” back to types of acids Resolve quantification – one proton per instance of acid molecule

From Original English to CPL – 2b Here “a substance” means an acid molecule Need to handle jumps between substance-level and molecule-level references Need to interpret “the more readily an A does B, the less readily a C does D” Need a model of two qualitative scales of ability, with an inverse relationship Resolve “its conjugate base” back to the acid

From Original English to CPL – 3 “Similarly” is a cue for a parallel construction Other issues are the same as in the previous sentence (inverse qualitative scales)

From Original English to CPL – 4a “In other words” is a cue for another view of the same knowledge in the previous sentence “the more readily an acid gives up a proton” = “the stronger an acid” Related qualitative scales again “the stronger an acid” is special syntax

From Original English to CPL – 4b Semicolon here denotes parallel constructions This is also another view of the same knowledge in the previous sentence “the more readily a base accepts a proton” = “the stronger a base” Inverse qualitative scales again

Overall Interpretation (sketch) Acid readily gives up a proton Acid strength Conjugate base readily accepts a proton Conjugate base strength “In other words”: “Similarly”: replace acid with base, replace conjugate base with conjugate acid inverse parallel

From Original English to Inference- Supporting Logic: Knowledge Requirements Discourse Knowledge: –Pragmatic knowledge for pronoun resolution –Ability to recognize and match parallel constructions E.g., with cue words Both within and across sentences –Ability to recognize a mental exercise (“if we do …”) Domain Knowledge: –Models of qualitative scales and relationships between two scales –Knowledge to handle substance/molecule metonymy –Models of abilities & give/receive –World knowledge to help resolve quantification e.g., one proton per molecule makes most sense

Case Study 2 of the Gap Grade-School Biology

Grade-school Biology Searched the Web, found 4 simple texts about the human heart and its function They are much simpler than our college chemistry text, but still exhibit lots of interpretation issues Only a few sentences from each text happened to be in pure CPL syntax By the time science is taught in school, the students are beyond the Dick & Jane reading level

Grade-school Biology Syntax - 1 Pronouns are everywhere –“Your heart is divided into two sides.” [anyone’s heart] Dependent clauses are common –“As blood begins to circulate, it leaves the heart …” –“… fresh oxygen that we have inhaled …” Conjunctions appear between various expressions –“… the vessels and the muscles that help and control …” –“Lizards don’t have hair or feathers … and can’t sweat …” Comparatives are common –“The tubes that more gently drain back to the heart …” Approximations are common –“… some 70 or so times a minute at rest …”

Grade-school Biology Syntax - 2 Negatives are sometimes used –They do not work on their own, but together as a team.” Phrases often modify other terms –“The blood leaving the aorta is full of oxygen.” –“On its way back to the heart, the blood travels …” Infinitives are sometimes used –“This is important for the cells … to do their work.” Parenthetical expressions are sometimes inserted –“… the carbon dioxide (a waste product) is removed...” –“… times a minute – more if you are exercising – and …”

Grade-school Biology Syntax - 3 Rhetorical questions to the reader –“Did you know that your heart is the strongest muscle?” Modals are sometimes used –“… so that your body can get rid of them.” –“… your blood vessels could circle the globe 2 ½ times!” Phrases about what something is called –“… a colorless liquid called plasma.” Omitted words –“… the other two [cavities] are called ventricles …” Adverbs, complex phrases, and other minor issues

Grade-school Biology Semantics Analyzed sample grade-school biology texts about the heart and circulation What commonsense knowledge is needed to correctly understand the text? –What pump-primed models would be needed? –What underlying knowledge could come from bootstrapping? As from tuple extraction from general texts

Rhetorical question – skip “Did you know that” “your heart” = a person’s heart (anatomy context) “strongest muscle” [in same body] (anatomy context) Build in pragmatics of reading for an anatomy context Knowledge: basic anatomy (bootstrapped)

“divided into” = partitioned (word sense for anatomy) “two sides” = two compartments (anatomy/container) Knowledge: Container/compartments model (pump-primed)

“right side” = [of the heart] (model of left/right parts) “pumps blood” = continuous process (anatomy) “to your lungs” could mean it fills up the lungs! what is “it”? – right side, or blood, or lungs? “picks up” = metaphor for absorbs (anatomy context) Knowledge: Containers, pumps, liquids (pump-primed)

“left side” = [of the heart] “oxygen-soaked blood” – but a liquid is already wet – Would like a model of blood cells, soaked in oxygen (fluid) – Not provided here, so just assume blood absorbed oxygen – Resolves previous sentence: pronoun “it” = blood Knowledge: model of left/right parts (pump-primed) “out” - liquid flow in & out of containers (pump-pr.)

“They” = the two sides of the heart (difficult) Rely on discourse pragmatics Knowledge: “work on their own” vs. “together as a team” Doing something alone vs. cooperating in an effort

“The body’s blood” = all its blood as a single blob Knowledge: “circulated through” - model of closed fluid circulation “1,000 times per day”- model of repeated events per time period

“five and six thousand” = 5 ≤ x ≤ 6,000? Use pragmatics to get: 5,000 ≤ x ≤ 6,000 “pumped each day” -- by which side? Or both sides? Could pose question: How much blood does a body contain? – 5 to 6 quarts (inference needed) Knowledge: Fluid flow, iteration, time periods

“your fist” -- interesting object, involves a pose Knowledge: “about the same size as” – model of comparative sizes

Summary of Biology Semantics Pragmatics for an anatomy context Pump-primed models: –Container & compartments & left/right parts –Continuously repeated biological events –Pumps & liquids & closed circulation –Working together vs. alone –Body parts in poses & comparative sizes Bootstrapped models: –Basic anatomy Some difficult pronoun resolutions

Grade-School Biology Conclusions Lots of pump-primed knowledge needed Bootstrapped knowledge can help Even grade-school texts have significant challenges Pragmatics need to be built in to NLP engine Is still substantially easier than AP chemistry!

Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

Dimensions of Difficulty Complexity of Knowledge Educational Level of Text Grade-schoolCollegeElementary Grade-school biology AP Chemistry

Two Dimensions of Difficulty Dimension 1: Domain Chemistry (hardest) –Algebraic manipulation, chaining, procedures –Not so much “common sense” Physics –Map situations onto a few equations Biology (easiest) –Memorize and compare structures and functions

Two Dimensions of Difficulty Dimension 2: Educational Level College level (hardest): –Sophisticated writing styles –Often includes mathematical abstractions –Attempts to challenge the student –Problem-solving Grade-school level (easier): –Simpler sentence structures –Teaches common world knowledge –No/little mathematics –Learning basic facts

Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

II: Integrating Knowledge Knowledge Integration Introspection Natural Language Processing Test Generation This seedling

Knowledge Integration: Principles for an Extensible KB The Halo KB was not easily extensible What should it have looked like?

Five Principles for an Extensible KB 1. Need Metonymy-Tolerant Repns The precision that logic requires of our written representations is a fundamental barrier to robustness IF “the acid on the left” is stronger than “the acid on the right” THEN the reaction direction is “to the right” “the acid denoted by the formula on the left side of the equation of the reaction” Alternative: –Preserve metonymy in the KB –Have it resolved at reasoning time

(every Compare-Relative-Strengths-of-Acids has (output ((if (((the1 of (the value of (the intensity of (the Acid-Role plays of (the first of (the input of Self)))))) = *strong) and ((the1 of (the value of (the intensity of (the Acid-Role plays of (the second of (the input of Self)))))) /= *strong)) then (the first of (the input of Self))))) (every Compare-Relative-Strengths-of-Acids has (output ((if ((the intensity of (the first of (the Chemicals)) = *strong) and ((the intensity of (the second of (the Chemicals)) /= *strong) then (the strongest of (the Chemicals)) = (the first of (the Chemicals))))) 1. Metonymy-Tolerant Repns (cont) if we had a metonymy-tolerant reasoner, we could instead write…

1. Metonymy-tolerance: Need Background Knowledge! Mixing chemical, molecular, and formula views Need background K to untangle the mess basic-unit “HC 2 H 3 O 2 (aq)+…C 2 H 3 O 2 - ” formula Note the fluidity of reference in written English!!!

2. Need to Separate Declarative and Procedural Knowledge input: a Base-Chemical output: convert Chemical → Molecule → Formula, append “H”, then → Molecule’ → Acid-Chemical Procedural: (Conjugate-Acid calculation) Declarative: Acid-Chemical = Base-Chemical + H + constraint reasoner to solve constraints

2. Need to Separate Declarative and Procedural Knowledge (cont) “Every acid has a conjugate base, formed by removing a proton from the acid.... Similarly, every base has associated with it a conjugate acid, formed by adding a proton to the base.” Acid-Chemical = Base-Chemical + H The English text often doesn’t help…

3. Syntactic Organization Matters! Elaboration tolerance: –Add/modify knowledge (semantics) by (only) adding formulae (syntactics) (every Acid-Role has (intensity ( (a Intensity-Value with (value ( (:pair ;; Case statement for Acids. (if ((the played-by of Self) isa Ionic-Compound-Substance) then (if (((the played-by of Self) isa HCl-Substance) or ((the played-by of Self) isa HBr-Substance) or ((the played-by of Self) isa HI-Substance) or ((the played-by of Self) isa HClO3-Substance) or ((the played-by of Self) isa HClO4-Substance) or ((the played-by of Self) isa H2SO4-Substance) or ((the played-by of Self) isa HNO3-Substance)) then *strong else Not elaboration-tolerant

3. Syntactic Organization Matters! Better…. intensity(HCl-Substance, *strong) intensity(HBr-Substance, *strong) intensity(HI-Substance, *strong) intensity(HClO3-Substance, *strong) intensity(HClO4-Substance, *strong) intensity(H2SO4-Substance, *strong) intensity(HNO3-Substance, *strong) … intensity(HF-Substance, *weak) intensity(HC2H3O2-Substance, *weak) intensity(H2CO3-Substance, *weak) … Elaboration-tolerant

4. Use a linguistically motivated ontology Key: mapping from English words/phrases to knowledge-base concepts Good: Words and concepts match easily: Less good: Linguistic concepts are missing Even worse: Different conceptual view in the KB HCl-Substance ↔ “HCl” Easy Direction of equilibrium: Attached to reaction, not eqn, in KB *strong/*weak/*negligible ↔ “HCl is stronger than H 2 O”

4. Use a linguistically motivated ontology Key: mapping from English words/phrases to concepts Good: Words and concepts match easily: –HCl-Substance ↔ “HCl” Less good: Linguistic concepts are missing –strong/weak/negligible↔“HCl is stronger than H 2 O” Even worse: Different conceptual view in the KB –Direction of equilibrium: Attached to reaction, not eqn, in KB

5. Need Error-Tolerant Reasoning KM can go belly-up with a contradiction Rather need to detect and correct contradictions –Detect: explore (ruminate), not just myopic backchaining richer background knowledge –Correct: reasoner supports suspension of assumptions/rules (TMS?) search mechanism to control this

Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary: Findings, Products, and Recommendations

Knowledge Mining There is a largely untapped source of general knowledge in texts, lying at a level beneath the explicit assertional content, and which can be harnessed. “The camouflaged helicopter landed near the embassy.”  helicopters can land  helicopters can be camouflaged Schubert’s Conjecture: Our attempt: “lightweight” LFs generated from Reuters LF forms: (S subject verb object (prep noun) (prep noun) …) (NN noun … noun) (AN adj noun)

Knowledge Mining HUTCHINSON SEES HIGHER PAYOUT. HONG KONG. Mar 2. Li said Hong Kong’s property market remains strong while its economy is performing better than forecast. Hong Kong Electric reorganized and will spin off its non-electricity related activities. Hongkong Electric shareholders will receive one share in the new subsidiary for every owned share in the sold company. Li said the decision to spin off … Newswire Article Shareholders may receive shares. Companies may be sold. Shares may be owned. Implicit, tacit knowledge

Knowledge Mining – our attempt ;; Atoms can combine (S "atom" "combine") ;; For example, combustion reactions are redox reactions because elemental oxygen is converted to compounds of oxygen (Section 3.2). (S "reaction" "be" "reaction") (S-ADJ "oxygen" "converted" ("to" "compound")) (AN "elemental" "oxygen") ;; Plan: Metals react with acids to form salts and gas. (S "metal" "react" (PP "with" "acid")) ;; Extensive oxidation can lead to the failure of metal machinery parts or the deterioration of metal structures. (S "oxidation" "lead" (PP "to" "failure")) (S "oxidation" "lead" (PP "to" "deterioration")) (AN "extensive" "oxidation") Fragment of the raw data (Brown & Lemay)

Agenda Introduction Recap – The Story So Far The “Knowledge Gap” –Overview –Characterization and analysis –Quantification Two Case Studies –AP chemistry –Grade-school biology Dimensions of Difficulty Principles for an Extensible KB Knowledge Mining Summary

Summary: Overall Findings and Products CPL: two formulations –"naive CPL": 275 sentences –rule-language CPL: ~15 complex rules CPL language interpretation algorithm Understanding Language –Characterization and quantification of the main challenges –Detailed case studies on the five pages Integrating Knowledge –Characterization of the main challenges –Set of principles for overcoming them –Study and algorithms for some of them Bridging the Gap: Useful conceptual framework Text Mining –2 tuple databases: 15k chemistry, 25k biology

Summary: Recommendations for Mobius Significant work needed on –math/symbol manipulation –handling generics –idiomatic words/phrases –Loosespeak Cycle, not just bottom-up/top-down! Discourse structure needs to be taken seriously –Not just individual sentences Need some radical KB changes –extensible units of knowledge, not intertwined structures –Error-tolerant/Robust reasoning