Reasoning From (Not Quite) Text Henry Lieberman (with Catherine Havasi, Rob Speer, Ken Arnold, Dustin Smith) MIT Media Lab & Mind-Machine Project Cambridge, Mass. USA
Big success for Reasoning with Text this week! Yow!
For the play-by-play….
We need better mechanisms for reasoning! Clue: It was this anatomical oddity of US gymnast George Eyser.... Ken Jennings' answer: Missing a hand (wrong) Watson's answer: leg (wrong) Correct answer: Missing a leg
Turing’s Dream & Knowledge Challenge - Schubert Natural language is a pretty damn good knowledge representation language Has capabilities that formal KR doesn’t Resist the urge to “simplify so the computer can understand it” Don’t be so afraid of the Ambiguity bogeyman
Capabilities of Natural Language Representations Generalized Quantifiers Event/Situation Reference Modification (of Predicates & Sentences) Reification (of Predicates & Sentences) Metric/Comparative Attributes Uncertainty & Genericity Metalinguistic Capabilities
Textual Entailment Yow!
Logical Reasoning: Classic example Birds can fly. Tweety is a bird. Therefore… Tweety can fly.
Logical Reasoning: Not-so-classic example Cheap apartments are rare.
Logical Reasoning: Not-so-classic example Cheap apartments are rare. Rare things are expensive.
Logical Reasoning: Not-so-classic example Cheap apartments are rare. Rare things are expensive. Therefore… Cheap apartments are expensive. So, exactly what was wrong with that??
Yeah, what's wrong with that? Logicians say: Not the same sense of "rare", "expensive", etc. I say: Maybe, but punts the problem of translating language/Commonsense to logic Logic is about possible inference; Common Sense is about plausible inference
Not so interested in absolute truth as we are in… Plausibility (not necessarily Probability) Similarity Analogy Relevance Computing "intangible" qualities (affect, point of view, connotation, overall "sense")
Logical vs. Commonsense knowledge Precise Vague Formal Natural language Experts General public Explicit Implicit Consistent Possibly contradictory Up-front organization Back-end organization
Logical vs. Statistical Reasoning Big debate, much hot air We need to fill in the gap between them Word occurrences are weak evidence Symbolic expressions much stronger But how do you combine lots of them?
Open Mind Common Sense http://openmind.media.mit.edu
Open Mind Common Sense “The Wikipedia version of Cyc” since 2000 1 Million English statements, other languages How much Commonsense does an average person know? 1 human lifetime = 3 billion seconds Less than a billion - Maybe 100 million How much domain knowledge does a single expert know? Less than a million - Maybe 10-100 thousand
Open Mind Commons - Speer
Granularity How much parsing should you do? Stemming, Lemmatizing, Chunking, Tagging, … Something’s lost and something’s gained Adjustable granularity
Effect of the parser
ConceptNet relations
ConceptNet - Liu, Singh, Eslick
AnalogySpace – Speer, Havasi
What AnalogySpace can do It can generalize from sparsely-collected knowledge It can identify the most important dimensions in a knowledge space It can classify concepts along those dimensions It can create ad-hoc categories (and classify accordingly) It can confirm or question existing knowledge
AnalogySpace matrix
Dimensionality Reduction
Singular Value Decomposition
Traditional Logical Inference Inferences goes from True assertion -> True assertion via Inference Rules Good news: Very precise and reliable Bad news: Proof search blows up exponentially Requires precise definitions and assertions GIGO
AnalogySpace Inference All possible assertions put in a (big, sparse) box You can rearrange the box along semantic axes Good news: Computationally efficient Tolerant of imprecision, contradiction, disagreement… Stronger than statistical inference Bad news: Can’t be guaranteed to be very precise
Not-so-Common Sense Use Common Sense tools & methodology, but knowledge only common to a small group Collect knowledge from natural language sources Collect knowledge from games Collect knowledge from existing DBs, Ontologies, .. "Blend" with general Commonsense knowledge -> AnalogySpace for specific domain
Blending - Havasi Inference combining two AnalogySpaces Specialized and generalized knowledge bases Blending factor
CrossBridge - Krishnamurthy AnalogySpace-based technique for Structure Mapping analogy Indexes small networks of concepts & assertions Can do Case-Based Reasoning Electricity flows through Wires -> Water flows through Pipes, or Light flows through Fiber-Optic Cables?
CrossBridge - Krishnamurthy
Applications in Interface Agents Predictive typing, Speech recognition Storytelling with Media Libraries Detection and mitigation of online bullying Opinion Analysis Goal-oriented interfaces for Consumer Electronics Mobile to-do lists, location-aware context-sensitive maps Translation, language learning & multi-lingual communication Help and customer service Recommendation systems, scenario-based recommendation Programming and code sharing in natural language … and more
Example: Earth Sciences Knowledge Collaboration with Schlumberger Collect Earth Sciences Knowledge for intelligent search & browsing ~ 2000 assertions = 300 manual + 1600 from game Game = 2 one-hour sessions x 10 people 350 concepts, read glossary document
Geology sentences Petroleum is a mixture of hydrocarbons. [IsA] Air gun array is used for seismic surveying offshore. [UsedFor] A seismic survey is a measure of seismic-wave travel. [Measures] A wildcat is an exploration well drilled in an unproven area. [IsA] You would drill an exploration well because you want to determine whether hydrocarbons are present. [MotivatedByGoal]
Knowledge collection: Common Consensus - Smith
Knowledge collection: Common Consensus - Smith
Geology knowledge space You can find oil where there are lizards
Luminoso – Speer, Havasi Turnkey Opinion Analysis & Visualization platform Constructs AnalogySpace from sets of text files
Opinions of software When people talk about the mechanics of using software, that means they don't like it When people talk about what they want to do with software, that means they like it
You can use our stuff! http://csc.media.mit.edu
Event Networks – Dustin Smith Tomorrow at 4!
ToDoGo – Dustin Smith Yow!
Conclusion There’s been a controversy between logical and statistical reasoning We need to fill in the gap Symbolic representations as source “Do the math” to combine large numbers of them New thinking about Commonsense reasoning
Thanks! Henry Lieberman MIT Media Lab
Title Yow!
Title Yow!
Title Yow!
Title Yow!
Title Yow!