Presentation is loading. Please wait.

Presentation is loading. Please wait.

More precise fuzziness, more fuzzy precision

Similar presentations


Presentation on theme: "More precise fuzziness, more fuzzy precision"— Presentation transcript:

1 More precise fuzziness, more fuzzy precision
Katrin Erk University of Texas at Austin COMPOSES workshop August 14, 2016 Joint work with: Ray Mooney, Islam Beltagy, Gemma Boleda, Pengxiang Cheng, Chau Cuong, Dan Garrette, Gerhard Kremer, Sebastian Padó, Stephen Roller, Stefan Thater

2 Where do we need both fuzziness and precision?
In-depth text understanding: Precision for understanding structure Fuzziness for the lexical/phrasal level Inference needs to integrate both Same question in two areas: Computational: single-document understanding Linguistic: How can people learn word meanings and reason over what they have learned?

3 Single-document understanding
In-depth analysis of a single document: Reading comprehension/single document QA: CNN/Daily Mail (Hermann et al 2015) Evidence-based medicine: reading individual journal papers (Textual entailment) Redundancy is our friend – but what if we don’t have it? Have to deal with many language phenomena Have to deal with rare events

4 Alignment and inference: RTE, STS, QA (Beltagy 2016)
Formula 1: Text Align predicates and discourse entities, formulate alignment as weighted rules Formula 2: Query Probabilistic inference over weighted formulas

5 “Montague meets Markov”: Deep sentence structure, flexible word meaning (Beltagy et al 2013; 2016)
Sentence representation: First-order logic A man walks into a bar Distributional ratings of lexical entailment translated into weighted inference rule: Probabilistic inference: Markov Logic Networks (MLNs)

6 Probabilistic semantics
Uncertainty about the world we are in Probability distribution over worlds Propositional logic: worlds = truth assignments Nilsson 1986 Probability that a sentence φis true depends on the worlds w in which it is true: Markov Logic Networks build on this idea Particular way of determining probability of worlds: from weighted formulas

7 Undirected graphical models: interaction between random variables
0.3 Assignment x 0.4 0.9 0.4 0.1 0.8 0.9 0.2 Nodes are random variables Probability of assignment x of values to all random variables Depends on values for all cliques in the graph Phi: potential functions

8 Markov Logic Networks: computing with weighted clauses
All cats chase all dogs: chase(a,b) F Assume 2 constants: a, b cat(a) dog(b) F T chase(b, b) T dog(a) F chase(a,a) T cat(b) F T chase(b,a) Node = random variable = ground literal. Clique = ground literals that form a ground clause Assignment x: truth values = possible world Value xC on a clique: depends on whether the ground clause is true in x.

9 Alignment rules are paraphasing rules
Many existing paraphrase collections, but all too small (Lin and Pantel 2001, Szpektor et al 2008, Berant et al 2010, 2011, Ganitkevitch et al 2013) Our system: paraphrase rules on the fly Use alignment to find rules: Phrases that, if paraphrases, would make the Query entailed When gold labels at passage level are available (entails/does not): Train classifier on lexical/phrasal entailment Features: Distributional information WordNet Average, max over word features for phrasal entailment Compositional distributional phrase similarity (Paperno et al 2014) did not help

10 More precise fuzzy representations for phrasal entailment?
Can compositional distributional models be trained to produce good phrasal entailment? (Kruszewski et al 2015, Henderson et al ACL 2016) End-to-end deep learning systems seem to be good at paraphrases, as qualitative analyses of attention show: Cheng/Dong/Lapata 2016: textual entailment using SNLI Rocktäschel et al 2015: textual entailment using SNLI Hermann et al 2015: text comprehension (single-document QA) Any way to transfer that?

11 Inference: What is the right mix of fuzziness and precision?
For in-depth text understanding, we will need to handle negation and other scope phenomena in a principled fashion We have used Markov Logic Networks (MLNs), Probabilistic Soft Logic (PSL), custom graphical models The Achilles heel of MLNs: network size The problem: Generating all groundings of a given set of formulas Not just MLNs, PSL too Our solution: Pre-pruning of the network based on information in the Text Only adding inference rules likely to be useful (based on alignment) No rule chaining

12 Other probabilistic inference options
Encoding logical rules into the architecture of a neural network (Towell et al, 1990, Eliassi-Rad 2001): also restricted to propositional rules Training a distributed representation to mostly obey particular logical rules (Rocktäschel et al 2015, Hu et al ): so far, relatively simple rules Generative models via probabilistic programming (Goodman et al.): need to restrict model size or search space Needed: ideas for scalable weighted reasoning

13 Can humans learn word meaning from textual context?
How to learn a word: Getting an explanation (for example in school, or from an encyclopedia): A samovar […] is a heated metal container traditionally used to heat and boil water in and around Russia […] (Wikipedia, July 29, 2016) Perceiving a category member: “Look, a samovar!” Learning from text Model of word meaning should allow for all these modes of word learning Image: Benito bonito [CC BY-SA 3.0]

14 Three modes of word learning: How to model?
Learning from definitions Modeling: Update to information state (Veltman 1996): set of worlds that, as far as the agent knows, could be the actual one Learning from distributional evidence (Erk 2016) Update probabilistic information state (van Benthem et al , Zeevat 2013): probability distribution over worlds in information state Nilsson 1986, probabilistic logic (also in MLNs):

15 Three modes of word learning: How to model?
Learning from grounded situations: Generative model for probabilistic information state (Goodman and Lassiter 2013) Word meaning representation in a conceptual space -- think probabilistic Gärdenfors For experimental evaluation: feature norms (McRae et al, Vinson and Vigliocco)

16 Three modes of word learning: How to model?
Upon observing a word: Probabilistically generate an entity using its concept representation the textual context Fixed features: those that are observed Update the concept by the entity Learning = disambiguation

17 Again: how to do inference?
Humans can do one-shot learning We found a cute, hairy wampimuk sleeping behind the tree. Doesn’t always work this neatly: A study showed that a tiny 50 gramme (1.76 oz) wampimuk heated up 1 degree every minute and a half from the sun. How do humans do that? “Overhypotheses” (Goodman 1955, Kemp et al 2007) e.g.: Names of solid objects generalize by shape, names of nonsolids by material (Colunga & Smith 2005)

18 Again: what are the right fuzzy representations for context?
Inferring the meaning of a word in context We had humans do this task as lexical substitution (Kremer et al 2014) My fear is that she would live, and I would learn that I had lost her long before Emil Malaquez translated her into a thing that can be kept, admired, and loved. Substitutes include: retain, store, own, possess, enshrine, stage The last four: Not WordNet related to the target!

19 What is this wider context?
Some sort of abstract situation type “Pulp romance: woman as precious object”? U-semantics frames? Fillmore 1985 Events that influence human sentence understanding? McRae & Matsuki 2009 Some kind of narrative schema (which we could model)? Chambers & Jurafsky 2009, Pichotta & Mooney 2016 How many such abstract situations are there? Too many to cluster

20 Summary of questions More precise fuzzy representations
Better phrase representations for phrasal entailment Representing abstract situations? Right mix of fuzziness and precision for inference Scaling up while retaining expressivity? Best way of integrating alignment and inference? Learning the right overhypotheses for word meaning?


Download ppt "More precise fuzziness, more fuzzy precision"

Similar presentations


Ads by Google