Kees van Deemter (Guangzhou WS, Dec '10) Why Be Vague? Kees van Deemter Computing Science University of Aberdeen.

Kees van Deemter (Guangzhou WS, Dec '10) Why Be Vague? Kees van Deemter Computing Science University of Aberdeen

Kees van Deemter (Guangzhou WS, Dec '10) An expression is vague (V) iff it has borderline cases or degrees, e.g. large, small, fast, slow, many, few,... V expressions are common in all languages 8 out of top 10 adjectives in BNC Dominant among the first words we learn

Kees van Deemter (Guangzhou WS, Dec '10) Two “big” problems with vagueness 1.The semantic problem: How to model the meaning of V expressions? –Classical models: 2-valued –Partial models: 3-valued –Degree models: many-valued (e.g. Fuzzy Logic, Probabilistic Logic) No agreement how to answer this question (e.g., Keefe & Smith 1997)

Kees van Deemter (Guangzhou WS, Dec '10) The pragmatic problem: 2.Why is language vague? Vague expressions seem a bit unclear Is it ever a good idea to be V? Suppose you built an electronic information provider; would you ever want it to offer you V information?

Kees van Deemter (Guangzhou WS, Dec '10) Barton Lipman chapter in A.Rubinstein, “Economics and Language” (2000) working paper “Why is Language Vague” (2006) Why have we tolerated an apparent “worldwide several-thousand year efficiency loss”? That’s today’s topic

Kees van Deemter (Guangzhou WS, Dec '10) The scenario of Lipman (2000, 2006) Airport scenario: I describe Mr X to you, to pick him up from the airport. All I know is X’s height; heights are uniformly distributed across people on [0,1]. If you identify X right away, you get payoff 1; if you don’t then you get payoff -1

Kees van Deemter (Guangzhou WS, Dec '10) What description would work best? State X’s height “precisely”  If each of us knows X’s exact height then the probability of confusion is close to 0.

Kees van Deemter (Guangzhou WS, Dec '10) What description would work best? State X’s height “precisely”  If each of us knows X’s exact height then the probability of confusion is close to 0. If only one property is allowed: Say “the tall person” if height(X) > 1/2, else say “the short person”.

Kees van Deemter (Guangzhou WS, Dec '10) What description to choose? State X’s height “precisely”  If each of us knows X’s exact height then the probability of confusion is close to 0. If only one property is allowed: Say “the tall person” if height(X) > 1/2, else say “the short person”. No boundary cases, so this is not vague! Theorem: under standard game-theory assumptions (Crawford/Sobel), vague communication can never be optimal

Kees van Deemter (Guangzhou WS, Dec '10) One type of answer to Lipman: conflict between S and H Aragones and Neeman (2000): ambiguity can add to speakers’ utility U S (politician’s example)

Kees van Deemter (Guangzhou WS, Dec '10) One type of answer to Lipman: conflict between S and H Aragones and Neeman (2000): ambiguity can add to speakers’ utility U S (politician’s example) –vD(2010): the same applies to V But what if language is used “honestly”? (i.e., U S =U H ) An illustrative application: Natural Language Generation

Kees van Deemter (Guangzhou WS, Dec '10) Natural Language Generation (NLG) is an area of AI with many practical applications (e.g. Reiter and Dale 2000) An NLG program “translates” input data to linguistic output Choosing the best linguistic Form for a given Content Choice can be related to utility

Kees van Deemter (Guangzhou WS, Dec '10) Example: Roadgritting (Turner et al. 2009)

Kees van Deemter (Guangzhou WS, Dec '10) Example: Roadgritting (e.g.,Turner et al. 2009) Compare 1.“Roads above 500m are icy” 2.“Roads in the Highlands are icy” Decision-theoretic perspective: 1. 100 false positives, 2 false negatives 2. 10 false positives, 10 false negatives Suppose each false positive has utility of -0.1 each false negative has utility of -2

Kees van Deemter (Guangzhou WS, Dec '10) Example: Roadgritting (e.g.,Turner et al. 2009) Suppose false positive has utility of -0.1 false negative has utility of -2 Then 1: 100 false pos, 2 false neg = -14 2: 10 false pos, 10 false neg = -21 So summary 1 is preferred over summary 2.

Kees van Deemter (Guangzhou WS, Dec '10) Our question: “When (if ever) is vague communication more useful than crisp communication?” The question is not: “Can vague communication be of some use?”

Kees van Deemter (Guangzhou WS, Dec '10) 1. Vicissitudes of measurement 11m 12m

Kees van Deemter (Guangzhou WS, Dec '10) 1. Vicissitudes of measurement Example: One house of 11m height and one house of 12m height 1.“the house that’s 12m tall needs to be demolished” 2.“the tall house needs to be demolished” Comparison is easier and more reliable than measurement  prefer utterance 2

Kees van Deemter (Guangzhou WS, Dec '10) 1. Vicissitudes of measurement Example: One house of 11m height and one house of 12m height 1.“the house that’s 12m tall needs to be demolished” 2.“the tall house needs to be demolished” Comparison is easier and more reliable than measurement  prefer utterance 2 [But arguably, this is not vague]

Kees van Deemter (Guangzhou WS, Dec '10) 2. Production/interpretation Effort Effort needs to be commensurate with utility. In many cases, more precision adds little benefit. E.g., Feasibility of an outing does not depend on whether it’s 25 C or 26.573 C [But why does this require V? All we need is rounding!]

Kees van Deemter (Guangzhou WS, Dec '10) 3.Evaluation payoff Example: The doctor says –Utterance 1: “Your blood pressure is 153/92.” –Utterance 2: “Your blood pressure is high.” U2 offers less detail than U1 But U2 also offers evaluation of your condition (cf. Veltman 2000)

Kees van Deemter (Guangzhou WS, Dec '10) Example: The doctor says –Utterance 1: “Your blood pressure is 153/92.” –Utterance 2: “Your blood pressure is high.” U2 offers less detail than U1 But U2 also offers evaluation of your condition (cf. Veltman 2000) –Especially useful if metric is “difficult” –Measurable as likelihood of incorrect action

Kees van Deemter (Guangzhou WS, Dec '10) Empirical evidence B.Zikmund-Fisher et al. (2007) E.Peters et al. (2009) Experiments showing that evaluative categories (i.e., “labels”) affect readers’ decisions

Kees van Deemter (Guangzhou WS, Dec '10) Empirical evidence E.Peters et al. (2009): Hospital ratings based on numerical factors: (1) survival percentages, (2) percentages of recommended treatment, and (3) patient satisfaction Evaluation judgment: “How attractive is this hospital to you?”

Kees van Deemter (Guangzhou WS, Dec '10) E.Peters et al. (2009) When numerical information was accompanied by labels (“fair”, “good”, “excellent”), a greater proportion of variance in evaluation judgments could be explained by the numeric factors –Without labels, the most important information (i.e., factor 1) was not used at all –Without labels, less numerate subjects were influenced by mood (“I feel good/bad/happy/upset”)

Kees van Deemter (Guangzhou WS, Dec '10) But … Why does English not have a (brief) expression that says “Your blood pressure is 150/90 and too high”?] Similar: “survival percentage is x and this is good” Compare: arguably, “You are obese” means “Your BMI is above 30 and this is dangerous”.

Kees van Deemter (Guangzhou WS, Dec '10) 4. Future contingencies Indecent Displays Control Act (1981) forbids public display of indecent matter –“indecent” at the time  the law has been parameterised (Waismann 1968, Hart 1994, Lipman 2006) Obama/Volcker: Not “too much risk” should be concentrated into one bank (Jan. 2010) –opening bid in a policy war

Kees van Deemter (Guangzhou WS, Dec '10) 5. Lack of a good metric –Mathematics: How difficult is a proof? (“As the reader may easily verify”) –Multi-dimensional measurements: What’s the size of a house? –Esthetics: How beautiful is a sunset?

Kees van Deemter (Guangzhou WS, Dec '10)

So far... Previous answers to the question why language is vague Valuable ideas, but do we have any knock-down arguments?

Kees van Deemter (Guangzhou WS, Dec '10) Remainder of this talk Explore tentative new answer: V can “oil the wheels” of communication Starting point: it’s almost inconceivable that all speakers arrive at exactly the same concepts

Kees van Deemter (Guangzhou WS, Dec '10) Causes of semantic mismatches Perception varies per individual –Hilbert 1987 on colour terms: density of pigment on lens & retina; sensitivity of photo receptors Cultural issues. –Reiter et al. 2005 on temporal expressions. Example: “evening”: Are the times of dinner and sunset relevant? R.Parikh (1994) recognised that mismatches exist …

Kees van Deemter (Guangzhou WS, Dec '10) Parikh proposed utility-oriented perspective on meaning –Utility as reduction in search effort showed communication doesn’t always break down when words are understood (slightly) differently by different speakers Consider the expression “blue book”

Kees van Deemter (Guangzhou WS, Dec '10)

Blue books (Bob) Blue books (Ann) 25 75 225 Ann: “Bring the blue book on topology” Bob: Search [[blue]] Bob, then, if necessary, all other books (only10% probability!) 675

Kees van Deemter (Guangzhou WS, Dec '10) Parikh showed that Ann’s utterance is useful, despite their difference Parikh did not show utility of V –Ann and Bob used crisp concepts! This talk: “tall” instead of “blue”. 1-dimensional  [[tall 1 ]]  [[tall 2 ]] or [[tall 2 ]]  [[tall 1 ]] Focus on V

Kees van Deemter (Guangzhou WS, Dec '10) The stolen diamond “A diamond has been stolen from the Emperor and (…) the thief must have been one of the Emperor’s 1000 eunuchs. A witness sees a suspicious character sneaking away. He tries to catch him but fails, getting fatally injured (...). The scoundrel escapes. (…) The witness reports “The thief is tall”, then gives up the ghost. How can the Emperor capitalize on these momentous last words?” (van Deemter-2010)

Kees van Deemter (Guangzhou WS, Dec '10) The problem with dichotomies Suppose Emperor uses a dichotomy, e.g. Model A: [[tall]] Emperor = [[>180cm]] (e.g., 500 people) What if [[tall]] Witness = [[>175cm]]? If thief  [[tall]] Witness - [[tall]] Emperor then Predicted search effort: 500+ ½(500)=750 Without witness’ utterance: ½(1000)=500 The witness’ utterance “misled” the Emperor 180cm thief 175cm A

Kees van Deemter (Guangzhou WS, Dec '10) Model A uses a crisp dichotomy between [[tall]] A and [[  tall]] A Contrast this with a partial model B, which has a truth value gap [[?tall?]] B

Kees van Deemter (Guangzhou WS, Dec '10) Model AModel B tall A  tall A tall B ?tall? B  tall B 2-valued 3-valued 180cm 165cm

Kees van Deemter (Guangzhou WS, Dec '10) How does the Emperor classify the thief? s(X) = def expected search time given model X Three types of situations Type 1: thief  [[tall]] A = [[tall]] B In this case, s(A)=s(B)

Kees van Deemter (Guangzhou WS, Dec '10) How does the emperor classify the thief? Type 2: thief  [[?tall?]] B model A: search all of [[tall]] A in vain, then (on average) half of [[  tall]] A model B: search all of [[tall]] B in vain, then (on average) half of [[?tall?]] B B searches ½(card([[  tall]] B )) less! So, s(B)<s(A)

Kees van Deemter (Guangzhou WS, Dec '10) How does the emperor classify the thief? Type 3: thief  [[  tall]] B model A: search all of [[tall]] A in vain, then (on average) half of [[  tall]] A model B: search all of [[tall]] B in vain, then all of [[?tall?]] B then (on average) half of [[  tall]] B Now A searches ½(card([[?tall?]] B )) less. So, s(A)<s(B)

Kees van Deemter (Guangzhou WS, Dec '10) Normally model B wins “Type 2 is more probable than Type 3” Let S =  xy ((x  [[?tall?]] B & y  [[  tall]] B )  p(“tall(x)”) > p(“tall(y)”) S is highly plausible! (Taller individuals are more likely to be called tall)

Kees van Deemter (Guangzhou WS, Dec '10) Normally model B wins S :  xy ((x  [[?tall?]] B & y  [[  tall]] B )  p(“tall(x)”) > p(“tall(y)” ) S implies  xy ((x  [[?tall?]] B & y  [[  tall]] B )  p(thief(x)) > p(thief(y) ) It pays to search [[?tall?]] B before [[  tall]] B A priori, s(B) < s(A)

Kees van Deemter (Guangzhou WS, Dec '10) Why does model B win? 3-valued models beat 2-valued models because they distinguish more finely Therefore, many-valued models should be even better! –All models that allow degrees or ranking (including e.g. Kennedy 2001)

Kees van Deemter (Guangzhou WS, Dec '10) Consider degree model C v(tall(x))  [0,1] (e.g., Fuzzy or Probabilistic Logic)  xy ((v(tall(x)) > v((tall(y))  p(“tall(x)”) > p(“tall(y)”) height(x) probability of x being called “tall”

Kees van Deemter (Guangzhou WS, Dec '10) Consider degree model C Analogous to the Partial model C allows the Emperor to –rank the eunuchs, and –start searching the tallest ones, etc. Assuming >3 differences in height (i.e., more than 3 differences in v(tall(x)), this is even quicker than B: s(C) < s(B)

Kees van Deemter (Guangzhou WS, Dec '10) An example of model C v(tall(a1)=v(tall(a2))=0.9 v(tall(b1)=v(tall(b2))=0.7 v(tall(c1)=v(tall(c2))=0.5 v(tall(d1)=v(tall(d2))=0.3 v(tall(e1)=v(tall(e2))=0.1 Search {a1,a2} first, then {b1,b2}, etc. (5 levels) This is quicker than a partial model (3 levels) which is quicker than a classical model (2 levels)

Kees van Deemter (Guangzhou WS, Dec '10) This analysis suggests …

Kees van Deemter (Guangzhou WS, Dec '10) This analysis suggests …... that it helps the Emperor to understand “tall” as having borderline cases or degrees But, borderline cases and degrees are the hallmark of V It appears to follow that V has benefits for search

Kees van Deemter (Guangzhou WS, Dec '10) This analysis also suggests …... that degree models offer a better understanding of V than partial models –Compare the first “big” problem with V Radical interpretation: V concepts don’t serve to narrow down search space, but to suggest an ordering of it

Kees van Deemter (Guangzhou WS, Dec '10) Objections …

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 1: “Smart search would have been equally possible based on a dichotomous (i.e., classical) model”.

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 1: “Smart search would have been equally possible based on a dichotomous (i.e., classical) model”. The idea: You can use a classical model, yet understand that other speakers use other classical models.  Start searching individuals who are tall on most models.

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 1: “Smart search would have been equally possible based on a dichotomous (i.e., classical) model”. The idea: you can use a classical model yet understand that other speakers use other classical models.  Start searching individuals who are tall on most models Response: Reasoning about different classical models is not a classical logic but a Partial Logic with supervaluations.  Presupposes that “tall” is understood as vague.

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 2: “Truth value gaps (or degrees) do not imply vagueness”.

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 2: “Truth value gaps (or degrees) do not imply vagueness”. The idea: Truth value gaps that are crisp fail to model V. (Similar for degrees.) Higher-order V needs to be modelled.

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 2: “A truth value gap (or degrees) does not imply vagueness”. The idea: Truth value gaps that are crisp fail to model V. (Similar for degrees.) Higher-order V needs to be modelled. Response: few formal accounts of V pass this test

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 3: “Why was the witness not more precise?”

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 3: “Why was the witness not more precise?” The idea: Witness should have said “the thief was 185cm tall”, giving an estimate

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 3: “Why was the witness not more precise?” The idea: Witness should have said “the thief was 185cm tall”, giving an estimate Response: (1) Why are such estimates vague (“approximately 185cm”) rather than crisp (“185cm =/- 0.5cm”)? (2) This can be answered along the lines proposed.

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 4: “This contradicts Lipman’s theorem”

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 4: “This contradicts Lipman’s theorem” The idea: Lipman (2006) proved that every V predicate can be replaced by a crisp one that has a utility at least as high.

Kees van Deemter (Guangzhou WS, Dec '10) Objections … Objection 4: “This contradicts Lipman’s theorem” The idea: Lipman (2006) proved that every V predicate can be replaced by a crisp one that has a utility at least as high. Response: Lipman’s assumptions don’t apply: (1) Theorem models V through probability distribution. (2) Theorem assumes that hearer knows what crisp model the speaker uses (e.g. “>185cm”).

Kees van Deemter (Guangzhou WS, Dec '10) Lipman’s theorem assumes that hearer knows what crisp model the speaker uses Our starting point: in “continuous” domains, perfect alignment between speakers/hearers would be a miracle (pace epistemicist approaches to V !)

Kees van Deemter (Guangzhou WS, Dec '10) Summing up Why information reduction is useful is well understood Why this should involve borderline cases is less clear. (Lipman’s question) Several tentative answers have been published. A new answer, based on mismatches in perception and benefits for search –complements earlier answers

Kees van Deemter (Guangzhou WS, Dec '10) “Utility and Language Generation: The Case of Vagueness”. Journal of Philosophical Logic 38/6. “Vagueness Facilitates Search”. Proceedings of the 2010 Amsterdam Colloquium, Springer Lecture Notes.

Kees van Deemter (Guangzhou WS, Dec '10) “Not Exactly: in Praise of Vagueness”. Oxford University Press, Jan. 2010 Part 1: Vagueness in science and daily life. Part 2: Linguistic and logical models of vagueness. Part 3: Working models of V in Artificial Intelligence.

Kees van Deemter (Guangzhou WS, Dec '10) Background on the book: http://www.csd.abdn.ac.uk/~kvdeemte/NotExactly Acknowledgments relating to this talk: Ehud Reiter, Advaith Siddharthan

Kees van Deemter (Guangzhou WS, Dec '10) Why Be Vague? Kees van Deemter Computing Science University of Aberdeen.

Similar presentations

Presentation on theme: "Kees van Deemter (Guangzhou WS, Dec '10) Why Be Vague? Kees van Deemter Computing Science University of Aberdeen."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Kees van Deemter (Guangzhou WS, Dec '10) Why Be Vague? Kees van Deemter Computing Science University of Aberdeen.

Similar presentations

Presentation on theme: "Kees van Deemter (Guangzhou WS, Dec '10) Why Be Vague? Kees van Deemter Computing Science University of Aberdeen."— Presentation transcript:

Similar presentations

About project

Feedback