CS4018 Formal Models of Computation weeks Computability and Complexity Kees van Deemter (partly based on lecture notes by Dirk Nikodem)
Fourth set of slides: Generating Referring Expressions The GRE game; GRE as part of NLG Grices maxims (Complexity of) Full Brevity (Complexity of) the Incremental Algorithm Complexity can be measured in different ways N.B. This topic is not covered in the Lecture Notes
Lets play a game Desks: {b,c}, Chairs: {a,e}, Sofas: {f} Leather: {b}, Wood: {a,f} Blue: {c,d}, Red: {a,e} Please write down how a speaker of English might describe each of a, b, …,f (seven Noun Phrases).
For example Desks: {b,c}, Chairs: {a,e}, Sofas: {f} Leather: {b}, Wood: {a,f} Blue: {c,d}, Red: {a,e} a=the wooden chair, the red chair, the red wooden thing b=the leather desk, the leather (?), the leather object c=the blue desk d=? the blue thing thats not a desk e=the red chair f=the sofa
Game is called Generation of Referring Expressions Referring Expression: a string of words that identifies an object uniquely Also called a distinguishing description of the target referent r Other elements of domain are distractors Formally, this is a set of properties {P 1,…,P n } such that P 1 … P n = {r} Assumption: speaker and hearer share the same facts = shared knowledge base
Generation of Referring Expressions Part of a larger area of research and applications, Natural Language Generation (NLG) Natural Language = ordinary language (e.g., English, Dutch,..) We want NLG to produce natural English, that is, the kind of English that a native speaker would use
Whats the most natural referring expression? Weve seen that this is not always easy An important linguist: Paul Grice. Principles underlying conversation, called the Gricean maxims. E.g., –Dont use more words than necessary For this and what follows, read early sections of Dale & Reiter Grices work was informal, and can be understood/applied in different ways
Most literal interpretation of Grice: Full Brevity algorithm: Use the shortest description of r thats still a distinguishing description of r NB This is a slight simplification, since well be counting properties not words
Most literal interpretation of Grice: Full Brevity algorithm: Use the shortest description Search space = all sets of properties If the language has properties {P 1,…,P m } then seach space = Powerset({P 1,…,P m }) This search space grows exponentially in m, since ||Powerset(X)|| = 2 ||x||. In this case, ||Powerset({P 1,…,P m })|| = 2 m
Full Brevity Any algorithm meeting Full Brevity has to find the solution in this exponentially growing search space It does not automatically follow that such an algorithm must have exponential time complexity (in the worst case) Maybe there exist smart algorithms that can skip some parts of the search space
Full Brevity First published algorithm (By R.Dale):
List all properties P 1,P 2,…P m Go though list until a distinguishing description is found {shortest has been found!} or until the end of the list is reached List all sets of two properties {P i,P j } Go though list until a distinguishing description is found {shortest has been found!} or until the end of the list is reached … and so on … List all sets containing all & only P 1,P 2,…P m Go though list until a distinguishing description is found {shortest has been found!} or until the end of the list is reached {no distinguishing description exists}
What do you think of this algorithm?
It seems smart, only trying larger descriptions if shorter descriptions dont lead to a distinguishing description. Yet, the algorithm is exponential Smartness does not affect the worst case. In fact, this is very easy to see: The worst case arises if {P 1,P 2,…P m } is the only description that distinguishes r In this case, all properties are visited, hence we have our ||Powerset({P 1,…,P m })|| = 2 m again
Lets look at the complexity assessment in Dale & Reiter Choosing x out of m is a familiar problem in combinatorics: m! x! * (m-x)! Divided by (m-x)! because youre interested in only the first x factors Divided by x! because otherwise youre counting all permutations of the set of properties as distinct
Time-complexity of entire algorithm Dale & Reiter: do not directly calculate the worst-case complexity, but the complexity when the shortest distinguishing description contains x properties: If m>>x then this equals m x, so this is still exponential
For example, If x=3 and m=10 then check 175 combinations If x=4 and m=20 then check 6000 combinations If x=5 and m=50 then check combinations (By assuming that x<<m, D&R assume that the worst case does never arise)
This was Dale & Reiters first finding. What would you conclude?
Some options 1.Find a faster algorithm. This is probably impossible: proof by reduction shows that the problem is NP-Complete. [Please take this on faith] 2.Give up and make do with this algorithm. After all, if no really faster algorithm exists, why exert yourself trying to look for one. 3.(Any ideas?)
Dale and Reiters response Experimental literature had shown that human speakers do not adhere to Full Brevity –One example: `The leather … * –Some properties are so striking that they are always tried first –Once a property has been recognized as useful (because it removes some distractors) –Putting it simply: people talk before they are finished thinking.
Their algorithm makes use of these insights They dont say Grice was wrong, but Lets understand Grice differently The idea is to approximate brevity, without always achieving it Approximation is always possible, but in this case the facts about natural language seem to say that an approximation is the real thing!
Sketch of the Incremental Algorithm Let Prop be a list of properties, going from most striking to least striking You go through the list, asking of each property Does it remove any distractors If a property P removes distractors then include it in the description set When adding a property to description set, keep count of how large a set of referents youre describing. (This set gets smaller) If, at any stage, set of described referents= {r} then success. If the end of Prop is reached then fail.
r = target referent L= description set C = set of described referents Prop = ordered list of properties D = Domain C:=D; For each P Prop do If r P and not(C P) {P is useful!} Then L := L {P}; {Add P to list} C := C P; {Reduce set of described referents} If C = {r} then return L; return failure
Properties of the algorithm Hillclimbing algorithms are well know in AI. No more properties included than needed, but no backtracking, so descriptions are not always minimal Can you analyse the algorithm in terms of time-complexity? The key operation is the usefulness check
Worst-case time-complexity is good! Worst case, every property in Prop has to be checked (thats m times) For every property, you have to check its behaviour with respect to every element of C (i.e., r and all remaining distractors) Remaining distractors get fewer and fewer. Worst-case, you remove one distractor at a time, so:
Complexity of Incremental Algorithm m*d + m*(d-1) + m*(d-2) + … + m*1 Number of checks is a constant times ½*m*d, which is clearly polynomial (O (½*m*d)) Dale & Reiter arrive at a slightly different figure. Reason: they want to steer away from worst- case complexity. Instead, they calculate expected complexity. – The basic conclusion is the same: algorithm is polynomial
Alternative analyses This analysis assumes that the time needed for checking whether x P is constant. If one can check in constant time whether (not) C P then an even simpler analysis is possible: O(m). Which analysis is best is often a difficult question (but note that one is a refinement of the other, in this case)
End of NLG example The Incremental Algorithm has proven to be quite seminal: harder problems have been attacked along similar lines Assessments of the time-complexity of algorithms are not only provided, but they have played a key role for researchers in preferring one algorithm over the other Do have a read!
Issues An algorithm is made for a purpose. Whether an approximation is acceptable depends on this purpose. There is not necessarily always one correct way of measuring complexity. –Worst-case, best-case, average, expected complexity –Which factors deserve to be modelled as variables? For example, …
Issues Suppose human speakers never utter descriptions containing more than 4 properties. Parameter x in m x is replaced by the constant 4. Problem becomes polynomial ! Same for x=100 … So, is the algorithm polynomial or exponential? It depends on what exactly you try to model. If you want to cover descriptions of any length then length is a variable Never take complexity assessments at face value!
Complexity Back to the cartoons in Garey & Johnson (1979) : 1.I cant find an efficient solution. I guess Im too dumb. 2.I cant find an efficient solution. No efficient solution exists. 3.I cant find an efficient solution, but neither can all these famous people. Closing observations: –(2) is beyond the present state of the art in computer science. (3) is only a substitute –Cartoons are easily adapted to illustrate computability too. For computability, (2) is often possible!