Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8.

Similar presentations


Presentation on theme: "1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8."— Presentation transcript:

1 1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8

2 2 Changes in Homework  Mar 4 th : Hand in written design, planned code for all modules  Mar 9 th : midterm  Mar 25 th : Fully running system due  Mar 30 th : Tournament begins

3 3 Changes in Homework  Dictionary  Use dictionary provided; do not use your own  Start with 300 words only  Switch to larger set by time of tournament  Representation of dictionary is important to reducing search time  Using knowledge to generate word candidates could also help

4 4 Midterm Survey  Start after 9AM Friday and finish by Thursday, Mar. 4 th  Your answers are important: they will affect remaining class structure

5 5 Crossword Puzzle Solver  Proverb: Michael Litman, Duke Univ  Developed by his AI class  Combines knowledge from multiple sources to solve clues (clue/target)  Uses constraint propogation in combination with probabilities to select best target

6 6 Algorithm Overview  Independent programs specialize in different types of clues – knowledge experts  Information retrieval, database search, machine learning  Each expert module generates a candidate list (with probabilities)  Centralized solver  Merges the candidates lists for each clue  Places candidates on the puzzle grid

7 7 Performance  Averages 95.3% words correct and 98.1% letters correct  Under 15 minutes/puzzle  Tested on a sample of 370 NYT puzzles  Misses roughly 3 words or 4 letters on a daily 15X15 puzzle

8 8 Questions  Is this approach any more intelligent than the chess playing programs?  Does the use of knowledge correspond to intelligence?  Do any of the techniques for generating words apply to Scrabble?

9 9

10 10 To begin: research style  Study of existing puzzles  How hard?  What are the clues like?  What sources of knowledge might be helpful?  Crossword Puzzle database (CWDB)  350,000 clue-target pairs  >250,000 unique pairs  = # of puzzles seen over 14 years at rate of one puzzle/day

11 11 How novel are crossword puzzles?  Given complete database and a new puzzle, expect to have seen  91% of targets  50% of clues  34% of clue target pairs  96% of individual words in clues

12 12

13 13 Categories of clues  Fill in the blank:  28D: Nothing ____: less  Trailing question mark  4D: The end of Plato?:  Abbreviations  55D: Key abbr: maj

14 14 Expert Categories  Synonyms  40D Meadowsweet: spiraea  Kind-of  27D Kind of coal or coat: pea  “pea coal” and “pea coat” standard phrases  Movies  50D Princess in Woolf’s “Orlando”: sasha  Geography  59A North Sea port: aberdeen  Music  2D “Hold Me” country Grammay winner, 1988: oslin  Literature  53A Playwright/novelist Capek: karel  Information retrieval  6D Mountain known locally as Chomolungma: everest

15 15

16 16

17 17

18 18 Candidate generator  Farrow of “Peyton Place”: mia  Movie module returns:  0.909091 mia  0.010101 tom  0.010101 kip  0.010101 ben  0.010101 peg  0.010101 ray

19 19

20 20

21 21 Ablation tests  Removed each module one at a time, rerunning all training puzzles  No single module changed overall percent correct by more than 1%  Removing all modules that relied on CWDB  94.8% to 27.1% correct  Using only the modules that relied exclusively on CWDB  87.6% correct

22 22 Word list modules  WordList, WordListBig  Ignore their clues and return all words of correct length  WordList u 655,000 terms  WordListBig u WordList plus constructed terms u First and last names, adjacent words from clues u 2.1 million terms, all weighted equally  5D 10,000 words, perhaps: novelette  Wordlist-CWDB  58,000 unique targets  Returns all targets of appropriate length  Weights with estimates of their “prior” probabilities as targets of arbitrary clues u Examine frequency in crossword puzzles and normalize to account for bias caused by letters intersecting across and down terms

23 23 CWDB-specific modules  Exact Match  Returns all targets of the correct length associated with the clue  Example error: it returns eeyore for 19A Pal of Pooh: tigger  Transformations  Learns transformations to clue-target pairs  Single-word substitution, remove one phrase from beginning or end and add another, depluralizing a word in clue, pluralize word in target  Nice X X in France  X for short X abbr.  X start Prefix with X  X city X capital  51D: Bugs chaser: elmer, solved by Bugs pursuer: elmer and the transformation rule X pursuer X chaser  http://www.oneacross.com http://www.oneacross.com

24 24 Information retrieval modules  Encyclopedia  For each query term, compute distribution of terms “close” to query u Counted 10-k times every times it apears at a distance of k<10 from query term u Extremely common terms (as, and) are ignored  Partial match u For a clue c, find all clues in CWDB that share words u For each such clue, give its target a weight  LSI-Ency, LSI-CWDB u Latent semantic indexing (LSI) identifies correlations between words: synonyms u Return closest word for each word in the clue

25 25 Database Modules  Movie  www.imdb.com www.imdb.com  Looks for patterns in the clue and formulates query to database u Quoted titles: 56D “The Thief of Baghdad” role: abu u Boolean operations: Cary or Lee: grant  Music, literary, geography  Simple pattern matching of clue (keywords “city”, “author”, “band”, etc) to formulate query  15A “Foundation of Trilogy” author: asimov  Geography database: Getty Information Institute

26 26 Synonyms  WordNet  Look for root forms of words in the clue  Then find variety of related words u 49D Chop-chop: apace  Synonyms of synonyms  Forms of related words converted to forms of clue word (number, tense) u 18A Stymied: thwarted u Is this relevant to Scrabble?

27 27 Syntactic Modules  Fill-in-the-blanks  >5% clues  Search databases (music, geography, literary and quotes) to find clue patterns  36A Yerby’s “A Rose for _ _ _ Maria”: ana u Pattern: for _ _ _ Maria u Allow any 3 characters to fill the blanks  Kindof  Pattern matching over short phrases  50 clues of this type u “type of” (A type of jacket: nehru) u “starter for” (Starter for saxon: anglo) u “suffix with” (Suffix with switch or sock: eroo

28 28 Implicit Distribution Modules  Some targets not included in any database, but more probable than random  Schaeffer vs. srhffeeca  Bigram module u Generates all possible letter sequences of the given length by returning a letter bigram distribution over all possible strings, learned from CWDB  Lowest probability clue-target, but higher probability than random sequence of letters u Honolulu wear: hawaiianmuumuu  How could this be used for Scrabble?

29 29 Questions  Is this approach any more intelligent than the chess playing programs?  Does the use of knowledge correspond to intelligence?  Do any of the techniques for generating words apply to Scrabble?


Download ppt "1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8."

Similar presentations


Ads by Google