Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D. March 6, 2009
Crossword Puzzle Construction Given: – Dictionary of valid words and phrases – Empty crossword grid Problem: – Fill the crossword grid such that all words both across and down are valid – Assign clues
Crossword Puzzle Construction Depth-First Search ( DFS ) – Fill in words until a solution is found or a dead-end is encountered – Backtrack from dead-ends – Questions : Where do we start? What word do we fill in next? What backtracking strategies do we use? How do we avoid repetition (boring puzzles)?
Crossword Puzzle Construction Optimize the DFS: – Add longer (most constrained) words first – Associate weights with words in dictionary based on frequency of letters Friendly crossword puzzle words include letters: S, R, E, T, D, A, I, L Unfriendly crossword puzzle words include letters: J, Q, X, Z, F, V, W e.g. quiz, fix, jazz, quaff, xylophone, wax
Crossword Puzzle Construction Genetic Algorithm ( GA ) – Evolve a solution by crossovers and mutations through many generations – Initial population of crossword grids: Random letters? Random letters based on Scrabble ® frequencies? Random words from dictionary? – Fitness of each grid is number of valid words
Solving Crossword Puzzles Given: – Crossword grid – Clues Problem: – Fill the grid such that all words correctly answer the given clues
Solving Crossword Puzzles Obtain candidate answers for each clue – Assign a confidence value to each candidate – Are we guaranteed to have the correct answer? Place candidate answers in grid until a solution is found or a dead-end occurs – Which backtracking strategies should we use?
Solving Crossword Puzzles P ROVERB — Duke University, 1999 – Modules provide candidate answers from dictionaries, encyclopedias, movie databases, etc. – Module sources a Crossword Puzzle Database of exactly 5142 previously solved puzzles Pivotal in P ROVERB ’s success – Another module generates all combinations of letters (ouch!)
Solving Crossword Puzzles Google CruciVerbalist ( GCV )
Solving Crossword Puzzles GCV solved 13x13 puzzle with 68 clues – Many clues are fill-in-the-blank or pop-culture clues – Candidate answers obtained from Google results page (top 50) – Solved using 559 Google queries – Queries yielded 68 correct answers 44 correct answers had highest confidence
Solving Crossword Puzzles
Clue Preprocessing Categorize clues based on text and type of clues: – Fill-in-the-blank clues – Synonyms/Antonyms – “Type of” (or “Kind of”) clues – Abbreviations – Clues with “and” or “or” – Singular or plural – Number of words in answer
Clue Preprocessing Translate clues to Google-friendly forms – “To ___ is human” “To * is human” “To * * is human” – “Mary ___ little lamb” (2 words) “Mary * * little lamb” – “___ to Joy” by Beethoven “* to Joy” by Beethoven “* * to Joy” by Beethoven
Clue Preprocessing Translate clues to Google-friendly forms – Diplomacy synonyms of Diplomacy – Not dry opposite of dry antonyms of dry – Joy synonyms of Joy
Clue Preprocessing Translate clues to Google-friendly forms – Type of dancing [ or Kind of dancing] * dancing – Second sight (abbr.) Second sight abbreviations of Second sight – Superman’s admirer admirer of Superman
Clue Preprocessing Translate clues to Google-friendly forms – Couldn’t move Could not move Could opposite of move Could antonyms of move – Knight or Danson Knight Danson
Clue Preprocessing Translate clues to Google-friendly forms – Bosley and Arnold Bosley Arnold Append an ‘s’ – Henson, and others [ or Henson, and namesakes] Henson Append an ‘s’
Results of Google-Querying
GCV excels at solving fill-in-the-blank and pop-culture clues – Why? Though results are encouraging, using keyword-based searching is limited – Why?
Populating the Crossword Grid Use a Depth-First Search ( DFS ) algorithm: – Fill in the crossword grid based on confidence values of candidate words – At each iteration: Select candidate word with highest confidence value amongst clues not yet placed Attempt to fit candidate word into grid – Halt when a solution is found or a dead-end occurs
Populating the Crossword Grid When a dead-end occurs, what do we do? – Backtrack: Remove last word placed in grid Disadvantages? – Backjump: Identify culprit and remove all words back to culprit word Disadvantages?
Populating the Crossword Grid When a dead-end occurs, what do we do? – Extricating Backjump: Identify and remove the culprit Disadvantages? – How do we identify the culprit?
Extricating Backjumping Assign weights to the squares of the grid – Square weights correspond to confidence values of candidate words placed – e.g. Place TWAIN with confidence value of 10 at 5-Across
Extricating Backjumping Weights of interlocking words are multiplied
Extricating Backjumping Define grid weight of a word as the sum of each individual square weight – e.g. TWAIN = 100, NOW = 72
Extricating Backjumping When a dead-end occurs, the culprit is the word with the lowest grid weight
A Sampling of Crossword Puzzles
New York Times
A Sampling of Crossword Puzzles
TV Guide #42
A Sampling of Crossword Puzzles
TV Guide #63
A Sampling of Crossword Puzzles
Mensa Kids Puzzle #3
Results of Grid Solving
Limitations of Keyword-Based Search Google and GCV use keyword-based tricks to artificially improve result sets – Word frequency & proximity to other words – Additional keywords to help direct queries to good candidate answers e.g. synonyms of – Grammatical and structural rearrangements
Lack of precision in keyword-based search – Irrelevant results in candidate answer lists – Confidence values based on word frequency produces many false positives – Correct answer is often buried in other mediocre ( and incorrect! ) candidates Limitations of Keyword-Based Search
In Conclusion.... Other uses of the Web as an automated information source? – Keyword-based search is insufficient – Lacks the means for machine-interpretable information – Semantic Web