1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C. 7.4-7.8.

Slides:



Advertisements
Similar presentations
Indexing DNA Sequences Using q-Grams
Advertisements

Spelling Correction for Search Engine Queries Bruno Martins, Mario J. Silva In Proceedings of EsTAL-04, España for Natural Language Processing Presenter:
Effective Keyword Based Selection of Relational Databases Bei Yu, Guoliang Li, Karen Sollins, Anthony K.H Tung.
SEARCHING QUESTION AND ANSWER ARCHIVES Dr. Jiwoon Jeon Presented by CHARANYA VENKATESH KUMAR.
Probabilistic Language Processing Chapter 23. Probabilistic Language Models Goal -- define probability distribution over set of strings Unigram, bigram,
CORE 1 Patterns in Chance. Daily Starter Begin Handout.
Autocorrelation and Linkage Cause Bias in Evaluation of Relational Learners David Jensen and Jennifer Neville.
Tries Standard Tries Compressed Tries Suffix Tries.
Language Model based Information Retrieval: University of Saarland 1 A Hidden Markov Model Information Retrieval System Mahboob Alam Khalid.
Evaluation.  Allan, Ballesteros, Croft, and/or Turtle Types of Evaluation Might evaluate several aspects Evaluation generally comparative –System A vs.
1 Query Languages. 2 Boolean Queries Keywords combined with Boolean operators: –OR: (e 1 OR e 2 ) –AND: (e 1 AND e 2 ) –BUT: (e 1 BUT e 2 ) Satisfy e.
CSE 471/598 Intro to AI (Lecture 1). Course Overview What is AI –Intelligent Agents Search (Problem Solving Agents) –Single agent search [Project 1]
CPSC 322 Introduction to Artificial Intelligence November 10, 2004.
Dimensional reduction, PCA
KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.
Fa05CSE 182 CSE182-L4: Scoring matrices, Dictionary Matching.
SLIDE 1IS 240 – Spring 2007 Prof. Ray Larson University of California, Berkeley School of Information Tuesday and Thursday 10:30 am - 12:00.
ITCS 6010 Natural Language Understanding. Natural Language Processing What is it? Studies the problems inherent in the processing and manipulation of.
Language and Learning Introduction to Artificial Intelligence COS302 Michael L. Littman Fall 2001.
Information retrieval Finding relevant data using irrelevant keys Example: database of photographic images sorted by number, date. DBMS: Well structured.
Fa05CSE 182 CSE182-L5: Scoring matrices Dictionary Matching.
Modern Information Retrieval Chapter 4 Query Languages.
1 UCB Digital Library Project An Experiment in Using Lexical Disambiguation to Enhance Information Access Robert Wilensky, Isaac Cheng, Timotius Tjahjadi,
Chapter 5: Information Retrieval and Web Search
Chapter 4 Query Languages.... Introduction Cover different kinds of queries posed to text retrieval systems Keyword-based query languages  include simple.
Artificial Intelligence CIS 342 The College of Saint Rose David Goldschmidt, Ph.D. March 6, 2009.
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
Ch1 AI: History and Applications Dr. Bernard Chen Ph.D. University of Central Arkansas Spring 2011.
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
Finding, Analyzing, and Documenting Information
Identifying Reasons for Software Changes Using Historic Databases The CISC 864 Analysis By Lionel Marks.
Basic Web Applications 2. Search Engine Why we need search ensigns? Why we need search ensigns? –because there are hundreds of millions of pages available.
Reyyan Yeniterzi Weakly-Supervised Discovery of Named Entities Using Web Search Queries Marius Pasca Google CIKM 2007.
Concept Unification of Terms in Different Languages for IR Qing Li, Sung-Hyon Myaeng (1), Yun Jin (2),Bo-yeong Kang (3) (1) Information & Communications.
Finding Similar Questions in Large Question and Answer Archives Jiwoon Jeon, W. Bruce Croft and Joon Ho Lee Retrieval Models for Question and Answer Archives.
Querying Structured Text in an XML Database By Xuemei Luo.
A Probabilistic Graphical Model for Joint Answer Ranking in Question Answering Jeongwoo Ko, Luo Si, Eric Nyberg (SIGIR ’ 07) Speaker: Cho, Chin Wei Advisor:
1 University of Palestine Topics In CIS ITBS 3202 Ms. Eman Alajrami 2 nd Semester
Planning a search strategy.  A search strategy may be broadly defined as a conscious approach to decision making to solve a problem or achieve an objective.
Constraint Satisfaction Problems (CSPs) CPSC 322 – CSP 1 Poole & Mackworth textbook: Sections § Lecturer: Alan Mackworth September 28, 2012.
The Key to Successful Searching Software patents pending. ™ Trademarks of SLICCWARE Corporation All rights reserved. SM Service Mark of SLICCWARE Corporation.
Solving Crossword Puzzles with AI:
Recognition of spoken and spelled proper names Reporter : CHEN, TZAN HWEI Author :Michael Meyer, Hermann Hild.
Information Retrieval CSE 8337 Spring 2007 Query Languages & Matching Material for these slides obtained from: Modern Information Retrieval by Ricardo.
©2003 Paula Matuszek CSC 9010: Text Mining Applications Document Summarization Dr. Paula Matuszek (610)
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Chapter 6: Information Retrieval and Web Search
LIS618 lecture 3 Thomas Krichel Structure of talk Document Preprocessing Basic ingredients of query languages Retrieval performance evaluation.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Computational linguistics A brief overview. Computational Linguistics might be considered as a synonym of automatic processing of natural language, since.
Cluster-specific Named Entity Transliteration Fei Huang HLT/EMNLP 2005.
Chapter 23: Probabilistic Language Models April 13, 2004.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Iterative Translation Disambiguation for Cross Language Information Retrieval Christof Monz and Bonnie J. Dorr Institute for Advanced Computer Studies.
Basic Implementation and Evaluations Aj. Khuanlux MitsophonsiriCS.426 INFORMATION RETRIEVAL.
Genetic Algorithms CSCI-2300 Introduction to Algorithms
Generating Query Substitutions Alicia Wood. What is the problem to be solved?
Copyright © 2012 by Nelson Education Limited. Chapter 5 Introduction to inferential Statistics: Sampling and the Sampling Distribution 5-1.
The Development of a search engine & Comparison according to algorithms Sung-soo Kim The final report.
HANGMAN OPTIMIZATION Kyle Anderson, Sean Barton and Brandyn Deffinbaugh.
Artificial Intelligence Lecture No. 8 Dr. Asad Ali Safi ​ Assistant Professor, Department of Computer Science, COMSATS Institute of Information Technology.
Feature Assignment LBSC 878 February 22, 1999 Douglas W. Oard and Dagobert Soergel.
Question Answering Passage Retrieval Using Dependency Relations (SIGIR 2005) (National University of Singapore) Hang Cui, Renxu Sun, Keya Li, Min-Yen Kan,
1 Dongheng Sun 04/26/2011 Learning with Matrix Factorizations By Nathan Srebro.
Constraint Satisfaction Problems (CSPs) Introduction
A research literature search engine with abbreviation recognition
Genetic Algorithms CSCI-2300 Introduction to Algorithms
Language and Learning Introduction to Artificial Intelligence COS302
Artificial Intelligence CIS 342
Information Retrieval and Web Design
Presentation transcript:

1 Combining KR and search: Crossword puzzles Next: Logic representations Reading: C

2 Changes in Homework  Mar 4 th : Hand in written design, planned code for all modules  Mar 9 th : midterm  Mar 25 th : Fully running system due  Mar 30 th : Tournament begins

3 Changes in Homework  Dictionary  Use dictionary provided; do not use your own  Start with 300 words only  Switch to larger set by time of tournament  Representation of dictionary is important to reducing search time  Using knowledge to generate word candidates could also help

4 Midterm Survey  Start after 9AM Friday and finish by Thursday, Mar. 4 th  Your answers are important: they will affect remaining class structure

5 Crossword Puzzle Solver  Proverb: Michael Litman, Duke Univ  Developed by his AI class  Combines knowledge from multiple sources to solve clues (clue/target)  Uses constraint propogation in combination with probabilities to select best target

6 Algorithm Overview  Independent programs specialize in different types of clues – knowledge experts  Information retrieval, database search, machine learning  Each expert module generates a candidate list (with probabilities)  Centralized solver  Merges the candidates lists for each clue  Places candidates on the puzzle grid

7 Performance  Averages 95.3% words correct and 98.1% letters correct  Under 15 minutes/puzzle  Tested on a sample of 370 NYT puzzles  Misses roughly 3 words or 4 letters on a daily 15X15 puzzle

8 Questions  Is this approach any more intelligent than the chess playing programs?  Does the use of knowledge correspond to intelligence?  Do any of the techniques for generating words apply to Scrabble?

9

10 To begin: research style  Study of existing puzzles  How hard?  What are the clues like?  What sources of knowledge might be helpful?  Crossword Puzzle database (CWDB)  350,000 clue-target pairs  >250,000 unique pairs  = # of puzzles seen over 14 years at rate of one puzzle/day

11 How novel are crossword puzzles?  Given complete database and a new puzzle, expect to have seen  91% of targets  50% of clues  34% of clue target pairs  96% of individual words in clues

12

13 Categories of clues  Fill in the blank:  28D: Nothing ____: less  Trailing question mark  4D: The end of Plato?:  Abbreviations  55D: Key abbr: maj

14 Expert Categories  Synonyms  40D Meadowsweet: spiraea  Kind-of  27D Kind of coal or coat: pea  “pea coal” and “pea coat” standard phrases  Movies  50D Princess in Woolf’s “Orlando”: sasha  Geography  59A North Sea port: aberdeen  Music  2D “Hold Me” country Grammay winner, 1988: oslin  Literature  53A Playwright/novelist Capek: karel  Information retrieval  6D Mountain known locally as Chomolungma: everest

15

16

17

18 Candidate generator  Farrow of “Peyton Place”: mia  Movie module returns:  mia  tom  kip  ben  peg  ray

19

20

21 Ablation tests  Removed each module one at a time, rerunning all training puzzles  No single module changed overall percent correct by more than 1%  Removing all modules that relied on CWDB  94.8% to 27.1% correct  Using only the modules that relied exclusively on CWDB  87.6% correct

22 Word list modules  WordList, WordListBig  Ignore their clues and return all words of correct length  WordList u 655,000 terms  WordListBig u WordList plus constructed terms u First and last names, adjacent words from clues u 2.1 million terms, all weighted equally  5D 10,000 words, perhaps: novelette  Wordlist-CWDB  58,000 unique targets  Returns all targets of appropriate length  Weights with estimates of their “prior” probabilities as targets of arbitrary clues u Examine frequency in crossword puzzles and normalize to account for bias caused by letters intersecting across and down terms

23 CWDB-specific modules  Exact Match  Returns all targets of the correct length associated with the clue  Example error: it returns eeyore for 19A Pal of Pooh: tigger  Transformations  Learns transformations to clue-target pairs  Single-word substitution, remove one phrase from beginning or end and add another, depluralizing a word in clue, pluralize word in target  Nice X X in France  X for short X abbr.  X start Prefix with X  X city X capital  51D: Bugs chaser: elmer, solved by Bugs pursuer: elmer and the transformation rule X pursuer X chaser 

24 Information retrieval modules  Encyclopedia  For each query term, compute distribution of terms “close” to query u Counted 10-k times every times it apears at a distance of k<10 from query term u Extremely common terms (as, and) are ignored  Partial match u For a clue c, find all clues in CWDB that share words u For each such clue, give its target a weight  LSI-Ency, LSI-CWDB u Latent semantic indexing (LSI) identifies correlations between words: synonyms u Return closest word for each word in the clue

25 Database Modules  Movie   Looks for patterns in the clue and formulates query to database u Quoted titles: 56D “The Thief of Baghdad” role: abu u Boolean operations: Cary or Lee: grant  Music, literary, geography  Simple pattern matching of clue (keywords “city”, “author”, “band”, etc) to formulate query  15A “Foundation of Trilogy” author: asimov  Geography database: Getty Information Institute

26 Synonyms  WordNet  Look for root forms of words in the clue  Then find variety of related words u 49D Chop-chop: apace  Synonyms of synonyms  Forms of related words converted to forms of clue word (number, tense) u 18A Stymied: thwarted u Is this relevant to Scrabble?

27 Syntactic Modules  Fill-in-the-blanks  >5% clues  Search databases (music, geography, literary and quotes) to find clue patterns  36A Yerby’s “A Rose for _ _ _ Maria”: ana u Pattern: for _ _ _ Maria u Allow any 3 characters to fill the blanks  Kindof  Pattern matching over short phrases  50 clues of this type u “type of” (A type of jacket: nehru) u “starter for” (Starter for saxon: anglo) u “suffix with” (Suffix with switch or sock: eroo

28 Implicit Distribution Modules  Some targets not included in any database, but more probable than random  Schaeffer vs. srhffeeca  Bigram module u Generates all possible letter sequences of the given length by returning a letter bigram distribution over all possible strings, learned from CWDB  Lowest probability clue-target, but higher probability than random sequence of letters u Honolulu wear: hawaiianmuumuu  How could this be used for Scrabble?

29 Questions  Is this approach any more intelligent than the chess playing programs?  Does the use of knowledge correspond to intelligence?  Do any of the techniques for generating words apply to Scrabble?