CROSSWORD PUZZLE – TEAM 2 Members:Derek van Assche Cody Hansen Jonathan Juett Seungbum Park Anthony Vito Date: 4/22/2014
Agenda Tasks Resources.Puz files Components
Tasks Create components to handle patterns Extend current list of clue patterns Write regular expressions for clue patterns Design and implement a GUI Download a larger set of.puz files
Resources Vehicle make and model database [1] 7,352 Vehicle Entries Model Years from 1909 to 2013
Resources Notable Names Database [2] Contains information on noteworthy people.
Resources List of rock bands and singers [3] 674 Entries
Resources Dictionary [4] Contains words found at dictionary.com Large list of words and word-like tokens
Resources BabelNet [5] Integration of WordNet, Open Multilingual WordNet, Wikipedia, and OmegaWiki
Resources WordNet [6] Large lexical database Nouns, verbs, adjectives and adverbs grouped in synsets Google Ngram [7] Corpus collected from online text by Google Information about ngrams of various lengths and their frequencies Natural Language Toolkit [8] Provides interface WordNet Text processing for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.
.Puz files Sources: Number of.puz: 192 Number of.puz: 18 index.html Number of.puz: 86 Number of.puz: 96 Puzzles Located at /data0/projects/cross/more_puz
Component Input: 1A Capital of Canada 6 1D Jaguar, e.g. 6 Output: [ ] 1A OTTAWA 5 1A QUEBEC 2 1D FELINE 3 1D CANINE 1
Component Antonyms pattern: Example: ZENITH NYT Nadir's opposite ABHOR unk Antonym for "adore" Regular Expression: ^([A-Za-z]+)(([\'][s]){0,1}|([s][\']){0,1})\s(opposite|antonym) ^([Oo]pposite|[Aa]ntonym)\s(of|for)\s(\'|\"){0,1}([\w]+)(\'|\"){0,1} $ Resources used: Nltk for access to Wordnet Evaluation MRAR score of 1.0 One correct answer out of one attempt
Component E.g. clue pattern: Example: HORSE CSy Chestnut, e.g. Regular Expression:,[\s][Ee]\.g\.$ Resources used: Nltk for access Wordnet to hypernyms Evaluation MRAR score of 0.5 Two correct answers out of four attempts
Component Say pattern: Example: MISS NYT Overshoot, say Regular Expression:, [Ss]ay$ Resources used: Nltk for access Wordnet to hypernyms Evaluation MRAR score of 0 Zero correct answers out of five attempts
Component In Brief pattern: Example: ETS NYT Some "Stargate SG-1" characters, in brief Regular Expression:, [Ii]n brief Resources used: Nltk for access to Wordnet for synonyms Evaluation Matched zero clues out of thirty
Component Kind of pattern: Example: SEAT unk Kind of belt Regular Expression: [Kk]ind of Resources used: Nltk for access to Wordnet for synonyms Evaluation Matched zero clues out of thirty
Component Antonym, E.g., Say, In Brief, Kind of pattern: Ways to improve: Incorporate scoring system Increase performance Accessing WordNet can be slow
Component Rock Band pattern: Example: SID CSy Rocker Vicious Regular Expression: \"[\w\s]+\"\s(rock\s)?band|[Rr]ocker[\s]+.*[A- Z]|\".+\"[\s]+.*[Rr]ocker|\'.+\'[\s]+.*[Rr]ocker Resources used: Rock Band Database Evaluation MRAR score of Two correct answers out of two attempts
Component Rock Band pattern: Ways to improve: Create a more complete database Include well-known songs Expand list of current patterns Include songs : "Come Sail Away" rockers => Styx
Component Vehicle pattern: Example: ACCORD unk Honda model Regular Expression: [Mm]odels?|[Vv]ehicles? Resources used: Vehicle make and model database Evaluation MRAR score of Precision of
Component Vehicle pattern: Ways to improve: Expand list of current patterns ‘70s Pontiac => Pontiac GTO
Component And/Or pattern: Example: LEVIS WaP Strauss and Stubbs ABE unk Lincoln or Burrows Regular Expression: [A-Z][a-z]+[\s](and|or)[\s][A-Z][a-z]+$ Resources used: NNDB (Notable Names Database) Evaluation MRAR score of 0.57 Precision of
Component And/Or pattern: Ways to improve: Integrate with Wikipedia or BabelNet Saturn and Mars => Planets Extend list of current patterns The Third son of Adam and Eve
Component Single Word pattern: Example: ABANDON USA Desert Regular Expression: [A-Z0-9][a-z0-9]+$ Resources used: BabelNet for synonyms, hyponyms, hypernyms Evaluation Undetermined
Component Single Word pattern: Ways to improve: Implement BabelNet API Accessing HTML is slow Eliminates timeout issue Implement stemming Helps solve conjugated clues Challenged => Dared Use Nltk
Component Prefix pattern: Example: STETHO NYT Prefix with scope Regular Expression: [Pp]refix Resources used: dictionary.com all_words.text file on Morana Evaluation MRAR score of 0.33 Precision of 0.665
Component Preceder pattern: Example: SEMI CSy Final preceder Regular Expression: [Pp]receder Resources used: Google ngrams Evaluation Undetermined
Component Preceder pattern: Ways to improve: Implement downloaded corpus Eliminates timeout issue
Thanks for listening Are there any questions?
Sources [1] [2] [3] [4] [5] [6] [7] blogspot.com%2F2006%2F08%2F all-our-n-gram-are-belong-to-you.html&sa=D&sntz=1&usg= AFQjCNEFJhdTDMnlK11Tg9vumlsRfDgq9Q [8]