Presentation is loading. Please wait.

Presentation is loading. Please wait.

CROSSWORD PUZZLE – TEAM 2 Members:Derek van Assche Cody Hansen Jonathan Juett Seungbum Park Anthony Vito Date: 4/22/2014.

Similar presentations


Presentation on theme: "CROSSWORD PUZZLE – TEAM 2 Members:Derek van Assche Cody Hansen Jonathan Juett Seungbum Park Anthony Vito Date: 4/22/2014."— Presentation transcript:

1 CROSSWORD PUZZLE – TEAM 2 Members:Derek van Assche Cody Hansen Jonathan Juett Seungbum Park Anthony Vito Date: 4/22/2014

2 Agenda Tasks Resources.Puz files Components

3 Tasks Create components to handle patterns Extend current list of clue patterns Write regular expressions for clue patterns Design and implement a GUI Download a larger set of.puz files

4 Resources  Vehicle make and model database [1]  7,352 Vehicle Entries  Model Years from 1909 to 2013

5 Resources  Notable Names Database [2]  Contains information on noteworthy people.

6 Resources  List of rock bands and singers [3]  674 Entries

7 Resources  Dictionary [4]  Contains words found at dictionary.com  Large list of words and word-like tokens

8 Resources  BabelNet [5]  Integration of WordNet, Open Multilingual WordNet, Wikipedia, and OmegaWiki

9 Resources  WordNet [6]  Large lexical database  Nouns, verbs, adjectives and adverbs grouped in synsets  Google Ngram [7]  Corpus collected from online text by Google  Information about ngrams of various lengths and their frequencies  Natural Language Toolkit [8]  Provides interface WordNet  Text processing for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

10 .Puz files  Sources:  http://chronicle.com/section/Crosswords/43 Number of.puz: 192  http://puzzle.about.com Number of.puz: 18  http://bobklahn.home.comcast.net/~bobklahn/CrosSynergy/ index.html Number of.puz: 86  http://www.fleetingimage.com/wij/xyzzy/12-dr.html Number of.puz: 96  Puzzles Located at /data0/projects/cross/more_puz

11 Component  Input:   1A Capital of Canada 6  1D Jaguar, e.g. 6  Output:  [ ]  1A OTTAWA 5  1A QUEBEC 2  1D FELINE 3  1D CANINE 1

12 Component  Antonyms pattern:  Example: ZENITH 1 2010 NYT Nadir's opposite ABHOR 2 2010 unk Antonym for "adore"  Regular Expression: ^([A-Za-z]+)(([\'][s]){0,1}|([s][\']){0,1})\s(opposite|antonym) ^([Oo]pposite|[Aa]ntonym)\s(of|for)\s(\'|\"){0,1}([\w]+)(\'|\"){0,1} $  Resources used: Nltk for access to Wordnet  Evaluation MRAR score of 1.0 One correct answer out of one attempt

13 Component  E.g. clue pattern:  Example: HORSE 5 2002 CSy Chestnut, e.g.  Regular Expression:,[\s][Ee]\.g\.$  Resources used: Nltk for access Wordnet to hypernyms  Evaluation MRAR score of 0.5 Two correct answers out of four attempts

14 Component  Say pattern:  Example: MISS 4 1999 NYT Overshoot, say  Regular Expression:, [Ss]ay$  Resources used: Nltk for access Wordnet to hypernyms  Evaluation MRAR score of 0 Zero correct answers out of five attempts

15 Component  In Brief pattern:  Example: ETS 2 2008 NYT Some "Stargate SG-1" characters, in brief  Regular Expression:, [Ii]n brief  Resources used: Nltk for access to Wordnet for synonyms  Evaluation Matched zero clues out of thirty

16 Component  Kind of pattern:  Example: SEAT 2 2000 unk Kind of belt  Regular Expression: [Kk]ind of  Resources used: Nltk for access to Wordnet for synonyms  Evaluation Matched zero clues out of thirty

17 Component  Antonym, E.g., Say, In Brief, Kind of pattern:  Ways to improve: Incorporate scoring system Increase performance Accessing WordNet can be slow

18 Component  Rock Band pattern:  Example: SID 5 2001 CSy Rocker Vicious  Regular Expression: \"[\w\s]+\"\s(rock\s)?band|[Rr]ocker[\s]+.*[A- Z]|\".+\"[\s]+.*[Rr]ocker|\'.+\'[\s]+.*[Rr]ocker  Resources used: Rock Band Database  Evaluation MRAR score of 0.5139 Two correct answers out of two attempts

19 Component  Rock Band pattern:  Ways to improve: Create a more complete database Include well-known songs Expand list of current patterns Include songs : "Come Sail Away" rockers => Styx

20 Component  Vehicle pattern:  Example: ACCORD 2 2004 unk Honda model  Regular Expression: [Mm]odels?|[Vv]ehicles?  Resources used: Vehicle make and model database  Evaluation MRAR score of 0.7246 Precision of 0.9891

21 Component  Vehicle pattern:  Ways to improve: Expand list of current patterns ‘70s Pontiac => Pontiac GTO

22 Component  And/Or pattern:  Example: LEVIS 4 2001 WaP Strauss and Stubbs ABE 2 1997 unk Lincoln or Burrows  Regular Expression: [A-Z][a-z]+[\s](and|or)[\s][A-Z][a-z]+$  Resources used: NNDB (Notable Names Database)  Evaluation MRAR score of 0.57 Precision of 0.6355

23 Component  And/Or pattern:  Ways to improve: Integrate with Wikipedia or BabelNet Saturn and Mars => Planets Extend list of current patterns The Third son of Adam and Eve

24 Component  Single Word pattern:  Example: ABANDON 2 1999 USA Desert  Regular Expression: [A-Z0-9][a-z0-9]+$  Resources used: BabelNet for synonyms, hyponyms, hypernyms  Evaluation Undetermined

25 Component  Single Word pattern:  Ways to improve: Implement BabelNet API Accessing HTML is slow Eliminates timeout issue Implement stemming Helps solve conjugated clues Challenged => Dared Use Nltk

26 Component  Prefix pattern:  Example: STETHO 3 2003 NYT Prefix with scope  Regular Expression: [Pp]refix  Resources used: dictionary.com all_words.text file on Morana  Evaluation MRAR score of 0.33 Precision of 0.665

27 Component  Preceder pattern:  Example: SEMI 3 2007 CSy Final preceder  Regular Expression: [Pp]receder  Resources used: Google ngrams  Evaluation Undetermined

28 Component  Preceder pattern:  Ways to improve: Implement downloaded corpus Eliminates timeout issue

29 Thanks for listening Are there any questions?

30 Sources [1] https://github.com/n8barr/automotive-model-year-data [2] http://www.nndb.com/ [3] http://www.allmusic.com/ [4] http://dictionary.reference.com/ [5] http://babelnet.org/ [6] http://wordnet.princeton.edu/ [7] http://www.google.com/url?q=http%3A%2F%2Fgoogleresearch. blogspot.com%2F2006%2F08%2F all-our-n-gram-are-belong-to-you.html&sa=D&sntz=1&usg= AFQjCNEFJhdTDMnlK11Tg9vumlsRfDgq9Q [8] http://www.nltk.org/


Download ppt "CROSSWORD PUZZLE – TEAM 2 Members:Derek van Assche Cody Hansen Jonathan Juett Seungbum Park Anthony Vito Date: 4/22/2014."

Similar presentations


Ads by Google