CROSSWORD PUZZLE – TEAM 2 Members:Derek van Assche Cody Hansen Jonathan Juett Seungbum Park Anthony Vito Date: 4/22/2014.

Slides:



Advertisements
Similar presentations
DICTIONARY Get to know your.
Advertisements

Improved TF-IDF Ranker
Using Link Grammar and WordNet on Fact Extraction for the Travel Domain.
Team : Priya Iyer Vaidy Venkat Sonali Sharma Mentor: Andy Schlaikjer Twist : User Timeline Tweets Classifier.
1 Extended Gloss Overlaps as a Measure of Semantic Relatedness Satanjeev Banerjee Ted Pedersen Carnegie Mellon University University of Minnesota Duluth.
1 Developing Statistic-based and Rule-based Grammar Checkers for Chinese ESL Learners Howard Chen Department of English National Taiwan Normal University.
CHAITALI GUPTA, RAJDEEP BHOWMIK, MICHAEL R. HEAD, MADHUSUDHAN GOVINDARAJU, WEIYI MENG PRESENTED BY: SIDDHARTH PALANISWAMI A Query-based System for Automatic.
1 Words and the Lexicon September 10th 2009 Lecture #3.
ANLE1 CC 437: Advanced Natural Language Engineering ASSIGNMENT 2: Implementing a query expansion component for a Web Search Engine.
Slide 1 EE3J2 Data Mining EE3J2 Data Mining Lecture 7 Topic Spotting & Query Expansion Martin Russell.
NATURAL LANGUAGE TOOLKIT(NLTK) April Corbet. Overview 1. What is NLTK? 2. NLTK Basic Functionalities 3. Part of Speech Tagging 4. Chunking and Trees 5.
Course G Web Search Engines 3/9/2011 Wei Xu
LING 581: Advanced Computational Linguistics Lecture Notes April 12th.
Antonym Creation Tool Presented By Thapar University WordNet Development Team.
WORDNET Approach on word sense techniques - AKILAN VELMURUGAN.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Fast Webpage classification using URL features Authors: Min-Yen Kan Hoang and Oanh Nguyen Thi Conference: ICIKM 2005 Reporter: Yi-Ren Yeh.
Wordnet, Raw Text Pinker, continuing Chapter 2
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
Final Review 31 October WP2: Named Entity Recognition and Classification Claire Grover University of Edinburgh.
A Survey of NLP Toolkits Jing Jiang Mar 8, /08/20072 Outline WordNet Statistics-based phrases POS taggers Parsers Chunkers (syntax-based phrases)
WordNet ® and its Java API ♦ Introduction to WordNet ♦ WordNet API for Java Name: Hao Li Uni: hl2489.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A semantic approach for question classification using.
Lecture 6 Hidden Markov Models Topics Smoothing again: Readings: Chapters January 16, 2013 CSCE 771 Natural Language Processing.
Jennie Ning Zheng Linda Melchor Ferhat Omur. Contents Introduction WordNet Application – WordNet Data Structure - WordNet FrameNet Application – FrameNet.
Use of WordNet and on-line dictionaries to build EN-SK synsets (experimental tool) Ján GENČI Technical University of Košice, Slovakia
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
1 Query Operations Relevance Feedback & Query Expansion.
Language Learning Targets based on CLIMB standards.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
SYMPOSIUM ON SEMANTICS IN SYSTEMS FOR TEXT PROCESSING September 22-24, Venice, Italy Combining Knowledge-based Methods and Supervised Learning for.
Pete Bohman Adam Kunk.  ChronoSearch: A System for Extracting a Chronological Timeline ChronoChrono.
WORDNET. THE WORDNET SYSTEM  Lexicographer files  Code: Lexico files  database  Search Routines and Interfaces.
Using a Lemmatizer to Support the Development and Validation of the Greek WordNet Harry Kornilakis 1, Maria Grigoriadou 1, Eleni Galiotou 1,2, Evangelos.
21/11/2002 The Integration of Lexical Knowledge and External Resources for QA Hui YANG, Tat-Seng Chua Pris, School of Computing.
Quality Control for Wordnet Development in BalkaNet Pavel Smrž Faculty of Informatics, Masaryk University in Brno, Czech.
Course grading Project: 75% Broken into several incremental deliverables Paper appraisal/evaluation/project tool evaluation in earlier May: 25%
October 2005CSA3180 NLP1 CSA3180 Natural Language Processing Introduction and Course Overview.
Regular Expressions The ultimate tool for textual analysis.
A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.
Digital libraries and web- based information systems Mohsen Kamyar.
Dictionary Skills 2nd Grade.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
NLTK & Python Day 8 LING Computational Linguistics Harry Howard Tulane University.
An Applied Ontological Approach to Computational Semantics Sam Zhang.
IELTS Intensive Writing part two. IELTS Writing Two parts of ielts writing Part one writing about a Graph, chart, diagram Part two is an essay.
FILTERED RANKING FOR BOOTSTRAPPING IN EVENT EXTRACTION Shasha Liao Ralph York University.
Lecture 19 Word Meanings II Topics Description Logic III Overview of MeaningReadings: Text Chapter 189NLTK book Chapter 10 March 27, 2013 CSCE 771 Natural.
Dictionary graphs Duško Vitas University of Belgrade, Faculty of Mathematics.
Sentiment Analysis Using Common- Sense and Context Information Basant Agarwal 1,2, Namita Mittal 2, Pooja Bansal 2, and Sonal Garg 2 1 Department of Computer.
Use of Concordancers A corpus (plural corpora) – a large collection of texts, written or spoken, stored on a computer. A concordancer – a computer programme.
Lexicons, Concept Networks, and Ontologies
A Mental Game as a Source of CS Case Studies
Date of Inception: 21st July 2012
Natural Language Processing (NLP)

WordNet: A Lexical Database for English
Extracting Semantic Concept Relations
Bulgarian WordNet Svetla Koeva Institute for Bulgarian Language
Expanding and Simplifying Algebraic Expressions
WordNet WordNet, WSD.
Text Mining & Natural Language Processing
Text Mining & Natural Language Processing
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Lecture 19 Word Meanings II
Natural Language Processing (NLP)
Dynamic Word Sense Disambiguation with Semantic Similarity
Natural Language Processing (NLP)
Presentation transcript:

CROSSWORD PUZZLE – TEAM 2 Members:Derek van Assche Cody Hansen Jonathan Juett Seungbum Park Anthony Vito Date: 4/22/2014

Agenda Tasks Resources.Puz files Components

Tasks Create components to handle patterns Extend current list of clue patterns Write regular expressions for clue patterns Design and implement a GUI Download a larger set of.puz files

Resources  Vehicle make and model database [1]  7,352 Vehicle Entries  Model Years from 1909 to 2013

Resources  Notable Names Database [2]  Contains information on noteworthy people.

Resources  List of rock bands and singers [3]  674 Entries

Resources  Dictionary [4]  Contains words found at dictionary.com  Large list of words and word-like tokens

Resources  BabelNet [5]  Integration of WordNet, Open Multilingual WordNet, Wikipedia, and OmegaWiki

Resources  WordNet [6]  Large lexical database  Nouns, verbs, adjectives and adverbs grouped in synsets  Google Ngram [7]  Corpus collected from online text by Google  Information about ngrams of various lengths and their frequencies  Natural Language Toolkit [8]  Provides interface WordNet  Text processing for classification, tokenization, stemming, tagging, parsing, and semantic reasoning.

.Puz files  Sources:  Number of.puz: 192  Number of.puz: 18  index.html Number of.puz: 86  Number of.puz: 96  Puzzles Located at /data0/projects/cross/more_puz

Component  Input:   1A Capital of Canada 6  1D Jaguar, e.g. 6  Output:  [ ]  1A OTTAWA 5  1A QUEBEC 2  1D FELINE 3  1D CANINE 1

Component  Antonyms pattern:  Example: ZENITH NYT Nadir's opposite ABHOR unk Antonym for "adore"  Regular Expression: ^([A-Za-z]+)(([\'][s]){0,1}|([s][\']){0,1})\s(opposite|antonym) ^([Oo]pposite|[Aa]ntonym)\s(of|for)\s(\'|\"){0,1}([\w]+)(\'|\"){0,1} $  Resources used: Nltk for access to Wordnet  Evaluation MRAR score of 1.0 One correct answer out of one attempt

Component  E.g. clue pattern:  Example: HORSE CSy Chestnut, e.g.  Regular Expression:,[\s][Ee]\.g\.$  Resources used: Nltk for access Wordnet to hypernyms  Evaluation MRAR score of 0.5 Two correct answers out of four attempts

Component  Say pattern:  Example: MISS NYT Overshoot, say  Regular Expression:, [Ss]ay$  Resources used: Nltk for access Wordnet to hypernyms  Evaluation MRAR score of 0 Zero correct answers out of five attempts

Component  In Brief pattern:  Example: ETS NYT Some "Stargate SG-1" characters, in brief  Regular Expression:, [Ii]n brief  Resources used: Nltk for access to Wordnet for synonyms  Evaluation Matched zero clues out of thirty

Component  Kind of pattern:  Example: SEAT unk Kind of belt  Regular Expression: [Kk]ind of  Resources used: Nltk for access to Wordnet for synonyms  Evaluation Matched zero clues out of thirty

Component  Antonym, E.g., Say, In Brief, Kind of pattern:  Ways to improve: Incorporate scoring system Increase performance Accessing WordNet can be slow

Component  Rock Band pattern:  Example: SID CSy Rocker Vicious  Regular Expression: \"[\w\s]+\"\s(rock\s)?band|[Rr]ocker[\s]+.*[A- Z]|\".+\"[\s]+.*[Rr]ocker|\'.+\'[\s]+.*[Rr]ocker  Resources used: Rock Band Database  Evaluation MRAR score of Two correct answers out of two attempts

Component  Rock Band pattern:  Ways to improve: Create a more complete database Include well-known songs Expand list of current patterns Include songs : "Come Sail Away" rockers => Styx

Component  Vehicle pattern:  Example: ACCORD unk Honda model  Regular Expression: [Mm]odels?|[Vv]ehicles?  Resources used: Vehicle make and model database  Evaluation MRAR score of Precision of

Component  Vehicle pattern:  Ways to improve: Expand list of current patterns ‘70s Pontiac => Pontiac GTO

Component  And/Or pattern:  Example: LEVIS WaP Strauss and Stubbs ABE unk Lincoln or Burrows  Regular Expression: [A-Z][a-z]+[\s](and|or)[\s][A-Z][a-z]+$  Resources used: NNDB (Notable Names Database)  Evaluation MRAR score of 0.57 Precision of

Component  And/Or pattern:  Ways to improve: Integrate with Wikipedia or BabelNet Saturn and Mars => Planets Extend list of current patterns The Third son of Adam and Eve

Component  Single Word pattern:  Example: ABANDON USA Desert  Regular Expression: [A-Z0-9][a-z0-9]+$  Resources used: BabelNet for synonyms, hyponyms, hypernyms  Evaluation Undetermined

Component  Single Word pattern:  Ways to improve: Implement BabelNet API Accessing HTML is slow Eliminates timeout issue Implement stemming Helps solve conjugated clues Challenged => Dared Use Nltk

Component  Prefix pattern:  Example: STETHO NYT Prefix with scope  Regular Expression: [Pp]refix  Resources used: dictionary.com all_words.text file on Morana  Evaluation MRAR score of 0.33 Precision of 0.665

Component  Preceder pattern:  Example: SEMI CSy Final preceder  Regular Expression: [Pp]receder  Resources used: Google ngrams  Evaluation Undetermined

Component  Preceder pattern:  Ways to improve: Implement downloaded corpus Eliminates timeout issue

Thanks for listening Are there any questions?

Sources [1] [2] [3] [4] [5] [6] [7] blogspot.com%2F2006%2F08%2F all-our-n-gram-are-belong-to-you.html&sa=D&sntz=1&usg= AFQjCNEFJhdTDMnlK11Tg9vumlsRfDgq9Q [8]