Download presentation
Presentation is loading. Please wait.
Published byClifton Greer Modified over 9 years ago
1
Course G22.2580 - Web Search Engines 3/9/2011 Wei Xu xuwei@cs.nyu.edu
2
WordNet® a large lexical database of English a combination of dictionary and thesaurus created and maintained by Cognitive Science Lab of Princeton University designed to establish the connections between words
3
http://wordnet.princeton.edu/
4
WORDnet 4 types of Parts of Speech (POS) ▪ Noun, Verb, Adjective, Adverb Synset ▪ the smallest unit in WordNet ▪ a synonym set ▪ Represent a specific meaning of a word
5
wordNET Synsets are connected to one anther through semantic and lexical relations Type of relations (based on POS) ▪ hypernyms (kind-of): ‘vehicle’ is a hypernym of ‘car’ ▪ hyponyms (kind-of): ‘car’ is a hyponym of ‘vehicle’ ▪ holonym (part-of): ‘building’ is a holonym of ‘window’ ▪ meronym(part-of): ‘window’ is a meronym of ‘building’ ▪ similar to: ‘smart’ is similar to ‘intelligent’ ▪ antonyms: ‘smart’ is antonym of ‘unintelligent’
6
hypernym hyponym
7
Unix-style manual Web Interfaces Local Interfaces/APIs Java Perl C# http://wordnet.princeton.edu/wordnet/related- projects/#web
8
Definition: the process for removing suffixes of words to get their base or root form Example: ‘fishing’, ‘fished’, ‘fish’, ‘fisher’ ‘fish’
9
Porter Stemmer http://tartarus.org/~martin/PorterStemmer/ http://tartarus.org/~martin/PorterStemmer/ Krovetz Stemmer (in Lemur package) http://www.lemurproject.org/phorum/read.php?1 1,1394 http://www.lemurproject.org/phorum/read.php?1 1,1394 WordNet Stemmer http://tipsandtricks.runicsoft.com/Other/JavaSte mmer.html http://tipsandtricks.runicsoft.com/Other/JavaSte mmer.html
10
Tokenization The process of breaking a stream of text up into “words” and punctuation marks. Sentence Splitting Part of Speech Tagging Example: He/PRP 's/VBZ at/IN peace/NN with/IN the/DT house/NN and/CC could/MD stay/VB there/RB indefinitely/RB./.
11
Name Entity Recognition The process of labeling sequences of words which are the names of things, such as person, company, location names. Example: Jim bought 300 shares of Acme Corp. in 2006.
12
Stanford POS tagger http://nlp.stanford.edu/software/tagger.shtml http://nlp.stanford.edu/software/tagger.shtml Stanford NER http://nlp.stanford.edu/software/CRF-NER.shtml http://nlp.stanford.edu/software/CRF-NER.shtml GATE http://gate.ac.uk/ http://gate.ac.uk/ JET http://cs.nyu.edu/grishman/jet/license.html http://cs.nyu.edu/grishman/jet/license.html http://www.cs.nyu.edu/courses/spring10/G22.2590- 001/schedule.html http://www.cs.nyu.edu/courses/spring10/G22.2590- 001/schedule.html
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.