Overview of Statistical NLP IR Group Meeting March 7, 2006.

Overview of Statistical NLP IR Group Meeting March 7, 2006

03/07/2006 IR Group Meeting -- NLP 2 Outline Some basic/important NLP problems Topics that recently attracted many interests NLP research groups Discussion on the relation between NLP and IR

03/07/2006 IR Group Meeting -- NLP 3 Levels of Analysis in NLP (from Dan Roth’s CS598) Morphology  How words are constructed Syntax  Structural relation between words Semantics  The meaning of words and of combinations of words Pragmatics.  How is a sentence used? What’s its purpose? Discourse (sometimes distinguished as a subfield of Pragmatics)  Relationships between sentences; global context.

03/07/2006 IR Group Meeting -- NLP 4 Some NLP Problems N-gram Models Word Sense Disambiguation Lexical Acquisition (POS) Tagging (Syntactic) Parsing Semantic Role Labeling (Semantic Parsing) Named Entity Recognition Textual Entailment …

03/07/2006 IR Group Meeting -- NLP 5 N-gram Models The task: to estimate P(w n |w 1,…,w n-1 ) Approaches:  Maximum likelihood estimation  Various smoothing methods Applications:  Automatic speech recognition  Spelling correction  Handwriting recognition  Statistical machine translation

03/07/2006 IR Group Meeting -- NLP 6 Word Sense Disambiguation (WSD) The task: to determine which of the senses of an ambiguous word is involved in a particular use of the word Approaches:  Supervised: Log-linear models Information-theoretic Memory-based learning (kNN)  Dictionary-based: Sense definitions Thesauri Translations in a second language  Unsupervised: Clustering using EM algorithm

03/07/2006 IR Group Meeting -- NLP 7 Word Sense Disambiguation (WSD) Accuracy:  Word-specific  Easy words: > 90%  Hard words: 50~70% Applications:  Statistical machine translation  Information retrieval

03/07/2006 IR Group Meeting -- NLP 8 Lexical Acquisition The task: to develop algorithms and statistical techniques for filling the holes in existing machine-learnable dictionaries by looking at the occurrence patterns of words in large text corpora Examples:  Verb subcategorization  Propositional phrase attachment disambiguation  Selectional preferences  Semantic similarity

03/07/2006 IR Group Meeting -- NLP 9 Semantic Similarity The task: to acquire a relative measure of similarity between two words Approaches:  Vector space measures (document space, word space, modifier space, etc.)  Probabilistic measures (KL-divergence, etc.) Applications:  Information retrieval (query expansion)

03/07/2006 IR Group Meeting -- NLP 10 POS Tagging The task: labeling each word in a sentence with its appropriate part of speech Major approaches  HMM  Transformation-based Advantages: speed and storage Other approaches  Neural networks, decision trees, memory-based learning, maximum entropy models

03/07/2006 IR Group Meeting -- NLP 11 POS Tagging Accuracy:  95~97%  Achieved only when the application text and the training text are from the similar source Applications  For higher-level NLP tasks: partial parsing, parsing, NER, etc. “…the best lexicalized probabilistic parsers are now good enough that they perform better starting with untagged text and doing the tagging themselves, rather than using a tagger as preprocessor.” (Charniak 1997)

03/07/2006 IR Group Meeting -- NLP 12 (Syntactic) Parsing The task: to find the most likely syntactic parse tree of a sentence Approaches:  Probabilistic context free grammar (PCFG) Supervised Unsupervised  Lexicalized models  Dependency-based models

03/07/2006 IR Group Meeting -- NLP 13 (Syntactic) Parsing Accuracy:  Charniak 1997: Rec 0.875 Prec 0.874  Collins 1997: Rec 0.881 Prec 0.886 Applications:  For other NLP tasks such as semantic role labeling and relation extraction

03/07/2006 IR Group Meeting -- NLP 14 Semantic Role Labeling The task: to identify the predicate-argument structures in sentences Approaches:  Supervised learning Accuracy:  Best ~70% (CoNLL 04 shared task) Applications:  Information extraction  Question answering

03/07/2006 IR Group Meeting -- NLP 15 Textual Entailment The task: given two text fragments, to recognize whether the meaning of one text is entailed (can be inferred) from the other text Approaches:  Word overlap  Statistical lexical relations  Syntactic matching  Logic inference Accuracy:  ~0.56, best ~0.60 (PASCAL Challenge 05) Applications:  Question answering  Multi-document summarization

03/07/2006 IR Group Meeting -- NLP 16 Tools Brill Tagger Brill Charniak Parser Charniak Collins Parser Collins MiniPar Semantic Parser  ASSERT Parser ASSERT  CCG’s demodemo

03/07/2006 IR Group Meeting -- NLP 17 Corpora WordNet Penn Treebank (Sample) Penn TreebankSample PropBank FrameNet

03/07/2006 IR Group Meeting -- NLP 18 Other Tasks Automatic Speech Recognition Natural Language Generation Automatic Summarization …

03/07/2006 IR Group Meeting -- NLP 20 Recent topics Unsupervised and semi-supervised approaches  Knowledge acquisition bottleneck Semantic role labeling  Improve the performance of SRL  Use the results for other tasks Relation extraction WSD Parsing Statistical machine translation  Word alignment

03/07/2006 IR Group Meeting -- NLP 22 NLP Research Groups USC/ISI Stanford UPenn Johns-Hopkins UIUC …

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Similar presentations

Presentation on theme: "Overview of Statistical NLP IR Group Meeting March 7, 2006."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Overview of Statistical NLP IR Group Meeting March 7, 2006.

Similar presentations

Presentation on theme: "Overview of Statistical NLP IR Group Meeting March 7, 2006."— Presentation transcript:

Similar presentations

About project

Feedback