Learning Extraction Patterns for Subjective Expressions 2007/10/09 DataMining Lab 안민영
Content Background Learning and Bootstrapping Extraction Patterns for Subjectivity Experimental Results Conclusions
Background The Goal of Work Classify individual sentences as subjective or objective The Goal of Research To use high-precision subjectivity classifiers to automatically identify subjective and objective Extraction Patterns Typically use lexico-syntactic patterns to identify relevant Hypotheses Extraction patterns would be able to represent subjective expression that have noncompositional meanings Ex) drives (someone) up the wall -George drives me up the wall -She drives me up the wall
Learning and Bootstrapping Extraction Patterns for Subjectivity Bootstrapping process for subjectivity classification High precision classifiers can be used to automatically Using to a training set to automatically learn extraction this data can be used as a training set to automatically learn extraction patterns Learned patterns can be used to grow training set Subjectivity clues are divided into Strongly subjective,Weakly subjective Use a combination of manual review and empirical results
Learning and Bootstrapping Extraction Patterns for Subjectivity
High-precision subjective classifier Classify a sentence as subjective if it contains two or more Test set : 2197 sentences, 59% subjective Precision: 91.5%, Recall: 31.9% High-precision objective classifier Rather than looking for the presence of lexical items, looks for absence Precision: 82.6%, Recall: 16.4%
Learning and Bootstrapping Extraction Patterns for Subjectivity Learning Subjective Extraction Patterns To automatically learn extraction patterns that are associated with subjectivity Use a learning algorithm similar to AutoSlog-TS (Riloff, 1996).
Learning and Bootstrapping Extraction Patterns for Subjectivity What is AutoSlog-TS (Riloff, 1996)? Autoslog is the first system to learn text extraction dictionary from training examples AutoSlog-TS is what generates extraction patterns from untagged text Requires two cormora: Relevant Irrelevant Based on AutoSlog which requires tagging
Learning and Bootstrapping Extraction Patterns for Subjectivity for this work Want a fully automatic process that does not depend on a human reviewer Were most interested in finding patterns that can identify subjective expressions with high precision. (noun) fact = subjective expression
Experimental Results Subjectivity Data Use consists of English-language version of foreign news documents from FBIS, U.S.Foreign Broadcast Information Service Evaluation of the learned patterns Pool of unannotated texts : individual sentences Evaluated 18 different subsets of the patterns by selecting the patterns that pass certain thresholds Extraction patterns perform quite well : precision ranges from 71%~85%
Experimental Results Evaluation of the Bootstrapping Preprocess Use the learned extraction patterns to classify previously unlabeled sentences bootstrapping process does not learn new objective sentences Didn’t want to simply add the new subjective sentences Modify the HP-Subj classifier to use extraction patterns - contains two or more learned patterns - contains one of the clues used by the original one
Experimental Results
Conclusions High-precision subjectivity classification can be used to generate a large amount of labeled training data Show that an extraction pattern learning technique can learn subjective expressions that are linguistically richer than individual words or fixed phrases. Augment our original high-precision subjective classifier with these newly learned extraction patterns