On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz.

On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz III School of Public Policy and Management and Center for Automated Learning and Discovery Carnegie Mellon University {xbai,rpadman}@andrew.cmu.edu Edoardo Airoldi Data Privacy Laboratory School of Computer Science Carnegie Mellon University eairoldi@cs.cmu.edu

Introduction Past text categorization solutions yield cross-validated accuracies and areas under the curve (AUC) in the low 80%s when ported to sentiment extraction. The reason for the failure: features used in the classification (e.g. the words) are assumed to be pairwise independent. We propose an algorithm that  is able to capture dependencies among words, and  is able to find a minimal vocabulary, sufficient for categorization purposes. Our two-stage Markov Blanket Classifier (MBC)  1st Stage: Learns conditional dependencies among the words and encodes them into a Markov Blanket Directed Acyclic Graph  2nd Stage: Uses a Tabu Search (TS) to fine tune the MB DAG

Related Work (1/2) Hatzivassiloglou and McKeown built a log-linear model to predict the semantic orientation of conjoined adjectives. Das and Chen  Manually construct lexicon and grammar rules  They categorized stock news as buy, sell or neutral  Use five classifiers and various voting schemes to achieve an accuracy of 62% Turney and Littman proposed a semi-supervised method to learn the polarity of adjectives starting from a small set of adjectives of known polarity. Turney used this method to predict the opinions of consumers about various objects (movies, cars, banks) and achieved accuracies between 66% and 84%.

Pang et al. used off-the-shelf classification methods on frequent, non-contextual words in combination with various heuristics and annotators, and achieved a maximum cross-validated accuracy of 82.9% on data from IMDb. Dave et al. categorized positive versus negative movie reviews using SVM, and achieved an accuracy of at most 88.9% on data from Amazon and Cnn.Net. Liu et al. proposed a framework to categorize emotions based on a large dictionary of common sense knowledge and on linguistic models. Related Work (2/2)

Bayesian Networks (1/2) A Bayesian network for a set of variables X = {X 1,...,X n } consists of  a network structure S that encodes a set of conditional independence assertions among variables in X;  a set P = {p1,..., pn} of local conditional probability distributions associated with each node and its parents.

P satisfies the Markov condition for S if every node Xi in S is independent of its non- descendants, conditional on its parents. Bayesian Networks (2/2) pa i : the set of parents of Xi p(Y,X 1,...,X 6 ) = C · p(Y | X 1 ) · p(X 4 | X 2, Y ) · p(X 5 | X 3, X 4, Y ) · p(X 2 | X 1 ) · p(X 3 | X 1 ) · p(X 6 | X 4 )

Markov Blanket (1/2) Given the set of variables X, target variable Y and a Bayesian network (S, P), the Markov Blanket for Y consists of  pa Y, the set of parents of Y ;  ch Y, the set of children of Y ; and  pa ch Y,the set of parents of children of Y. p(Y |X 1,...,X 6 ) = C’ · p(Y |X 1 ) · p(X 4 |X 2, Y ) · p(X 5 |X 3,X 4, Y )

Different MB DAGs that entail the same factorization for p(Y |X1,...,X6) belong to the same Markov equivalence class. Our algorithm searches the space of Markov equivalent classes, rather than that of DAGs, thus boosting its efficiency. Markov Blanket (2/2)

Tabu Search Tabu Search (TS) is a powerful meta-heuristic strategy that helps local search heuristics explore the space of solutions by guiding them out of local optima. The basic Tabu Search starts with a feasible solution and iteratively chooses the best move, according to a specified evaluation function. Keeping a tabu list of restrictions on possible moves, updated at each step, which discourage the repetition of selected moves.

MB Classifier 1st Stage: Learning Initial Dependencies (1/2) 1. Search potential parents and children(L Y ) of Y, and potential parents and children ( ∪ i L Xi ) of nodes Xi ∈ L Y  This search is performed by two recursive calls to the function findAdjacencies(Y) 2. Orient the edges using six edge orientation rules 3. Prune the remaining undirected edges and bi-directed edges to avoid cycles

findAdjacencies (Node Y, Node List L, Depth d, Significance α) 1. A Y := {Xi ∈ L: Xi is dependent of Y at level α} 2. for Xi ∈ A Y and for all distinct subsets S ⊂ {A Y \ Xi} d 2.1. if Xi is independent of Y given S at level α 2.2. then remove Xi from A Y 3. for Xi ∈ A Y 3.1. A Xi := {Xj ∈ L: Xj is dependent of Xi at level α,j ≠ i} 3.2. for all distinct subsets S ⊂ {A Xi } d 3.2.1. if Xi is independent of Y given S at level α 3.2.2. then remove Xi from A Y 4. return A Y MB Classifier 1st Stage: Learning Initial Dependencies (2/2)

Four kinds of moves are allowed: edge addition, edge deletion, edge reversal, and edge reversal with node pruning. Always select best move. The tabu list keeps a record of m previous moves. MB Classifier 2nd Stage: Tabu Search Optimization

Experiments 1 (1/3) Data set: 700 positive and 700 negative movies reviews from the rec.arts.movies.reviews newsgroup archived at the Internet Movie Database (IMDb). We considered all the words that appeared in more than 8 documents as our input features (7716 words). 5-fold cross-validation

Experiments 1 (2/3)

Experiments 1 (3/3)

Experiment 2 (1/2) Data set:  Mergers and acquisitions (M&A, 1000 documents)  Finance (1000 documents)  Mixed news (3 topics × 1000 documents) The sentiments we consider are three: positive, neutral and negative. 3-fold cross-validation

Experiment 2 (2/2)

Conclusions We believe that in order to capture sentiments we have to go beyond the search for richer feature sets and the independence assumption. We believe that a robust model is obtained by encoding dependencies among words, and by actively searching for a better dependency structure using heuristic and optimal strategies. The MBC achieves these goals.

On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz.

Similar presentations

Presentation on theme: "On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz.

Similar presentations

Presentation on theme: "On Learning Parsimonious Models for Extracting Consumer Opinions International Conference on System Sciences 2005 Xue Bai and Rema Padman The John Heinz."— Presentation transcript:

Similar presentations

About project

Feedback