Matwin 1999 1 Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa

Slides:

Advertisements

Similar presentations

A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.

Advertisements

Arnd Christian König Venkatesh Ganti Rares Vernica Microsoft Research Entity Categorization Over Large Document Collections.

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Large-Scale Entity-Based Online Social Network Profile Linkage.

Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Text Mining with Machine Learning.

A Survey on Text Categorization with Machine Learning Chikayama lab. Dai Saito.

What is Statistical Modeling

On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.

Text Categorization Hongning Wang Today’s lecture Bayes decision theory Supervised text categorization – General steps for text categorization.

Gimme’ The Context: Context- driven Automatic Semantic Annotation with CPANKOW Philipp Cimiano et al.

Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.

Presented by Zeehasham Rasheed

Part I: Classification and Bayesian Learning

Mining the Peanut Gallery: Opinion Extraction and Semantic Classification of Product Reviews K. Dave et al, WWW 2003, citations Presented by Sarah.

Comparing the Parallel Automatic Composition of Inductive Applications with Stacking Methods Hidenao Abe & Takahira Yamaguchi Shizuoka University, JAPAN.

MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.

COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.

Processing of large document collections Part 3 (Evaluation of text classifiers, applications of text categorization) Helena Ahonen-Myka Spring 2005.

Classifying Tags Using Open Content Resources Simon Overell, Borkur Sigurbjornsson & Roelof van Zwol WSDM ‘09.

Bayesian Networks. Male brain wiring Female brain wiring.

Automatically Identifying Localizable Queries Center for E-Business Technology Seoul National University Seoul, Korea Nam, Kwang-hyun Intelligent Database.

1 A study on automatically extracted keywords in text categorization Authors:Anette Hulth and Be´ata B. Megyesi From:ACL 2006 Reporter: 陳永祥 Date:2007/10/16.

Group 2 R 李庭閣 R 孔垂玖 R 許守傑 R 鄭力維.

Text Classification, Active/Interactive learning.

1 Bins and Text Categorization Carl Sable (Columbia University) Kenneth W. Church (AT&T)

Machine Learning CSE 681 CH2 - Supervised Learning.

Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.

Copyright (c) 2003 David D. Lewis (Spam vs.) Forty Years of Machine Learning for Text Classification David D. Lewis, Ph.D. Independent Consultant Chicago,

1 Statistical NLP: Lecture 9 Word Sense Disambiguation.

Universit at Dortmund, LS VIII

AUTOMATED TEXT CATEGORIZATION: THE TWO-DIMENSIONAL PROBABILITY MODE Abdulaziz alsharikh.

Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,

Text Mining the technology to convert text into knowledge Stan Matwin School of Information Technology and Engineering University of Ottawa Canada

CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.

Prediction of Molecular Bioactivity for Drug Design Experiences from the KDD Cup 2001 competition Sunita Sarawagi, IITB

1 Pattern Recognition Pattern recognition is: 1. A research area in which patterns in data are found, recognized, discovered, …whatever. 2. A catchall.

Mining Binary Constraints in Feature Models: A Classification-based Approach Yi Li.

CISC Machine Learning for Solving Systems Problems Presented by: Ashwani Rao Dept of Computer & Information Sciences University of Delaware Learning.

A Scalable Machine Learning Approach for Semi-Structured Named Entity Recognition Utku Irmak(Yahoo! Labs) Reiner Kraft(Yahoo! Inc.) WWW 2010(Information.

Powerpoint Templates Page 1 Powerpoint Templates Scalable Text Classification with Sparse Generative Modeling Antti PuurulaWaikato University.

Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Neural Text Categorizer for Exclusive Text Categorization Journal of Information Processing Systems, Vol.4, No.2, June 2008 Taeho Jo* 報告者 : 林昱志.

Computational Approaches for Biomarker Discovery SubbaLakshmiswetha Patchamatla.

Text Categorization With Support Vector Machines: Learning With Many Relevant Features By Thornsten Joachims Presented By Meghneel Gore.

Class Imbalance in Text Classification

Automatic acquisition for low frequency lexical items Nuria Bel, Sergio Espeja, Montserrat Marimon.

Data Mining and Decision Support

26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.

Exploiting Named Entity Taggers in a Second Language Thamar Solorio Computer Science Department National Institute of Astrophysics, Optics and Electronics.

From Words to Senses: A Case Study of Subjectivity Recognition Author: Fangzhong Su & Katja Markert (University of Leeds, UK) Source: COLING 2008 Reporter:

Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.

Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.

Classification using Co-Training

A Comprehensive Comparative Study on Term Weighting Schemes for Text Categorization with SVM Lan Man 3 Nov, 2004.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

Mining Tag Semantics for Social Tag Recommendation Hsin-Chang Yang Department of Information Management National University of Kaohsiung.

Wrapper Learning: Cohen et al 2002; Kushmeric 2000; Kushmeric & Frietag 2000 William Cohen 1/26/03.

BAYESIAN LEARNING. 2 Bayesian Classifiers Bayesian classifiers are statistical classifiers, and are based on Bayes theorem They can calculate the probability.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

Introduction to Information Retrieval Introduction to Information Retrieval Lecture 15: Text Classification & Naive Bayes 1.

COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics Semantic distance between two words.

Data Mining Practical Machine Learning Tools and Techniques

Source: Procedia Computer Science（2015）70:

Text Categorization Assigning documents to a fixed set of categories

Discriminative Frequent Pattern Analysis for Effective Classification

Family History Technology Workshop

A task of induction to find patterns

Extracting Why Text Segment from Web Based on Grammar-gram

KnowItAll and TextRunner

Presentation transcript:

Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa

Matwin Outline Supervised learning=classification ML/DM at U of O Classical approach Attempt at a linguistic representation N-grams – how to get them? Labelling and co-learning Next steps?…

Matwin Supervised learning (classification) Given: a set of training instances T={e t }, where each t is a class label : one of the classes C 1,…C k a concept with k classes C 1,…C k (but the definition of the concept is NOT known) Find: a description for each class which will perform well in determining (predicting) class membership for unseen instances

Matwin Classification Prevalent practice: examples are represented as vectors of values of attributes Theoretical wisdom, confirmed empirically: the more examples, the better predictive accuracy

Matwin ML/DM at U of O Learning from imbalanced classes: applications in remote sensing a relational, rather than propositional representation: learning the maintainability concept Learning in the presence of background knowledge. Bayesian belief networks and how to get them. Appl to distributed DB

Matwin Why text classification? Automatic file saving Internet filters Recommenders Information extraction …

Matwin Bag of words Text classification: standard approach 1.Remove stop words and markings 2. remaining words are all attributes 3.A document becomes a vector 4.Train a boolean classifier for each class 5.Evaluate the results on an unseen sample

Matwin Text classification: tools RIPPER A “covering”learner Works well with large sets of binary features Naïve Bayes Efficient (no search) Simple to program Gives “degree of belief”

Matwin “Prior art” Yang: best results using k-NN: 82.3% microaveraged accuracy Joachim’s results using Support Vector Machine + unlabelled data SVM insensitive to high dimensionality, sparseness of examples

Matwin SVM in Text classification SVM Transductive SVM Maximum separation Margin for test set Training with 17 examples in 10 most frequent categories gives test performance of 60% on test cases available during training

Matwin Problem 1: aggressive feature selection

Matwin Problem 2: semantic relationships are missed

Matwin Proposed solution (Sam Scott) Get noun phrases and/or key phrases (Extractor) and add to the feature list Add hypernyms

Matwin Hypernyms - WordNet

Matwin Evaluation (Lewis) Vary the “loss ratio” parameter For each parameter value Learn a hypothesis for each class (binary classification) Micro-average the confusion matrices (add component-wise) Compute precision and recall Interpolate (or extrapolate) to find the point where microaveraged precision and recall are equal

Matwin Results No gain over BW in alternative representations But… Comprehensibility…

Matwin Combining classifiers Comparable to best known results (Yang)

Matwin Other possibilities Using hypernyms with a small training set (avoids ambiguous words) Use Bayes+Ripper in a cascade scheme (Gama) Other representations:

Matwin Collocations Do not need to be noun phrases, just pairs of words possibly separated by stop words Only the well discriminating ones are chosen These are added to the bag of words, and… Ripper

Matwin N-grams N-grams are substrings of a given length Good results in Reuters [Mladenic, Grobelnik] with Bayes; we try RIPPER A different task: classifying text files Attachments Audio/video Coded From n-grams to relational features

Matwin How to get good n-grams? We use Ziv-Lempel for frequent substring detection (.gz!) abababa a b a a b b a

Matwin N-grams Counting Pruning: substring occurrence ratio < acceptance threshold Building relations: string A almost always precedes string B Feeding into relational learner (FOIL)

Matwin Using grammar induction (text files) Idea: detect patterns of substrings Patterns are regular languages Methods of automata induction: a recognizer for each class of files We use a modified version of RPNI2 [Dupont, Miclet]

Matwin What’s new… Work with marked up text (Word, Web) XML with semantic tags: mixed blessing for DM/TM Co-learning Text mining

Matwin Co-learning How to use unlabelled data? Or How to limit the number of examples that need be labelled? Two classifiers and two redundantly sufficient representations Train both, run both on test set, add best predictions to training set

Matwin Co-learning Training set grows as… …each learner predicts independently due to redundant sufficiency (different representations) would also work with our learners if we used Bayes? Would work with classifying s

Matwin Co-learning Mitchell experimented with the task of classifying web pages (profs, students, courses, projects) – a supervised learning task Used Anchor text Page contents Error rate halved (from 11% to 5%)

Matwin Cog-sci? Co- learning seems to be cognitively justified Model: students learning in groups (pairs) What other social learning mechanisms could provide models for supervised learning?

Matwin Conclusion A practical task, needs a solution No satisfactory solution so far Fruitful ground for research