Text Classification With Support Vector Machines

Slides:



Advertisements
Similar presentations
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Advertisements

Albert Gatt Corpora and Statistical Methods Lecture 13.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
SVM - Support Vector Machines A new classification method for both linear and nonlinear data It uses a nonlinear mapping to transform the original training.
Linear Classifiers (perceptrons)
Query Chains: Learning to Rank from Implicit Feedback Paper Authors: Filip Radlinski Thorsten Joachims Presented By: Steven Carr.
Support Vector Machines
Machine learning continued Image source:
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Ping-Tsun Chang Intelligent Systems Laboratory Computer Science and Information Engineering National Taiwan University Text Mining with Machine Learning.
Discriminative and generative methods for bags of features
On feature distributional clustering for text categorization Bekkerman, El-Yaniv, Tishby and Winter The Technion. June, 27, 2001.
Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?
Chapter 5: Partially-Supervised Learning
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Mapping Between Taxonomies Elena Eneva 27 Sep 2001 Advanced IR Seminar.
Active Learning with Support Vector Machines
Combining Labeled and Unlabeled Data for Multiclass Text Categorization Rayid Ghani Accenture Technology Labs.
Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Semi-Supervised Clustering Jieping Ye Department of Computer Science and Engineering Arizona State University
Dept. of Computer Science & Engineering, CUHK Pseudo Relevance Feedback with Biased Support Vector Machine in Multimedia Retrieval Steven C.H. Hoi 14-Oct,
Bing LiuCS Department, UIC1 Chapter 8: Semi-Supervised Learning Also called “partially supervised learning”
Text Classification With Labeled and Unlabeled Data Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learing.
Métodos de kernel. Resumen SVM - motivación SVM no separable Kernels Otros problemas Ejemplos Muchas slides de Ronald Collopert.
Text Classification using SVM- light DSSI 2008 Jing Jiang.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
Smart RSS Aggregator A text classification problem Alban Scholer & Markus Kirsten 2005.
“Study on Parallel SVM Based on MapReduce” Kuei-Ti Lu 03/12/2015.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Exploring a Hybrid of Support Vector Machines (SVMs) and a Heuristic Based System in Classifying Web Pages Santa Clara, California, USA Ahmad Rahman, Yuliya.
©2012 Paula Matuszek CSC 9010: Text Mining Applications: Document-Based Techniques Dr. Paula Matuszek
The Disputed Federalist Papers: Resolution via Support Vector Machine Feature Selection Olvi Mangasarian UW Madison & UCSD La Jolla Glenn Fung Amazon Inc.,
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.
Machine Learning Using Support Vector Machines (Paper Review) Presented to: Prof. Dr. Mohamed Batouche Prepared By: Asma B. Al-Saleh Amani A. Al-Ajlan.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Text Feature Extraction. Text Classification Text classification has many applications –Spam detection –Automated tagging of streams of news articles,
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.
Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.
CoCQA : Co-Training Over Questions and Answers with an Application to Predicting Question Subjectivity Orientation Baoli Li, Yandong Liu, and Eugene Agichtein.
University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Support Vector Machines.
Support Vector Machines and Gene Function Prediction Brown et al PNAS. CS 466 Saurabh Sinha.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Support-Vector Networks C Cortes and V Vapnik (Tue) Computational Models of Intelligence Joon Shik Kim.
Classification using Co-Training
6.S093 Visual Recognition through Machine Learning Competition Image by kirkh.deviantart.com Joseph Lim and Aditya Khosla Acknowledgment: Many slides from.
Support Vector Machines Reading: Ben-Hur and Weston, “A User’s Guide to Support Vector Machines” (linked from class web page)
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Efficient Text Categorization with a Large Number of Categories Rayid Ghani KDD Project Proposal.
Support Vector Machine (SVM) Presented by Robert Chen.
SUPPORT VECTOR MACHINES Presented by: Naman Fatehpuria Sumana Venkatesh.
Finding Clusters within a Class to Improve Classification Accuracy Literature Survey Yong Jae Lee 3/6/08.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Ping-Tsun Chang Intelligent Systems Laboratory NTU/CSIE Using Support Vector Machine for Integrating Catalogs.
Support Vector Machines Reading: Textbook, Chapter 5 Ben-Hur and Weston, A User’s Guide to Support Vector Machines (linked from class web page)
High resolution product by SVM. L’Aquila experience and prospects for the validation site R. Anniballe DIET- Sapienza University of Rome.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Pawan Lingras and Cory Butz
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Semi-Automatic Data-Driven Ontology Construction System
Unsupervised Machine Learning: Clustering Assignment
MIRA, SVM, k-NN Lirong Xia. MIRA, SVM, k-NN Lirong Xia.
Presentation transcript:

Text Classification With Support Vector Machines Presenter: Aleksandar Milisic Supervisor: Dr. David Albrecht

Overview Text Classification – What and Why? Text Clustering Support Vector Machines Current Techniques Project Aim and Plan

Text Classification – What and Why? Text Classification – assigning documents to predefined classes (categories). Example: Web pages can be assigned to “politics”, “sport”, “business”, “entertainment” etc. There are thousands of categories associated with web pages. Labeling manually is time-consuming and sometimes impossible – the process needs to be automated!

Text Classification – What and Why? Automated text classifiers need to be able to learn from: Small set of labeled documents Large set of unlabeled documents Otherwise a lot of labeling would have to be done by humans So how is it done?

Representing Text Companies Document Distance Offices Match 1 Companies 3 Document Distance . . . Offices Unix Match …With paperless offices becoming more common, companies start using document databases with classification schemes… Feature Vector

Clustering Feature Vectors 1 2 … … 4 1 … Labeled documents … 4 1 … Labeled documents Unlabeled documents

Support Vector Machines (SVM) Binary Classifiers Maximizes distance between two classes (finds Optimal Separating Hyperplane – OSH) Support Vectors are closest to OSH OSH Class1 Not Class 1 Support Vectors

Current Techniques Clustering Methods Classification Methods Rasmussen’s Single Pass Algorithm (as described by Raskutti et al. (2002)) Reallocation Method Hierarchical Methods Classification Methods Support Vector Machines Co-Training Algorithm (Blum and Mitchell, 1998) Raskutti et al. (2002) describe an interesting approach – combining SVM’s with Rasmussen’s clustering algorithm

Combining SVM With Clustering Added Features Labeled documents (Class 1) Labeled documents (Not Class 1) Unlabeled documents Support Vectors Separating Hyperplane

Project Aim Resolve following issues: Can combining SVM’s with other techniques improve performance? Documents have thousands of features: Can different feature representation (selection) techniques improve performance without affecting accuracy? Documents can belong to multiple classes but SVM’s are binary classifiers!

Project Plan Currently implementing clustering technique described in Raskutti et al. (2002) Plan to implement other clustering techniques Investigate different feature representation (selection) techniques For example, different weights for words in different positions in document Investigate multi-class problem

References Blum, A. and T. Mitchell (1998). Combining labeled and unlabeled data with co-training. In COLT: Proceedings of the Workshop on Computational Learning Theory, Morgan Kaufmann Publishers Raskutti, B., H. Ferra, and A. Kowalczyk (2002). Using unlabeled data for text classification through addition of cluster parameters. In International Conference on Machine Learning (Accepted)