January 23 rd, 2011 1. Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.

Slides:

Advertisements

Similar presentations

Machine Learning for Vision-Based Motion Analysis Learning pullback metrics for linear models Oxford Brookes Vision Group Oxford Brookes University 17/10/2008.

Advertisements

General algorithmic techniques: Balanced binary tree technique Doubling technique: List Ranking Problem Divide and concur Lecture 6.

Text mining Gergely Kótyuk Laboratory of Cryptography and System Security (CrySyS) Budapest University of Technology and Economics

The simplex algorithm The simplex algorithm is the classical method for solving linear programs. Its running time is not polynomial in the worst case.

Solving Absolute Value Equations Solving Absolute Value Equations

Document Summarization using Conditional Random Fields Dou Shen, Jian-Tao Sun, Hua Li, Qiang Yang, Zheng Chen IJCAI 2007 Hao-Chin Chang Department of Computer.

Basics of Statistical Estimation

Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.

Hadi Goudarzi and Massoud Pedram

Feature Selection as Relevant Information Encoding Naftali Tishby School of Computer Science and Engineering The Hebrew University, Jerusalem, Israel NIPS.

On-line learning and Boosting

Search-Based Structured Prediction

Automatic Text Processing: Cross-Lingual Text Categorization Automatic Text Processing: Cross-Lingual Text Categorization Dipartimento di Ingegneria dell’Informazione.

Machine learning continued Image source:

Lecture 17: Supervised Learning Recap Machine Learning April 6, 2010.

Distributional Clustering of Words for Text Classification Authors: L.Douglas Baker Andrew Kachites McCallum Presenter: Yihong Ding.

CS292 Computational Vision and Language Pattern Recognition and Classification.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

Ensemble Learning: An Introduction

Project  Now it is time to think about the project  It is a team work Each team will consist of 2 people  It is better to consider a project of your.

Visual Recognition Tutorial

Selecting Informative Genes with Parallel Genetic Algorithms Deodatta Bhoite Prashant Jain.

Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.

Distributed Representations of Sentences and Documents

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Relaxed Transfer of Different Classes via Spectral Partition Xiaoxiao Shi 1 Wei Fan 2 Qiang Yang 3 Jiangtao Ren 4 1 University of Illinois at Chicago 2.

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Active Learning for Class Imbalance Problem

A Unified Modeling Framework for Distributed Resource Allocation of General Fork and Join Processing Networks in ACM SIGMETRICS

Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.

Pattern Classification All materials in these slides were taken from Pattern Classification (2nd ed) by R. O. Duda, P. E. Hart and D. G. Stork, John Wiley.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition LECTURE 16: NEURAL NETWORKS Objectives: Feedforward.

Memory Bounded Inference on Topic Models Paper by R. Gomes, M. Welling, and P. Perona Included in Proceedings of ICML 2008 Presentation by Eric Wang 1/9/2009.

Universit at Dortmund, LS VIII

Benk Erika Kelemen Zsolt

Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.

CHAPTER 7: Clustering Eick: K-Means and EM (modified Alpaydin transparencies and new transparencies added) Last updated: February 25, 2014.

Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Indirect Supervision Protocols for Learning in Natural Language Processing II. Learning by Inventing Binary Labels This work is supported by DARPA funding.

Graph-based Text Classification: Learn from Your Neighbors Ralitsa Angelova ， Gerhard Weikum : Max Planck Institute for Informatics Stuhlsatzenhausweg.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Active learning Haidong Shi, Nanyi Zeng Nov,12,2008.

Detecting New a Priori Probabilities of Data Using Supervised Learning Karpov Nikolay Associate professor NRU Higher School of Economics.

Iterative similarity based adaptation technique for Cross Domain text classification Under: Prof. Amitabha Mukherjee By: Narendra Roy Roll no: Group:

CS6772 Advanced Machine Learning Fall 2006 Extending Maximum Entropy Discrimination on Mixtures of Gaussians With Transduction Final Project by Barry.

Optimal Reverse Prediction: Linli Xu, Martha White and Dale Schuurmans ICML 2009, Best Overall Paper Honorable Mention A Unified Perspective on Supervised,

Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.

 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:

An Application of Divergence Estimation to Projection Retrieval for Semi-supervised Classification and Clustering MADALINA FITERAU ARTUR DUBRAWSKI ICML.

SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.

Flexible Speaker Adaptation using Maximum Likelihood Linear Regression Authors: C. J. Leggetter P. C. Woodland Presenter: 陳亮宇 Proc. ARPA Spoken Language.

Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.

ECE 8443 – Pattern Recognition ECE 8527 – Introduction to Machine Learning and Pattern Recognition Objectives: Bayes Rule Mutual Information Conditional.

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.

Exploring Social Tagging Graph for Web Object Classification

8.3.2 Constant Distance Approximations

Usman Roshan CS 675 Machine Learning

Linli Xu Martha White Dale Schuurmans University of Alberta

Data Mining K-means Algorithm

CSE 4705 Artificial Intelligence

Restricted Boltzmann Machines for Classification

The Curve Merger (Dvir & Widgerson, 2008)

[Figure taken from googleblog

Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models

LECTURE 21: CLUSTERING Objectives: Mixture Densities Maximum Likelihood Estimates Application to Gaussian Mixture Models k-Means Clustering Fuzzy k-Means.

Text Categorization Berlin Chen 2003 Reference:

Presentation transcript:

January 23 rd,

Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document to a fixed number of semantic categories. In general, text classification is a multi-class problem, where each document may belong to one, many, or none of the categories. Many algorithms were proposed to handle this task, part of them are trained using semi-supervised text. 2

Semi-supervised learning (SSL) Semi-supervised learning (SSL) employs small amount of labeled data with relatively large amount of unlabeled data to train classifiers. Graph-based SSL algorithms are an important class of SSL techniques, where the data (both labeled and unlabeled) is embedded within a low-dimensional manifold expressed by a graph. 3

New approach for graph-based semi-supervised text classification Previous work on semi-supervised text classification has relied primarily on one vs. rest approaches to overcome the multi-class nature of the text classification problem. Such an approach may be sub-optimal since it disregards data overlaps. In order to address this drawback, a new framework is proposed based on optimizing a loss function composed of Kullback-Leiber divergence between probability distribution defined for each graph vertex. The use of probability distributions, rather than fixed integer labels, leads to straightforward multi-class generalization. 4

Learning problem definition 5

Multi-class SSL optimization procedure Lets define two sets of probability distribution over the elements of : where the later is fixed throughout the training. we set to be uniform over only the possible categories and zero otherwise: where is a set of possible outputs for input. The goal of the learning is to find the best set of distributions by minimizing the next objective function: where denotes the entire set of distributions to be learned, is the standard entropy function of, is the KL-divergence between and, and are positive parameters. The final labels for are then given by 6

Analysis for the objective function 7

Learning with alternating minimization 8

9

10

The update equations 11

Conclusions 12 The paper introduce a new graph based semi supervised learning approach for document classification. This approach uses Kullback-Leiber divergence between probability distributions in the objective function instead of Euclidian distances, thus allowing for each document to be a member of different classes. Unfortunately the objective derivative didnt yield closed form solution, so an alternating minimization approach was proposed. This approach admits a closed form and relatively simple solution for each iteration.

Any questions? 13