Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.

Slides:



Advertisements
Similar presentations
Co Training Presented by: Shankar B S DMML Lab
Advertisements

On-line learning and Boosting
Data Mining Classification: Alternative Techniques
Support Vector Machines and Kernels Adapted from slides by Tim Oates Cognition, Robotics, and Learning (CORAL) Lab University of Maryland Baltimore County.
FilterBoost: Regression and Classification on Large Datasets Joseph K. Bradley 1 and Robert E. Schapire 2 1 Carnegie Mellon University 2 Princeton University.
PAC Learning adapted from Tom M.Mitchell Carnegie Mellon University.
Multi-View Learning in the Presence of View Disagreement C. Mario Christoudias, Raquel Urtasun, Trevor Darrell UC Berkeley EECS & ICSI MIT CSAIL.
Multivariate linear models for regression and classification Outline: 1) multivariate linear regression 2) linear classification (perceptron) 3) logistic.
Review of : Yoav Freund, and Robert E
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
Introduction to Boosting Slides Adapted from Che Wanxiang( 车 万翔 ) at HIT, and Robin Dhamankar of Many thanks!
Robust Moving Object Detection & Categorization using self- improving classifiers Omar Javed, Saad Ali & Mubarak Shah.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Text Learning Tom M. Mitchell Aladdin Workshop Carnegie Mellon University January 2003.
CS Word Sense Disambiguation. 2 Overview A problem for semantic attachment approaches: what happens when a given lexeme has multiple ‘meanings’?
Active Learning of Binary Classifiers
Spatial Semi- supervised Image Classification Stuart Ness G07 - Csci 8701 Final Project 1.
Ensemble Learning: An Introduction
Unsupervised Models for Named Entity Classifcation Michael Collins Yoram Singer AT&T Labs, 1999.
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Introduction to Machine Learning course fall 2007 Lecturer: Amnon Shashua Teaching Assistant: Yevgeny Seldin School of Computer Science and Engineering.
Boosting Applied to Tagging and PP Attachment By Aviad Barzilai.
Introduction to Boosting Aristotelis Tsirigos SCLT seminar - NYU Computer Science.
ML ALGORITHMS. Algorithm Types Classification (supervised) Given -> A set of classified examples “instances” Produce -> A way of classifying new examples.
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Seven Lectures on Statistical Parsing Christopher Manning LSA Linguistic Institute 2007 LSA 354 Lecture 7.
Introduction to machine learning
Support Vector Machines
A k-Nearest Neighbor Based Algorithm for Multi-Label Classification Min-Ling Zhang
Learning with Positive and Unlabeled Examples using Weighted Logistic Regression Wee Sun Lee National University of Singapore Bing Liu University of Illinois,
Topics on Final Perceptrons SVMs Precision/Recall/ROC Decision Trees Naive Bayes Bayesian networks Adaboost Genetic algorithms Q learning Not on the final:
1 Logistic Regression Adapted from: Tom Mitchell’s Machine Learning Book Evan Wei Xiang and Qiang Yang.
1 Artificial Neural Networks Sanun Srisuk EECP0720 Expert Systems – Artificial Neural Networks.
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Text Classification, Active/Interactive learning.
Fundamentals of machine learning 1 Types of machine learning In-sample and out-of-sample errors Version space VC dimension.
Benk Erika Kelemen Zsolt
CS 4705 Lecture 19 Word Sense Disambiguation. Overview Selectional restriction based approaches Robust techniques –Machine Learning Supervised Unsupervised.
Classification Heejune Ahn SeoulTech Last updated May. 03.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Linear Discrimination Reading: Chapter 2 of textbook.
Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.
Report on Semi-supervised Training for Statistical Parsing Zhang Hao
Concept learning, Regression Adapted from slides from Alpaydin’s book and slides by Professor Doina Precup, Mcgill University.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Goal of Learning Algorithms  The early learning algorithms were designed to find such an accurate fit to the data.  A classifier is said to be consistent.
Data Mining and Decision Support
Text Categorization by Boosting Automatically Extracted Concepts Lijuan Cai and Tommas Hofmann Department of Computer Science, Brown University SIGIR 2003.
Learning from Labeled and Unlabeled Data Tom Mitchell Statistical Approaches to Learning and Discovery, and March 31, 2003.
Course Review #2 and Project Parts 3-6 LING 572 Fei Xia 02/14/06.
Machine Learning Lecture 1: Intro + Decision Trees Moshe Koppel Slides adapted from Tom Mitchell and from Dan Roth.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
SUPERVISED AND UNSUPERVISED LEARNING Presentation by Ege Saygıner CENG 784.
Adaboost (Adaptive boosting) Jo Yeong-Jun Schapire, Robert E., and Yoram Singer. "Improved boosting algorithms using confidence- rated predictions."
Empirical risk minimization
Machine Learning Week 1.
Boosting Nearest-Neighbor Classifier for Character Recognition
The
Introduction to Data Mining, 2nd Edition
Computational Learning Theory
Machine Learning Ensemble Learning: Voting, Boosting(Adaboost)
ADABOOST(Adaptative Boosting)
Classification and Prediction
Computational Learning Theory
Empirical risk minimization
Presentation transcript:

Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007

Overview Unlabeled data can be used to reduce the need for supervision Basic idea: making use of the redundancy in the unlabeled data –CoTraining [ Blum and Mitchell 98] Two unsupervised models –DL-CoTrain, based on Decision List Learning –CoBoost, based on AdaBoost algorithm

Problem setting Given iid labeled examples –7 seed rules iid unlabeled examples – 90,000 unlabeled examples x i is a feature vector drawn from a set of possible values X Wish to learn a classification function f :X ->Y {Location, Person, Organization}

Redundantly Sufficient Features spelling featurecontext feature person Redundantly Sufficient Features: features can be separated into two types either X 1 or X 2 is sufficient for classification there exists functions f 1 and f 2 for any example, X 1 and X 2 are conditionally independent given Y

CoTraining spellingcontext Mr. Cooper … a president of Each unlabeled pair is represented as an edge An edge indicates that the two features must have the same label

CoTraining spellingcontext Mr. Cooper … president Each unlabeled pair is represented as an edge An edge indicates that the two features must have the same label

CoTraining Given Labeled examples Unlabeled examples x i = Induce functions f 1 and f 2 such that Loose the constraint

Supervised Algorithm based on Decision Lists Input –x i is a set of features Output –A function –h(x,y) is an estimate of the probability p(y|x) of seeing label y given that feature x is present –h can be thought of as defining a decision list of rules x->y ranked by their strength h(x,y) true x1x1 x2x2 x3x3 false h(x 1,0) h(x 2,1) Decision List

Supervised Algorithm based on Decision Lists (2) The label for a test example x h(x,y) is defined as follows –Count(x,y) is the number of times feature x is seen with label y in training data true x1x1 x2x2 x3x3 false h(x 1,0) h(x 2,1) Decision List

DL-CoTrain (unsupervised decision list) iteration Initialize spelling rules x 1,s ->y x 2,s ->y x 3,s ->y x 4,s ->y context rules x 1,c ->y x 2,c ->y x 3,c ->y spelling rules x 1,s ->y x 2,s ->y x 3,s ->y x 4,s ->y x 5,s ->y x 4,s ->y label datainduce rules label data induce rules labeled data … Spelling rules … context rules … Induce rules: choose the rules with the features that appeared more times with some known label

Boosting-based algorithm - AdaBoost D is a distribution over instances, specifies the relative weight of each example weight for the learner Choose h t and the weight to minimize Z t The training error is bounded above by

AdaBoost for named entity recognition Weak hypothesis choose choose h t, so that it minimize

CoBoost (unsupervised AdaBoost) Recall the criteria for CoTraining Given Labeled examples Unlabeled examples x i = (spelling, context) = Induce functions f 1 and f 2 such that

CoBoost (2) Optimization function : a extension of Z t, learn f 1 and f 2 choose h t and to minimize this function is the unthresholded hypothesis for f j error for labeled data the number of disagreements on unlabeled data

CoBoost (3) At each iteration step 1: fix the second one, choose and to minimize the first one step 2: fix the first one, choose and to minimize the second one

CoBoost (4) t is iteration, j is the classifier take the current output of the other classifier for unlabeled data the instance weight is based on this classifier the same form as the function Z t in AdaBoost use the same algorithm as AdaBoost to choose and

Evaluation 88,962 examples (spelling,context) pairs 7 seed rules are used 1000 examples are chosen as test data. (85 noise) We label the examples to ( location, person, organization, noise)