Co Training Presented by: Shankar B S DMML Lab 05-11-2007.

Slides:



Advertisements
Similar presentations
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
Advertisements

CHAPTER 2: Supervised Learning. Lecture Notes for E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1) 2 Learning a Class from Examples.
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Maria-Florina Balcan Modern Topics in Learning Theory Maria-Florina Balcan 04/19/2006.
Yi Wu (CMU) Joint work with Vitaly Feldman (IBM) Venkat Guruswami (CMU) Prasad Ragvenhdra (MSR)
Model Evaluation Metrics for Performance Evaluation
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
2D1431 Machine Learning Boosting.
Ensemble Learning: An Introduction
1 How to be a Bayesian without believing Yoav Freund Joint work with Rob Schapire and Yishay Mansour.
Adaboost and its application
Machine Learning: Symbol-Based
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Sparse vs. Ensemble Approaches to Supervised Learning
Distributed Representations of Sentences and Documents
Maria-Florina Balcan A Theoretical Model for Learning from Labeled and Unlabeled Data Maria-Florina Balcan & Avrim Blum Carnegie Mellon University, Computer.
Ensemble Learning (2), Tree and Forest
For Better Accuracy Eick: Ensemble Learning
Semi-Supervised Learning
Ensembles of Classifiers Evgueni Smirnov
2015 AprilUNIVERSITY OF HAIFA, DEPARTMENT OF STATISTICS, SEMINAR FOR M.A 1 Hastie, Tibshirani and Friedman.The Elements of Statistical Learning (2nd edition,
Processing of large document collections Part 2 (Text categorization) Helena Ahonen-Myka Spring 2006.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Naive Bayes Classifier
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Transfer Learning Task. Problem Identification Dataset : A Year: 2000 Features: 48 Training Model ‘M’ Testing 98.6% Training Model ‘M’ Testing 97% Dataset.
Data Mining Practical Machine Learning Tools and Techniques Chapter 4: Algorithms: The Basic Methods Section 4.6: Linear Models Rodney Nielsen Many of.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Classification Techniques: Bayesian Classification
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
ASSESSING LEARNING ALGORITHMS Yılmaz KILIÇASLAN. Assessing the performance of the learning algorithm A learning algorithm is good if it produces hypotheses.
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Computer Vision Lecture 6. Probabilistic Methods in Segmentation.
Machine Learning II Decision Tree Induction CSE 573.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 5: Credibility: Evaluating What’s Been Learned.
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Chapter 20 Classification and Estimation Classification – Feature selection Good feature have four characteristics: –Discrimination. Features.
Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.
Classification Ensemble Methods 1
Ensemble Methods Construct a set of classifiers from the training data Predict class label of previously unseen records by aggregating predictions made.
CS 8751 ML & KDDComputational Learning Theory1 Notions of interest: efficiency, accuracy, complexity Probably, Approximately Correct (PAC) Learning Agnostic.
1 Chapter 8: Model Inference and Averaging Presented by Hui Fang.
Discriminative Training and Machine Learning Approaches Machine Learning Lab, Dept. of CSIE, NCKU Chih-Pin Liao.
Machine Learning Chapter 7. Computational Learning Theory Tom M. Mitchell.
Web-Mining Agents: Transfer Learning TrAdaBoost R. Möller Institute of Information Systems University of Lübeck.
Debrup Chakraborty Non Parametric Methods Pattern Recognition and Machine Learning.
Ensembles of Classifiers Evgueni Smirnov. Outline 1 Methods for Independently Constructing Ensembles 1.1 Bagging 1.2 Randomness Injection 1.3 Feature-Selection.
Wrapper Learning: Cohen et al 2002; Kushmeric 2000; Kushmeric & Frietag 2000 William Cohen 1/26/03.
Computational Intelligence: Methods and Applications Lecture 26 Density estimation, Expectation Maximization. Włodzisław Duch Dept. of Informatics, UMK.
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Ensemble Classifiers.
Machine Learning: Ensemble Methods
Semi-Supervised Clustering
Zaman Faisal Kyushu Institute of Technology Fukuoka, JAPAN
Semi-supervised Machine Learning Gergana Lazarova
Table 1. Advantages and Disadvantages of Traditional DM/ML Methods
Classification of unlabeled data:
Machine Learning Basics
Data Mining Lecture 11.
Semi-Supervised Learning
Introduction to Data Mining, 2nd Edition
Ensembles.
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Model Combination.
Recitation 10 Oznur Tastan
Presentation transcript:

Co Training Presented by: Shankar B S DMML Lab

Bootstrapping Bootstrapping – Use initial labeled data to build a predictive labeling procedure. Use the newly labeled data to build a new predictive procedure Example: 1. EM algorithm – In each iteration the model parameters are updated. The model defines a joint probability distribution on the observed data. 2. Rule based bootstrapping

Two views – X1, X2 Two distinct hypothesis classes H1, H2 consisting of functions predicting Y from X1 and X2 respectively Bootstrap using h1єH1, h2єH2 “If X1 is conditionally independent of X2 given Y then given a weak predictor in H1 and given an algorithm which can learn H2 under random misclassification noise, then it is possible to learn a good predictor in H2” Co-Training

Example Description of a web page can be partitioned into Words occurring on that page Words occurring on the hyperlinks pointing to that page (Anchor text) Train two learning algorithms on each view. Use the predictions of each algorithm on unlabeled example to enlarge training set of the other

Co-training framework Instance space X = X1*X2, X1,X2 re two different views of same example Label ‘ l = f1(X1) = f2(X2) = f(X) f1,f2 are target functions, f is combined target function C1 and C2 are concept classes defined over X1, X2 f1єC1, f2єC2; f = (f1,f2) єC1*C2 Even if C1 and C2 are large concepts with high complexity, the set of compatible target functions might be simpler and smaller

Co-Training framework X1 = X2 = {0,1} n C1 = C2 = ‘Conjunctions over {0,1} n If first coordinate of X1 is known to be 0, then this gives a negative example of X2 If the distribution has non-zero probability only on pairs where X1= X2, then no useful information about f2 can be obtained. If X2 is conditionally independent of X1 given Y, then a new random negative example is obtained which is quite useful.

Idea 1 : Feature selection with multiple views As in Co-training suppose we have two views f1(X1) = f2(x2) = C We want to do feature selection on X1, Using X2 can reduce the number of labeled instances required Or Given a set of labeled instances X2 can be used to select better set of features

Idea 2: Feature expansion Suppose we have 2 views of same data X1 and X2 and classifier uses combined data set. If X2 is available only for some instances, we can use X1 to construct X2 for rest of the instances using the labeled training data and/or the unlabeled test data. Related to missing features problem  EM algorithm  KNN algorithm  Median, Mean etc