PEBL: Web Page Classification without Negative Examples

Slides:



Advertisements
Similar presentations
A General Framework for Mining Concept-Drifting Data Streams with Skewed Distributions Jing Gao Wei Fan Jiawei Han Philip S. Yu University of Illinois.
Advertisements

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Pseudo-Relevance Feedback For Multimedia Retrieval By Rong Yan, Alexander G. and Rong Jin Mwangi S. Kariuki
Active Learning with Feedback on Both Features and Instances H. Raghavan, O. Madani and R. Jones Journal of Machine Learning Research 7 (2006) Presented.
Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Universal Learning over Related Distributions and Adaptive Graph Transduction Erheng Zhong †, Wei Fan ‡, Jing Peng*, Olivier Verscheure ‡, and Jiangtao.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
1 Fast Asymmetric Learning for Cascade Face Detection Jiaxin Wu, and Charles Brubaker IEEE PAMI, 2008 Chun-Hao Chang 張峻豪 2009/12/01.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Large Lump Detection by SVM Sharmin Nilufar Nilanjan Ray.
Chapter 5: Partially-Supervised Learning
SUPPORT VECTOR MACHINES PRESENTED BY MUTHAPPA. Introduction Support Vector Machines(SVMs) are supervised learning models with associated learning algorithms.
1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.
On Appropriate Assumptions to Mine Data Streams: Analyses and Solutions Jing Gao† Wei Fan‡ Jiawei Han† †University of Illinois at Urbana-Champaign ‡IBM.
Text Classification from Labeled and Unlabeled Documents using EM Kamal Nigam Andrew K. McCallum Sebastian Thrun Tom Mitchell Machine Learning (2000) Presented.
Image Categorization by Learning and Reasoning with Regions Yixin Chen, University of New Orleans James Z. Wang, The Pennsylvania State University Published.
Object Class Recognition Using Discriminative Local Features Gyuri Dorko and Cordelia Schmid.
Presented by Zeehasham Rasheed
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Distributed Representations of Sentences and Documents
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
Soft Margin Estimation for Speech Recognition Main Reference: Jinyu Li, " SOFT MARGIN ESTIMATION FOR AUTOMATIC SPEECH RECOGNITION," PhD thesis, Georgia.
Learning from Imbalanced, Only Positive and Unlabeled Data Yetian Chen
BILINGUAL CO-TRAINING FOR MONOLINGUAL HYPONYMY-RELATION ACQUISITION Jong-Hoon Oh, Kiyotaka Uchimoto, Kentaro Torisawa ACL 2009.
Thien Anh Dinh1, Tomi Silander1, Bolan Su1, Tianxia Gong
Active Learning for Class Imbalance Problem
Intelligent Database Systems Lab Advisor : Dr. Hsu Graduate : Chien-Shing Chen Author : Satoshi Oyama Takashi Kokubo Toru lshida 國立雲林科技大學 National Yunlin.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
Partially Supervised Classification of Text Documents by Bing Liu, Philip Yu, and Xiaoli Li Presented by: Rick Knowles 7 April 2005.
Jun-Won Suh Intelligent Electronic Systems Human and Systems Engineering Department of Electrical and Computer Engineering Speaker Verification System.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Online Kinect Handwritten Digit Recognition Based on Dynamic Time Warping and Support Vector Machine Journal of Information & Computational Science, 2015.
Feature Vector Selection and Use With Hidden Markov Models to Identify Frequency-Modulated Bioacoustic Signals Amidst Noise T. Scott Brandes IEEE Transactions.
Sentosa Technology Consultants | | KDDI R&D Laboratories Inc. Automatic Content Filtering KDDI R&D Laboratories Inc.
IMPROVING ACTIVE LEARNING METHODS USING SPATIAL INFORMATION IGARSS 2011 Edoardo Pasolli Univ. of Trento, Italy Farid Melgani Univ.
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Advisor : Prof. Sing Ling Lee Student : Chao Chih Wang Date :
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Machine Learning for Spam Filtering 1 Sai Koushik Haddunoori.
Selecting Good Expansion Terms for Pseudo-Relevance Feedback Guihong Cao, Jian-Yun Nie, Jianfeng Gao, Stephen Robertson 2008 SIGIR reporter: Chen, Yi-wen.
Nuhi BESIMI, Adrian BESIMI, Visar SHEHU
A New Method for Crater Detection Heather Dunlop November 2, 2006.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
Anomaly Detection. Network Intrusion Detection Techniques. Ştefan-Iulian Handra Dept. of Computer Science Polytechnic University of Timișoara June 2010.
 Effective Multi-Label Active Learning for Text Classification Bishan yang, Juan-Tao Sun, Tengjiao Wang, Zheng Chen KDD’ 09 Supervisor: Koh Jia-Ling Presenter:
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Max-Confidence Boosting With Uncertainty for Visual tracking WEN GUO, LIANGLIANG CAO, TONY X. HAN, SHUICHENG YAN AND CHANGSHENG XU IEEE TRANSACTIONS ON.
Unsupervised Learning Part 2. Topics How to determine the K in K-means? Hierarchical clustering Soft clustering with Gaussian mixture models Expectation-Maximization.
Experience Report: System Log Analysis for Anomaly Detection
CS 9633 Machine Learning Support Vector Machines
How to forecast solar flares?
CS 9633 Machine Learning Concept Learning
Chapter 8: Semi-Supervised Learning
Introductory Seminar on Research: Fall 2017
An Enhanced Support Vector Machine Model for Intrusion Detection
Asymmetric Gradient Boosting with Application to Spam Filtering
Pawan Lingras and Cory Butz
Machine Learning Week 1.
A New Approach to Track Multiple Vehicles With the Combination of Robust Detection and Two Classifiers Weidong Min , Mengdan Fan, Xiaoguang Guo, and Qing.
Pattern Recognition CS479/679 Pattern Recognition Dr. George Bebis
iSRD Spam Review Detection with Imbalanced Data Distributions
Unsupervised Learning II: Soft Clustering with Gaussian Mixture Models
Model generalization Brief summary of methods
Three steps are separately conducted
Physics-guided machine learning for milling stability:
Xiao-Yu Zhang, Shupeng Wang, Xiaochun Yun
An introduction to Machine Learning (ML)
Presentation transcript:

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, and Kevin Chen-Chuan IEEE Transactions on Knowledge and Data Engineering, Vol. 16, No. 1, 2004 Presented by Chirayu Wongchokprasitti

Introduction Web page classification is one of the main techniques for Web mining Constructing a classifier requires positive and negative training examples Cautious to avoid bias and laborious to collect negative training examples

Typical Learning Framework

Positive Example Base Learning (PEBL) Framework Learn from positive data and unlabeled data Unlabeled data indicates random samples of the universal set Apply the Mapping-Convergence (M-C) Algorithm

Mapping-Convergence (M-C) Algorithm Divide into 2 stages Mapping stage Use any classifier that does not generate false negatives They chose 1-DNF ( monotone Disjunctive Normal Form) Convergence stage For maximizing margin They chose SVM (Support Vector Machine)

Mapping Stage Use a weak classifier to draw an initial approximation of “strong” negative data. First, Identify strong positive features from positive and unlabeled data by checking the frequency of those features. If feature frequency in positive data is larger than one in the universal data, it is a strong positive Filter out any possible positive, leaving only strong negatives.

Convergence Stage Use SVM to scope down the class boundary Iterate SVM for certain times to extract negative data from unlabeled data The boundary will converge into the true boundary.

Support Vector Machines Visualization of a Support Vector Machine

Convergence of SVM

Data Flow Diagram

Experimental Results Report the result with precision-recall breakeven point (P-R) Experiment 1: the Internet Use DMOZ as the universal set Experiment 2: University CS department Use WebKB data set Mixture Models

Experiment 1

Experiment 2

Mixture Models

Summary and Conclusions PEBL framework eliminates the need for manually collecting negative training examples The Mapping-Convergence (M-C) algorithm achieves classification accuracy as high as that of traditional SVM PEBL needs faster training time