Wrapper Learning: Cohen et al 2002; Kushmeric 2000; Kushmeric & Frietag 2000 William Cohen 1/26/03.

Slides:



Advertisements
Similar presentations
Random Forest Predrag Radenković 3237/10
Advertisements

Co Training Presented by: Shankar B S DMML Lab
Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.
Concept Learning DefinitionsDefinitions Search Space and General-Specific OrderingSearch Space and General-Specific Ordering The Candidate Elimination.
Evaluation (practice). 2 Predicting performance  Assume the estimated error rate is 25%. How close is this to the true error rate?  Depends on the amount.
Decision Tree Rong Jin. Determine Milage Per Gallon.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Unsupervised Models for Named Entity Classification Michael Collins and Yoram Singer Yimeng Zhang March 1 st, 2007.
Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson
Maria-Florina Balcan Carnegie Mellon University Margin-Based Active Learning Joint with Andrei Broder & Tong Zhang Yahoo! Research.
Ensemble Learning: An Introduction
Co-training LING 572 Fei Xia 02/21/06. Overview Proposed by Blum and Mitchell (1998) Important work: –(Nigam and Ghani, 2000) –(Goldman and Zhou, 2000)
Holistic Web Page Classification William W. Cohen Center for Automated Learning and Discovery (CALD) Carnegie-Mellon University.
ICS 273A Intro Machine Learning
Machine Learning: Ensemble Methods
CSCI 5582 Fall 2006 CSCI 5582 Artificial Intelligence Lecture 22 Jim Martin.
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Boosting Main idea: train classifiers (e.g. decision trees) in a sequence. a new classifier should focus on those cases which were incorrectly classified.
IE by Candidate Classification: Jansche & Abney, Cohen et al William Cohen 1/19/03.
Online Stacked Graphical Learning Zhenzhen Kou +, Vitor R. Carvalho *, and William W. Cohen + Machine Learning Department + / Language Technologies Institute.
For Better Accuracy Eick: Ensemble Learning
Ensembles of Classifiers Evgueni Smirnov
Machine Learning CS 165B Spring 2012
Face Detection using the Viola-Jones Method
Issues with Data Mining
By Wang Rui State Key Lab of CAD&CG
Accurately and Reliably Extracting Data from the Web: A Machine Learning Approach by: Craig A. Knoblock, Kristina Lerman Steven Minton, Ion Muslea Presented.
Mohammad Ali Keyvanrad
Boris Babenko Department of Computer Science and Engineering University of California, San Diego Semi-supervised and Unsupervised Feature Scaling.
Text Classification, Active/Interactive learning.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
Universit at Dortmund, LS VIII
Benk Erika Kelemen Zsolt
Presenter: Shanshan Lu 03/04/2010
Machine Learning Chapter 2. Concept Learning and The General-to-specific Ordering Tom M. Mitchell.
Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.
Ensemble Learning Spring 2009 Ben-Gurion University of the Negev.
Chapter 9: Structured Data Extraction Supervised and unsupervised wrapper generation.
ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.
Tony Jebara, Columbia University Advanced Machine Learning & Perception Instructor: Tony Jebara.
Combining labeled and unlabeled data for text categorization with a large number of categories Rayid Ghani KDD Lab Project.
For Monday Finish chapter 19 Take-home exam due. Program 4 Any questions?
Matwin Text classification: In Search of a Representation Stan Matwin School of Information Technology and Engineering University of Ottawa
COP5992 – DATA MINING TERM PROJECT RANDOM SUBSPACE METHOD + CO-TRAINING by SELIM KALAYCI.
Number Sense Disambiguation Stuart Moore Supervised by: Anna Korhonen (Computer Lab)‏ Sabine Buchholz (Toshiba CRL)‏
Data Mining Practical Machine Learning Tools and Techniques By I. H. Witten, E. Frank and M. A. Hall Chapter 6.2: Classification Rules Rodney Nielsen Many.
Learning to Detect Faces A Large-Scale Application of Machine Learning (This material is not in the text: for further information see the paper by P.
Machine Learning Concept Learning General-to Specific Ordering
Data Mining and Decision Support
Hypertext Categorization using Hyperlink Patterns and Meta Data Rayid Ghani Séan Slattery Yiming Yang Carnegie Mellon University.
Outline Decision tree representation ID3 learning algorithm Entropy, Information gain Issues in decision tree learning 2.
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Ensembles of Classifiers Evgueni Smirnov. Outline 1 Methods for Independently Constructing Ensembles 1.1 Bagging 1.2 Randomness Injection 1.3 Feature-Selection.
Ensemble Learning, Boosting, and Bagging: Scaling up Decision Trees (with thanks to William Cohen of CMU, Michael Malohlava of 0xdata, and Manish Amde.
1 Machine Learning: Ensemble Methods. 2 Learning Ensembles Learn multiple alternative definitions of a concept using different training data or different.
1 Machine Learning Lecture 8: Ensemble Methods Moshe Koppel Slides adapted from Raymond J. Mooney and others.
Wrapper Induction & Other Use of “Structure”
Ensemble Classifiers.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Machine Learning: Ensemble Methods
Chapter 2 Concept Learning
Fast Effective Rule Induction
Introduction to Data Mining, 2nd Edition
Machine Learning: Lecture 3
IE by Candidate Classification: Califf & Mooney
Ensemble learning.
Machine Learning Chapter 2
Statistical Relational AI
Implementation of Learning Systems
Machine Learning Chapter 2
Presentation transcript:

Wrapper Learning: Cohen et al 2002; Kushmeric 2000; Kushmeric & Frietag 2000 William Cohen 1/26/03

Goal: learn from a human teacher how to extract certain database records from a particular web site.

Learner

Why learning from few examples is important At training time, only four examples are available—but one would like to generalize to future pages as well… Must generalize across time as well as across a single site

now some details….

Kushmerick’s WIEN system Earliest wrapper-learning system (published IJCAI ’97) Special things about WIEN: –Treats document as a string of characters –Learns to extract a relation directly, rather than extracting fields, then associating them together in some way –Example is a completely labeled page

WIEN system: a sample wrapper

Left delimiters L1=“ ”, L2=“ ”; Right R1=“ ”, R2=“ ”

WIEN system: a sample wrapper Learning means finding L1,…,Lk and R1,…,Rk Li must precede every instance of field i Ri must follow every instance of field I Li, Ri can’t contain data items Limited number of possible candidates for Li,Ri

WIEN system: a more complex class of wrappers (HLRT) Extension: use Li,Ri delimiters only: after a “head” (after first occurence of H) and before a “tail” (occurrence of T) H = “ ”, T = “ ”

Kushmeric: overview of various extensions to LR

Kushmeric and Frietag: Boosted wrapper induction

Review of boosting Generalized version of AdaBoost (Singer&Schapire, 99) Allows “real-valued” predictions for each “base hypothesis”—including value of zero.

Learning methods: boosting rules Weak learner: to find weak hypothesis t: 1.Split Data into Growing and Pruning sets 2.Let R t be an empty conjunction 3.Greedily add conditions to R t guided by Growing set: 4.Greedily remove conditions from R t guided by Pruning set: 5.Convert to weak hypothesis: where Constraint: W + > W - and caret is smoothing

Learning methods: boosting rules SLIPPER also produces fairly compact rule sets.

Learning methods: BWI Boosted wrapper induction (BWI) learns to extract substrings from a document. –Learns three concepts: firstToken(x), lastToken(x), substringLength(k) –Conditions are tests on tokens before/after x E.g., tok i-2 =‘from’, isNumber(tok i+1 ) –S LIPPER weak learner, no pruning. –Greedy search extends “window size” by at most L in each iteration, uses lookahead L, no fixed limit on window size. Good results in ( Kushmeric and Frietag, 2000)

BWI algorithm

Lookahead search here

BWI example rules

Cohen et al

Improving A Page Classifier with Anchor Extraction and Link Analysis William W. Cohen NIPS 2002

Previous work in page classification using links: Exploit hyperlinks (Slattery&Mitchell 2000; Cohn&Hofmann, 2001; Joachims 2001): Documents pointed to by the same “hub” should have the same class. What’s new in this paper: Use structure of hub pages (as well as structure of site graph) to find better “hubs” Adapt an existing “wrapper learning” system to find structure, on the task of classifying “executive bio pages”.

Intuition: links from this “hub page” are informative… …especially these links

Idea: use the wrapper-learner to learn to extract links to execBio pages, smoothing the “noisy” data produced by the initial page classifier. Task: train a page classifier, then use it to classify pages on a new, previously-unseen web site as executiveBio or other Question: can index pages for executive biographies be used to improve classification?

Background: “co-training” (Mitchell&Blum, ‘98) Suppose examples are of the form (x 1,x 2,y) where x 1,x 2 are independent (given y), and where each x i is sufficient for classification, and unlabeled examples are cheap. –(E.g., x 1 = bag of words, x 2 = bag of links). Co-training algorithm: 1. Use x 1 ’s (on labeled data D) to train f 1 (x)=y 2. Use f 1 to label additional unlabeled examples U. 3. Use x 2 ’s (on labeled part of U+D to train f 1 (x)=y 4. Repeat...

Simple 1-step co-training for web pages f 1 is a bag-of-words page classifier, and S is web site containing unlabeled pages. Feature construction. Represent a page x in S as a bag of pages that link to x (“bag of hubs”). Learning. Learn f 2 from the bag-of-hubs examples, labeled with f 1 Labeling. Use f 2 (x) to label pages from S. Idea: use one round of co-training to bootstrap the bag-of words classifier to one that uses site-specific features x 2 /f 2

Improved 1-step co-training for web pages Feature construction. - Label an anchor a in S as positive iff it points to a positive page x (according to f 1 ). Let D = {(x’,a): a is a positive anchor on x’}. - Generate many small training sets D i from D, by sliding small windows over D. - Let P be the set of all “structures” found by any builder from any subset D i - Say that p links to x if p extracts an anchor that points to x. Represent a page x as the bag of structures in P that link to x. Learning and Labeling. As before.

builder extractor List1

builder extractor List2

builder extractor List3

BOH representation: { List1, List3,…}, PR { List1, List2, List3,…}, PR { List2, List 3,…}, Other { List2, List3,…}, PR … Learner

Experimental results Co-training hurts No improvement

Experimental results

Summary - “Builders” (from a wrapper learning system) let one discover and use structure of web sites and index pages to smooth page classification results. - Discovering good “hub structures” makes it possible to use 1-step co-training on small ( example) unlabeled datasets. – Average error rate was reduced from 8.4% to 3.6%. – Difference is statistically significant with a 2- tailed paired sign test or t-test. – EM with probabilistic learners also works—see (Blei et al, UAI 2002)