Multi-class SVM with Negative Data Selection for Web Page Classification Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference.

Slides:



Advertisements
Similar presentations
An Introduction To Categorization Soam Acharya, PhD 1/15/2003.
Advertisements

PEBL: Web Page Classification without Negative Examples Hwanjo Yu, Jiawei Han, Kevin Chen- Chuan Chang IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
Large-Scale Entity-Based Online Social Network Profile Linkage.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen SIMS, UC Berkeley Susan Dumais Adaptive Systems & Interactions Microsoft.
Jean-Eudes Ranvier 17/05/2015Planet Data - Madrid Trustworthiness assessment (on web pages) Task 3.3.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Kernel Technique Based on Mercer’s Condition (1909)
Mapping Between Taxonomies Elena Eneva 27 Sep 2001 Advanced IR Seminar.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Mapping Between Taxonomies Elena Eneva 11 Dec 2001 Advanced IR Seminar.
Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.
1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.
Text Classification: An Implementation Project Prerak Sanghvi Computer Science and Engineering Department State University of New York at Buffalo.
J. Chen, O. R. Zaiane and R. Goebel An Unsupervised Approach to Cluster Web Search Results based on Word Sense Communities.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
TransRank: A Novel Algorithm for Transfer of Rank Learning Depin Chen, Jun Yan, Gang Wang et al. University of Science and Technology of China, USTC Machine.
Tag-based Social Interest Discovery
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Advisor : Dr. Hsu Presenter : Chien-Shing Chen Author: Tie-Yan.
An Example of Course Project Face Identification.
Bringing Order to the Web: Automatically Categorizing Search Results Hao Chen, CS Division, UC Berkeley Susan Dumais, Microsoft Research ACM:CHI April.
1 LiveClassifier: Creating Hierarchical Text Classifiers through Web Corpora Chien-Chung Huang Shui-Lung Chuang Lee-Feng Chien Presented by: Vu LONG.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
Improving Web Spam Classification using Rank-time Features September 25, 2008 TaeSeob,Yun KAIST DATABASE & MULTIMEDIA LAB.
CIKM’09 Date:2010/8/24 Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen 1.
Nov 23rd, 2001Copyright © 2001, 2003, Andrew W. Moore Linear Document Classifier.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Mining Positive and Negative Patterns for Relevance Feature.
Presenter: Lung-Hao Lee ( 李龍豪 ) January 7, 309.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003.
Publication Spider Wang Xuan 07/14/2006. What is publication spider Gathering publication pages Using focused crawling With the help of Search Engine.
1 Opinion Retrieval from Blogs Wei Zhang, Clement Yu, and Weiyi Meng (2007 CIKM)
Hierarchical Classification
IR Homework #3 By J. H. Wang May 4, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Analysis of Link Structures on the World Wide Web and Classified Improvements Greg Nilsen University of Pittsburgh April 2003.
Neural Network Classification versus Linear Programming Classification in breast cancer diagnosis Denny Wibisono December 10, 2001.
Augmenting Focused Crawling using Search Engine Queries Wang Xuan 10th Nov 2006.
IR Homework #3 By J. H. Wang May 10, Programming Exercise #3: Text Classification Goal: to classify each document into predefined categories Input:
Learning to Rank From Pairwise Approach to Listwise Approach.
Exploring in the Weblog Space by Detecting Informative and Affective Articles Xiaochuan Ni, Gui-Rong Xue, Xiao Ling, Yong Yu Shanghai Jiao-Tong University.
Text Document Categorization by Term Association Maria-luiza Antonie Osmar R. Zaiane University of Alberta, Canada 2002 IEEE International Conference on.
© Devi Parikh 2008 Devi Parikh and Tsuhan Chen Carnegie Mellon University April 3, ICASSP 2008 Bringing Diverse Classifiers to Common Grounds: dtransform.
Scran for Craft, Design and Technology Logging in Browsing and Searching Topics Scran Training PowerPoint 8.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Cheng-Lung Huang Mu-Chen Chen Chieh-Jen Wang
Musical Genre Categorization Using Support Vector Machines Shu Wang.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
INTERNET VOCAB. WEB BROWSER An app for finding info on the web.
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
1 Text Categorization  Assigning documents to a fixed set of categories  Applications:  Web pages  Recommending pages  Yahoo-like classification hierarchies.
Data Mining and Text Mining. The Standard Data Mining process.
IR Homework #2 By J. H. Wang May 9, Programming Exercise #2: Text Classification Goal: to classify each document into predefined categories Input:
Information Organization: Overview
Search Engines.
Discovering User Access Patterns on the World-Wide Web
Source: Procedia Computer Science(2015)70:
Brian Whitman Paris Smaragdis MIT Media Lab
Unsupervised Learning and Autoencoders
J. Zhu, A. Ahmed and E.P. Xing Carnegie Mellon University ICML 2009
Text Categorization Assigning documents to a fixed set of categories
Understanding the Features of a Web Site
<< Advanced Software Agents in Web Mining >>
Identify Different Chinese People with Identical Names on the Web
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Semi-Automatic Data-Driven Ontology Construction System
Deep SEARCH 9 A new tool in the box for automatic content classification: DS9 Machine Learning uses Hybrid Semantic AI ConTech November.
Information Organization: Overview
Presentation transcript:

Multi-class SVM with Negative Data Selection for Web Page Classification Chih-Ming Chen, Hahn-Ming Lee and Ming-Tyan Kao International Joint Conference on Neural Networks 2004

Motivation Several new websites are launched everyday Need to search fast and efficiently Search engines organize websites under topic hierarchy (taxonomy) Need a classifier: one-against-all SVM Catch: huge negative data  increased training time

Negative Data Selection Support vectors in the negative data are much similar to the positive data than the other negative data

Negative Data Selection 1.Feature Selection: top n keywords from the positive data 2.All websites are represented as vectors of these top n keywords. 3.Cosine Similarity:

Negative Data Selection Plot similarity scores of negative to positive documents in descending order with negative documents Similarity Scores in Descending order Negative Documents Convergence Point

Experiments Reuters dataset (10802 training, 565 test) ClassNumber of Positive Data Number of Negative Data Crude Trade Dlr Nat-gas Acq

Experiments