Web Page Classification with Heterogeneous Data Fusion

Slides:



Advertisements
Similar presentations
A Comparison of Implicit and Explicit Links for Web Page Classification Dou Shen 1 Jian-Tao Sun 2 Qiang Yang 1 Zheng Chen 2 1 Department of Computer Science.
Advertisements

A Novel Visualization Model for Web Search Results An Application of the Solar System Metaphor Tien N. Nguyen and Jin Zhang Electrical and Computer Engineering.
Relevant characteristics extraction from semantically unstructured data PhD title : Data mining in unstructured data Daniel I. MORARIU, MSc PhD Supervisor:
N.U.S. - January 13, 2006 Gert Lanckriet U.C. San Diego Classification problems with heterogeneous information sources.
Ziming Zhang*, Ze-Nian Li, Mark Drew School of Computing Science Simon Fraser University Vancouver, Canada {zza27, li, AdaMKL: A Novel.
Chen Cheng1, Haiqin Yang1, Irwin King1,2 and Michael R. Lyu1
ACM Multimedia th Annual Conference, October , 2004
Efficient Convex Relaxation for Transductive Support Vector Machine Zenglin Xu 1, Rong Jin 2, Jianke Zhu 1, Irwin King 1, and Michael R. Lyu 1 4. Experimental.
Relevance Feedback based on Parameter Estimation of Target Distribution K. C. Sia and Irwin King Department of Computer Science & Engineering The Chinese.
1 Integrating User Feedback Log into Relevance Feedback by Coupled SVM for Content-Based Image Retrieval 9-April, 2005 Steven C. H. Hoi *, Michael R. Lyu.
1 Extending Link-based Algorithms for Similar Web Pages with Neighborhood Structure Allen, Zhenjiang LIN CSE, CUHK 13 Dec 2006.
OCFS: Optimal Orthogonal Centroid Feature Selection for Text Categorization Jun Yan, Ning Liu, Benyu Zhang, Shuicheng Yan, Zheng Chen, and Weiguo Fan et.
Support Vector Machine Regression for Volatile Stock Market Prediction Haiqin Yang, Laiwan Chan, and Irwin King Department of Computer Science and Engineering.
Presentation in IJCNN 2004 Biased Support Vector Machine for Relevance Feedback in Image Retrieval Hoi, Chu-Hong Steven Department of Computer Science.
The Implicit Mapping into Feature Space. In order to learn non-linear relations with a linear machine, we need to select a set of non- linear features.
1 PageSim: A Link-based Similarity Measure for the World Wide Web Zhenjiang Lin, Irwin King, and Michael, R., Lyu Computer Science & Engineering, The Chinese.
A Study of the Relationship between SVM and Gabriel Graph ZHANG Wan and Irwin King, Multimedia Information Processing Laboratory, Department of Computer.
Multimedia Security Digital Video Watermarking Supervised by Prof. LYU, Rung Tsong Michael Presented by Chan Pik Wah, Pat Nov 20, 2002 Department of Computer.
Optimizing Learning with SVM Constraint for Content-based Image Retrieval* Steven C.H. Hoi 1th March, 2004 *Note: The copyright of the presentation material.
SIGIR’09 Boston 1 Entropy-biased Models for Query Representation on the Click Graph Hongbo Deng, Irwin King and Michael R. Lyu Department of Computer Science.
Semantic Similarity over Gene Ontology for Multi-label Protein Subcellular Localization Shibiao WAN and Man-Wai MAK The Hong Kong Polytechnic University.
We introduce the use of Confidence c as a weighted vote for the voting machine to avoid low confidence Result r of individual expert from affecting the.
PageRank for Product Image Search Kevin Jing (Googlc IncGVU, College of Computing, Georgia Institute of Technology) Shumeet Baluja (Google Inc.) WWW 2008.
An Example of Course Project Face Identification.
Table 3:Yale Result Table 2:ORL Result Introduction System Architecture The Approach and Experimental Results A Face Processing System Based on Committee.
1 Formal Models for Expert Finding on DBLP Bibliography Data Presented by: Hongbo Deng Co-worked with: Irwin King and Michael R. Lyu Department of Computer.
ICML2004, Banff, Alberta, Canada Learning Larger Margin Machine Locally and Globally Kaizhu Huang Haiqin Yang, Irwin King, Michael.
Exploration of Instantaneous Amplitude and Frequency Features for Epileptic Seizure Prediction Ning Wang and Michael R. Lyu Dept. of Computer Science and.
Question Routing in Community Question Answering: Putting Category in Its Place 1 The Chinese University of Hong Kong, Shatin, N.T., Hong Kong 2 AT&T Labs.
Exploiting Wikipedia Categorization for Predicting Age and Gender of Blog Authors K Santosh Aditya Joshi Manish Gupta Vasudeva Varma
Exploit of Online Social Networks with Community-Based Graph Semi-Supervised Learning Mingzhen Mo and Irwin King Department of Computer Science and Engineering.
1 Heat Diffusion Classifier on a Graph Haixuan Yang, Irwin King, Michael R. Lyu The Chinese University of Hong Kong Group Meeting 2006.
Image Classification for Automatic Annotation
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
Hongbo Deng, Michael R. Lyu and Irwin King
Recommender Systems with Social Regularization Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu The Chinese University of Hong Kong Irwin.
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
11 A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 1, Michael R. Lyu 1, Irwin King 1,2 1 The Chinese.
ICONIP 2010, Sydney, Australia 1 An Enhanced Semi-supervised Recommendation Model Based on Green’s Function Dingyan Wang and Irwin King Dept. of Computer.
Poster Spotlights Conference on Uncertainty in Artificial Intelligence Catalina Island, United States August 15-17, 2012 Session: Wed. 15 August 2012,
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Contextual Text Cube Model and Aggregation Operator for Text OLAP
UIC at TREC 2006: Blog Track Wei Zhang Clement Yu Department of Computer Science University of Illinois at Chicago.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Boosting the Feature Space: Text Classification for Unstructured.
The Chinese University of Hong Kong Learning Larger Margin Machine Locally and Globally Dept. of Computer Science and Engineering The Chinese University.
Hao Ma, Dengyong Zhou, Chao Liu Microsoft Research Michael R. Lyu
Experience Report: System Log Analysis for Anomaly Detection
A Collaborative Quality Ranking Framework for Cloud Components
Support Feature Machine for DNA microarray data
Pat P. W. Chan,  Michael R. Lyu, Roland T. Chin*
Artist Identification Based on Song Analysis
CANSIM II Multiplicity of Access
WSRec: A Collaborative Filtering Based Web Service Recommender System
An Enhanced Support Vector Machine Model for Intrusion Detection
Video Summarization by Spatial-Temporal Graph Optimization
A Wireless Client for Accessing
Random feature for sparse signal classification
Text Categorization Assigning documents to a fixed set of categories
Multiple Feature Learning for Action Classification
TITLE TITLE TITLE TITLE
Zhenjiang Lin, Michael R. Lyu and Irwin King
Web Service and Fault Tolerance Stratregy Evaluation and Selection
Housam Babiker, Randy Goebel and Irene Cheng
A Classification-based Approach to Question Routing in Community Question Answering Tom Chao Zhou 22, Feb, 2010 Department of Computer.
Mingzhen Mo and Irwin King
Mark Chavira Ulises Robles
Three steps are separately conducted
Using Link Information to Enhance Web Page Classification
NON-NEGATIVE COMPONENT PARTS OF SOUND FOR CLASSIFICATION Yong-Choon Cho, Seungjin Choi, Sung-Yang Bang Wen-Yi Chu Department of Computer Science &
Presentation transcript:

Web Page Classification with Heterogeneous Data Fusion The Chinese University of Hong Kong Web Page Classification with Heterogeneous Data Fusion Zenglin Xu, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong {zlxu, king, lyu}@cse.cuhk.edu.hk 1 Motivations 2 Contributions For web page classification, there are many available data sources, such as the text, the title, the meta data, the anchor text, etc. Simply putting them together would not greatly enhance the classification performance. Different dimensions and types of data sources can be represented into a common format of kernel matrix. A kernel learning approach is thus proposed to integrate multiple data sources A systematic way of integrating multiple data sources. Better classification accuracy. 3 Architacture & Model 1. Feature Extraction. 2. Similarity Representation. Each data source is represented as a kernel matrix (Ki) 3. Similarity Combination. 4. Classification. Substitute K into the dual SVM We have the following QCQP problem: where αis the parameter of dual SVMs,δ is a constant and t is the trace vector. 4 Experiment results Dataset: DMOZ AT: Anchor Text LT: Link Text MT: Meta Data TI: Title PT: Plain Text UW: Universally Weighted sources KC: sources by Kernel Combination Mi -F1: Micro-F1 Ma-F1: Macro-F1 The Chinese University of Hong Kong WWW 2007, May 8–12, 2007, Banff, Alberta, Canada.