Download presentation
Presentation is loading. Please wait.
Published byOlivia Manninen Modified over 5 years ago
1
Web Page Classification with Heterogeneous Data Fusion
The Chinese University of Hong Kong Web Page Classification with Heterogeneous Data Fusion Zenglin Xu, Irwin King and Michael R. Lyu Department of Computer Science and Engineering The Chinese University of Hong Kong {zlxu, king, 1 Motivations 2 Contributions For web page classification, there are many available data sources, such as the text, the title, the meta data, the anchor text, etc. Simply putting them together would not greatly enhance the classification performance. Different dimensions and types of data sources can be represented into a common format of kernel matrix. A kernel learning approach is thus proposed to integrate multiple data sources A systematic way of integrating multiple data sources. Better classification accuracy. 3 Architacture & Model 1. Feature Extraction. 2. Similarity Representation. Each data source is represented as a kernel matrix (Ki) 3. Similarity Combination. 4. Classification. Substitute K into the dual SVM We have the following QCQP problem: where αis the parameter of dual SVMs,δ is a constant and t is the trace vector. 4 Experiment results Dataset: DMOZ AT: Anchor Text LT: Link Text MT: Meta Data TI: Title PT: Plain Text UW: Universally Weighted sources KC: sources by Kernel Combination Mi -F1: Micro-F1 Ma-F1: Macro-F1 The Chinese University of Hong Kong WWW 2007, May 8–12, 2007, Banff, Alberta, Canada.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.