TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents Haimonti Dutta 1, Xianshu Zhu 2, Tushar Muhale 2, Hillol Kargupta.

Slides:



Advertisements
Similar presentations
Mining customer ratings for product recommendation using the support vector machine and the latent class model William K. Cheung, James T. Kwok, Martin.
Advertisements

ICONIP 2005 Improve Naïve Bayesian Classifier by Discriminative Training Kaizhu Huang, Zhangbing Zhou, Irwin King, Michael R. Lyu Oct
K-NEAREST NEIGHBORS AND DECISION TREE Nonparametric Supervised Learning.
1/1/ A Knowledge-based Approach to Citation Extraction Min-Yuh Day 1,2, Tzong-Han Tsai 1,3, Cheng-Lung Sung 1, Cheng-Wei Lee 1, Shih-Hung Wu 4, Chorng-Shyong.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Learning Object Metadata From the locally prescribed to the socially derived (or, a look back at 4 years of LORNET at the University of Saskatchewan Scott.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
Title of Presentation Author 1, Author 2, Author 3, Author 4 Abstract Introduction This is my abstract. This is my abstract. This is my abstract. This.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Support Vector Machines Pattern Recognition Sergios Theodoridis Konstantinos Koutroumbas Second Edition A Tutorial on Support Vector Machines for Pattern.
Mobile Web Search Personalization Kapil Goenka. Outline Introduction & Background Methodology Evaluation Future Work Conclusion.
Web Mining Research: A Survey
Proceedings of the 2007 SIAM International Conference on Data Mining.
Region Based Image Annotation Through Multiple-Instance Learning By: Changbo Yang Wayne State University Department of Computer Science.
Bing LiuCS Department, UIC1 Learning from Positive and Unlabeled Examples Bing Liu Department of Computer Science University of Illinois at Chicago Joint.
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
CONTENT-BASED BOOK RECOMMENDING USING LEARNING FOR TEXT CATEGORIZATION TRIVIKRAM BHAT UNIVERSITY OF TEXAS AT ARLINGTON DATA MINING CSE6362 BASED ON PAPER.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Text Classification Using Stochastic Keyword Generation Cong Li, Ji-Rong Wen and Hang Li Microsoft Research Asia August 22nd, 2003.
1 © Goharian & Grossman 2003 Introduction to Data Mining (CS 422) Fall 2010.
Digital Library Architecture and Technology
Language Identification of Search Engine Queries Hakan Ceylan Yookyung Kim Department of Computer Science Yahoo! Inc. University of North Texas 2821 Mission.
Haimonti Dutta 1 and Hillol Kargupta 2 1 Center for Computational Learning Systems (CCLS), Columbia University, NY, USA. 2 University of Maryland, Baltimore.
©2008 Srikanth Kallurkar, Quantum Leap Innovations, Inc. All rights reserved. Apollo – Automated Content Management System Srikanth Kallurkar Quantum Leap.
Web 2.0 Features on Scitation. Web 2.0 and Powder Diffraction Web 2.0 features can be found on the Scitation platform for Powder Diffraction –
Dec 9-11, 2003ICADL Challenges in Building Federation Services over Harvested Metadata Hesham Anan, Jianfeng Tang, Kurt Maly, Michael Nelson, Mohammad.
Review of the web page classification approaches and applications Luu-Ngoc Do Quang-Nhat Vo.
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
Xiaoying Gao Computer Science Victoria University of Wellington Intelligent Agents COMP 423.
PAUL ALEXANDRU CHIRITA STEFANIA COSTACHE SIEGFRIED HANDSCHUH WOLFGANG NEJDL 1* L3S RESEARCH CENTER 2* NATIONAL UNIVERSITY OF IRELAND PROCEEDINGS OF THE.
Topical Crawlers for Building Digital Library Collections Presenter: Qiaozhu Mei.
Xiaoxiao Shi, Qi Liu, Wei Fan, Philip S. Yu, and Ruixin Zhu
Shortest Path Navigation Application on GIS Supervisor: Dr. Damitha Karunaratne Thilani Imalka 2007/MCS/023.
Module 5 A system where in its parts perform a unified job of receiving inputs, processes the information and transforms the information into a new kind.
Markup and Validation Agents in Vijjana – A Pragmatic model for Self- Organizing, Collaborative, Domain- Centric Knowledge Networks S. Devalapalli, R.
Math Information Retrieval Zhao Jin. Zhao Jin. Math Information Retrieval Examples: –Looking for formulas –Collect teaching resources –Keeping updated.
Research Library, Los Alamos National Laboratory RESEARCH OAI4 - Geneva, Switzerland Digital Library Research & Prototyping Team Multi-Graph.
A Collaborative Cloud-Based Multimedia Sharing Platform for Social Networking Environments Speaker : Chang,Kun-Hsiang /11/$26.00 ©2011.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
1 Chapter 6. Classification and Prediction Overview Classification algorithms and methods Decision tree induction Bayesian classification Lazy learning.
Extending the Multi- Instance Problem to Model Instance Collaboration Anjali Koppal Advanced Machine Learning December 11, 2007.
Artificial Intelligence Research Center Pereslavl-Zalessky, Russia Program Systems Institute, RAS.
Facilitating Document Annotation using Content and Querying Value.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Externally growing self-organizing maps and its application to database visualization and exploration.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
Bing LiuCS Department, UIC1 Chapter 8: Semi-supervised learning.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
HAITHAM BOU AMMAR MAASTRICHT UNIVERSITY Transfer for Supervised Learning Tasks.
Illustration: 3-Party Secure Sum Compare, match, and analyze data from different organizations without disclosing the private data to any other party Experimental.
Enabling e-Research in Combustion Research Community T.V Pham 1, P.M. Dew 1, L.M.S. Lau 1 and M.J. Pilling 2 1 School of Computing 2 School of Chemistry.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 Fuzzy integration of structure adaptive SOMs for web content.
Data and Applications Security Developments and Directions Dr. Bhavani Thuraisingham The University of Texas at Dallas Lecture #15 Secure Multimedia Data.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Oct 12-14, 2003NSDL Challenges in Building Federation Services over Harvested Metadata Kurt Maly, Michael Nelson, Mohammad Zubair Digital Library.
يادگيري ماشين Machine Learning Lecturer: A. Rabiee
26/01/20161Gianluca Demartini Ranking Categories for Faceted Search Gianluca Demartini L3S Research Seminars Hannover, 09 June 2006.
Virtual Examples for Text Classification with Support Vector Machines Manabu Sassano Proceedings of the 2003 Conference on Emprical Methods in Natural.
An Interval Classifier for Database Mining Applications Rakes Agrawal, Sakti Ghosh, Tomasz Imielinski, Bala Iyer, Arun Swami Proceedings of the 18 th VLDB.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Event-Based Model for Reconciling Digital Entities Ahmet Fatih Mustacoglu Ahmet E. Topcu Aurel Cami Geoffrey C. Fox Indiana University Computer Science.
Bringing Order to the Web : Automatically Categorizing Search Results Advisor : Dr. Hsu Graduate : Keng-Wei Chang Author : Hao Chen Susan Dumais.
Data Mining in Germany IIM Conference, Oct. 24, 2012 Gottfried Schwarz, DLR > Lecture > Author Document > Datewww.DLR.de Chart 1.
Web2.0 Services and the Management of Academic Libraries Dr. Christian Hänger Christine Krätzsch.
Ahmet Fatih Mustacoglu
Ontology-Based Information Integration Using INDUS System
Overview of Machine Learning
Web Mining Department of Computer Science and Engg.
Presentation transcript:

TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents Haimonti Dutta 1, Xianshu Zhu 2, Tushar Muhale 2, Hillol Kargupta 2, Kirk Borne 3, Codrina Lauth 4, Florian Holz 5, and Gerherd Heyer 5 1 Columbia University 2 University of Maryland, Baltimore County 3 George Mason University 4 Fraunhofer Institute for Intelligent Analysis and Information Systems 5 University of Leipzig

Outline Introduction and Motivation Related Work TagLearner Distributed Classifier-learning Algorithm Experiments Conclusion and Future Work

Introduction Large Online Document Repositories: –Online Newspapers, Digital Libraries, etc. –Growing in size Text categorization on the repositories: –No automated text classification mechanism –Performed by authorities, such as librarians Impractical

Introduction (cont.) Collaborative tagging –Del.icio.us, Flickr, Google image labeler –Recruit web users to add tags to a resource –Help to utilize power of people ’ s knowledge Pros and cons –Improve web search result, help on classification –Not support by most online text repositories –Lack of control Absence of standard keywords Errors in tagging due to spelling errors Harder to manage due to increased content diversity

Motivation Provide automated classification service –Utilize collaborative effort of users Collaborative tagging in Peer-to-Peer network –Without repositories ’ support P2P Classifier learning system

Related Work Collaborative tagging: –Recommendation System (Tso-Sutter et al.) –Web search (Yahia et al.) –Classification accuracy (Brooks et al.) Distributed Linear Programming: –Distributed Simplex Algorithm (Dutta et al.)

TagLearner: A P2P Classifier Learning System

Service provider: provide P2P classifier learning service TagLearner -Register service by creating a tagging group -Maintain a tagging group for this service -Predefined Labels used for tagging -Features for classification -Group members -Learnt classifier model

TagLearner Interface: - Join or leave the tagging group - Tag the web documents Distributed classifier learning algorithm Client side browser plugin

Classifier Design by Linear Programming Classification problem can be framed as a linear programming problem Class 1 Class 2 :feature vector of k-th instance W : weight vector We want to find a W such that: W can be found by minimizing the error

Classifier Design by Linear Programming Maximize: Subject to: where Use Simplex Method to solve it!

Distributed Linear Programming Distributed data –Each user only has a collection of constraints Objective function: Constraints:  www ZW1W1 W2W2 W3W3 value Simplex Tableau

Distributed Simplex Algorithm Each user has different constraints, but wants to solve the same objective function. User A User B User C User D

Distributed Simplex Algorithm User A User B User C User D

Distributed Simplex Algorithm 0.5/7=1/14 0.5/3=1/6 0.5/2=1/4 0.5/3=1/6 0.5/6.5=13/4 User A User B User C User D

Distributed Simplex Algorithm 0.5/7=1/14 0.5/3=1/6 0.5/2=1/4 0.5/3=1/6 0.5/6.5=13/4 User A User B User C User D

Experimental Results Distributed Data Mining Toolkit (DDMT) “ NSF Research Awards Abstracts ” data set from the UCI Machine Learning Repository We only consider abstracts belonging to Earth and Mathematical sciences Features used for classification do not rely on collaboratively generated annotations.

Experiments (cont.) Figure 1. Communication cost versus the number of nodes in the network

Experiments (cont.)

Conclusion and Future Work Conclusion: –P2P classifier learning system prototype –Scalable distributed classification algorithm based on linear programming Future work: –extension of the classification algorithm for multi-class classification problems –Improve classification accuracy

Thank you ! Questions ?