Presentation is loading. Please wait.

Presentation is loading. Please wait.

TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents Haimonti Dutta 1, Xianshu Zhu 2, Tushar Muhale 2, Hillol Kargupta.

Similar presentations


Presentation on theme: "TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents Haimonti Dutta 1, Xianshu Zhu 2, Tushar Muhale 2, Hillol Kargupta."— Presentation transcript:

1 TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents Haimonti Dutta 1, Xianshu Zhu 2, Tushar Muhale 2, Hillol Kargupta 2, Kirk Borne 3, Codrina Lauth 4, Florian Holz 5, and Gerherd Heyer 5 1 Columbia University 2 University of Maryland, Baltimore County 3 George Mason University 4 Fraunhofer Institute for Intelligent Analysis and Information Systems 5 University of Leipzig

2 Outline Introduction and Motivation Related Work TagLearner Distributed Classifier-learning Algorithm Experiments Conclusion and Future Work

3 Introduction Large Online Document Repositories: –Online Newspapers, Digital Libraries, etc. –Growing in size Text categorization on the repositories: –No automated text classification mechanism –Performed by authorities, such as librarians Impractical

4 Introduction (cont.) Collaborative tagging –Del.icio.us, Flickr, Google image labeler –Recruit web users to add tags to a resource –Help to utilize power of people ’ s knowledge Pros and cons –Improve web search result, help on classification –Not support by most online text repositories –Lack of control Absence of standard keywords Errors in tagging due to spelling errors Harder to manage due to increased content diversity

5 Motivation Provide automated classification service –Utilize collaborative effort of users Collaborative tagging in Peer-to-Peer network –Without repositories ’ support P2P Classifier learning system

6 Related Work Collaborative tagging: –Recommendation System (Tso-Sutter et al.) –Web search (Yahia et al.) –Classification accuracy (Brooks et al.) Distributed Linear Programming: –Distributed Simplex Algorithm (Dutta et al.)

7 TagLearner: A P2P Classifier Learning System

8

9

10

11 Service provider: provide P2P classifier learning service TagLearner -Register service by creating a tagging group -Maintain a tagging group for this service -Predefined Labels used for tagging -Features for classification -Group members -Learnt classifier model

12 TagLearner Interface: - Join or leave the tagging group - Tag the web documents Distributed classifier learning algorithm Client side browser plugin

13

14 Classifier Design by Linear Programming Classification problem can be framed as a linear programming problem Class 1 Class 2 :feature vector of k-th instance W : weight vector We want to find a W such that: W can be found by minimizing the error

15 Classifier Design by Linear Programming Maximize: Subject to: where Use Simplex Method to solve it!

16 Distributed Linear Programming Distributed data –Each user only has a collection of constraints Objective function: Constraints: 5.024 321  www ZW1W1 W2W2 W3W3 value 1-7-16-21.50 02170.5 0133 0142 0113 0276.50.5 Simplex Tableau

17 Distributed Simplex Algorithm Each user has different constraints, but wants to solve the same objective function. User A User B User C User D

18 Distributed Simplex Algorithm User A User B User C User D

19 Distributed Simplex Algorithm 0.5/7=1/14 0.5/3=1/6 0.5/2=1/4 0.5/3=1/6 0.5/6.5=13/4 User A User B User C User D

20 Distributed Simplex Algorithm 0.5/7=1/14 0.5/3=1/6 0.5/2=1/4 0.5/3=1/6 0.5/6.5=13/4 User A User B User C User D

21 Experimental Results Distributed Data Mining Toolkit (DDMT) “ NSF Research Awards Abstracts 1990-2003 ” data set from the UCI Machine Learning Repository We only consider abstracts belonging to Earth and Mathematical sciences Features used for classification do not rely on collaboratively generated annotations.

22 Experiments (cont.) Figure 1. Communication cost versus the number of nodes in the network

23 Experiments (cont.)

24 Conclusion and Future Work Conclusion: –P2P classifier learning system prototype –Scalable distributed classification algorithm based on linear programming Future work: –extension of the classification algorithm for multi-class classification problems –Improve classification accuracy

25 Thank you ! Questions ?


Download ppt "TagLearner: A P2P Classifier Learning System from Collaboratively Tagged Text Documents Haimonti Dutta 1, Xianshu Zhu 2, Tushar Muhale 2, Hillol Kargupta."

Similar presentations


Ads by Google