Presentation is loading. Please wait.

Presentation is loading. Please wait.

Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University.

Similar presentations


Presentation on theme: "Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University."— Presentation transcript:

1 Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University

2  Delicious example 1 Bookmark Tags Social Networks Social Networks People- generated

3  Collaborative Tagging  Online bookmarking with annotations  Create social networks  Utilize power of people’s knowledge  Pros and cons  High-quality classifier by using human intelligence  But lack of control or authority 2

4 3

5 4 Search Result SOAP, REST, … Repository Query with various options RDF RSS Atom HTML Populate Bookmarks/ tags Distributed Tagging Data CCT System Data Coordinator User Service Data Importer Collective Collaborative Tagging (CCT) System

6 5  1 st - Service and algorithm development  Identify services and algorithms  2 nd - Interface development  Web2.o style interface  REST, SOAP, …  3 rd – Export/import service development  Merging distributed data sets  Export data to build mesh-up sites  So far, we are mainly in 1 st stage and do some experiments in 2 nd stage

7 6 Different Data Sources Various IR algorithms Flexible Options Result Comparison

8 7 Searching Given input tags, returning the most relevant X (X = URLs, tags, or users) Latent Semantic Indexing (LSI), FolkRank I I Recomme ndation Indirect input tags, returning undiscovered X II Clustering Community discovering. Finding a group or a community with similar interests K-Means, Deterministic Annealing Clustering III Trend detection Analysis the tagging activities in time- series manner and detect abnormality Time Series Analysis IV Service Description Algorithm Type

9  Vector-space model (bag-of-words model)  Assume n URLs and q tags  A URL can be represented by q-dimension vector, d i = (t 1, t 2, …, t q )  A total data set can be represented by n-by-q matrix  Pairwise Dissimilarity Matrix  n-by-n symmetric matrix  Distance (Euclidean, Manhattan, … )  Angles, cosine, sine, …  O(n 2 ) complexity 8

10 9 (Source : MSI-CIEC)  Graph model  Building a graph with nodes and edges  Edges are indicating relationship  Becoming complex networks (tag graph)  Dissimilarity  Related with path distance  Finding path is important (Shortest path problem)  Naive approach : O(n 3 ) complexity

11  Latent Semantic Indexing  Using vector-space model, find the most similar URLs with user’s query tags  Dimension reduction from high q to low d (q >> d)  Removing noisy terms, extracting latent concepts 10 Precision Recall 2 terms 4 terms 8 terms 20% dim. reduction None Ideal Line

12  Discover the group structures of URLs  Non-parametric learning algorithm  Non-trivial optimization problem  Should avoid local minima/maxima solution 11

13  Deterministically avoid local minima  Tracing global solution by changing level of energy  Analogy to physical annealing process (High  Low) 12

14  Classification  To response more quickly to user’s requests  Training data based on user’s input and answering questions based on the training results  Artificial Neural Network, Support Vector Machine,…  Trend Detection  Can be used for prediction/forecasting  Time-series analysis of tagging activities  Markov chain model, Fourier transform, … 13

15  The goal of our Collective Collaborative Tagging (CCT) system  Utilize various data sets  Provide various information retrieval (IR) algorithms  Help to utilize people-powered knowledge  Currently various models and algorithms are being investigated  Service interfaces and import/export function will be added soon 14

16 15

17 16 -. Distances, cosine, … -. O(N 2 ) complexity -. Distances, cosine, … -. O(N 2 ) complexity Dis- similarity Vector-space Model -. Paths, hops, connectivity, … -. O(N 3 ) complexity -. Paths, hops, connectivity, … -. O(N 3 ) complexity Graph Model -. Latent Semantic Indexing -. Dimension reduction schemes -. PCA -. Latent Semantic Indexing -. Dimension reduction schemes -. PCA Algorithm -. PageRank, FolkRank, … -. Pairwise clustering -. MDS -. PageRank, FolkRank, … -. Pairwise clustering -. MDS -. q-dimensional vector -. q-by-n matrix -. q-dimensional vector -. q-by-n matrix Represen- tation -. G(V, E) -. V = {URL, tags, users} -. G(V, E) -. V = {URL, tags, users}

18  Pairwise clustering  Input from vector-based model vs. graph model  How to avoid local minima/maxima? (e.g, K-Means) 17 Graph model Vector-space model


Download ppt "Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana University."

Similar presentations


Ads by Google