Exploring Social Tagging Graph for Web Object Classification

Slides:



Advertisements
Similar presentations
Consistent Bipartite Graph Co-Partitioning for High-Order Heterogeneous Co-Clustering Tie-Yan Liu WSM Group, Microsoft Research Asia Joint work.
Advertisements

Knowledge Transfer via Multiple Model Local Structure Mapping Jing Gao, Wei Fan, Jing Jiang, Jiawei Han l Motivate Solution Framework Data Sets Synthetic.
Using Large-Scale Web Data to Facilitate Textual Query Based Retrieval of Consumer Photos.
January 23 rd, Document classification task We are interested to solve a task of Text Classification, i.e. to automatically assign a given document.
Active Learning for Streaming Networked Data Zhilin Yang, Jie Tang, Yutao Zhang Computer Science Department, Tsinghua University.
Semi-supervised Learning Rong Jin. Semi-supervised learning  Label propagation  Transductive learning  Co-training  Active learning.
Social Media Mining Chapter 5 1 Chapter 5, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool, September, 2010.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,
An Approach to Evaluate Data Trustworthiness Based on Data Provenance Department of Computer Science Purdue University.
Communities in Heterogeneous Networks Chapter 4 1 Chapter 4, Community Detection and Mining in Social Media. Lei Tang and Huan Liu, Morgan & Claypool,
Maggie Zhou COMP 790 Data Mining Seminar, Spring 2011
Li-Jia Li Yongwhan Lim Li Fei-Fei Chong Wang David M. Blei B UILDING AND U SING A S EMANTIVISUAL I MAGE H IERARCHY CVPR, 2010.
Using Structure Indices for Efficient Approximation of Network Properties Matthew J. Rattigan, Marc Maier, and David Jensen University of Massachusetts.
Co-Training and Expansion: Towards Bridging Theory and Practice Maria-Florina Balcan, Avrim Blum, Ke Yang Carnegie Mellon University, Computer Science.
Jierui Xie, Boleslaw Szymanski, Mohammed J. Zaki Department of Computer Science Rensselaer Polytechnic Institute Troy, NY 12180, USA {xiej2, szymansk,
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao † Wei Fan ‡ Yizhou Sun † Jiawei Han † †University of Illinois at Urbana-Champaign.
Heterogeneous Consensus Learning via Decision Propagation and Negotiation Jing Gao† Wei Fan‡ Yizhou Sun†Jiawei Han† †University of Illinois at Urbana-Champaign.
Presented by Zeehasham Rasheed
Semi-Supervised Learning D. Zhou, O Bousquet, T. Navin Lan, J. Weston, B. Schokopf J. Weston, B. Schokopf Presents: Tal Babaioff.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Transfer Learning From Multiple Source Domains via Consensus Regularization Ping Luo, Fuzhen Zhuang, Hui Xiong, Yuhong Xiong, Qing He.
Tag-based Social Interest Discovery
Extracting Places and Activities from GPS Traces Using Hierarchical Conditional Random Fields Yong-Joong Kim Dept. of Computer Science Yonsei.
MediaEval Workshop 2011 Pisa, Italy 1-2 September 2011.
Modeling Relationship Strength in Online Social Networks Rongjing Xiang: Purdue University Jennifer Neville: Purdue University Monica Rogati: LinkedIn.
Random Walk with Restart (RWR) for Image Segmentation
DATA MINING LECTURE 13 Absorbing Random walks Coverage.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
Page 1 Ming Ji Department of Computer Science University of Illinois at Urbana-Champaign.
Universit at Dortmund, LS VIII
DATA MINING LECTURE 13 Pagerank, Absorbing Random Walks Coverage Problems.
A General Optimization Framework for Smoothing Language Models on Graph Structures Qiaozhu Mei, Duo Zhang, ChengXiang Zhai University of Illinois at Urbana-Champaign.
On Node Classification in Dynamic Content-based Networks.
1 SIGIR 2004 Web-page Classification through Summarization Dou Shen Zheng Chen * Qiang Yang Presentation : Yao-Min Huang Date : 09/15/2004.
LOGO Finding High-Quality Content in Social Media Eugene Agichtein, Carlos Castillo, Debora Donato, Aristides Gionis and Gilad Mishne (WSDM 2008) Advisor.
Zibin Zheng DR 2 : Dynamic Request Routing for Tolerating Latency Variability in Cloud Applications CLOUD 2013 Jieming Zhu, Zibin.
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology Mining Logs Files for Data-Driven System Management Advisor.
SemiBoost : Boosting for Semi-supervised Learning Pavan Kumar Mallapragada, Student Member, IEEE, Rong Jin, Member, IEEE, Anil K. Jain, Fellow, IEEE, and.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Towards Social User Profiling: Unified and Discriminative Influence Model for Inferring Home Locations Rui Li, Shengjie Wang, Hongbo Deng, Rui Wang, Kevin.
Panther: Fast Top-k Similarity Search in Large Networks JING ZHANG, JIE TANG, CONG MA, HANGHANG TONG, YU JING, AND JUANZI LI Presented by Moumita Chanda.
Discovering Evolutionary Theme Patterns from Text - An Exploration of Temporal Text Mining Qiaozhu Mei and ChengXiang Zhai Department of Computer Science.
Improving Music Genre Classification Using Collaborative Tagging Data Ling Chen, Phillip Wright *, Wolfgang Nejdl Leibniz University Hannover * Georgia.
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Social Information Processing March 26-28, 2008 AAAI Spring Symposium Stanford University
Enhanced hypertext categorization using hyperlinks Soumen Chakrabarti (IBM Almaden) Byron Dom (IBM Almaden) Piotr Indyk (Stanford)
CIS750 – Seminar in Advanced Topics in Computer Science Advanced topics in databases – Multimedia Databases V. Megalooikonomou Link mining ( based on slides.
Paper Presentation Social influence based clustering of heterogeneous information networks Qiwei Bao & Siqi Huang.
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
Multi-Class Sentiment Analysis with Clustering and Score Representation Yan Zhu.
Applying Deep Neural Network to Enhance EMPI Searching
Cross Domain Distribution Adaptation via Kernel Mapping
Integrating Meta-Path Selection With User-Guided Object Clustering in Heterogeneous Information Networks Yizhou Sun†, Brandon Norick†, Jiawei Han†, Xifeng.
Location Recommendation — for Out-of-Town Users in Location-Based Social Network Yina Meng.
RankClus: Integrating Clustering with Ranking for Heterogeneous Information Network Analysis Yizhou Sun, Jiawei Han, Peixiang Zhao, Zhijun Yin, Hong Cheng,
Graph Based Multi-Modality Learning
A weight-incorporated similarity-based clustering ensemble method based on swarm intelligence Yue Ming NJIT#:
Presented by: Prof. Ali Jaoua
Adaptive entity resolution with human computation
Discriminative Frequent Pattern Analysis for Effective Classification
Jiawei Han Department of Computer Science
Knowledge Transfer via Multiple Model Local Structure Mapping
Binghui Wang, Le Zhang, Neil Zhenqiang Gong
Graph-based Security and Privacy Analytics via Collective Classification with Joint Weight Learning and Propagation Binghui Wang, Jinyuan Jia, and Neil.
Web Mining Research: A Survey
Presentation transcript:

Exploring Social Tagging Graph for Web Object Classification Zhijun Yin, Rui Li, Qiaozhu Mei, Jiawei Han Department of Computer Science University of Illinois at Urbana-Champaign

Outline Motivation: Web Object Classification Related Work Problem Formulation Classification Algorithm Experiments Conclusions

Web Object Classification Web objects become increasingly popular (>106-109) products sold on Amazon videos uploaded to YouTube research papers referenced on CiteULike photos uploaded to and collected by Flickr and Facebook Why classifying web objects into semantic categories? Index and organize web objects efficiently Browse and search of web objects conveniently Discover interesting patterns from web objects

Subtlety on Web Object Classification “Harry Potter” DVD Class: “Movies & TV” The fifth book of “Harry Potter” Class: “Books” “Harry Potter” Halloween costume Class: “Apparel & Accessories”

Challenges for Web Obj. Classification Lack of features Limited text description, e.g., title of a picture on Flickr Inaccurate/difficult content features of images/videos Lack of interconnections Often in isolate settings, w. limited interconnections E.g. Michael Jordan: a basketball star or a Berkeley professor? Lack of labels Impractical to obtain a huge number of labels Without enough labels, how can one do effective classification? 5

Social Tagging I: Tagging Web Pages 6 6

Social Tagging II: Tagging Products

Social Tagging Does Exist There exist many existing social tagging sites Flickr (tagging pictures) Digg (tagging news articles) Technorati (tagging blogs) Live search QnA (tagging questions) …

Intuition Social tagging can tackle the above challenges Lack of features Tags “art” and “architecture” are good features to characterize the book “ancient Greek art and architecture”.

Intuition (Cont.) Social tagging can tackle the above challenges Lack of interconnections Although web page P1 and web page P2 do not have any tags in common, there is an implicit path from P1 to P2 via two tags and P2. Class of P1 can infer the class of P2.

Intuition (Cont.) Social tagging can tackle the above challenges Lack of labels Assume that the labels of web pages are easy to obtain, the class of web page “www.art.com” can infer the class of an art picture in Flickr via tag “art”.

Outline Motivation: Web Object Classification Related Work Problem Formulation Classification Algorithm Experiments Conclusions

Related Work Web object classification Web page classification Multimedia classification Social tag usage Web search Information retrieval Semantic web Web page clustering User interest mining

Outline Motivation: Web Object Classification Related Work Problem Formulation Classification Algorithm Experiments Conclusions

Given: Social Tagging Graph Objects of type T are the target objects to be assigned category labels Objects of type S are labeled objects from another domain

Notations: Social Tagging Graph C: a category set, {c1, c2, … , ck} G = (V,E): a social tagging graph. Every object, u, and every tag, v, is a vertex in the graph G. If an object u is associated with a tag v, there will be an edge between u and v VS: a set of objects of type S : a set of labeled objects of type T : a set of unlabeled objects of type T Vtag: a set of tags

Web Object Classification Problem Achieve consistency on social tagging graph Category assignment of a vertex in Vs should not deviate much from its original label Category assignment of the vertex in should remain the same with its original label if it is fully trustable Category of the vertex in should take the prior knowledge into consideration if there is any Category assignment of any vertex in graph G should be as consistent as possible to the categories of its neighbors

The Optimization Framework 18

The Optimization Framework 19 19

Target: Minimizing O(f) Our target is to find f * to minimize the O(f) The class label c of object o

Outline Motivation: Web Object Classification Related Work Problem Formulation Classification Algorithm Experiments Conclusions

Classification Algorithm Finding the close solution of the above optimization problem requires the computation of the inverse of a matrix with the size of all web objects and tags. In reality, this is usually not feasible due to the complexity of computation. An efficient iterative algorithm to solve the optimization problem.

Classification Algorithm Overall, it takes O(k(|V | + iter|E|))time The initialization steps (i.e., steps 1-4) take O(k|V |) time The iteration steps (i.e., steps 5-12) take O(2k|E|) time At step 7, the class distributions of objects of type S are updated from the class distributions of the associated tags At step 8, the class distributions of the labeled objects of type T are updated from the class distributions of the associated tags At step 9, the class distributions of the unlabeled objects of type T are updated from the class distributions of the associated tags At step 10, the class distributions of the tags are updated from the class distributions of the connected object It takes O(k| |) time to get the class labels (i.e., steps 13-14). 23

Parameter Setting (Semi-Supervised Learning)

Parameter Setting (Transfer Learning)

Parameter Setting (Prior Integration) prior knowledge

Convergence of the Algorithm Convergence proof Equivalent to absorption random walk on a new graph General idea: Construct a new graph based on the previously built social tagging graph. Then make some duplicate nodes with their initial labels Take these labeled nodes as the absorbing boundary for random walk The probability that the node belongs to class i is equal to the probability that a particle starts from the node and hits (i.e., be absorbed by) the set of nodes with label i

Outline Motivation: Web Object Classification Related Work Problem Formulation Classification Algorithm Experiments Conclusions

Experiments: Data Collections 6123 products from Amazon 5536 web pages (under ODP Shopping category) Tags of web pages are collected from Delicious

Experiments: Measurement Micro-averaged scores (MicroF1) tend to be dominated by the performance on common categories Macro-averaged scores (MacroF1) are influenced by the performance in rare categories Baseline SVM+TITLE: SVM using product titles as feature SVM+TAG: SVM using tags as feature HG+TITLE: Harmonic Gaussian field method using titles. Use cosine similarity of the titles as edge weight HG+TAG: Harmonic Gaussian field method using tags. Use cosine similarity of the tags as edge weight

Experiment: Overall Performance Overall performance comparison TM (Tag-based classification Model) to refer to our method.

Experiment (Cont.) Challenge in web object classification Lack of features Lack of interconnections Lack of labels 32

Experiment (Cont.) Lack of features? Effectiveness of tag feature Lack of interconnection? Exploring the interconnections of objects

Experiment (Cont.) Lack of labels? Handling lack of labeling issue 34

Experiment (Cont.) Sensitivity of parameter 35

Experiment (Cont.) Prior Knowledge With prior > Without prior SVM+TAG with prior > SVM+TAG HG+TAG with prior > HG+TAG

Outline Motivation: Web Object Classification Related Work Problem Formulation Classification Algorithm Experiments Conclusions

Conclusions Web object classification: An emerging task and increasingly important Web object classification problem can take advantage from social tags in three aspects represent web objects in a meaningful feature space interconnect objects to indicate implicit relationship bridging heterogeneous objects so that category information can be propagated from one domain to another We propose a general framework to model the problem as an optimization problem on a social tagging graph, which covers different scenarios of web object classification problem In our model, we only consider the setting of two types of web objects It is interesting to generalize our model to manage multi-types of objects

THANKS! Q&A 39