Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.

Slides:



Advertisements
Similar presentations
Chapter 5: Introduction to Information Retrieval
Advertisements

Entity-Centric Topic-Oriented Opinion Summarization in Twitter Date : 2013/09/03 Author : Xinfan Meng, Furu Wei, Xiaohua, Liu, Ming Zhou, Sujian Li and.
A UTOMATICALLY A CQUIRING A S EMANTIC N ETWORK OF R ELATED C ONCEPTS Date: 2011/11/14 Source: Sean Szumlanski et. al (CIKM’10) Advisor: Jia-ling, Koh Speaker:
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , Chapter 8.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Stephan Gammeter, Lukas Bossard, Till Quack, Luc Van Gool.
Landmark Classification in Large- scale Image Collections Yunpeng Li David J. Crandall Daniel P. Huttenlocher ICCV 2009.
DIMENSIONALITY REDUCTION BY RANDOM PROJECTION AND LATENT SEMANTIC INDEXING Jessica Lin and Dimitrios Gunopulos Ângelo Cardoso IST/UTL December
Efficient Processing of Top-k Spatial Keyword Queries João B. Rocha-Junior, Orestis Gkorgkas, Simon Jonassen, and Kjetil Nørvåg 1 SSTD 2011.
Vector Space Information Retrieval Using Concept Projection Presented by Zhiguo Li
Analyzing System Logs: A New View of What's Important Sivan Sabato Elad Yom-Tov Aviad Tsherniak Saharon Rosset IBM Research SysML07 (Second Workshop on.
Visual Querying By Color Perceptive Regions Alberto del Bimbo, M. Mugnaini, P. Pala, and F. Turco University of Florence, Italy Pattern Recognition, 1998.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
Finding Wormholes with Flickr Geotags Maarten Clements Marcel Reinders Arjen de Vries Pavel Serdyukov December 3 rd, 2009 GIS.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
SEEKING STATEMENT-SUPPORTING TOP-K WITNESSES Date: 2012/03/12 Source: Steffen Metzger (CIKM’11) Speaker: Er-gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Context-Aware Search Personalization with Concept Preference CIKM’11 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
Unsupervised Learning Reading: Chapter 8 from Introduction to Data Mining by Tan, Steinbach, and Kumar, pp , , (
Mining High Utility Itemsets without Candidate Generation Date: 2013/05/13 Author: Mengchi Liu, Junfeng Qu Source: CIKM "12 Advisor: Jia-ling Koh Speaker:
Multi-Prototype Vector Space Models of Word Meaning __________________________________________________________________________________________________.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
A Scalable Self-organizing Map Algorithm for Textual Classification: A Neural Network Approach to Thesaurus Generation Dmitri G. Roussinov Department of.
Glasgow 02/02/04 NN k networks for content-based image retrieval Daniel Heesch.
MINING FREQUENT ITEMSETS IN A STREAM TOON CALDERS, NELE DEXTERS, BART GOETHALS ICDM2007 Date: 5 June 2008 Speaker: Li, Huei-Jyun Advisor: Dr. Koh, Jia-Ling.
Feature selection LING 572 Fei Xia Week 4: 1/29/08 1.
Date: 2013/8/27 Author: Shinya Tanaka, Adam Jatowt, Makoto P. Kato, Katsumi Tanaka Source: WSDM’13 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang Estimating.
ON THE SELECTION OF TAGS FOR TAG CLOUDS (WSDM11) Advisor: Dr. Koh. Jia-Ling Speaker: Chiang, Guang-ting Date:2011/06/20 1.
Wei Feng , Jiawei Han, Jianyong Wang , Charu Aggarwal , Jianbin Huang
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 New Unsupervised Clustering Algorithm for Large Datasets.
Beyond Nouns Exploiting Preposition and Comparative adjectives for learning visual classifiers.
Date : 2013/03/18 Author : Jeffrey Pound, Alexander K. Hudek, Ihab F. Ilyas, Grant Weddell Source : CIKM’12 Speaker : Er-Gang Liu Advisor : Prof. Jia-Ling.
LOGO 1 Corroborate and Learn Facts from the Web Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Shubin Zhao, Jonathan Betz (KDD '07 )
Clustering C.Watters CS6403.
Finding Experts Using Social Network Analysis 2007 IEEE/WIC/ACM International Conference on Web Intelligence Yupeng Fu, Rongjing Xiang, Yong Wang, Min.
1 AC-Close: Efficiently Mining Approximate Closed Itemsets by Core Pattern Recovery Advisor : Dr. Koh Jia-Ling Speaker : Tu Yi-Lang Date : Hong.
Effective Automatic Image Annotation Via A Coherent Language Model and Active Learning Rong Jin, Joyce Y. Chai Michigan State University Luo Si Carnegie.
Intelligent DataBase System Lab, NCKU, Taiwan Josh Jia-Ching Ying 1, Wang-Chien Lee 2, Tz-Chiao Weng 1 and Vincent S. Tseng 1 1 Department of Computer.
A Novel Visualization Model for Web Search Results Nguyen T, and Zhang J IEEE Transactions on Visualization and Computer Graphics PAWS Meeting Presented.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Predicting the Location and Time of Mobile Phone Users by Using Sequential Pattern Mining Techniques Mert Özer, Ilkcan Keles, Ismail Hakki Toroslu, Pinar.
Divided Pretreatment to Targets and Intentions for Query Recommendation Reporter: Yangyang Kang /23.
Duc-Tien Dang-Nguyen, Giulia Boato, Alessandro Moschitti, Francesco G.B. De Natale Department to Information and Computer Science –University of Trento.
Topical Clustering of Search Results Date : 2012/11/8 Resource : WSDM’12 Advisor : Dr. Jia-Ling Koh Speaker : Wei Chang 1.
Y OU T UBE A ROUND THE W ORLD : G EOGRAPHIC P OPULARITY OF V IDEOS Date : 2012/9/2 Resource : WWW’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu 1.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
{ Adaptive Relevance Feedback in Information Retrieval Yuanhua Lv and ChengXiang Zhai (CIKM ‘09) Date: 2010/10/12 Advisor: Dr. Koh, Jia-Ling Speaker: Lin,
Finding similar items by leveraging social tag clouds Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: SAC 2012’ Date: October 4, 2012.
Cluster Analysis What is Cluster Analysis? Types of Data in Cluster Analysis A Categorization of Major Clustering Methods Partitioning Methods.
Short Text Similarity with Word Embedding Date: 2016/03/28 Author: Tom Kenter, Maarten de Rijke Source: CIKM’15 Advisor: Jia-Ling Koh Speaker: Chih-Hsuan.
Where Did You Go: Personalized Annotation of Mobility Records
Dr. Unnikrishnan P.C. Professor, EEE
#VisualHashtags Visual Summarization of Social Media Events using Mid-Level Visual Elements Sonal Goel (IIIT-Delhi), Sarthak Ahuja (IBM Research, India),
Special Topics in Geo-Business Data Analysis
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Intent-Aware Semantic Query Annotation
Enriching Taxonomies With Functional Domain Knowledge
Date: 2012/11/15 Author: Jin Young Kim, Kevyn Collins-Thompson,
Color Image Retrieval based on Primitives of Color Moments
Ying Dai Faculty of software and information science,
TOPTRAC: Topical Trajectory Pattern Mining
DENSE ITEMSETS JOUNI K. SEPPANEN, HEIKKI MANNILA SIGKDD2004
Presentation transcript:

Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu 1

Outline Introduction Discovering Tag Relationships Experiments and Visualizations Conclusion 2

Introduction Existing approaches have largely relied on tag co- occurrences But some related tags may seldom co-occur Ex : the Statue of Liberty and Central Park  Both are major New York City landmarks “statueofliberty”,”centralpark” it is nearly impossible to take a photo that includes both tatbtatb o1o1 o2o2 o3o3 o4o4 … onon 3

Introduction Motivation How to find connections between tags by comparing their distributions over time and space Goal To cluster the tags based on their geo-spatial and temporal properties 4

Outline Introduction Discovering Tag Relationships Experiments and Visualizations Conclusion 5 Collection of tagged object Geo-spatial feature vectors Temporal feature vectors Geo-temporal feature Co-occurrence feature

Geo-spatial feature vectors The distribution of photographs over the world is highly non-uniform by raw geo-tags Divide the world into n bins, each with s degrees of latitude by s degrees of longitude Use s = 1 degree, which corresponds to grid cells of roughly 100 km x 100 km at the middle latitudes 6

Geo-spatial feature vectors 7 (user,tag,time,geo)

Temporal feature vectors Tags also have different temporal distributions EX : “beach”, ”restaurant” 26-dimensional 2-week vectors 7-dimensional day-of-week vectors 24-dimensional hour-of-day vectors Index : 1~m 8

Temporal feature vectors 9

Geo-temporal (motion) features 10

Co-occurrence features The pairwise co-occurrence between two tags t1 and t2, co_occur(t1, t2) Mutual information 11

Outline Introduction Discovering Tag Relationships Experiments and Visualizations Conclusion 12 Tag relationships Clustering tags Evaluation

Dataset Collect nearly 80 million photos Selected only the photos in North America Computed the top 2000 most frequent text tags Each tags has been used by at least 1,100 unique users 13

Tag relationships 14 Compute the pairwise Euclidean distances between these vectors co_occur(t, t’) mutual_info(t, t’) Missed tags : “whitehouse”,“april”

Clustering tags To cluster the 2000 tags using k-means Since k-means clustering is sensitive to the initial choice of centroids  With different random initial cluster centers : 5 times  Set k = Clustering the following data into 2 clusters using k-means -2, 5, 7, 8, 10, 16, 18, 20 -Initial centroid : 2, 20 2: 2,5,7,8,10 → 6.4 :2,5,7,8,10 20: 16,18,20 → 18: 16,18,20 -Initial centroid : 2, 5 2: 2 → 4.6 : 2,5,7,8 5: 5,7,8,10,16,18,20 → 14.8 : 10,16,18,20

Clustering tags Measure the peakiness of a vector v by computing its second moment 16 Example : Assume : After geo clustering,there were three tags in a group {sea,beach, shearwater} sea → Beach → shearwater →

Geo-spatial clusters 17

Temporal clusters 18

Evaluation Define the quality of a tag cluster → difficult Experiment → humans judge Goal : To test whether these techniques produce coherent tag clusters that correspond to intuitive geo- spatial and temporal concepts Try to measure how well these visualizations could help people appreciate subtle semantic connections between tags. 19

Evaluation results Geo relevant Temporally relevant 20 Geo clusters29(58%) Motion clusters30(60%) Co-occurrence clusters11(22%) Mutual information clusters11(22%) Temporal clusters13(26%) Motion clusters5(10%) Co-occurrence clusters1(2%) Mutual information clusters6(12%)

Geo and temporally relevant cluster retrieval 21 Dataset : 5000 data  500 item related to data mining  User gives a query “data mining”, system retrieves 2000 data,  400 data related to data mining Precision = 400/2000 = 0.2 Recall = 400/500 = 0.8

Outline Introduction Discovering Tag Relationships Experiments and Visualizations Conclusion 22

Conclusion Proposed techniques to measure the semantic similarity of tags by comparing geo-spatial, temporal, and geo- temporal patterns of use Proposed novel methods to visualize the resulting clusters 23

Future work Using only North American data and clustering the top 2000 most used tags into 50 clusters It would also be interesting to build a tag recommendation system that integrates these techniques Their approach could be applied to other collections of objects with geographical and temporal attributes, such as data from Wikipedia or Twitter 24