A Semantic Clustering-based Approach For Searching And Browsing Tag Spaces Date: 2011/10/17 Source:Damir Vandic et. al (SAC’11) Speaker:Chiang,guang-ting.

Slides:



Advertisements
Similar presentations
Collaborative Filtering in Social Tagging System on Joint Item-Tag Recommendations Date : 2011/11/7 Source : Jing Peng et. al (CIKM’10) Speaker : Chiu.
Advertisements

Date: 2012/8/13 Source: Luca Maria Aiello. al(CIKM’11) Advisor: Jia-ling, Koh Speaker: Jiun Jia, Chiou Behavior-driven Clustering of Queries into Topics.
Sequence Clustering and Labeling for Unsupervised Query Intent Discovery Speaker: Po-Hsien Shih Advisor: Jia-Ling Koh Source: WSDM’12 Date: 1 November,
DOMAIN DEPENDENT QUERY REFORMULATION FOR WEB SEARCH Date : 2013/06/17 Author : Van Dang, Giridhar Kumaran, Adam Troy Source : CIKM’12 Advisor : Dr. Jia-Ling.
Bring Order to Your Photos: Event-Driven Classification of Flickr Images Based on Social Knowledge Date: 2011/11/21 Source: Claudiu S. Firan (CIKM’10)
Query Dependent Pseudo-Relevance Feedback based on Wikipedia SIGIR ‘09 Advisor: Dr. Koh Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/01/24 1.
Searchable Web sites Recommendation Date : 2012/2/20 Source : WSDM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh Jia-ling 1.
GENERATING AUTOMATIC SEMANTIC ANNOTATIONS FOR RESEARCH DATASETS AYUSH SINGHAL AND JAIDEEP SRIVASTAVA CS DEPT., UNIVERSITY OF MINNESOTA, MN, USA.
Mining Query Subtopics from Search Log Data Date : 2012/12/06 Resource : SIGIR’12 Advisor : Dr. Jia-Ling Koh Speaker : I-Chih Chiu.
Syntactic And Sub-lexical Features For Turkish Discriminative Language Models ICASSP 2010 Ebru Arısoy, Murat Sarac¸lar, Brian Roark, Izhak Shafran Bang-Xuan.
Explorations in Tag Suggestion and Query Expansion Jian Wang and Brian D. Davison Lehigh University, USA SSM 2008 (Workshop on Search in Social Media)
布林代數的應用--- 全及項(最小項)和全或項(最大項)展開式
1.1 線性方程式系統簡介 1.2 高斯消去法與高斯-喬登消去法 1.3 線性方程式系統的應用(-Skip-)
: The Playboy Chimp ★★☆☆☆ 題組: Problem Set Archive with Online Judge 題號: 10611: The Playboy Chimp 解題者:蔡昇宇 解題日期: 2010 年 2 月 28 日 題意:給一已排序的數列 S( 升冪.
PE 2 文書編輯 張基昇.
Modern Information Retrieval 第三組 陳國富 王俊傑 夏希璿.
按一下以編輯母片標題樣式 按一下以編輯母片 第二層 第三層 第四層 第五層 1 按一下以編輯母片標題樣式 按一下以編輯母片 第二層 第三層 第四層 第五層 1 Problem E: Jolly Jumpers A sequence of n > 0 integers is called a jolly.
最新計算機概論 第 5 章 系統程式. 5-1 系統程式的類型 作業系統 (OS) : 介於電腦硬體與 應用軟體之間的 程式,除了提供 執行應用軟體的 環境,還負責分 配系統資源。
7.1 背景介紹 7.2 多解析度擴展 7.3 一維小波轉換 7.4 快速小波轉換 7.5 二維小波轉換 7.6 小波封包
: Ahoy, Pirates! ★★★★☆ 題組: Contest Archive with Online Judge 題號: 11402: Ahoy, Pirates! 解題者:李重儀 解題日期: 2008 年 8 月 26 日 題意:有一個海盜島有 N 個海盜,他們的編號 (id)
: Multisets and Sequences ★★★★☆ 題組: Problem Set Archive with Online Judge 題號: 11023: Multisets and Sequences 解題者:葉貫中 解題日期: 2007 年 4 月 24 日 題意:在這個題目中,我們要定義.
: A-Sequence ★★★☆☆ 題組: Problem Set Archive with Online Judge 題號: 10930: A-Sequence 解題者:陳盈村 解題日期: 2008 年 5 月 30 日 題意: A-Sequence 需符合以下的條件, 1 ≤ a.
: Lucky Number ★★★★☆ 題組: Proble Set Archive with Online Judge 題號: 10909: Lucky Number 解題者:李育賢 解題日期: 2008 年 4 月 25 日 題意:給一個奇數數列 1,3,5,7,9,11,13,15…
資料結構實習-二.
845: Gas Station Numbers ★★★ 題組: Problem Set Archive with Online Judge 題號: 845: Gas Station Numbers. 解題者:張維珊 解題日期: 2006 年 2 月 題意: 將輸入的數字,經過重新排列組合或旋轉數字,得到比原先的數字大,
Chapter 10 m-way 搜尋樹與B-Tree
本章重點 2-1 有序串列(Ordered List) 2-2 介紹陣列(array) 2-3 矩陣(matrix)的應用
1 Short Vectors of Planar Lattices Via Continued Fraction Friedrich Eisenbrand Information Processing Letters, 田錦燕 95/05/5.
電腦的基本單位 類比訊號 (analog signal) 指的是連續的訊號 數位訊號 (digital signal) 指的是以預先定義的符號表示不連續 的訊號 one bit 8 bits=one byte 電腦裡的所有資料,包括文 字、數據、影像、音訊、視 訊,都是用二進位來表示的。
冷凍空調自動控制 - 系統性能分析 李達生. Focusing here … 概論 自動控制理論發展 自控系統設計實例 Laplace Transform 冷凍空調自動控制 控制系統範例 控制元件作動原理 控制系統除錯 自動控制理論 系統穩定度分析 系統性能分析 PID Controller 自動控制實務.
電腦的基本單位 類比訊號 (analog signal) 指的是連續的訊號
蔡中皓 郭尚豪 紀羽軒 1. Outline Background Motive and purpose Method Conclusion 2.
Mapping Techniques and Visualization of Statistical Indicators Haitham Zeidan Palestinian Central Bureau of Statistics IAOS 2014 Conference.
-Artificial Neural Network- Matlab操作介紹 -以類神經網路BPN Model為例
Mapping - 1 Mapping From ER Model to Relational DB.
◎Series Solutions of ODEs. 微分方程式之級數解 ◎Special Functions 特殊方程式
Managing Large RDF Graphs (Infinite Graph) Vaibhav Khadilkar Department of Computer Science, The University of Texas at Dallas FEARLESS engineering.
Web Usage Mining with Semantic Analysis Date: 2013/12/18 Author: Laura Hollink, Peter Mika, Roi Blanco Source: WWW’13 Advisor: Jia-Ling Koh Speaker: Pei-Hao.
Tag Clouds Revisited Date : 2011/12/12 Source : CIKM’11 Speaker : I- Chih Chiu Advisor : Dr. Koh. Jia-ling 1.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
RuleML-2007, Orlando, Florida1 Towards Knowledge Extraction from Weblogs and Rule-based Semantic Querying Xi Bai, Jigui Sun, Haiyan Che, Jin.
Date: 2012/10/18 Author: Makoto P. Kato, Tetsuya Sakai, Katsumi Tanaka Source: World Wide Web conference (WWW "12) Advisor: Jia-ling, Koh Speaker: Jiun.
Beyond Co-occurrence: Discovering and Visualizing Tag Relationships from Geo-spatial and Temporal Similarities Date : 2012/8/6 Resource : WSDM’12 Advisor.
PERSONALIZED SEARCH Ram Nithin Baalay. Personalized Search? Search Engine: A Vital Need Next level of Intelligent Information Retrieval. Retrieval of.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
Exploring Online Social Activities for Adaptive Search Personalization CIKM’10 Advisor : Jia Ling, Koh Speaker : SHENG HONG, CHUNG.
April 14, 2003Hang Cui, Ji-Rong Wen and Tat- Seng Chua 1 Hierarchical Indexing and Flexible Element Retrieval for Structured Document Hang Cui School of.
TOPIC CENTRIC QUERY ROUTING Research Methods (CS689) 11/21/00 By Anupam Khanal.
Automatic Image Annotation by Using Concept-Sensitive Salient Objects for Image Content Representation Jianping Fan, Yuli Gao, Hangzai Luo, Guangyou Xu.
Interoperable Visualization Framework towards enhancing mapping and integration of official statistics Haitham Zeidan Palestinian Central.
1 Automatic Classification of Bookmarked Web Pages Chris Staff Second Talk February 2007.
You Are What You Tag Yi-Ching Huang and Chia-Chuan Hung and Jane Yung-jen Hsu Department of Computer Science and Information Engineering Graduate Institute.
Binxing Jiao et. al (SIGIR ’10) Presenter : Lin, Yi-Jhen Advisor: Dr. Koh. Jia-ling Date: 2011/4/25 VISUAL SUMMARIZATION OF WEB PAGES.
Feature Detection in Ajax-enabled Web Applications Natalia Negara Nikolaos Tsantalis Eleni Stroulia 1 17th European Conference on Software Maintenance.
BioSnowball: Automated Population of Wikis (KDD ‘10) Advisor: Dr. Koh, Jia-Ling Speaker: Lin, Yi-Jhen Date: 2010/11/30 1.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
資訊碩一 吳振華 An Extended Method of the Parametric Eigenspace Method by Automatic Background Elimination.
Web mining:a survey in the fuzzy framework
Web Information Retrieval Prof. Alessandro Agostini 1 Context in Web Search Steve Lawrence Speaker: Antonella Delmestri IEEE Data Engineering Bulletin.
A Classification-based Approach to Question Answering in Discussion Boards Liangjie Hong, Brian D. Davison Lehigh University (SIGIR ’ 09) Speaker: Cho,
Date: 2012/08/21 Source: Zhong Zeng, Zhifeng Bao, Tok Wang Ling, Mong Li Lee (KEYS’12) Speaker: Er-Gang Liu Advisor: Dr. Jia-ling Koh 1.
1 Adaptive Subjective Triggers for Opinionated Document Retrieval (WSDM 09’) Kazuhiro Seki, Kuniaki Uehara Date: 11/02/09 Speaker: Hsu, Yu-Wen Advisor:
Multi-Aspect Query Summarization by Composite Query Date: 2013/03/11 Author: Wei Song, Qing Yu, Zhiheng Xu, Ting Liu, Sheng Li, Ji-Rong Wen Source: SIGIR.
MMM2005The Chinese University of Hong Kong MMM2005 The Chinese University of Hong Kong 1 Video Summarization Using Mutual Reinforcement Principle and Shot.
Fire detection based on vision sensor and support vector machines Adviser: Li Yu-Chiang Speaker: Wu Wei-Cheng Date: 2009/03/10 Fire Safety Journal, Volume.
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
CiteData: A New Multi-Faceted Dataset for Evaluating Personalized Search Performance CIKM’10 Advisor : Jia-Ling, Koh Speaker : Po-Hsien, Shih.
哈工大信息检索研究室 HITIR ’ s Update Summary at TAC2008 Extractive Content Selection Using Evolutionary Manifold-ranking and Spectral Clustering Reporter: Ph.d.
University Of Seoul Ubiquitous Sensor Network Lab Query Dependent Pseudo-Relevance Feedback based on Wikipedia 전자전기컴퓨터공학 부 USN 연구실 G
Connecting the Dots Between News Article
Presentation transcript:

A Semantic Clustering-based Approach For Searching And Browsing Tag Spaces Date: 2011/10/17 Source:Damir Vandic et. al (SAC’11) Speaker:Chiang,guang-ting Advisor: Dr. Koh. Jia-ling 1

Index Introduction Framework design Implementation Experiment Conclusion 2

Introduction Today’s Web offers many services that enable users to label content on the Web by means of tags. Even though tags are a flexible way of categorizing data, they have their limitations. Tags are prone to typographical errors or syntactic variations due to the amount of freedom users have, e,q, ”waterfal” and “waterfall”. 3

4

Introduction Motivation: Many of the existing cloud tagging systems are unable to cope with the syntactic and semantic tag variations during user search and browse activities. Goal: Propose the Semantic Tag Clustering Search, a framework able to cope with these needs. 5

6

Framework design 7

1. Clean data set 2. Syntatic variations 3. Semantic clustering 4. Searching tag spaces 8

Input data 9 Framework design D={User, Tags, Pic} apple { Mac, apple, iphone, iPod } t1 t2 t3 t4 t5 t6 t7t8 t9 Jack123 website t1 Base on Flickr

Clean data set Some pictures have many unusable tags due to the freedom of the users in setting picture tags. Apply a sequence of filters that remove tags with “unrecognizable” signs, tags which are complete sentences. 10 Framework design

Syntatic variations 11 Framework design

12 P1{apple, fruit, food} P2{apple, apples, fruit, food} P3{apples, fruit} P4{apples, food} P5{apples, food} P6{food} P7{fruit, food} Base on “ Co-occurance ” 1*+083*0.35 =0.83

Semantic clustering 13 Framework design

Semantic clustering 14 Framework design C1 P1{apple, fruit} P5{apples, fruit, food} C2 P2{apples, food} P4{apples,fruit. food}

Semantic clustering 15 C1 t1{a, b} t3{a, b, c} C2 t2{a, b, c, e} t4{a, b, c}

Searching tag spaces 16 Framework design

Searching tag spaces Feature: 1. Automatic replacement of syntactic variations by their corresponding labels. 2. The ability to detect contexts. If a tag can have multiple meanings, the search engine asks the user to choose a cluster to indicate the sense that was actually meant. 17

Implementation The STCS framework has been implemented in a Javabased Web application i.e., The application uses a subset from the Flickr database. Clean data set: 18 Raw data Users57,009 Pictures166,544 tags317,657 Cleaned data Users50,986 Pictures147,132 tags27,401

Implementation Auto-completion 19

Implementation Syntatic variation detection 20

Implementation Context selection 21

Implementation Context for different selection 22

Experiment 1. Syntatic variations 2. Semantic clustering 3. Searching tag spaces 23

Syntatic variations 24 Experiment

Semantic clustering 100 randomly chosen clusters. Our analysis three thresholds. After generating 100 random clusters, obtain 458 tags. Misplaced tags: 44 misplaced tags and thus the error rate is 9.6%. 25 Experiment Determines whether or not a tag is added to a cluster during the initial cluster creation. Defines the minimum average cosine similarity when merging two sets of which the smaller set has elements that the larger set does not contain. As parameters for the function that defines the dynamic threshold.

Searching tag spaces Compare the cluster-driven search engines”NHC”, “NHC STCS”. This comparison is based on the precision of the first 24 results of an arbitrary query In this paper finds more contexts than the original approach. 26 Experiment NHC % NHC STCS %

Conclusion Proposed the Semantic Tag Clustering Search (STCS) framework for building and utilizing semantic clusters from a social tagging system. The framework has three core tasks: removing syntactic variations, creating semantic clusters, and utilizing obtained clusters to improve search and exploration of tag spaces. Proposed a measure based on the normalized Levenshtein value, combined with the cosine value. With respect to a traditional search engine, searching tag spaces using STCS retrieves more relevant results and achieves a higher precision. 27

Thx for your listening ….. 28

SUPPLEMENT 29

Levenshtein distance 又稱 Edit distance. 其定義是一單字, 集合, 序列轉換成另一組所需 的最少編輯次數。 編輯的操作可分為三種: 取代:將一個字元取代為另外一個字元。 插入:在序列中插入一個字元。 刪除:刪除序列中的一個字元。 Ex: Levenshtein distance between "kitten" and "sitting" is 3 kitten → sitten (substitution of 's' for 'k') sitten → sittin (substitution of 'i' for 'e') sittin → sitting (insertion of 'g' at the end). 30

Cosine similarity 31