Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.

Slides:



Advertisements
Similar presentations
Data Mining and the Web Susan Dumais Microsoft Research KDD97 Panel - Aug 17, 1997.
Advertisements

Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Collective Collaborative Tagging System Jong Y. Choi, Joshua Rosen, Siddharth Maini, Marlon E. Pierce, and Geoffrey C. Fox Community Grids Laboratory Indiana.
Cloak and Dagger. In a nutshell… Cloaking Cloaking in search engines Search engines’ response to cloaking Lifetime of cloaked search results Cloaked pages.
Flickr Tags Network Mustafa Kilavuz. Tags A tag is a keyword Search, spam detection, reputation systems, personal organization and metadata.
Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.
Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.
Basic IR: Queries Query is statement of user’s information need. Index is designed to map queries to likely to be relevant documents. Query type, content,
A Web of Concepts Dalvi, et al. Presented by Andrew Zitzelberger.
Xyleme A Dynamic Warehouse for XML Data of the Web.
Automatic Discovery and Classification of search interface to the Hidden Web Dean Lee and Richard Sia Dec 2 nd 2003.
Predicting the Semantic Orientation of Adjective Vasileios Hatzivassiloglou and Kathleen R. McKeown Presented By Yash Satsangi.
Web queries classification Nguyen Viet Bang WING group meeting June 9 th 2006.
Website Clustering Combining Website Lexical Data and Query Semantic Data Nana Huang, Ray Li.
Recommender Systems; Social Information Filtering.
Ranking by Odds Ratio A Probability Model Approach let be a Boolean random variable: document d is relevant to query q otherwise Consider document d as.
1 MARG-DARSHAK: A Scrapbook on Web Search engines allow the users to enter keywords relating to a topic and retrieve information about internet sites (URLs)
1 Today  Tools (Yves)  Efficient Web Browsing on Hand Held Devices (Shrenik)  Web Page Summarization using Click- through Data (Kathy)  On the Summarization.
Deriving Emergent Web Page Semantics D.V. Sreenath*, W.I. Grosky**, and F. Fotouhi* *Wayne State University **University of Michigan-Dearborn.
Information Retrieval
Memoplex Browser: Searching and Browsing in Semantic Networks CPSC 533C - Project Update Yoel Lanir.
| Computer Science Department | Ubiquitous Knowledge Processing Lab | © Prof. Dr. Iryna Gurevych | 1 del.icio.us Knowledge Management in Web.
The Social Web: A laboratory for studying s ocial networks, tagging and beyond Kristina Lerman USC Information Sciences Institute.
Personalized Ontologies for Web Search and Caching Susan Gauch Information and Telecommunications Technology Center Electrical Engineering and Computer.
Overview of Search Engines
How Search Engines Work. Any ideas? Building an index Dan taylor Flickr Creative Commons.
CS344: Introduction to Artificial Intelligence Vishal Vachhani M.Tech, CSE Lecture 34-35: CLIR and Ranking in IR.
Tag-based Social Interest Discovery
Web 2.0: Concepts and Applications 4 Organizing Information.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
2008/06/06 Y.H.Chang Towards Effective Browsing of Large Scale Social Annotations1 Towards Effective Browsing of Large Scale Social Annotations WWW 2007.
X-Informatics Web Search; Text Mining B 2013 Geoffrey Fox Associate Dean for.
By : Garima Indurkhya Jay Parikh Shraddha Herlekar Vikrant Naik.
Improved search for Socially Annotated Data Authors: Nikos Sarkas, Gautam Das, Nick Koudas Presented by: Amanda Cohen Mostafavi.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
Web Search. Structure of the Web n The Web is a complex network (graph) of nodes & links that has the appearance of a self-organizing structure  The.
Web Searching Basics Dr. Dania Bilal IS 530 Fall 2009.
UOS 1 Ontology Based Personalized Search Zhang Tao The University of Seoul.
The Development of the Ceramics and Glass website Mia Ridge Museum Systems Team Museum of London.
29-30 October, 2006, Estonia 1 IST4Balt Information analysis using social bookmarking and other tools IST4Balt Information analysis using social bookmarking.
The Internet 8th Edition Tutorial 4 Searching the Web.
Keyword Searching and Browsing in Databases using BANKS Seoyoung Ahn Mar 3, 2005 The University of Texas at Arlington.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
LATENT SEMANTIC INDEXING Hande Zırtıloğlu Levent Altunyurt.
Keyword Search in Databases using PageRank By Michael Sirivianos April 11, 2003.
The Business Model of Google MBAA 609 R. Nakatsu.
Semantic Visualization What do we mean when we talk about visualization? - Understanding data - Showing the relationships between elements of data Overviews.
Taylor Rassmann.  Grouping data objects into X tree of clusters and uses distance matrices as clustering criteria  Two Hierarchical Clustering Categories:
Order the featured book of the day Estimated effort: 2.
Algorithmic Detection of Semantic Similarity WWW 2005.
How Do We Find Information?. Key Questions  What are we looking for?  How do we find it?  Why is it difficult? “A prudent question is one-half of wisdom”
Thesis Proposal: Prediction of popular social annotations Abon.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
Search Worms, ACM Workshop on Recurring Malcode (WORM) 2006 N Provos, J McClain, K Wang Dhruv Sharma
1 One Table Stores All: Enabling Painless Free-and-Easy Data Publishing and Sharing Bei Yu 1, Guoliang Li 2, Beng Chin Ooi 1, Li-zhu Zhou 2 1 National.
Date: 2012/11/29 Author: Chen Wang, Keping Bi, Yunhua Hu, Hang Li, Guihong Cao Source: WSDM’12 Advisor: Jia-ling, Koh Speaker: Shun-Chen, Cheng.
Hybrid Content and Tag-based Profiles for recommendation in Collaborative Tagging Systems Latin American Web Conference IEEE Computer Society, 2008 Presenter:
Instance Discovery and Schema Matching With Applications to Biological Deep Web Data Integration Tantan Liu, Fan Wang, Gagan Agrawal {liut, wangfa,
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
IR Theory: Web Information Retrieval. Web IRFusion IR Search Engine 2.
1 CS 430 / INFO 430: Information Retrieval Lecture 20 Web Search 2.
Data mining in web applications
Personalized Social Image Recommendation
Data Mining Chapter 6 Search Engines
NewCronos what policy and architecture contents consultation evolution
Information Retrieval and Web Design
A Glimpse of Recommender Systems on the Web
Presentation transcript:

Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007

Social Resource Sharing The del.icio.us paradigm. Users store links to web pages of interest along with arbitrary, user-specified tags in a server. The model is independent of the resource being shared. Music (Last.fm) Photos (Flickr) Publications (CiteULike) …

Part I: Searching

Ranking Web Search Results Two prevalent models. Ranking based on query-document similarity. TF/IDF Metadata extraction Link analysis Query independent static ranking. PageRank “Quality” based

Similarity Ranking, Take I Query q={q 1,q 2,…,q n }. Tags of URL p, T(p)={t 1,t 2,…,t m }. Define similarity as |q∩T(p)|/|T(p)|. Problems Synonymy (according to the authors) Others? Synonymy example Linux, Ubuntu and Gnome

Similarity Ranking, Take II Use tags with “similar” meaning to enrich query. Create 3 matrices M TP, tag-URL count matrix S T, tag-tag similarity matrix S P, URL-URL similarity matrix

Similarity Ranking, Take II Iterate Similarly update S P, until convergence. Then, similarity between a query q and a url p is

Social PageRank “Popular web pages are tagged by many up- to-date users, using hot tags”. Transfer popularity between entities. Define matrices M PU, M UT, M TP. Iterate

Putting It All Together Train a ranking function (RankSVM) using the following features BM25 similarity between query and url content Simple query-url tags similarity measure Complex query-url tags similarity measure PageRank Social PageRank Results Precision, NDCG at k Small improvement over BM25, up to 25% for NDCG and synthetic queries

Part II: Browsing

Tag Assisted Browsing Currently two methods for tag driven browsing Keyword search Clouds of popular tags We would like to support Semantic browsing: also present URLs annotated with similar tags Hierarchical browsing: browse in a top-down fashion

Semantic Browsing Define similarity between tags: Synonymic tags: similarity above a threshold. The synonymic tags and the tag itself defines its semantic concept. Given that the user has selected L tags, that define semantic concepts S c ={C 1,…,C L }, related URLs are:

Hierarchical Browsing Observations No neat tree structure Multiple ways to target resource URLs associated with different categories Dynamic structure: leafs can become inner nodes

Hierarchical Browsing Generating sub-tags Train a classifier to identify which of the tags in the semantic concept are sub-tags Features used: ratio of tag counts, intersection size, etc. Clustering sub-tags Ranks tags based on a complex formula Greedy clustering technique