1 One Table Stores All: Enabling Painless Free-and-Easy Data Publishing and Sharing Bei Yu 1, Guoliang Li 2, Beng Chin Ooi 1, Li-zhu Zhou 2 1 National.

Slides:



Advertisements
Similar presentations
TI: An Efficient Indexing Mechanism for Real-Time Search on Tweets Chun Chen 1, Feng Li 2, Beng Chin Ooi 2, and Sai Wu 2 1 Zhejiang University, 2 National.
Advertisements

Pete Bohman Adam Kunk.  Introduction  Related Work  System Overview  Indexing Scheme  Ranking  Evaluation  Conclusion.
Management Information Systems, Sixth Edition
Presented by Russell Myers Paper by Ming-Chuan Wu and Alejandro P. Buchmann.
Search Engines. 2 What Are They?  Four Components  A database of references to webpages  An indexing robot that crawls the WWW  An interface  Enables.
Information Retrieval in Practice
Flickr Tags Network Mustafa Kilavuz. Tags A tag is a keyword Search, spam detection, reputation systems, personal organization and metadata.
Search Engines and Information Retrieval
Tagging Systems Austin Wester. Tags A keywords linked to a resource (image, video, web page, blog, etc) by users without using a controlled vocabulary.
Tagging Systems Mustafa Kilavuz. Tags A tag is a keyword added to an internet resource (web page, image, video) by users without relying on a controlled.
ADVISE: Advanced Digital Video Information Segmentation Engine
Object-based Image Representation Dr. B.S. Manjunath Sitaram Bhagavathy Shawn Newsam Baris Sumengen Vision Research Lab University of California, Santa.
Information Retrieval Concerned with the: Representation of Storage of Organization of, and Access to Information items.
Information Retrieval in Practice
A survey of tag cloud presentation techniques Mogens Nielsen June 6th 2007.
Object Naming & Content based Object Search 2/3/2003.
A Search-based Method for Forecasting Ad Impression in Contextual Advertising Defense.
ÆKOS: A new paradigm for discovery and access to complex ecological data David Turner, Paul Chinnick, Andrew Graham, Matt Schneider, Craig Walker Logos.
Overview of Search Engines
Databases & Data Warehouses Chapter 3 Database Processing.
Digital Object: A Virtual Online Storage Solution 598C Course Project Huajing Li.
Tag-based Social Interest Discovery
Naming and Directories. Recall from the last time… File system components Disk management organizes disk blocks into files. Many disk blocks management.
BiodiversityWorld GRID Workshop NeSC, Edinburgh – 30 June and 1 July 2005 Metadata Agents and Semantic Mediation Mikhaila Burgess Cardiff University.
Search Engines and Information Retrieval Chapter 1.
1 Benjamin Perry, Venkata Kambhampaty, Kyle Brumsted, Lars Vilhuber, William Block Crowdsourcing DDI Development: New Features from the CED 2 AR Project.
Social scope: Enabling Information Discovery On Social Content Sites
Exploring the Applicability of Scientific Data Management Tools and Techniques on the Records Management Requirements for the National Archives and Records.
Information Retrieval in Folksonomies Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Thanks to Bill Arms, Marti Hearst Documents. Last time Size of information –Continues to grow IR an old field, goes back to the ‘40s IR iterative process.
Indices Tomasz Bartoszewski. Inverted Index Search Construction Compression.
29-30 October, 2006, Estonia 1 IST4Balt Information analysis using social bookmarking and other tools IST4Balt Information analysis using social bookmarking.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Searching and Browsing Using Tags Nikos Sarkas Social Information Systems Seminar DCS, University of Toronto, Winter 2007.
Search Engines. Search Strategies Define the search topic(s) and break it down into its component parts What terms, words or phrases do you use to describe.
University of Malta CSA3080: Lecture 4 © Chris Staff 1 of 14 CSA3080: Adaptive Hypertext Systems I Dr. Christopher Staff Department.
Introduction to Digital Libraries hussein suleman uct cs honours 2003.
NTU Natural Language Processing Lab. 1 An Analysis of Effectiveness of Tagging in Blogs Christopher H. Brooks and Nancy Montanez University of San Francisco.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Freelib: A Self-sustainable Digital Library for Education Community Ashraf Amrou, Kurt Maly, Mohammad Zubair Computer Science Dept., Old Dominion University.
Enhancing Cluster Labeling Using Wikipedia David Carmel, Haggai Roitman, Naama Zwerdling IBM Research Lab (SIGIR’09) Date: 11/09/2009 Speaker: Cho, Chin.
Efficient RDF Storage and Retrieval in Jena2 Written by: Kevin Wilkinson, Craig Sayers, Harumi Kuno, Dave Reynolds Presented by: Umer Fareed 파리드.
Automatic Video Tagging using Content Redundancy Stefan Siersdorfer 1, Jose San Pedro 2, Mark Sanderson 2 1 L3S Research Center, Germany 2 University of.
 Enhancing User Experience  Why it is important?  Discussing user experience one-by-one.
Advanced Semantics and Search Beyond Tag Clouds and Taxonomies Tom Reamy Chief Knowledge Architect KAPS Group Knowledge Architecture Professional Services.
JISC/NSF PI Meeting, June Archon - A Digital Library that Federates Physics Collections with Varying Degrees of Metadata Richness Department of Computer.
Metadata and Meta tag. What is metadata? What does metadata do? Metadata schemes What is meta tag? Meta tag example Table of Content.
LTER IM Meeting 2008 – Benson, Boose, Bohm, Gries, Gu, Kaplan, Koskela, Laney, Porter, Remillard, Sheldon and others.
BATON A Balanced Tree Structure for Peer-to-Peer Networks H. V. Jagadish, Beng Chin Ooi, Quang Hieu Vu.
SAPIR Search in Audio-Visual Content using P2P Information Retrival For more information visit: Support.
Achieving Semantic Interoperability at the World Bank Designing the Information Architecture and Programmatically Processing Information Denise Bedford.
Citation-Based Retrieval for Scholarly Publications 指導教授:郭建明 學生:蘇文正 M
NeOn Components for Ontology Sharing and Reuse Mathieu d’Aquin (and the NeOn Consortium) KMi, the Open Univeristy, UK
Automatic vs manual indexing Focus on subject indexing Not a relevant question? –Wherever full text is available, automatic methods predominate Simple.
Folksonomy-based Course Authoring for Flexible Student Modeling Sergey Sosnovsky, Michael Yudelson
INFORMATION TECHNOLOGY DATABASE MANAGEMENT. A database is a collection of information organized to provide efficient retrieval. The collected information.
Evolution of the Electronic Calendar: Introducing Social Calendaring.
Information Retrieval in Practice
Information Retrieval in Practice
Why indexing? For efficient searching of a document
Searching for Information
Search Engine Architecture
Lecture 7. Web Search. Author: Aleksey Semyonov
Robotic Search Engines for the Physical World
Metadata Construction in Collaborative Research Networks
Networked Information Resources
Information Retrieval and Web Design
Information Retrieval and Web Design
Presentation transcript:

1 One Table Stores All: Enabling Painless Free-and-Easy Data Publishing and Sharing Bei Yu 1, Guoliang Li 2, Beng Chin Ooi 1, Li-zhu Zhou 2 1 National University of Singapore 2 Tsinghua University

2 Folksonomy (folk+taxonomy) Examples Delicious Flickr Google Base YouTube Internet-based information sharing methodology Users collaboratively publish information resources, e.g., webpages, photos, using self-defined metadata Users collaborative behavior decides the data semantics System categorize information resources based on user- defined metadata, to facilitate searching, browsing, etc..

3 Our Attempt Devise a general system framework for supporting folksonomy-based data sharing Allows rich and flexible structure of the metadata (called data units) for describing published resources Categorize data units Efficiently store all data units Provide browsing and querying services

4 Data Units The metadata, called data unit, consists of user-created title, fields (attributes and values), tags

5 Data Model A generic relational table for storing all data units, e.g. A set of virtual relations (VR) as views over the generic table, as querying interface, e.g. VR2 VR1

6 System Framework queries

7 Data Units Categorizer Constructs and maintains VRs dynamically as data units are published constantly Clustering based on attributes and tags VR ≡ Cluster of data units with similar topics Need an on-line one pass clustering model Accepts a data unit u, and extracts its attributes and tags Compare u with existing VRs, and assigns it to the ones that results in a match If no suitable VR for u, create a new VR with u as the only tuple

8 Challenges for Categorizing Uncontrolled vocabulary for both attributes and tags Large portion of “ noise ”, very infrequent The number of unique attributes and tags keeps growing Problems with synonyms, polysemy, etc.

9 Our Current Approach Characterize each VR with sets of popular attributes (PAS) and tags (PTS), for representing the dominating features Compare new data units with PAS and PTS, for limiting the affect of “ noise ” Maintain PAS and PTS when assigning each new data unit

10 Storage Manager Function Store and index the generic table (very sparse) maintain mappings with VRs Challenge Space efficiency Scalable over the number of attributes and data volume Be efficient for both retrieval and update

11 Storage with Sparse Table Only storing non-null values for each tuple Build inverted index over attributes for processing attribute-based queries Build inverted index over keywords for processing keyword queries Other approaches? Bitmap index?

12 Browsing and Query Processing The VRs are ordered based on popularity for browsing May be presented in different views, e.g., based on attributes or based on tags Support both keyword query and structured query Inverted index Effective ranking

13 Conclusion We have presented the design for a folksonomy-based data sharing system We devise a generic table data model for representing and storing the data units Future work Port the system into P2P networks