1 Mining a Web 2.0 service for the discovery of semantically similar terms: A case study with Del.icio.us Kwan Yi School of Library and Information Science.

Slides:



Advertisements
Similar presentations
IMA Membership Audit Quarter January – March /06/2013.
Advertisements

Jack Jedwab Association for Canadian Studies September 27 th, 2008 Canadian Post Olympic Survey.
Strategies LLCTaxonomy May 22, 2006Copyright 2006 Taxonomy Strategies LLC. All rights reserved Enterprise Search Summit Taxonomy Fundamentals Workbook.
Mental Mind Gym coming …. 30 Second Challenge - Early Additive.
Welcome to Who Wants to be a Millionaire
Composition Program 09 Grading Workshop Conclusion Ellen Barton Director of Composition
Using del.icio.us in AskAway Part of the AskAway Best Practices session November 16, 2006 Joy Schwarz Winnefox Library System
Fill in missing numbers or operations
Name: Date: Read temperatures on a thermometer Independent / Some adult support / A lot of adult support
$1 Million $500,000 $250,000 $125,000 $64,000 $32,000 $16,000 $8,000 $4,000 $2,000 $1,000 $500 $300 $200 $100 Welcome.
1
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
Multiplication X 1 1 x 1 = 1 2 x 1 = 2 3 x 1 = 3 4 x 1 = 4 5 x 1 = 5 6 x 1 = 6 7 x 1 = 7 8 x 1 = 8 9 x 1 = 9 10 x 1 = x 1 = x 1 = 12 X 2 1.
Division ÷ 1 1 ÷ 1 = 1 2 ÷ 1 = 2 3 ÷ 1 = 3 4 ÷ 1 = 4 5 ÷ 1 = 5 6 ÷ 1 = 6 7 ÷ 1 = 7 8 ÷ 1 = 8 9 ÷ 1 = 9 10 ÷ 1 = ÷ 1 = ÷ 1 = 12 ÷ 2 2 ÷ 2 =
Objectives: Generate and describe sequences. Vocabulary:
eClassifier: Tool for Taxonomies
Slide 1 FastFacts Feature Presentation October 16 th, 2008 We are using audio during this session, so please dial in to our conference line… Phone number:
National Diet Library Digital Archive Portal - PORTA - Gateway to digital information in Japan April 3, 2008 Hideki Takeuchi Planning.
AS. 02/03 Finding fractions of a quantity AS. 02/03.
1 State Wildlife Action Plans Wiki: Business Transformation Tutorial Brand Niemann July 5, 2008
CALENDAR.
Using Social Bookmarking in Academic Research Adriana Reed J. Willard Marriott Library April 30, 2008.
Welcome to Who Wants to be a Millionaire
1  1 =.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
I can interpret intervals on partially numbered scales and record readings accurately ? 15 ? 45 ? 25 ? 37 ? 53 ? 64 Each little mark.
£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.
Welcome to Who Wants to be a Millionaire
£1 Million £500,000 £250,000 £125,000 £64,000 £32,000 £16,000 £8,000 £4,000 £2,000 £1,000 £500 £300 £200 £100 Welcome.
Welcome to Who Wants to be a Millionaire
Welcome to Who Wants to be a Millionaire
For Translators and Translation Editors Note-Taking presents... by Riccardo Schiaffino CTA 3rd Annual Conference Boulder, May © Riccardo Schiaffino,
A Probabilistic Approach to Personalized Tag Recommendation Meiqun Hu, Ee-Peng Lim and Jing Jiang School of Information Systems Singapore Management University.
Copyright © 2008 Roger Webster, Ph.D. EDW647 Internet For Educators Conclusion Roger W. Webster, Ph.D. Department of Computer Science Millersville University.
ORDER OF OPERATIONS LESSON 2 DAY 2. BEDMAS B – Brackets E – Exponents D – Division from left to right M – Multiply from left to right A – Add from left.
$100 $200 $300 $400 $100 $200 $300 $400 $100 $200 $300 $400 $100 $200 $300 $400 $100 $200 $300 $400.
PNS: Personalized Multi-Source News Delivery Georgios Paliouras(1), Mouzakidis Alexandros(1), Christos Ntoutsis(2), Angelos Alexopoulos(3), Christos Skourlas(2)
Least Common Multiple (LCM)
9 th Annual AMICAL Meeting & Conference, Sharjah, UAE The Value of Library and Information Services: Sharing Data and Assessing Impact. Stella Asderi,
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
Promoting Regulatory Excellence Self Assessment & Physiotherapy: the Ontario Model Jan Robinson, Registrar & CEO, College of Physiotherapists of Ontario.
Making Landmark or Friendly Numbers (Multiplication)
Because I said so… Objective: To identify and use inductive and deductive reasoning.
Adding Up In Chunks.
Student Engagement in Science and Engineering Paul Chin.
FAFSA on the Web Preview Presentation December 2013.
SLP – Endless Possibilities What can SLP do for your school? Everything you need to know about SLP – past, present and future.
Facebook Pages 101: Your Organization’s Foothold on the Social Web A Volunteer Leader Webinar Sponsored by CACO December 1, 2010 Andrew Gossen, Senior.
Sets Sets © 2005 Richard A. Medeiros next Patterns.
Least Common Multiples and Greatest Common Factors
Before Between After.
Fractions Simplify: 36/48 = 36/48 = ¾ 125/225 = 125/225 = 25/45 = 5/9
1 Phase III: Planning Action Developing Improvement Plans.
Partial Products. Category 1 1 x 3-digit problems.
PSSA Preparation.
Powerpoint Jeopardy Category 1Category 2Category 3Category 4Category
Doubling and Halving. CATEGORY 1 Doubling and Halving with basic facts.
Dutchess Community College Fire Science program Let’s take a 10 minute break Please be back on time.
Web 2.0? Library 2.0? How Libraries Are Using New Web Tools Mary Page March 7, 2007.
| Computer Science Department | Ubiquitous Knowledge Processing Lab | © Prof. Dr. Iryna Gurevych | 1 del.icio.us Knowledge Management in Web.
Web 2.0: Concepts and Applications 4 Organizing Information.
1 Web Search Personalization via Social Bookmarking and Tagging Michael G. Noll & Christoph Meinel Hasso-Plattner-Institut an der Universit¨at Potsdam,
No Title, yet Hyunwoo Kim SNU IDB Lab. September 11, 2008.
EnTaG Enhanced (social) Tagging for Discovery Doug Tudhope Hypermedia Research Unit, University of Glamorgan Exeter.
Let's play “tag”. what is a tag? A tag is a keyword or descriptive term associated with an item as means of classification by means of a folksonomy...
Automatic Detection of Social Tag Spams Using a Text Mining Approach Hsin-Chang Yang Associate Professor Department of Information Management National.
Social Bookmarking! September 29, For Today: Introduce Social Bookmarking Register at Take home worksheet!
+ User-induced Links in Collaborative Tagging Systems Ching-man Au Yeung, Nicholas Gibbins, Nigel Shadbolt CIKM’09 Speaker: Nonhlanhla Shongwe 18 January.
EnTag Enhanced Tagging for Discovery Koraljka Golub, Jim Moon,
Presentation transcript:

1 Mining a Web 2.0 service for the discovery of semantically similar terms: A case study with Del.icio.us Kwan Yi School of Library and Information Science College of Communications and Information Studies University of Kentucky

Social bookmarking: Del.icio.us Del.icio.us is one of most popular social bookmarking systems: – 3 million registered users and – 100 million unique URLs bookmarked, as of September 2007

Folksonomy We define folksonomy as a collective set of tags (keywords or terms) assigned by participants in a social tagging system. – User-created vocabulary – Uncontrolled vocabulary – Built in a collaborative manner

Example: A folksonomy in Delicious.com Resource title Resource taggers Resource URL Tagging history Popular tags

Objective of the Study To examine an effective way of mining semantically similar terms from folksonomy for the purpose of investigating the feasibility of folksonomy as a potential data source of semantically similar terms

Proposed algorithms for mining similar terms from Folksonomy Co-occurrence-based similarity algorithm Correlation-based similarity algorithm

Experiment (I) To identify similar terms of each of the 121 most popular tags on Del.icio.us posted on the fifteenth of May 2008

Result: How many similar terms for the 121 popular tags? Co-occurrence-based algorithm – 2.6 similar terms (Level of similarity = 0.9) – 5.1 similar terms (Level of similarity = 0.7) – 10.1 similar terms (Level of similarity = 0.5) Correlation-based algorithm – 0.9 similar terms (Level of similarity = 0.9) – 1.6 similar terms (Level of similarity = 0.7) – 2.6 similar terms (Level of similarity = 0.5)

Experiment (II) To identify similar terms of each of the 32 tags (out of the 121) that are not listed on the online version of Merriam-Webster Dictionary

Result: How many similar terms for the 32 not-in-the-dictionary tags? Co-occurrence-based algorithm – 3.3 similar terms (Level of similarity = 0.9) – 5.9 similar terms (Level of similarity = 0.7) – 10.1 similar terms (Level of similarity = 0.5) Correlation-based algorithm – 1 similar terms (Level of similarity = 0.9) – 1.7 similar terms (Level of similarity = 0.7) – 2.4 similar terms (Level of similarity = 0.5)

Webdesign (similarity level: 0.9) Co-occurrence [12]: resources css web design reference html tutorial tutorials inspiration gallery development webdev Correlation [4]: css design html inspiration

Findings The correlation-based is more selective than the co-occurrence- based. The co-occurrence-based appears to be most attractive with the similarity level of 0.7.

Conclusion As social bookmarking systems are more popularly utilized, the potential of their folksonomies for the mining task will be more increased.

Thanks!

Co-occurrence-based similarity algorithm (Identifying similar terms of the term W) W (100) A (50) B (20) C (10) W (87) B (57) C (40) A (30) W (1032) A (250) F (120) D (78) W (37) A (29) B (16) F (9) A (4) B (3) C (2) F (2) D (1) CoSA(s=1: A W) CoSA(s=0.75: B W) CoSA(s=0.5: C W) CoSA(s=0.5: F W) 3 3 CoSA(s=0.25: D W)

Correlation-based similarity algorithm Term X is said to be similar to term W on the basis of the correlation-based algorithm: CrSA(s: X W) CrSA(s: X W) can be defined only if both CoSA(s: X W) and CoSA(s: W X) are satisfied.