Semantically Enriching Folksonomies with

Slides:



Advertisements
Similar presentations
Semi-automatic compound nouns annotation for data integration systems Tuesday, 23 June 2009 SEBD 2009 Sonia Bergamaschi Serena Sorrentino
Advertisements

1 Semantically Enriching Folksonomies with Sofia Angeletou, Marta Sabou and Enrico Motta.
Chapter 9: Ontology Management Service-Oriented Computing: Semantics, Processes, Agents – Munindar P. Singh and Michael N. Huhns, Wiley, 2005.
Web3.0 and Language Resources Marta Sabou Knowledge Media Institute (KMi) The Open University Exploiting Semantic Web Ontologies: An Experimental Report.
Information Retrieval in Practice
Search Engines and Information Retrieval
Xiaomeng Su & Jon Atle Gulla Dept. of Computer and Information Science Norwegian University of Science and Technology Trondheim Norway June 2004 Semantic.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Text Mining: Finding Nuggets in Mountains of Textual Data Jochen Dijrre, Peter Gerstl, Roland Seiffert Presented by Drew DeHaas.
Introduction to Machine Learning Approach Lecture 5.
SemanTic Interoperability To access Cultural Heritage Frank van Harmelen Henk Matthezing Peter Wittenburg Marjolein van Gendt Antoine Isaac Lourens van.
Named Entity Disambiguation Based on Explicit Semantics Martin Jačala and Jozef Tvarožek Špindlerův Mlýn, Czech Republic January 23, 2012 Slovak University.
Overview of Search Engines
Improving Data Discovery in Metadata Repositories through Semantic Search Chad Berkley 1, Shawn Bowers 2, Matt Jones 1, Mark Schildhauer 1, Josh Madin.
Retrieval Effectiveness of an Ontology-based Model for Information Selection Khan, L., McLeod, D. & Hovy, E. Presented by Danielle Lee.
Query Relevance Feedback and Ontologies How to Make Queries Better.
COMP423: Intelligent Agent Text Representation. Menu – Bag of words – Phrase – Semantics – Bag of concepts – Semantic distance between two words.
Search Engines and Information Retrieval Chapter 1.
 Copyright 2006 Digital Enterprise Research Institute. All rights reserved. Collaborative Building of Controlled Vocabularies Crosswalks Mateusz.
An Integrated Approach to Extracting Ontological Structures from Folksonomies Huairen Lin, Joseph Davis, Ying Zhou ESWC 2009 Hyewon Lim October 9 th, 2009.
Growing a Tree in the Forest: Constructing Folksonomies by Integrating Structured Metadata Anon Plangprasopchok 1, Kristina Lerman 1, Lise Getoor 2 1 USC.
Exploiting Wikipedia as External Knowledge for Document Clustering Sakyasingha Dasgupta, Pradeep Ghosh Data Mining and Exploration-Presentation School.
CIG Conference Norwich September 2006 AUTINDEX 1 AUTINDEX: Automatic Indexing and Classification of Texts Catherine Pease & Paul Schmidt IAI, Saarbrücken.
1 Wikification CSE 6339 (Section 002) Abhijit Tendulkar.
Automatic Lexical Annotation Applied to the SCARLET Ontology Matcher Laura Po and Sonia Bergamaschi DII, University of Modena and Reggio Emilia, Italy.
Name : Emad Zargoun Id number : EASTERN MEDITERRANEAN UNIVERSITY DEPARTMENT OF Computing and technology “ITEC547- text mining“ Prof.Dr. Nazife Dimiriler.
Ontology-Driven Automatic Entity Disambiguation in Unstructured Text Jed Hassell.
1 Technologies for (semi-) automatic metadata creation Diana Maynard.
Annotating Words using WordNet Semantic Glosses Julian Szymański Department of Computer Systems Architecture, Faculty of Electronics, Telecommunications.
Towards an ecosystem of data and ontologies Mathieu d’Aquin and Enrico Motta Knowledge Media Institute The Open University.
WORD SENSE DISAMBIGUATION STUDY ON WORD NET ONTOLOGY Akilan Velmurugan Computer Networks – CS 790G.
Definition of a taxonomy “System for naming and organizing things into groups that share similar characteristics” Taxonomy Architectures Applications.
P2P Concept Search Fausto Giunchiglia Uladzimir Kharkevich S.R.H Noori April 21st, 2009, Madrid, Spain.
Evaluating Semantic Metadata without the Presence of a Gold Standard Yuangui Lei, Andriy Nikolov, Victoria Uren, Enrico Motta Knowledge Media Institute,
Wikipedia as Sense Inventory to Improve Diversity in Web Search Results Celina SantamariaJulio GonzaloJavier Artiles nlp.uned.es UNED,c/Juan del Rosal,
Understanding User’s Query Intent with Wikipedia G 여 승 후.
Authors: Marius Pasca and Benjamin Van Durme Presented by Bonan Min Weakly-Supervised Acquisition of Open- Domain Classes and Class Attributes from Web.
Harvesting Social Knowledge from Folksonomies Harris Wu, Mohammad Zubair, Kurt Maly, Harvesting social knowledge from folksonomies, Proceedings of the.
1 Masters Thesis Presentation By Debotosh Dey AUTOMATIC CONSTRUCTION OF HASHTAGS HIERARCHIES UNIVERSITAT ROVIRA I VIRGILI Tarragona, June 2015 Supervised.
Acquisition of Categorized Named Entities for Web Search Marius Pasca Google Inc. from Conference on Information and Knowledge Management (CIKM) ’04.
Identifying “Best Bet” Web Search Results by Mining Past User Behavior Author: Eugene Agichtein, Zijian Zheng (Microsoft Research) Source: KDD2006 Reporter:
Jean-Yves Le Meur - CERN Geneva Switzerland - GL'99 Conference 1.
Semantic Interoperability in GIS N. L. Sarda Suman Somavarapu.
GoRelations: an Intuitive Query System for DBPedia Lushan Han and Tim Finin 15 November 2011
Of 24 lecture 11: ontology – mediation, merging & aligning.
SERVICE ANNOTATION WITH LEXICON-BASED ALIGNMENT Service Ontology Construction Ontology of a given web service, service ontology, is constructed from service.
2016/9/301 Exploiting Wikipedia as External Knowledge for Document Clustering Xiaohua Hu, Xiaodan Zhang, Caimei Lu, E. K. Park, and Xiaohua Zhou Proceeding.
Harnessing the Deep Web : Present and Future -Tushar Mhaskar Jayant Madhavan, Loredana Afanasiev, Lyublena Antova, Alon Halevy January 7,
Trends in NL Analysis Jim Critz University of New York in Prague EurOpen.CZ 12 December 2008.
Information Retrieval in Practice
Information Retrieval in Practice
Automatic Writing Evaluation
Linguistic Graph Similarity for News Sentence Searching
Exploiting Wikipedia as External Knowledge for Document Clustering
Web News Sentence Searching Using Linguistic Graph Similarity
User-Adaptive Systems
Extracting Semantic Concept Relations
Presented by: Prof. Ali Jaoua
Information Organization: Clustering
Service-Oriented Computing: Semantics, Processes, Agents
An Empirical Study of Property Collocation on Large Scale of Knowledge Base 龚赛赛
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Ying Dai Faculty of software and information science,
Semantic Similarity Methods in WordNet and their Application to Information Retrieval on the Web Yizhe Ge.
Ying Dai Faculty of software and information science,
Service-Oriented Computing: Semantics, Processes, Agents
Giannis Varelas Epimenidis Voutsakis Paraskevi Raftopoulou
Chaitali Gupta, Madhusudhan Govindaraju
WSExpress: A QoS-Aware Search Engine for Web Services
Presentation transcript:

Semantically Enriching Folksonomies with Sofia Angeletou, Marta Sabou and Enrico Motta

Semantic Web2.0 “The combination of Semantic Web formal structures and Web2.0 user generated content can lead the Web to its full potential”. Semantically Enriching Folksonomies with FLOR

Web2.0 … easy upload free tagging requiring minimal annotation effort open, dynamic and evolving vocabulary .. leading to a content intensive web …however.. Semantically Enriching Folksonomies with FLOR

tagging systems’ characteristics content retrieval mechanisms are limited: keyword based search tag cloud navigation search may suffer of poor precision and recall due to: basic level variation problem whale VS orca syntactic inconsistencies singular VS plural concatenated/misspelled tags Semantically Enriching Folksonomies with FLOR

..an example query: “animal live water” looking in for photos of “animals which live in the water” Dog Bird Tiger Cat Land scape 5/24 ≈ 21% relevant This is the first page of results from Flickr. These results are the most interesting as opposed to the most recent (which means that it is possible to have higher relevance to the query) Semantically Enriching Folksonomies with FLOR

.. some missed photos whale dolphin dolphin whale dolphin whale sea elephant seal whale Semantically Enriching Folksonomies with FLOR

modifying the query.. similar results ...also: “animal habitat water” “animal sea” “animal water” similar results ...also: not easy for the user to form the most effective query Semantically Enriching Folksonomies with FLOR

kitten furry pets cow whiskers whale eye our goal Improve content retrieval in folksonomies enhance precision and recall in search enable complex queries support intelligent navigation by applying a semantic layer on top of folksonomy tagspaces Dolphin Seal Marine Mammal Sea hasHabitat Whale Body of Water Ocean Mammal Terrestrial Mammal Tiger Lion Sea Elephant Animal kitten furry pets cow whiskers whale eye cat cute feline pet monkey water deer primate bear lion rodent giraffe dog elephant fur ocean rabbit grass cute tree goat canon tiger seal gorilla brown marine wild closeup california white cats eyes park animals otter mammal animal zoo nature dolphin nose Semantically Enriching Folksonomies with FLOR

STEP1: Semantically Enriching Folksonomies our goal STEP1: Semantically Enriching Folksonomies Dolphin Seal Marine Mammal Sea hasHabitat Whale Body of Water Ocean Mammal Terrestrial Mammal Tiger Lion Sea Elephant Animal kitten furry pets cow whiskers whale eye cat cute feline pet monkey water deer primate bear lion rodent giraffe dog elephant fur ocean rabbit sea grass cute tree goat canon tiger seal gorilla brown marine wild closeup california white cats eyes park animals otter blue mammal animal zoo nature dolphin nose farm hasHabitat Semantically Enriching Folksonomies with FLOR

STEP2: Querying Folksonomies through the Semantic Layer our goal STEP2: Querying Folksonomies through the Semantic Layer Query Mechanism Dolphin Seal Marine Mammal Sea hasHabitat Whale Body of Water Ocean Mammal Terrestrial Mammal Tiger Lion Sea Elephant Animal kitten furry pets cow whiskers whale eye cat cute feline pet monkey water deer primate bear lion rodent giraffe dog elephant fur ocean rabbit sea grass cute tree goat canon tiger seal gorilla brown marine wild closeup california white cats eyes park animals otter blue mammal animal zoo nature dolphin nose farm Semantically Enriching Folksonomies with FLOR

“Dolphin OR Seal OR Sea Elephant OR Whale” 21/24 ≈ 87% relevant Semantically Enriching Folksonomies with FLOR

existing work on folksonomy enrichment tag clustering based on co-occurrence frequency, to identify groups of related tags works well in certain contexts, but does not bring ‘explicit semantics’ into the system co-occurrence has no formal meaning (still not able to address the problem of “animal living in water”) existing semantic approaches limited in their semantic coverage some use a thesaurus others use a pre-defined ontology some cases require human intervention domain specific Semantically Enriching Folksonomies with FLOR

our approach automatic semantic enrichment of tagspaces exploiting the entire Semantic Web as well as other sources of background knowledge domain independent enrichment includes the semantic neighbourhood of a concept found in an ontology Including the semantic neighbourhood of a concept (found in an ontology) (as opposed to only linking the concept with a concept) Semantically Enriching Folksonomies with FLOR

FLOR Input Lexical Processing Semantic Expansion Semantic Enrichment Output Online Ontologies Dictionary Thesauri Entity Discovery Tagset Isolated Tags Sense Definition Sem. Enriched Tagset Sem. Expanded Tagset Entity Selection Lexical Isolation Normalised Tagset FLOR is modular, composed of three phases allowing to alter each phase individually (as long as each phase accepts and produces the predefined input and out put, described in the next slide) Semantic Expansion Relation Discovery Lexical Normalisation Semantically Enriching Folksonomies with FLOR

1.1.Lexical Isolation isolate tags that can’t be processed by the next steps of FLOR special characters “:P”, “(raw -> jpg)” non English “sillon”, “arbol” numbers “356days”, “tag1” Lexical Processing Dictionary Isolated Tags As mentioned in the previous slide, FLOR components can be altered, enhanced to deal with more types of tags. The lexical processing phase is isolating tags that in each FLOR run cannot be tackled by next phases. For example the tag “:P” can be found neither in WN nor in SW. The same happens for the rest of the tags with special characters, thus we isolate them with this phase. Tagset Lexical Isolation Normalised Tagset Lexical Normalisation Semantically Enriching Folksonomies with FLOR

1.2.Lexical Normalisation enhance anchoring Folksonomies: santabarbara Semantic Web: Santa-Barbara or Santa+Barbara WordNet: Santa Barbara Produce the following: {santaBarbara santa.barbara, santa_barbara, santa(space)barbara, santa-barbara, santa+barbara, ..} Lexical Processing Dictionary Isolated Tags Tagset Lexical Isolation Normalised Tagset Lexical Normalisation Semantically Enriching Folksonomies with FLOR

FLOR methodology Semantically Enriching Folksonomies with FLOR buildings corporation road england bw neil101 1. Lexical Processing buildings : <buildings, building> corporation : <corporation> road : <road> england : <england> Each step generates the information of the same colour. Semantically Enriching Folksonomies with FLOR

2. Sense Definition & Semantic Expansion Goals: Define appropriate sense for each tag (based on the context) Expand the tag with Synonyms and Hypernyms Semantic Expansion Thesauri Sense Definition Sem. Expanded Tagset Normalised Tagset Semantic Expansion Semantically Enriching Folksonomies with FLOR

2.1.Sense Definition Wu & Palmer Conceptual Similarity1 1. Z. Wu and M. Palmer. Verb semantics and lexical selection. In 32nd Annual Meeting of the Association for Computational Linguistics, 1994. Semantically Enriching Folksonomies with FLOR

2.1.Sense Definition building corporation road england building artifact construction way road building object entity Wu and Palmer Similarity: 0.666 road Using the Wu and Palmer similarity formula on WordNet calculate the pairwise similarity for all combinations of tags. Building and england don’t connect in WN. Semantically Enriching Folksonomies with FLOR

2.1.Sense Definition group social group organization gathering building corporation road england building corporation group social group organization gathering Wu and Palmer Similarity: 0.363 enterprise building Wu and Palmer similarity is calculated by looking at the path that connects all the possible pairs of senses from the two tags in the hierarchy of wordnet business the occupants of a building; "the entire building complained about the noise“ firm corporation Semantically Enriching Folksonomies with FLOR

2.1.Sense Definition Selected Senses a structure that has a roof and walls and stands more or less permanently in one place; "there was a three-story building on the corner” building a business firm whose articles of incorporation have been approved in some state corporation road We select for building (and the same happens for the rest of the tags) the sense that returned a higher similarity with another tag of the tagset. In case of no similarities of a tag with the others in the tagset then the first sense from WordNet (=most popular) is selected an open way (generally public) for travel or transportation england a division of the United Kingdom Semantically Enriching Folksonomies with FLOR

2.2.Semantic Expansion The synonyms and hypernyms from the selected senses are used to expand the tags Synonyms Hypernyms buildings: < <edifice>, < structure, construction, artefact, …> > corporation: < <corp>, < firm, business, concern,..> > road: < <route>, <way, artefact, object,..> > england : < < >, <European_Country, European_Nation, land,..> > Semantically Enriching Folksonomies with FLOR

FLOR methodology Semantically Enriching Folksonomies with FLOR 2. Disambiguation & Semantic Expansion buildings corporation road england bw neil101 1. Lexical Processing buildings : <buildings, building> corporation : <corporation> road : <road> england : <england> buildings: < <buildings, building>, <edifice>, < structure construction, artefact, …> > corporation: < <corporation>, <corp>, < firm, business, concern,..> > road: < <road>, <route>, <way, artifact, object,..> > england : < <england>, < >, <European_Country, European_Nation, land,..> > Each step generates the information of the same colour. Semantically Enriching Folksonomies with FLOR

3.Semantic Enrichment The final phase, links the tags with Ontological Entities (Semantic Web Entities, SWEs) Class Property Individual Semantic Enrichment Online Ontologies Entity Discovery Sem. Enriched Tagset Sem. Expanded Tagset Entity Selection Relation Discovery Semantically Enriching Folksonomies with FLOR

3.1.Entity Discovery Query the Semantic Web with Identify all entities that contain the tag OR its lexical representations OR its synonyms as localname OR label For each tagset For each tag Do what is described in the slides Semantically Enriching Folksonomies with FLOR

3.1.Entity Discovery Watson results: Ontology B Ontology A Ontology C HumanShelterConstruction Ontology A BuiltStructure Building Railway Pier Bridge Tower PublicConstant FixedStructure Building SpaceInAHOC PartOfAnHSC TwoStoryBuilding OneStoryBuilding ThreeStoryBuilding Ontology C Dashed line represents disjointness The shadowed results are very similar according to the similarity measure explained in the next slide. The BuiltEntity is something I created (wasn’t found on WATSON) to demonstrate the next slide Ontology D Spot Structure Building Building label: Gebäude Semantically Enriching Folksonomies with FLOR

3.2.Entity Selection the discovered Semantic Web Entities are compared against Semantically Expanded tags buildings: < <edifice>, < structure, construction, artefact, …> > HumanShelterConstruction Building FixedStructure PublicConstant ThreeStoryBuilding PartOfAnHSC SpaceInAHOC OneStoryBuilding TwoStoryBuilding Ontology B The parents of the entities are checked against the hypernyms and if they match (flexibly) then the entity is linked to the tag Entity B is strongly connected to the tag as there are two parents matching two hypernyms. Semantically Enriching Folksonomies with FLOR

FLOR methodology Semantically Enriching Folksonomies with FLOR 2. Disambiguation & Semantic Expansion buildings corporation road england bw neil101 1. Lexical Processing buildings : <buildings, building> corporation : <corporation> road : <road> england : <england> 3. Semantic Enrichment buildings: < <buildings, building>, <edifice>, < structure construction, artefact, …> > corporation: < <corporation>, <corp>, < firm, business, concern,..> > road: < <road>, <route>, <way, artifact, object,..> > england : < <england>, < >, <European_Country, European_Nation, land,..> > Each step generates the information of the same colour. buildings : < <buildings, building>, <edifice>, < structure construction, artefact, …>, <URI1#Building, URI2#Building> > corporation : < <corporation>, <corp>, < firm, business, concern,..>, <URI1#Corporation, URI2#Corp> > road : < <road>, <route>, <way, artefact, object,..>, <URI1#Route> > england : < <england>, <>, <Europ. Country, Europ.Nation, land,..>, <URI1#England, URI2#England> > Tags Lexical Synonyms Hypernyms Semantic Web Entities Representations Semantically Enriching Folksonomies with FLOR

preliminary experiments randomly selected 250 photos tagged with 2819 distinct tags the Lexical Isolation phase removed 59% of the tags, resulting to 1146 distinct tags and 226 photos the isolated tags included: 45 two character tags (e.g., pb, ak) 333 containing numbers (e.g., 356days, tag1) 86 containing special characters (e.g., :P, (raw-> jpg)) 818 non English tags (e.g., sillon, arbol) 250-226 =24 photos which contained exclusively the isolated tags. Semantically Enriching Folksonomies with FLOR

tag based results Tag enrichment = CORRECT Tag enrichment = INCORRECT if tag was linked to appropriate SWE Tag enrichment = INCORRECT if tag was linked to un-appropriate SWE Tag enrichment = UNDETERMINED If we were not able to determine the correctness of the enrichment Tag NON ENRICHED if tag was not linked to any entity Manual evaluation. Tag enrichment = CORRECT if was linked to appropriate SWE (according to the context of the tag) Tag enrichment = INCORRECT if was linked to un-appropriate SWE (according to the context of the tag) Tag enrichment = UNDETERMINED not able to determine based on the context Tag NON ENRICHED if not linked to any entity Semantically Enriching Folksonomies with FLOR

tag based results 93 % enrichment precision 73.4% non enriched tags selected a random 10% (85 tags) and were able to manually enriched 29, thus: ~70% due to Knowledge Sparseness in Watson or Semantic Web ~30% of the non-enriched tags due to FLOR algorithm issues FLOR failed to enrich 841 tags, i.e., 73.4% of the tags (see Table 1). Because this is a signficant amount of tags, we wished to understand whether the enrichment failed because of FLOR's recall or because most of the tags have no equivalent coverage in online ontologies. Semantically Enriching Folksonomies with FLOR

FLOR algorithm issues 24% of non enriched tags defined incorrectly in Phase 2 (i.e., assigned to the wrong sense) e.g., <square> assigned to <geometrical-shape> rather than <geographical-area> 55% of non enriched tags were differently defined in WordNet and in ontologies e.g.,: love WordNet: Love→ Emotion → Feeling → Psychological feature (a strong positive emotion of regard and affection) Semantic Web: Love subClassOf Affection Although both these definitions refer to the same sense, and additionally the superclass Affection belongs to the gloss of Love in WordNet, they were not matched because Affection does not appear as a hypernym of Love. Current work investigates alternative ways of Semantic Expansion. Semantically Enriching Folksonomies with FLOR

photo based results Photo enrichment = CORRECT if all enriched tags CORRECT Photo enrichment = INCORRECT if all enriched tags INCORRECT Photo enrichment = MIXED if some tags INCORRECT and some tags CORRECT Photo enrichment = UNDETERMINED if all enriched tags UNDETERMINED (i.e. could not decide on correctness) Photo NON ENRICHED if none of the tags was enriched Semantically Enriching Folksonomies with FLOR

photo based results Semantically Enriching Folksonomies with FLOR

future work Semantic Relatedness measure instead of similarity measure Process the Lexically Isolated tags using other background knowledge resources, e.g. Wikipedia. Relation discovery between tags with Step2: Intelligent Query Interface large scale evaluation Expand the tags with more hypernyms and synonyms (love-affection) case Semantically Enriching Folksonomies with FLOR

conclusions automatic semantic enrichment of tagspaces is possible 93% precision in the 24.5% enriched tags 79% enriched resources three phase architecture works well identified the steps of each phase that require improvement Semantically Enriching Folksonomies with FLOR

Thank you  S.Angeletou@open.ac.uk http://flor.kmi.open.ac.uk/ Semantically Enriching Folksonomies with FLOR