Download presentation
Presentation is loading. Please wait.
Published byMorris Holmes Modified over 9 years ago
1
Semantic Search: different meanings
2
Semantic search: different meanings Definition 1: Semantic search as the problem of searching documents beyond the syntactic level of matching keywords – Hakia, PowerSet, SearchMonkey Definition 2: Semantic search as the problem of searching large semantic web datasets – Watson, PowerAqua, Swoogle, Sindice, SWSE
3
Facing keyword-based search problems Relations between search terms: – “books about recommender systems” vs. “systems that recommend books” Polisemy – “mouth” as part of the body vs. “mouth” as part of a stream Synonymy – “movies” vs. “films” Documents about individuals where query keywords do not appear: – “English banks”, individual “Abbey”
4
Several attempts from the IR community Early 80s: elaboration of conceptual frameworks and their introduction in IR models – Taxonomies (categories + hierarchical relations), e.g., The ODP (Open Directory Project) – Thesaurus (categories + fixed hierarchical & associative relations), e.g., WordNet (used by linguistic approaches) – Algebraic methods such as LSA Limitations: The level of conceptualization is often shallow (specially at the level of relations)
5
The emergence of the SW Late 90s: introduction of ontologies as conceptual framework (classes + instances (KBs) + arbitrary semantic relations + rules) – Semantic search: Exploiting ontologies as a richer conceptualizations & formal languages to enhance traditional keyword-based document retrieval – Semantic search: Need to search this emergent and continuously growing structured information space (the Web of Data) DPLP, Geonames, DBPedia, BBC Music,... (http://esw.w3.org/TaskForces/CommunityProjects/LinkingO penData/DataSets)
6
The Web of Data 2007 2008 2009 Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
7
LOD cloud May 2007 Figure from [4] Facts: Focal points: DBPedia: RDFized vesion of Wikipiedia; many ingoing and outgoing links Music-related datasets Big datasets include FOAF, US Census data Size approx. 1 billion triples, 250k links Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
8
LOD cloud September 2008 Facts: More than 35 datasets interlinked Commercial players joined the cloud, e.g., BBC Companies began to publish and host dataset, e.g. OpenLink, Talis, or Garlik. Size approx. 2 billion triples, 3 million links Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
9
LOD cloud March 2009 Facts: Big part from Linking Open Drug cloud and the BIO2RDF project Notable new datasets: Freebase, OpenCalais, ACM/IEEE Size > 10 billion triples Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
10
The LOD clouds Extracted from: Linked Data Tutorial (Florianópolis) http://www.slideshare.net/ocorcho/linked-data-tutorial-florianpolis
11
Commercial interest by publishers
12
Commercial interest by search engines 2007 Yahoo! Presents Search Monkey
13
Commercial interest by search engines July-2008 Microsoft buys Powerset
14
Commercial interest by search engines April 2010 Facebook announced the use of the Open Graph protocol
15
Commercial interest by search engines May-2009 Google announces Rich Snippets and it’s official use of RDFa and Microformats
16
Commercial interest by search engines July-2010 Google buys Metaweb (the company behind FreeBase)
17
Commercial interest by search engines November-2010 Google announced the support of the GoodRelations vocabulary for Google Rich Snippets.
18
Challenges Exploiting this new information space for semantic search purposes opens new research challenges: – Scalability – Heterogeneity – Uncertainty
19
Scalability Effective exploitation of the linked data requires infrastructure that scales to a large and ever growing collection of interlinked data!
20
Heterogeneity Dbpedia:Rudi_Studer Dblp:Studer:Rudi.html SW:/en/rudi_studer Dblp:~ley/db/../author SW:Person Dbpedia:Professor SCHEMA-LEVEL DATA-LEVEL Align Reconcile, Combine Effective exploitation of the data web requires an effective mechanism for finding the relevant data sources integrating data sources combining elements from different data sources
21
Uncertainty Incomplete Representation of User’s Needs and content meanings –User cannot completely specify the need –The semantic information in the search space is incomplete Effective exploitation requires match user’s needs to data in an imprecise way rank the results be flexible enough to adjust to changes in constraints! “Find action films directed by some Hong Kong film director and starring Chinese martial actors”
22
The Search Space: different representations
23
The search space: different representations Unstructured search space – The Web of documents (textual and multimedia content) Structured search space – The Web of data (ontologies + Knowledge Bases) Hybrid search space – Unstructured content is enriched with metadata Embedded annotations Not embedded annotations
24
The unstructured search space The Web of human-understandable content. The Web of documents and links – CC License Documents Search space
25
Search engines
26
The structured search space The Web of machine understandable content. The Web of objects and relations – Creative Commons License objects Search space
27
Search engines
28
The hybrid search space Enriching documents with metadata Objects Documents How to interlink documents and data? Search space
29
Two ways of interlinking metadata and documents Information Extraction By relying on Web publishers – More on the section Data on the (Semantic) Web
30
Search engines
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.