Download presentation
Presentation is loading. Please wait.
1
A. Frank From Syntactic Search to Semantic Search מחיפוש תחבירי לחיפוש משמעותי אריאל פרנק מחלקה למדעי המחשב אוניברסיטת בר-אילן ariel@cs.biu.ac.il
2
2 A. Frank Contents Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!
3
3 A. Frank “The good, the bad and the …”
4
4 A. Frank Semantics in Web 2.0?!, Open Gardens blog, Ajit Jaokar http://opengardensblog.futuretext.com/archives/2005/12/mobile_Web_20_w.html
5
5 A. Frank Web 1.0, Web 2.0, Web 3.0, Web X.0…
6
6 A. Frank Where are we headed?!
7
7 A. Frank Contents Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!
8
8 A. Frank Googlism
9
9 A. Frank Google Presents us with a search box and allows us to enter free form queries. Basically retrieves documents based on keyword searches. The search algorithms are based on string/pattern matching, statistical frequencies and Page rank (based on hyperlink popularity analysis). There is some light-weight semantics – when you search for known objects or public data, Google knows about these types of entities (OneBox) and provides factual answers.
10
10 A. Frank Actual Answers to Factual Questions
11
11 A. Frank Find Public Data
12
12 A. Frank Compare Public Data
13
13 A. Frank Current State The current generation of search engines is severely limited in its understanding of the user's intent and the Web's content. Semantic search has attracted a lot of attention in the past year, largely due to the growth of the Semantic Web as a whole. The term “Semantic Search” itself is popular enough to be considered overused/overloaded. But in general it refers to methods of searching documents beyond the syntactic level of matching keywords.
14
14 A. Frank Syntactic vs. Semantic Search Syntactic search – can match the query against –index of the textual content of the resources –URIs (URLs, URNs) in the system –literals in the RDF metadata –or a combination of these, possibly using: Exact, prefix or substring match, stemming, minimal edit distance Semantic search – in addition to syntactic search, can use –index of the meaning of sentences in each resource –semantic knowledge structures and analysis –the graph structure of RDF metadata –or a combination of these, possibly using: query expansion, classification/categorization, tagging, graph traversal, microformats, RDF & OWL inferencing and reasoning
15
15 A. Frank Can Semantic Search answer this :-?)
16
16 A. Frank Semantics & NLP Semantics is a subfield of linguistics that is devoted to the study of meaning, as expressed by words, phrases, sentences, and even larger units of text. Natural Language Processing (NLP) is a subfield of artificial intelligence and computational linguistics that studies the problems of automated generation and understanding of natural human languages by computers. Semantic NLP integrates these to provide meaning and understanding within context of computer applications.
17
17 A. Frank Semantic Search in General Collective methods that look at information available beyond the level of individual words or phrases. Semantic search methods are diverse and cover a range of disciplines –Information Retrieval, Natural Language Processing, Semantic Analysis, the Semantic Web, Databases, Information Extraction, Information Visualization, etc. An understanding of content at the semantic level also makes it possible to aggregate, compare and reason with content as structured data. Enables people to type in arbitrarily complex questions and then interpret these queries and execute them.
18
18 A. Frank Contents Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!
19
19 A. Frank Types of “Semantic Search” Engines Semantic Web Search Engines –search on the Semantic Web data (RDF, OWL, etc). Semantically-enhanced Search Engines –mostly augment and refine the displayed results. NLP-based Search Engines –mainly work on the indexing and query side of the search. Semantic-NLP-based Search Engines –employ semantic knowledge structures and analysis. Computational-NLP-based Search Engines –employ an inference engine that enables reasoning and computation.
20
20 A. Frank Sample of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …
21
21 A. Frank Sample queries used here Query involves extracting concepts –Artificial Intelligence Query involves ambiguous term –Jaguar, Cycle Query involves aggregating data –What did Albert Einstein do Query involves inference –When was California’s governor born Query involves computation –What is the distance from San Francisco to Michigan
22
22 A. Frank Results of sample queries used here True Knowledge CognitionhakiaPowersetFreebase Engines Queries √√√√√ extracting concepts √√√√√ ambiguous term ~~ √√√ aggregating data √√ ~~ X inference √X ~ XX computation
23
23 A. Frank Types of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …
24
24 A. Frank Google Orion Technology that can better understand associations and concepts related to search. It finds relationships between queries and related concepts and presents those as search refinements: –Pages are scanned in “real-time” by Google after a query is entered. –Conceptually and contextually related sites/pages are then identified and expressed in the form of the improved refinements. Also provides longer “snippets” (text extracts containing the keywords) when users input queries of three words or more.
25
25 A. Frank Orion Search Refinements
26
26 A. Frank Yahoo! SearchMonkey The SearchMonkey platform is mainly based on the Semantic Web approaches. Supports the ability to modify the presentation of search results through plug-ins to the search interface. Content providers or third-party developers can create custom data services to extract information from the Web page. After defining the extracted properties per entry, an enhanced display can be provided.
27
27 A. Frank SearchMonkey Enhanced Display
28
28 A. Frank Types of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …
29
29 A. Frank MetaWeb Freebase Open, shared database of the world’s knowledge that collects data from the Web to build a massive, collaboratively-edited database of cross-linked data. Free for anyone to query, contribute to, build applications on top of, or integrate into their Web sites. Focus is on organizing and managing complex data structures by use of Semantic Web technologies. Enables extraction of ordered knowledge out of the information chaos that is the current Web.
30
30 A. Frank Freebase Interface
31
31 A. Frank Freebase Repository Covers millions of topics in hundreds of categories. Freebase contains structured information on many popular topics, like movies, music, people and locations – all reconciled and freely available. A base is a new kind of Website that anyone can build using Freebase; It is a place to organize and share collections of information about a particular subject. Freebase has two search interfaces: –Free form queries; a text search box. –MQL (MetaWeb Query Language) queries.
32
32 A. Frank Freebase Structure Freebase spans domains, but requires that a particular topic exist only once, even if it might normally be found in multiple bases. For example, Arnold Schwarzenegger would appear in a movie base as an actor, a political base as a governor and a bodybuilder base as a Mr. Universe. In Freebase, there is only one topic for Arnold Schwarzenegger, with all three facets of his public persona brought together.
33
33 A. Frank Freebase vs. Wikipedia The difference lies in the way they store information. Wikipedia arranges information in the form of articles. Freebase lists facts and statistics. Its list form is good not only for people who like to glance at facts, but also for people who want to use the data to build other Web sites/software. Topics covered by Freebase include subjects that are too obscure for Wikipedia, which strives for notability appropriate to an encyclopedia.
34
34 A. Frank Query involves extracting concepts
35
35 A. Frank Query involves ambiguous term
36
36 A. Frank Query involves aggregating data (1)
37
37 A. Frank Query involves aggregating data (2)
38
38 A. Frank Query involves inference
39
39 A. Frank Query involves computation
40
40 A. Frank Microsoft Powerset Provides a NLP-based search engine. Powerset reads and understands every sentence on a Web page and allows asking questions in plain English. Powerset search and discovery experience is currently based on Wikipedia and Freebase. In large, Powerset is trying to match the meaning of a query with the meaning of sentences in Wikipedia or facts in Freebase.
41
41 A. Frank Powerset Interface
42
42 A. Frank Powerset Services In the search box, you can express yourself in keywords, phrases, or simple questions. On the search results page, Powerset gives more accurate results, often answering questions directly, and aggregating information from across multiple articles. Powerset's technology follows you into enhanced Wikipedia articles, giving you a better way to quickly digest and navigate content.
43
43 A. Frank Powerset Factz Factz are concise representations of information extracted from sentences. They are represented in three parts: the subject, relation and object (e.g., Oswald shot JFK). Factz do not always represent truth, but rather propositions that are asserted in the text of Wikipedia. On the search results page, you will often see Factz for general topic queries; these Factz are collected from pages across Wikipedia. On a particular topic page, you can see the Factz extracted from the given page in the outline.
44
44 A. Frank Snippet of a Powerset Result Page
45
45 A. Frank Query involves extracting concepts
46
46 A. Frank Query involves ambiguous term
47
47 A. Frank Query involves aggregating data
48
48 A. Frank Query involves inference
49
49 A. Frank Query involves computation
50
50 A. Frank Types of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …
51
51 A. Frank hakia Brings relevant results based on concept match rather than keyword match or popularity ranking. hakia has constructed three major components: 1.QDEX (Query Detection and Extraction) is an advanced system that enables semantic analysis of Web pages, and "meaning-based" search. 2.SemanticRank algorithm is comprised of innovative solutions from the disciplines of Ontological Semantics, Fuzzy Logic, Computational Linguistics, and Mathematics. 3.OntoSem (Ontological Semantics) is based on a formal and comprehensive linguistic theory of meaning in natural language.
52
52 A. Frank hakia Interface
53
53 A. Frank System hakia QDEX
54
54 A. Frank hakia QDEX The QDEX algorithm –analyzes the entire content of a Web page (including HTML). –extracts all possible queries that can be asked to this content, at various lengths and forms. –queries (called sequences) become gateways to the originating documents, paragraphs and sentences during the retrieval mode. –All this is done off-line before any actual query is received from a user. The critical point in QDEX system is to be able to decompose sentences into a handful of meaningful sequences without getting lost in the combinatory explosion space (hakia uses OntoSem technology to meet this challenge). Decomposing content in this way provides great flexibility in a search engine platform for utilizing semantically rich data and multiple-thread processing of equivalent sequences.
55
55 A. Frank SemanticRank Algorithm
56
56 A. Frank hakia SemanticRank Ranks search results in the order relevancy: –A pool of relevant paragraphs come from the QDEX system for terms of a given query. –The final relevancy is determined based on advanced sentence analysis and concept match between the query and the best sentence of each paragraph. –Morphological and syntactic analyses are also performed. Among the criteria for ranking, the credibility and age (of the Web page) are also taken into account via proprietary measurements. Popularity of keywords by means of link referrals is not used, which eliminates the possibilities of –meaningless results ranking high. –manipulation of results by fictitious page design.
57
57 A. Frank hakia OntoSem A formal and comprehensive linguistic theory of meaning in natural language. OntoSem offers an advanced methodology and technology for natural language processing. Has set of well-developed and constantly improving resources: –language-independent ontology of thousands of interrelated concepts. –an ontology-based English lexicon of 100,000 word senses, and counting (plus, the lexicons for several other languages under construction). –an ontological parser which "translates" every sentence of the text into its text meaning representation, approximating the complete understanding of the sentence by the native speaker.
58
58 A. Frank Query involves extracting concepts
59
59 A. Frank Query involves ambiguous term
60
60 A. Frank Query involves aggregating data
61
61 A. Frank Query involves inference
62
62 A. Frank Query involves computation
63
63 A. Frank Cognition Technologies Semantic NLP technology employs a unique mix of linguistics and mathematical algorithms –has, in effect, taught the computer the meanings (or associated concepts) of, and relations between, nearly all the words and frequently used phrases within the common English language. Focuses on the understanding of word and phrase meanings within context –provides an application or end-user with actionable content based upon semantic knowledge. Is able to simultaneously deliver significantly higher levels of precision and recall than is usually possible.
64
64 A. Frank Cognition Interface
65
65 A. Frank Cognition’s Semantic Map Cognition's Semantic NLP enabling technology contains one of the world's largest computational dictionaries, also known as a Semantic Map. This Semantic Map encodes a wealth of morphological, syntactic and semantic information about the words of the English language and their relationships to each other. These resources were created and reviewed by lexicographers and linguists over a span of 24 years.
66
66 A. Frank How Cognition Is Different Cognition has changed the NLP paradigm through its unique and complete combination of a complete semantic map and linguistic elements to optimize semantic understanding: –Morphology The various forms of word, e.g. singular, plural, tense –Syntax The grammatical structure, e.g. verbs, nouns –Semantics Word and sentence meaning, augmented by synonymy and taxonomy –Spelling The various ways words are spelled (or misspelled)
67
67 A. Frank Wikipedia.Cognition Interface
68
68 A. Frank Query involves extracting concepts
69
69 A. Frank Query involves ambiguous term
70
70 A. Frank Query involves aggregating data
71
71 A. Frank Query involves inference (1)
72
72 A. Frank Query involves inference (2)
73
73 A. Frank Query involves computation
74
74 A. Frank Types of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …
75
75 A. Frank True Knowledge Provides an Answer Engine aimed at dramatically improving the experience of finding known facts on the Web. Represents the world's knowledge in a form that is clear and accessible to humans, as well as being comprehensible to computers. Combines knowledge through a process of inference, drawing conclusions, and cross- referencing stored information to produce a reasoned answer.
76
76 A. Frank True Knowledge Interface
77
77 A. Frank Ways of looking at True Knowledge 1.A Q/A (Question-Answering) Site 2.The Answer Engine – The Perfect Complement to Search 3.A "Wikipedia for Facts" 4.A Universal Knowledge Base 5.A Platform for Building Knowledge Services
78
78 A. Frank True Knowledge Components Knowledge Base (KB) – a huge database of facts on any topic. Knowledge Generator – infers facts either using KB, other generated facts or external feeds of knowledge. Browser interface – supports NLP & query language. Query/Answer System – uses KB and generated facts to answer queries. System Assessment – further processes existing facts in order to maintain semantic consistency of KB. User Assessment – enables user to endorse or contradict particular facts in KB.
79
79 A. Frank Query involves extracting concepts
80
80 A. Frank Query involves ambiguous term
81
81 A. Frank Query involves aggregating data (1)
82
82 A. Frank Query involves aggregating data (2)
83
83 A. Frank Query involves aggregating data (3)
84
84 A. Frank Query involves inference
85
85 A. Frank Query involves computation (1)
86
86 A. Frank Query involves computation (2)
87
87 A. Frank Contents Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!
88
88 A. Frank Alex Iskold Spectrum
89
89 A. Frank Ken Ewell Debuking the Spectrum Ken Ewell for Semiotica
90
90 A. Frank Google “goes religious”
91
91 A. Frank Freebase “what’s” it
92
92 A. Frank … but knows what vocation is!
93
93 A. Frank PowerSet “does his best” for “me now”
94
94 A. Frank hakia mimics Google
95
95 A. Frank or wants to “end the occupation now”
96
96 A. Frank Cognition mixes it up
97
97 A. Frank True Knowledge seems intelligent
98
98 A. Frank … but isn’t really
99
99 A. Frank Computational-Semantic-NLP-based Search :-?)
100
10 0 A. Frank Sources P. Mika, Semantic Search Arrives at the Web, DevX, July 18, 2008, http://www.devx.com/semantic/Article/38595/0/page/1http://www.devx.com/semantic/Article/38595/0/page/1 A. Iskold, Semantic Search: The Myth and Reality, ReadWriteWeb, May 29, 2008, http://www.readwriteWeb.com/archives/semantic_search_the_ myth_and_reality.php http://www.readwriteWeb.com/archives/semantic_search_the_ myth_and_reality.php K. Ewell, The Search for Semantic Search, Semiotica, June 18, 2008, http://commonsensical.wordpress.com/2008/06/18/the- search-for-semantic-search/ http://commonsensical.wordpress.com/2008/06/18/the- search-for-semantic-search/ N. Karandikar, Powerset vs. Cognition: A Semantic Search Shoot-out, Gigaom, June 7, 2008, http://gigaom.com/2008/06/07/powerset-vs-cognition-a- semantic-search-shoot-out/ http://gigaom.com/2008/06/07/powerset-vs-cognition-a- semantic-search-shoot-out/
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.