Presentation is loading. Please wait.

Presentation is loading. Please wait.

A. Frank From Syntactic Search to Semantic Search מחיפוש תחבירי לחיפוש משמעותי אריאל פרנק מחלקה למדעי המחשב אוניברסיטת בר-אילן

Similar presentations


Presentation on theme: "A. Frank From Syntactic Search to Semantic Search מחיפוש תחבירי לחיפוש משמעותי אריאל פרנק מחלקה למדעי המחשב אוניברסיטת בר-אילן"— Presentation transcript:

1 A. Frank From Syntactic Search to Semantic Search מחיפוש תחבירי לחיפוש משמעותי אריאל פרנק מחלקה למדעי המחשב אוניברסיטת בר-אילן ariel@cs.biu.ac.il

2 2 A. Frank Contents Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!

3 3 A. Frank “The good, the bad and the …”

4 4 A. Frank Semantics in Web 2.0?!, Open Gardens blog, Ajit Jaokar http://opengardensblog.futuretext.com/archives/2005/12/mobile_Web_20_w.html

5 5 A. Frank Web 1.0, Web 2.0, Web 3.0, Web X.0…

6 6 A. Frank Where are we headed?!

7 7 A. Frank Contents Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!

8 8 A. Frank Googlism

9 9 A. Frank Google Presents us with a search box and allows us to enter free form queries. Basically retrieves documents based on keyword searches. The search algorithms are based on string/pattern matching, statistical frequencies and Page rank (based on hyperlink popularity analysis). There is some light-weight semantics – when you search for known objects or public data, Google knows about these types of entities (OneBox) and provides factual answers.

10 10 A. Frank Actual Answers to Factual Questions

11 11 A. Frank Find Public Data

12 12 A. Frank Compare Public Data

13 13 A. Frank Current State The current generation of search engines is severely limited in its understanding of the user's intent and the Web's content. Semantic search has attracted a lot of attention in the past year, largely due to the growth of the Semantic Web as a whole. The term “Semantic Search” itself is popular enough to be considered overused/overloaded. But in general it refers to methods of searching documents beyond the syntactic level of matching keywords.

14 14 A. Frank Syntactic vs. Semantic Search Syntactic search – can match the query against –index of the textual content of the resources –URIs (URLs, URNs) in the system –literals in the RDF metadata –or a combination of these, possibly using: Exact, prefix or substring match, stemming, minimal edit distance Semantic search – in addition to syntactic search, can use –index of the meaning of sentences in each resource –semantic knowledge structures and analysis –the graph structure of RDF metadata –or a combination of these, possibly using: query expansion, classification/categorization, tagging, graph traversal, microformats, RDF & OWL inferencing and reasoning

15 15 A. Frank Can Semantic Search answer this :-?)

16 16 A. Frank Semantics & NLP Semantics is a subfield of linguistics that is devoted to the study of meaning, as expressed by words, phrases, sentences, and even larger units of text. Natural Language Processing (NLP) is a subfield of artificial intelligence and computational linguistics that studies the problems of automated generation and understanding of natural human languages by computers. Semantic NLP integrates these to provide meaning and understanding within context of computer applications.

17 17 A. Frank Semantic Search in General Collective methods that look at information available beyond the level of individual words or phrases. Semantic search methods are diverse and cover a range of disciplines –Information Retrieval, Natural Language Processing, Semantic Analysis, the Semantic Web, Databases, Information Extraction, Information Visualization, etc. An understanding of content at the semantic level also makes it possible to aggregate, compare and reason with content as structured data. Enables people to type in arbitrarily complex questions and then interpret these queries and execute them.

18 18 A. Frank Contents Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!

19 19 A. Frank Types of “Semantic Search” Engines Semantic Web Search Engines –search on the Semantic Web data (RDF, OWL, etc). Semantically-enhanced Search Engines –mostly augment and refine the displayed results. NLP-based Search Engines –mainly work on the indexing and query side of the search. Semantic-NLP-based Search Engines –employ semantic knowledge structures and analysis. Computational-NLP-based Search Engines –employ an inference engine that enables reasoning and computation.

20 20 A. Frank Sample of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …

21 21 A. Frank Sample queries used here Query involves extracting concepts –Artificial Intelligence Query involves ambiguous term –Jaguar, Cycle Query involves aggregating data –What did Albert Einstein do Query involves inference –When was California’s governor born Query involves computation –What is the distance from San Francisco to Michigan

22 22 A. Frank Results of sample queries used here True Knowledge CognitionhakiaPowersetFreebase Engines Queries √√√√√ extracting concepts √√√√√ ambiguous term ~~ √√√ aggregating data √√ ~~ X inference √X ~ XX computation

23 23 A. Frank Types of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …

24 24 A. Frank Google Orion Technology that can better understand associations and concepts related to search. It finds relationships between queries and related concepts and presents those as search refinements: –Pages are scanned in “real-time” by Google after a query is entered. –Conceptually and contextually related sites/pages are then identified and expressed in the form of the improved refinements. Also provides longer “snippets” (text extracts containing the keywords) when users input queries of three words or more.

25 25 A. Frank Orion Search Refinements

26 26 A. Frank Yahoo! SearchMonkey The SearchMonkey platform is mainly based on the Semantic Web approaches. Supports the ability to modify the presentation of search results through plug-ins to the search interface. Content providers or third-party developers can create custom data services to extract information from the Web page. After defining the extracted properties per entry, an enhanced display can be provided.

27 27 A. Frank SearchMonkey Enhanced Display

28 28 A. Frank Types of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …

29 29 A. Frank MetaWeb Freebase Open, shared database of the world’s knowledge that collects data from the Web to build a massive, collaboratively-edited database of cross-linked data. Free for anyone to query, contribute to, build applications on top of, or integrate into their Web sites. Focus is on organizing and managing complex data structures by use of Semantic Web technologies. Enables extraction of ordered knowledge out of the information chaos that is the current Web.

30 30 A. Frank Freebase Interface

31 31 A. Frank Freebase Repository Covers millions of topics in hundreds of categories. Freebase contains structured information on many popular topics, like movies, music, people and locations – all reconciled and freely available. A base is a new kind of Website that anyone can build using Freebase; It is a place to organize and share collections of information about a particular subject. Freebase has two search interfaces: –Free form queries; a text search box. –MQL (MetaWeb Query Language) queries.

32 32 A. Frank Freebase Structure Freebase spans domains, but requires that a particular topic exist only once, even if it might normally be found in multiple bases. For example, Arnold Schwarzenegger would appear in a movie base as an actor, a political base as a governor and a bodybuilder base as a Mr. Universe. In Freebase, there is only one topic for Arnold Schwarzenegger, with all three facets of his public persona brought together.

33 33 A. Frank Freebase vs. Wikipedia The difference lies in the way they store information. Wikipedia arranges information in the form of articles. Freebase lists facts and statistics. Its list form is good not only for people who like to glance at facts, but also for people who want to use the data to build other Web sites/software. Topics covered by Freebase include subjects that are too obscure for Wikipedia, which strives for notability appropriate to an encyclopedia.

34 34 A. Frank Query involves extracting concepts

35 35 A. Frank Query involves ambiguous term

36 36 A. Frank Query involves aggregating data (1)

37 37 A. Frank Query involves aggregating data (2)

38 38 A. Frank Query involves inference

39 39 A. Frank Query involves computation

40 40 A. Frank Microsoft Powerset Provides a NLP-based search engine. Powerset reads and understands every sentence on a Web page and allows asking questions in plain English. Powerset search and discovery experience is currently based on Wikipedia and Freebase. In large, Powerset is trying to match the meaning of a query with the meaning of sentences in Wikipedia or facts in Freebase.

41 41 A. Frank Powerset Interface

42 42 A. Frank Powerset Services In the search box, you can express yourself in keywords, phrases, or simple questions. On the search results page, Powerset gives more accurate results, often answering questions directly, and aggregating information from across multiple articles. Powerset's technology follows you into enhanced Wikipedia articles, giving you a better way to quickly digest and navigate content.

43 43 A. Frank Powerset Factz Factz are concise representations of information extracted from sentences. They are represented in three parts: the subject, relation and object (e.g., Oswald shot JFK). Factz do not always represent truth, but rather propositions that are asserted in the text of Wikipedia. On the search results page, you will often see Factz for general topic queries; these Factz are collected from pages across Wikipedia. On a particular topic page, you can see the Factz extracted from the given page in the outline.

44 44 A. Frank Snippet of a Powerset Result Page

45 45 A. Frank Query involves extracting concepts

46 46 A. Frank Query involves ambiguous term

47 47 A. Frank Query involves aggregating data

48 48 A. Frank Query involves inference

49 49 A. Frank Query involves computation

50 50 A. Frank Types of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …

51 51 A. Frank hakia Brings relevant results based on concept match rather than keyword match or popularity ranking. hakia has constructed three major components: 1.QDEX (Query Detection and Extraction) is an advanced system that enables semantic analysis of Web pages, and "meaning-based" search. 2.SemanticRank algorithm is comprised of innovative solutions from the disciplines of Ontological Semantics, Fuzzy Logic, Computational Linguistics, and Mathematics. 3.OntoSem (Ontological Semantics) is based on a formal and comprehensive linguistic theory of meaning in natural language.

52 52 A. Frank hakia Interface

53 53 A. Frank System hakia QDEX

54 54 A. Frank hakia QDEX The QDEX algorithm –analyzes the entire content of a Web page (including HTML). –extracts all possible queries that can be asked to this content, at various lengths and forms. –queries (called sequences) become gateways to the originating documents, paragraphs and sentences during the retrieval mode. –All this is done off-line before any actual query is received from a user. The critical point in QDEX system is to be able to decompose sentences into a handful of meaningful sequences without getting lost in the combinatory explosion space (hakia uses OntoSem technology to meet this challenge). Decomposing content in this way provides great flexibility in a search engine platform for utilizing semantically rich data and multiple-thread processing of equivalent sequences.

55 55 A. Frank SemanticRank Algorithm

56 56 A. Frank hakia SemanticRank Ranks search results in the order relevancy: –A pool of relevant paragraphs come from the QDEX system for terms of a given query. –The final relevancy is determined based on advanced sentence analysis and concept match between the query and the best sentence of each paragraph. –Morphological and syntactic analyses are also performed. Among the criteria for ranking, the credibility and age (of the Web page) are also taken into account via proprietary measurements. Popularity of keywords by means of link referrals is not used, which eliminates the possibilities of –meaningless results ranking high. –manipulation of results by fictitious page design.

57 57 A. Frank hakia OntoSem A formal and comprehensive linguistic theory of meaning in natural language. OntoSem offers an advanced methodology and technology for natural language processing. Has set of well-developed and constantly improving resources: –language-independent ontology of thousands of interrelated concepts. –an ontology-based English lexicon of 100,000 word senses, and counting (plus, the lexicons for several other languages under construction). –an ontological parser which "translates" every sentence of the text into its text meaning representation, approximating the complete understanding of the sentence by the native speaker.

58 58 A. Frank Query involves extracting concepts

59 59 A. Frank Query involves ambiguous term

60 60 A. Frank Query involves aggregating data

61 61 A. Frank Query involves inference

62 62 A. Frank Query involves computation

63 63 A. Frank Cognition Technologies Semantic NLP technology employs a unique mix of linguistics and mathematical algorithms –has, in effect, taught the computer the meanings (or associated concepts) of, and relations between, nearly all the words and frequently used phrases within the common English language. Focuses on the understanding of word and phrase meanings within context –provides an application or end-user with actionable content based upon semantic knowledge. Is able to simultaneously deliver significantly higher levels of precision and recall than is usually possible.

64 64 A. Frank Cognition Interface

65 65 A. Frank Cognition’s Semantic Map Cognition's Semantic NLP enabling technology contains one of the world's largest computational dictionaries, also known as a Semantic Map. This Semantic Map encodes a wealth of morphological, syntactic and semantic information about the words of the English language and their relationships to each other. These resources were created and reviewed by lexicographers and linguists over a span of 24 years.

66 66 A. Frank How Cognition Is Different Cognition has changed the NLP paradigm through its unique and complete combination of a complete semantic map and linguistic elements to optimize semantic understanding: –Morphology The various forms of word, e.g. singular, plural, tense –Syntax The grammatical structure, e.g. verbs, nouns –Semantics Word and sentence meaning, augmented by synonymy and taxonomy –Spelling The various ways words are spelled (or misspelled)

67 67 A. Frank Wikipedia.Cognition Interface

68 68 A. Frank Query involves extracting concepts

69 69 A. Frank Query involves ambiguous term

70 70 A. Frank Query involves aggregating data

71 71 A. Frank Query involves inference (1)

72 72 A. Frank Query involves inference (2)

73 73 A. Frank Query involves computation

74 74 A. Frank Types of “Semantic Search” Engines Semantic Web Search (not covered here) –Stooge, Sindice, SWSE, Falcon-S, Watson, Shoe, … Semantically-enhanced Search (barely covered here) –Google Orion, Yahoo! SearchMonkey, (MS Kumo?), … NLP-based Search –MetaWeb Freebase, MS Powerset, … Semantic-NLP-based Search –hakia, Cognition, (Readware?), … Computational-NLP-based Search –True Knowledge, (Wolfram Alpha?), …

75 75 A. Frank True Knowledge Provides an Answer Engine aimed at dramatically improving the experience of finding known facts on the Web. Represents the world's knowledge in a form that is clear and accessible to humans, as well as being comprehensible to computers. Combines knowledge through a process of inference, drawing conclusions, and cross- referencing stored information to produce a reasoned answer.

76 76 A. Frank True Knowledge Interface

77 77 A. Frank Ways of looking at True Knowledge 1.A Q/A (Question-Answering) Site 2.The Answer Engine – The Perfect Complement to Search 3.A "Wikipedia for Facts" 4.A Universal Knowledge Base 5.A Platform for Building Knowledge Services

78 78 A. Frank True Knowledge Components Knowledge Base (KB) – a huge database of facts on any topic. Knowledge Generator – infers facts either using KB, other generated facts or external feeds of knowledge. Browser interface – supports NLP & query language. Query/Answer System – uses KB and generated facts to answer queries. System Assessment – further processes existing facts in order to maintain semantic consistency of KB. User Assessment – enables user to endorse or contradict particular facts in KB.

79 79 A. Frank Query involves extracting concepts

80 80 A. Frank Query involves ambiguous term

81 81 A. Frank Query involves aggregating data (1)

82 82 A. Frank Query involves aggregating data (2)

83 83 A. Frank Query involves aggregating data (3)

84 84 A. Frank Query involves inference

85 85 A. Frank Query involves computation (1)

86 86 A. Frank Query involves computation (2)

87 87 A. Frank Contents Web 1.0, Web 2.0, Web 3.0, Web X.0… Syntactic vs. Semantic Search Semantic Search Engines Spectrum of Semantic Search Problems?!

88 88 A. Frank Alex Iskold Spectrum

89 89 A. Frank Ken Ewell Debuking the Spectrum Ken Ewell for Semiotica

90 90 A. Frank Google “goes religious”

91 91 A. Frank Freebase “what’s” it

92 92 A. Frank … but knows what vocation is!

93 93 A. Frank PowerSet “does his best” for “me now”

94 94 A. Frank hakia mimics Google

95 95 A. Frank or wants to “end the occupation now”

96 96 A. Frank Cognition mixes it up

97 97 A. Frank True Knowledge seems intelligent

98 98 A. Frank … but isn’t really

99 99 A. Frank Computational-Semantic-NLP-based Search :-?)

100 10 0 A. Frank Sources P. Mika, Semantic Search Arrives at the Web, DevX, July 18, 2008, http://www.devx.com/semantic/Article/38595/0/page/1http://www.devx.com/semantic/Article/38595/0/page/1 A. Iskold, Semantic Search: The Myth and Reality, ReadWriteWeb, May 29, 2008, http://www.readwriteWeb.com/archives/semantic_search_the_ myth_and_reality.php http://www.readwriteWeb.com/archives/semantic_search_the_ myth_and_reality.php K. Ewell, The Search for Semantic Search, Semiotica, June 18, 2008, http://commonsensical.wordpress.com/2008/06/18/the- search-for-semantic-search/ http://commonsensical.wordpress.com/2008/06/18/the- search-for-semantic-search/ N. Karandikar, Powerset vs. Cognition: A Semantic Search Shoot-out, Gigaom, June 7, 2008, http://gigaom.com/2008/06/07/powerset-vs-cognition-a- semantic-search-shoot-out/ http://gigaom.com/2008/06/07/powerset-vs-cognition-a- semantic-search-shoot-out/


Download ppt "A. Frank From Syntactic Search to Semantic Search מחיפוש תחבירי לחיפוש משמעותי אריאל פרנק מחלקה למדעי המחשב אוניברסיטת בר-אילן"

Similar presentations


Ads by Google