Presentation is loading. Please wait.

Presentation is loading. Please wait.

1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.

Similar presentations


Presentation on theme: "1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum."— Presentation transcript:

1 1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum

2 2 Motivation Example queries  Which politicians are also scientists?  Which gods do the Maya and the Greeks have in common? Keyword queries are too weak to express advanced user intentions such as  concepts,  entity properties  relationships between entities

3 3 Motivation

4 4

5 5 Keyword queries are too weak to express advanced user intentions such as  concepts,  entity properties  relationships between entities Data is not knowledge.  Data extraction and organization needed

6 6 Query: Results: Benjamin Franklin Paul Wolfowitz Angela Merkel … Politician$x Scientist isA

7 7 Query: Results: Thunder god Wisdom god Agricultural god … $z $x $yMayan god type Greek god type

8 8

9 9 Web  Universal knowledge Information Extraction Semantic Database (Relational Database, XML(XLinks), RDF) Entity (Keyword) (Proximity) Search Ranking Question Answering Question Answering & Ranking NAGA Libra Nie et al. WWW 2007 Entity Search Cheng et al. CIDR 2007 Ex DBMS Cafarella et al. CIDR 2007 START Katz et al. TREC 2005 BANKS Bhalotia et al. ICDE 2002 DISCOVER Histridis et al. VLDB 2002 BLINKS He et al. SIGMOD 2007 Querying SYSTEMS KnowItAll Etzioni et al. WWW 2004 ALICE Banko et al. K-CAP 2007 TextRunner Banko et al. IJCAI 2007 YAGO Suchanek et al. WWW 2007

10 10 Outline Framework  Data model  Query language  Ranking model Evaluation  Setting  Metrics  Results

11 11 type Excerpt from YAGO: Suchanek et al. WWW 2007 Framework (Data model) Entity-relationship (ER) graph  Node label : entity  Edge label : relation  Edge weight : relation “strength” Fact  Represented by an edge Evidence pages for a fact f  Web pages from which f was derived Computation of fact confidence (i.e. edge weights) : Max Planck InstituteGermany locatedIn

12 12 type Excerpt from YAGO: Suchanek et al. WWW 2007 Framework (Data model) Entity-relationship (ER) graph  Node label : entity  Edge label : relation  Edge weight : relation “strength” Fact  Represented by an edge Evidence pages for a fact f  Web pages from which f was derived Computation of fact confidence (i.e. edge weights) : Max Planck InstituteGermany locatedIn

13 13 type Excerpt from YAGO: Suchanek et al. WWW 2007 Framework (Data model) Entity-relationship (ER) graph  Node label : entity  Edge label : relation  Edge weight : relation “strength” Fact  Represented by an edge Evidence pages for a fact f  Web pages from which f was derived Computation of fact confidence (i.e. edge weights) : Max Planck InstituteGermany locatedIn

14 14 type Excerpt from YAGO: Suchanek et al. WWW 2007 Framework (Data model) Entity-relationship (ER) graph  Node label : entity  Edge label : relation  Edge weight : relation “strength” Fact  Represented by an edge Evidence pages for a fact f  Web pages from which f was derived Computation of fact confidence (i.e. edge weights) : Max Planck InstituteGermany locatedIn

15 15 type Excerpt from YAGO: Suchanek et al. WWW 2007 Framework (Data model) Entity-relationship (ER) graph  Node label : entity  Edge label : relation  Edge weight : relation “strength” Fact  Represented by an edge Evidence pages for a fact f  Web pages from which f was derived Computation of fact confidence (i.e. edge weights) : Max Planck InstituteGermany locatedIn

16 16 Framework (Query language) R : set of relationship labels RegEx(R) : set of regular expressions over R-labels E : set of entity labels V : set of variables Definition (fact template) A fact template is a triple where e 1, e 2  E  V and r  RegEx(R)  V. Liu givenNameOf | familiyNameOf $x Examples: Albert Einstein $x Mileva Maric

17 17 Framework (Query language) Definition (NAGA query) A NAGA query is a connected directed graph in which each edge represents a fact template. Examples 1) Which physicist was born in the same year as Max Planck? Politician$x Scientist isA 2) Which politician is also a scientist? 3) Which scientist are called Liu? 4) Which mountain is located in Africa? Albert Einstein Niels Bohr * 5) What connects Einstein and Bohr? Africa$x Mountain isA loctedIn * Liu givenNameOf | familiyNameOf $x Scientist isA Physicist $x Max Planck$y bornInYear isA

18 18 Framework (Query language) Definition (NAGA query) A NAGA query is a connected directed graph in which each edge represents a fact template. Examples 1) Which physicist was born in the same year as Max Planck? Politician$x Scientist isA 2) Which politician is also a scientist? 3) Which scientist are called Liu? 4) Which mountain is located in Africa? Albert Einstein Niels Bohr * 5) What connects Einstein and Bohr? Africa$x Mountain isA loctedIn * Liu givenNameOf | familiyNameOf $x Scientist isA Physicist $x Max Planck$y bornInYear isA

19 19 0.96 Framework (Query language) Definition (NAGA answer) A NAGA answer is a subgraph of the underlying ER graph that matches the query graph. Examples 1) Which physicist was born in the same year as Max Planck? Albert Einstein Niels Bohr * 3) What connects Einstein and Bohr? 2) Which mountain is located in Africa? Africa$x Mountain isA loctedIn * AfricaTanzania Mountain isA loctedIn * Kilimanjaro loctedIn * Albert Einstein Nobel Prize hasWonPrize Niels Bohr hasWonPrize 0.95 0.98 Physicist 1858 Max PlanckMihajlo Pupuin bornInYear isA 0.96 0.97 Physicist $x Max Planck$y bornInYear isA

20 20 0.96 Framework (Query language) Definition (NAGA answer) A NAGA answer is a subgraph of the underlying ER graph that matches the query graph. Examples 1) Which physicist was born in the same year as Max Planck? Albert Einstein Niels Bohr * 3) What connects Einstein and Bohr? 2) Which mountain is located in Africa? Africa$x Mountain isA loctedIn * AfricaTanzania Mountain isA loctedIn Kilimanjaro loctedIn Albert Einstein Nobel Prize hasWonPrize Niels Bohr hasWonPrize 0.95 0.98 Physicist 1858 Max PlanckMihajlo Pupin bornInYear isA 0.96 0.97 Physicist $x Max Planck$y bornInYear isA

21 21 Framework (Ranking model) Question How to rank multiple matches to the same query? Ranking desiderata Confidence Correct answers  Certainty of IE  Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) Informativeness prominent results preferred  Frequency of facts in the corpus hasWonPrize Einsteinvegetarian BohrNobel Prize Tom Cruise 1962 isA bornInYear diedInYear hasWonPrize Compactness Prefer “tightly” connected answers  Size of the answer graph Einstein isa scientist Einstein isa vegetarian Einstein$x isA

22 22 Framework (Ranking model) Question How to rank multiple matches to the same query? Ranking desiderata Confidence Correct answers  Certainty of IE  Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) Informativeness prominent results preferred  Frequency of facts in the corpus hasWonPrize Einsteinvegetarian BohrNobel Prize Tom Cruise 1962 isA bornInYear diedInYear hasWonPrize Compactness Prefer “tightly” connected answers  Size of the answer graph Einstein isa scientist Einstein isa vegetarian Einstein$x isA

23 23 Framework (Ranking model) Question How to rank multiple matches to the same query? Ranking desiderata Confidence Correct answers  Certainty of IE  Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) Informativeness prominent results preferred  Frequency of facts in the corpus hasWonPrize Einsteinvegetarian BohrNobel Prize Tom Cruise 1962 isA bornInYear diedInYear hasWonPrize Compactness Prefer “tightly” connected answers  Size of the answer graph Einstein isa scientist Einstein isa vegetarian Einstein$x isA

24 24 Framework (Ranking model) Question How to rank multiple matches to the same query? Ranking desiderata Confidence Correct answers  Certainty of IE  Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) Informativeness prominent results preferred  Frequency of facts hasWonPrize Einsteinvegetarian BohrNobel Prize Tom Cruise 1962 isA bornInYear diedInYear hasWonPrize Compactness Prefer “tightly” connected answers  Size of the answer graph Einstein isa scientist Einstein isa vegetarian Einstein$x isA

25 25 Framework (Ranking model) Question How to rank multiple matches to the same query? Ranking desiderata Confidence Correct answers  Certainty of IE  Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) Informativeness prominent results preferred  Frequency of facts hasWonPrize Einsteinvegetarian BohrNobel Prize Tom Cruise 1962 isA bornInYear diedInYear hasWonPrize Compactness Prefer “tightly” connected answers  Size of the answer graph Einstein isa scientist Einstein isa vegetarian Einstein$x isA

26 26 Framework (Ranking model) Question How to rank multiple matches to the same query? Ranking desiderata Confidence Correct answers  Certainty of IE  Trust/Authority of source “Max Planck born in Kiel” bornIn (Max_Planck, Kiel) (Source: Wikipedia) “They believe Elvis hides on Mars” livesIn (Elvis_Presley, Mars) (Source: The One and Only King‘s Blog) Informativeness prominent results preferred  Frequency of facts hasWonPrize Einsteinvegetarian BohrNobel Prize Tom Cruise 1962 isA bornInYear diedInYear hasWonPrize Compactness Prefer “tightly” connected answers  Size of the answer graph Einstein isa scientist Einstein isa vegetarian Einstein$x isA NAGA exploits language models for ranking

27 27 Framework (Ranking model) Statistical Language Models for Document IR q LM(  1 ) d1d1 d2d2 LM(  2 ) ? ? each doc has LM: generative prob. distr. with parameters  query q viewed as sample estimate likelihood that q is sample of LM of doc d rank by descending likelihoods (best „explanation“ of q) [Maron/Kuhns 1960, Ponte/Croft 1998, Lafferty/Zhai 2001] MLE: sparseness mixture model Background model (smoothing)

28 28 Scoring answers Query q with templates q 1 q 2 … q n, e.g. Given g with facts g 1 g 2 … g n, e.g. We use generative mixture models to compute P[q | g] background model estimated by correlation statistics using generative mixture model estimated using knowledge base graph structure based on IE accuracy and authority analysis Framework (Ranking model) Albert Einstein isA $x Albert Einstein isA Physicist

29 29 Framework (Ranking model) Informativeness Consider Possible results NAGA Ranking (Informativeness) Albert Einstein isA $x Albert Einstein isA PhysicistAlbert Einstein isA Vegetarian

30 30 Framework (Ranking model) Informativeness Consider Possible results BANKS Ranking (Bhalotia et al. ICDE 2002) Albert Einstein isA $x Albert Einstein isA VegetarianAlbert Einstein isA Physicist Relies only on underlying graph structure Importance of an entity is proportional to its degree Albert Einstein isA Physicist Vegetarian isA  Vegetarian more important than Physicist

31 31 Evaluation (Setting) Knowledge graph YAGO (Suchanek et al. WWW 2007)  16 Million facts 85 NAGA queries  55 queries from TREC 2005/2006  12 queries from the work on SphereSearch (Graupmann et al. VLDB 2005)  We provided 18 regular expression queries

32 32 Evaluation (Setting) The queries were issued to  Google,  Yahoo! Answers,  START ( http://start.csail.mit.edu/),  NAGA (Banks scoring) relies only on the structure of the underlying graph. (see Bhalotia et al. ICDE 2002)  NAGA (NAGA scoring) top-10 answers assessed by 20 human judges as relevant, less relevant and irrelevant.

33 33 Evaluation (Setting) The queries were issued to  Google,  Yahoo! Answers,  START ( http://start.csail.mit.edu/),  NAGA (Banks scoring) relies only on the structure of the underlying graph. (see Bhalotia et al. ICDE 2002)  NAGA (NAGA scoring) top-10 answers assessed by 20 human judges as relevant (2), less relevant (1), and irrelevant (0).

34 34 Evaluation (Metrics & Results) NDCG (normalized discounted cumulative gain)  rewards result lists in which relevant results are ranked higher than less relevant ones  Useful when comparing result lists of different lengths P@1  to measure how satisfied the user was on average with the first answer of the search engine We report the Wilson confidence intervals at  =0.95%

35 35 Evaluation (Metrics & Results) NDCG (normalized discounted cumulative gain)  rewards result lists in which relevant results are ranked higher than less relevant ones  Useful when comparing result lists of different lengths P@1  to measure how satisfied the user was on average with the first answer of the search engine We report the Wilson confidence intervals at  =0.95%

36 36 Evaluation (Metrics & Results) NDCG (normalized discounted cumulative gain)  rewards result lists in which relevant results are ranked higher than less relevant ones  Useful when comparing result lists of different lengths P@1  to measure how satisfied the user was on average with the first answer of the search engine Wilson confidence intervals computed at  =0.95% Benchmark# Q# AMetricsGoogleYahoo! Answers STARTBANKS scoringNAGA TREC551098NDCG P@1 75.88% 67.81% 26.15% 17.20% 75.38% 73.23% 87.93% 69.54% 92.75% 84.40% SphereSearch12343NDCG P@1 38.22% 19.38% 17.23% 6.15%2.87% 88.82% 84.28% 91.01% 84.94% OWN18418NDCG P@1 54.09% 27.95% 17.98% 6.57% 13.35% 13.57% 85.59% 76.54% 91.33% 86.56%

37 37 Summary NAGA is a search engine for  Advanced querying of information in ER graphs NAGA queries NAGA answers We a novel scoring mechanism  based on generative language models, Applied to the specific and unexplored setting of ER graphs  Incorporating confidence, informativeness, and compactness Viability of the approach demonstrated in comparison to state of the art search engines and QA-Systems.

38 38 Summary NAGA is a search engine for  Advanced querying of information in ER graphs NAGA queries NAGA answers We propose a novel scoring mechanism  based on generative language models, Applied to the specific and unexplored setting of ER graphs  Incorporating confidence, informativeness, and compactness Viability of the approach demonstrated in comparison to state of the art search engines and QA-Systems.

39 39 Summary NAGA is a search engine for  Advanced querying of information in ER graphs NAGA queries NAGA answers We propose a novel scoring mechanism  based on generative language models, Applied to the specific and unexplored setting of ER graphs  Incorporating confidence, informativeness, and compactness Viability of the approach demonstrated in comparison to state of the art search engines and QA-Systems.

40 40 Thank you NAGA: http://www.mpi-inf.mpg.de/~kasneci/naga YAGO: http://www.mpi-inf.mpg.de/~suchanek/yago


Download ppt "1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum."

Similar presentations


Ads by Google