Presentation is loading. Please wait.

Presentation is loading. Please wait.

EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.

Similar presentations


Presentation on theme: "EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois."— Presentation transcript:

1 EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois at Urbana-Champaign VLDB ’07, September 23-28, 2007, Vienna, Austria 2008. 1. 11 Presented by Sangkeun Lee, IDS Lab., Seoul National University

2 Copyright  2007 by CEBT Motivating Scenario 2 Customer Service phone number of Amazon?

3 Copyright  2007 by CEBT 3 Search on Amazon?

4 Copyright  2007 by CEBT 4 Search on Google?

5 Copyright  2007 by CEBT Many many similar cases:  The email of Luis Gravano?  What profs are doing databases at UIUC?  The papers and presentations of ICDE 2007?  Due date of SIGMOD 2008?  Sale price of “Canon PowerShot A400”?  “Hamlet” books available at bookstores?  Often times, we are looking for data entities, e.g. emails, dates, prices, etc, not pages. 5

6 Copyright  2007 by CEBT 6 What you search is not what you want

7 Copyright  2007 by CEBT 7 Traditional SearchEntity Search Keywords Entities Results Support Entity Search Problem

8 Copyright  2007 by CEBT Entity Search Problem 8

9 Copyright  2007 by CEBT Challenge How to rank Entities? Why a novel Problem? 9

10 Copyright  2007 by CEBT Core Challenges  Contextual: pattern (phrase, uw, ow) & proximity  Holistic: aggregated occurrences  Uncertainty: extraction confidence probability  Associative: distinguish true associations from accidental  Discriminative: entity instances matched on more popul ar pages should receive higher scores than entity instances from less popular pages  A novel problem: solve all together, probabilistic 10

11 Copyright  2007 by CEBT Impression Model 11

12 Copyright  2007 by CEBT Recognition Layer: Local Assessment  Given a document d, how to assess a particular tuple t= matches the query q = α (E 1,…, E m, k 1,…, k l ) = α (γ):  Two orthogonal factors Extraction uncertainty Association context –Boolean Pattern Qualification Doc, phrase, uw, ow –Probabilistic Proximity Quantification  * s: the span length-the shortest window that covers the entire occurence

13 Copyright  2007 by CEBT 13 Recognition Layer: Local Assessment C ontextual U ncertain H olistic D iscriminative A ssociative Input: L1L1 L2L2 Extraction Conf = 1.0Extraction Conf = 0.3 Output:

14 Copyright  2007 by CEBT 14 Access Layer: Global Aggregation C ontextual U ncertain H olistic D iscriminative A ssociative Holistic Discriminative Output: Input: e.g

15 Copyright  2007 by CEBT Validation Layer: Hypothesis Testing  Accidental association E.g: info@acm.org appears very frequently with keywords “Luis”, “G ravano”. However, such association is only accidental as info@acm. org appears on many pages.info@acm.orginfo@acm. org  Validate if the association is not accidental

16 Copyright  2007 by CEBT EntityRank: The Scoring Function Local RecognitionGlobal Aggregation Validation

17 Copyright  2007 by CEBT Comparison … EntityRank Naïve approch Local only Global only Combine L by simple summation L+G without hypothesis testing %Satisfied Queries at #Rank Query Type I: Phone for Top-30 Fortune500 Companies Query Type II: Email for 51 of 88 SIGMOD07 PC Corpus: General crawl of the Web(Aug, 2006), around 2TB with 93M pages. Entities: Phone (8.8M distinctive instances) Email (4.6M distinctive instances) System: A cluster of 34 machines

18 Copyright  2007 by CEBT Conclusions  Formulate the entity search problem  Study and define the characteristics and requirements of entity search  Propose Impression Model and EntityRank framework for ranking entities  Implement a prototype with real Web


Download ppt "EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois."

Similar presentations


Ads by Google