EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam PhD Student CSE Department, UTA
Motivating Scenario Customer service phone number of Amazon?
Search on Amazon?
Search on Google?
Many many Similar Cases The of Luis Gravano? What profs are doing databases at UIUC? The papers and presentations of ICDE 2007? Due date of SIGMOD 2008? Sale price of “Canon PowerShot A400”? “Hamlet” books available at bookstores? Often times, we are looking for data entities, e.g. s, dates, prices, etc, not pages.
What you search is not what you want.
From pages to entities Traditional SearchEntity Search Keywords Entities Results Support
Concretely, what is meant by Entity Search?
9 Entity Search Problem: Given: Entity Collection over Document Collection Input: where is a tuple pattern,, and is a keyword e.g. ow(David DeWitt #phone # ) Output: Ranked list of sorted by Score(q(t)), the query score of t Given: Entity Collection over Document Collection Input: where is a tuple pattern,, and is a keyword e.g. ow(David DeWitt #phone # ) Output: Ranked list of sorted by Score(q(t)), the query score of t Given: Input: Keywords & Entities (optionally with a pattern) E.g. Amazon Customer Service #phone Output: Ranked Entity Tuples ……
10 How to rank Entities? Challenge: Challenge:
Characteristics I: Contextual -Utilize Entities’ Surrounding Context Characteristics I: Contextual -Utilize Entities’ Surrounding Context Content Context
Characteristics II: Uncertain -Extractions are non”prefect” Characteristics II: Uncertain -Extractions are non”prefect”
Characteristics III: Holistic -Many evidences from multiple sources Characteristics III: Holistic -Many evidences from multiple sources
Characteristics IV: Discriminative - Web Pages are of Varying Quality Characteristics IV: Discriminative - Web Pages are of Varying Quality
Characteristics V: Associative -Tell True Associations from Accidental Characteristics V: Associative -Tell True Associations from Accidental Example: Finding Prof. Luis Gravano’s Observation: appears very frequently with keywords “Luis”, However, such association is only accidental as appears on many pages.
EntityRank: The Impression Model EntityRank: The Impression Model Tireless Observer ?? Access Layer: Global Aggregation Recognition Layer: Local Assessment Validation Layer: Hypothesis Testing ……
17 Recognition Layer: Local Assessment Recognition Layer: Local Assessment C ontextual U ncertain H olistic D iscriminative A ssociative Input: L1L1 L2L2 Output:
18 Access Layer: Global Aggregation Access Layer: Global Aggregation C ontextual U ncertain H olistic D iscriminative A ssociative Holistic Discriminative Output: Input:
19 Validation Layer: Hypothesis Testing Validation Layer: Hypothesis Testing C ontextual U ncertain H olistic D iscriminative A ssociative Input: Collection E over D Output: Virtual Collection E’ over D’ randomize
EntityRank: The Scoring Function EntityRank: The Scoring Function Local RecognitionGlobal Aggregation Validation
21 Sort-merge Join Query Processing 7, 33d9d9 3d7d7 10d6d6 5d3d3 8, 25d1d1 Doc Posting Doc 8, 24d7d7 66d5d5 11d3d3 Posting 44d8d8 9d7d7 12d3d3 Doc Posting AmazonCustomer Service (13, ,1.0) (78, ,1.0) d7d7 (18, ,1.0)d3d3 (42, ,0.8)d2d2 Doc Posting #phone Aggregation : p : p : p : p : p4 Hypothesis Test Result
22 Experiment Setup Experiment Setup Corpus: General crawl of the Web(Aug, 2006), around 2TB with 93M pages. Entities: Phone (8.8M distinctive instances) (4.6M distinctive instances) System: A cluster of 34 machines
23 Comparing EntityRank to the Following Different Approaches C ontextual U ncertain H olistic D iscriminative A ssociative N aïve L ocal G lobal C ombine W ithout E ntity R ank
Online Demo.
25 Example Query Results
26 Conclusions Formulate the entity search problem Study and define the characteristics of entity search Conceptual Impression Model and concrete EntityRank framework for ranking entities An online prototype with real Web corpus
Thank You ! Thank You ! Questions?
Reference EntityRank: Searching Entities Directly and Holistically. T. Cheng, X. Yan, and K. C.-C. Chang. In Proceedings of the 33rd Very Large Data Bases Conference (VLDB 2007), pages , Vienna, Austria, September vldb07-cyc-sep07.ppt