Download presentation
Presentation is loading. Please wait.
Published byPauline Andrews Modified over 9 years ago
1
EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam PhD Student CSE Department, UTA
2
Motivating Scenario Customer service phone number of Amazon?
3
Search on Amazon?
4
Search on Google?
5
Many many Similar Cases The email of Luis Gravano? What profs are doing databases at UIUC? The papers and presentations of ICDE 2007? Due date of SIGMOD 2008? Sale price of “Canon PowerShot A400”? “Hamlet” books available at bookstores? Often times, we are looking for data entities, e.g. emails, dates, prices, etc, not pages.
6
What you search is not what you want.
7
From pages to entities Traditional SearchEntity Search Keywords Entities Results Support
8
Concretely, what is meant by Entity Search?
9
9 Entity Search Problem: Given: Entity Collection over Document Collection Input: where is a tuple pattern,, and is a keyword e.g. ow(David DeWitt #phone #email ) Output: Ranked list of sorted by Score(q(t)), the query score of t Given: Entity Collection over Document Collection Input: where is a tuple pattern,, and is a keyword e.g. ow(David DeWitt #phone #email ) Output: Ranked list of sorted by Score(q(t)), the query score of t Given: Input: Keywords & Entities (optionally with a pattern) E.g. Amazon Customer Service #phone Output: Ranked Entity Tuples …… 0.60 0.80 0.90
10
10 How to rank Entities? Challenge: Challenge:
11
Characteristics I: Contextual -Utilize Entities’ Surrounding Context Characteristics I: Contextual -Utilize Entities’ Surrounding Context Content Context
12
Characteristics II: Uncertain -Extractions are non”prefect” Characteristics II: Uncertain -Extractions are non”prefect”
13
Characteristics III: Holistic -Many evidences from multiple sources Characteristics III: Holistic -Many evidences from multiple sources
14
Characteristics IV: Discriminative - Web Pages are of Varying Quality Characteristics IV: Discriminative - Web Pages are of Varying Quality
15
Characteristics V: Associative -Tell True Associations from Accidental Characteristics V: Associative -Tell True Associations from Accidental Example: Finding Prof. Luis Gravano’s Email Observation: info@acm.org appears very frequently with keywords “Luis”, “Gravano”info@acm.org However, such association is only accidental as info@acm.org appears on many pages. info@acm.org
16
EntityRank: The Impression Model EntityRank: The Impression Model Tireless Observer......... ?? Access Layer: Global Aggregation Recognition Layer: Local Assessment Validation Layer: Hypothesis Testing …… 0.60 0.80 0.90
17
17 Recognition Layer: Local Assessment Recognition Layer: Local Assessment C ontextual U ncertain H olistic D iscriminative A ssociative Input: L1L1 L2L2 Output:
18
18 Access Layer: Global Aggregation Access Layer: Global Aggregation C ontextual U ncertain H olistic D iscriminative A ssociative Holistic Discriminative Output: Input:
19
19 Validation Layer: Hypothesis Testing Validation Layer: Hypothesis Testing C ontextual U ncertain H olistic D iscriminative A ssociative Input: Collection E over D Output: Virtual Collection E’ over D’ randomize
20
EntityRank: The Scoring Function EntityRank: The Scoring Function Local RecognitionGlobal Aggregation Validation
21
21 Sort-merge Join Query Processing 7, 33d9d9 3d7d7 10d6d6 5d3d3 8, 25d1d1 Doc Posting Doc 8, 24d7d7 66d5d5 11d3d3 Posting 44d8d8 9d7d7 12d3d3 Doc Posting AmazonCustomer Service (13,800-202-7575,1.0) (78,800-322-9266,1.0) d7d7 (18,800-202-7575,1.0)d3d3 (42,851-0400,0.8)d2d2 Doc Posting #phone Aggregation 800-202-7575: p1 800-322-9266: p3 800-202-7575: p2 800-322-9266: p5 800-202-7575: p4 Hypothesis Test Result
22
22 Experiment Setup Experiment Setup Corpus: General crawl of the Web(Aug, 2006), around 2TB with 93M pages. Entities: Phone (8.8M distinctive instances) Email (4.6M distinctive instances) System: A cluster of 34 machines
23
23 Comparing EntityRank to the Following Different Approaches C ontextual U ncertain H olistic D iscriminative A ssociative N aïve L ocal G lobal C ombine W ithout E ntity R ank
24
Online Demo.
25
25 Example Query Results
26
26 Conclusions Formulate the entity search problem Study and define the characteristics of entity search Conceptual Impression Model and concrete EntityRank framework for ranking entities An online prototype with real Web corpus
27
Thank You ! Thank You ! Questions?
28
Reference EntityRank: Searching Entities Directly and Holistically. T. Cheng, X. Yan, and K. C.-C. Chang. In Proceedings of the 33rd Very Large Data Bases Conference (VLDB 2007), pages 387-398, Vienna, Austria, September 2007 http://www-forward.cs.uiuc.edu/talks/2007/entityrank- vldb07-cyc-sep07.ppt
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.