Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1.

Similar presentations

Presentation on theme: "Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1."— Presentation transcript:

1 Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1

2 Web Info Extraction Typed Entity Search Web-based Q/A In most cases, what we really want are not pages, but the information units inside. ? ? 2

3 Specialized Information Extractors Web Information Extraction (WIE) (Marius 2006, Cafarella 2005, Etzioni 2004) Pattern: “#Number people die of #Disease each year” DiseaseDeath Influenza63730 Penumonia61776 …… Limitation Focus on simple patterns. Lack of interactivity. 3

4 Web-based Question Answering (WQA) (Wu 2007, Lin 2003, Brill 2002) How many people die from seasonal flue each year in US? Keywords: “seasonal flu death” Parse Top-k results Around 36,000 Limitation Only rely on top-k pages to retrieve the answer. 4

5 Typed-Entity Search (TES) (Cheng 2007, Cafarella 2007, Chakrabarti 2006) Amazon Phone …… 0.60 0.80 0.90 Ranked Entity List But … Where is Professor Limitation Limited Number of Data Type Lack of Flexibility 5

6 ? ? Data-oriented Content Query System Web Info Extraction Typed Entity Search Web-based QA Requirements 1.Extensible Data Types 2.Flexible Contextual Patterns 3.Customizable Scoring

7 Input: CQL (Content Query Language) Output Entity Search Web QA Data-oriented Content Query System

8 8

9 What we needRelational Model Person Organization Location Number Person Organization Location

10 What we needRelational Model Find the population of China WHERE pattern(…) GROUP BY #number ORDER BY conf() FROM #number China has a population of 1.3 billion China with its population of 1.3 billion people China is established in 1949. Shanghai is the largest city with 15 million inhabitants in China 1.3 billion 15 million 1.1.3 billion 2.15 million … 1.1.3 billion 2.15 million …

11 What we needRelational Model Number Location Person Population Phone Price Capital Headquarter Professor CEO President Table View Number population price phone

12 Index Layer Parsing Layer Index Selection Module Execution Tree INPUT SELECT … FROM … WHERE … OUTPUT Index Design Special Inverted Index Contextual Index Join Index Index Design Special Inverted Index Contextual Index Join Index Query Optimization Graph Coverage Problem Query Optimization Graph Coverage Problem Data Type Repository Data Type Definition Experimental Result Speed improvement: 6-10 times Space overhead: Around 2 times original corpus size. E x p e r i m e n t a l R e s u l t S p e e d i m p r o v e m e n t : 6 - 1 0 t i m e s S p a c e o v e r h e a d : A r o u n d 2 t i m e s o r i g i n a l c o r p u s s i z e.

13 Data-oriented Content Query System Web Info Extraction Typed Entity Search Web-based Q/A

14 14

Download ppt "Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1."

Similar presentations

Ads by Google