Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1.

Slides:



Advertisements
Similar presentations
XML DOCUMENTS AND DATABASES
Advertisements

CS598CXZ Panel – Next Generation Search Engines Shui-Lung Chuang April 21, 2005.
TEXTRUNNER Turing Center Computer Science and Engineering
Crawling, Ranking and Indexing. Organizing the Web The Web is big. Really big. –Over 3 billion pages, just in the indexable Web The Web is dynamic Problems:
QA and Language Modeling (and Some Challenges) Eduard Hovy Information Sciences Institute University of Southern California.
1 EntityRank: Searching Entities Directly and Holistically Tao Cheng Joint work with : Xifeng Yan, Kevin Chang VLDB 2007, Vienna, Austria.
EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam.
1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.
Web Search – Summer Term 2006 VI. Web Search - Indexing (c) Wolfgang Hürst, Albert-Ludwigs-University.
Information Retrieval in Practice
Search Engines and Information Retrieval
Information Retrieval in Practice
INFO 624 Week 3 Retrieval System Evaluation
Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.
Disambiguation Algorithm for People Search on the Web Dmitri V. Kalashnikov, Sharad Mehrotra, Zhaoqi Chen, Rabia Nuray-Turan, Naveen Ashish For questions.
1 CS 430: Information Discovery Lecture 20 The User in the Loop.
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
1 BrainWave Biosolutions Limited Accelerating Life Science Research through Technology.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Overview of Search Engines
Minimal Probing: Supporting Expensive Predicates for Top-k Queries Kevin C. Chang Seung-won Hwang Univ. of Illinois at Urbana-Champaign.
A Web-based Question Answering System Yu-shan & Wenxiu
 Fatemeh Lashkari UNB University May 7 th  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance.
Search Engines and Information Retrieval Chapter 1.
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Advanced Data Mining May 4, 2010 Growing Parallel Paths for Entity-Page.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
1 Beyond Pages: Supporting Efficient, Scalable Entity Search with Dual-Inversion Index Tao Cheng and Kevin Chang Computer.
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Master Thesis Defense Jan Fiedler 04/17/98
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
CPS120: Introduction to Computer Science Lecture 19 Introduction to SQL.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
Mianwei Zhou, Tao Cheng, Kevin Chen-Chuan Chang WSDM 2010, New York, USA 1.
2007. Software Engineering Laboratory, School of Computer Science S E Web-Harvest Web-Harvest: Open Source Web Data Extraction tool 이재정 Software Engineering.
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
WIRED Week 3 Syllabus Update (next week) Readings Overview - Quick Review of Last Week’s IR Models (if time) - Evaluating IR Systems - Understanding Queries.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Entity Search Are you searching for what you want? Kevin C. Chang Joint work with: Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe.
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
Data Integration Hanna Zhong Department of Computer Science University of Illinois, Urbana-Champaign 11/12/2009.
Using a Named Entity Tagger to Generalise Surface Matching Text Patterns for Question Answering Mark A. Greenwood and Robert Gaizauskas Natural Language.
Text Mining Application Programming Chapter 1 Introduction Manu Konchady, 2006.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Information Retrieval
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
1 Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web Tao Cheng, Kevin Chang University Of Illinois, Urbana-Champaign.
Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese Teruko Mitamura Mengqiu Wang Hideki Shima Frank Lin In CMU EACL.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
General Architecture of Retrieval Systems 1Adrienn Skrop.
Information Retrieval in Practice
Query Methods Simple SQL Statements Start ….
Information Retrieval and Web Search
Prepared by Rao Umar Anwar For Detail information Visit my blog:
Information Retrieval and Web Search
PJ SEO Specialists WordPress Web Development and SEO.
Shanghai Largest city china
How does Google search for everything? Computer Science at Work
CSE 635 Multimedia Information Retrieval
Information Retrieval and Web Design
Introduction to XML IR XML Group.
KnowItAll and TextRunner
Presentation transcript:

Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1

Web Info Extraction Typed Entity Search Web-based Q/A In most cases, what we really want are not pages, but the information units inside. ? ? 2

Specialized Information Extractors Web Information Extraction (WIE) (Marius 2006, Cafarella 2005, Etzioni 2004) Pattern: “#Number people die of #Disease each year” DiseaseDeath Influenza63730 Penumonia61776 …… Limitation Focus on simple patterns. Lack of interactivity. 3

Web-based Question Answering (WQA) (Wu 2007, Lin 2003, Brill 2002) How many people die from seasonal flue each year in US? Keywords: “seasonal flu death” Parse Top-k results Around 36,000 Limitation Only rely on top-k pages to retrieve the answer. 4

Typed-Entity Search (TES) (Cheng 2007, Cafarella 2007, Chakrabarti 2006) Amazon Phone …… Ranked Entity List But … Where is Professor Limitation Limited Number of Data Type Lack of Flexibility 5

? ? Data-oriented Content Query System Web Info Extraction Typed Entity Search Web-based QA Requirements 1.Extensible Data Types 2.Flexible Contextual Patterns 3.Customizable Scoring

Input: CQL (Content Query Language) Output Entity Search Web QA Data-oriented Content Query System

8

What we needRelational Model Person Organization Location Number Person Organization Location

What we needRelational Model Find the population of China WHERE pattern(…) GROUP BY #number ORDER BY conf() FROM #number China has a population of 1.3 billion China with its population of 1.3 billion people China is established in Shanghai is the largest city with 15 million inhabitants in China 1.3 billion 15 million billion 2.15 million … billion 2.15 million …

What we needRelational Model Number Location Person Population Phone Price Capital Headquarter Professor CEO President Table View Number population price phone

Index Layer Parsing Layer Index Selection Module Execution Tree INPUT SELECT … FROM … WHERE … OUTPUT Index Design Special Inverted Index Contextual Index Join Index Index Design Special Inverted Index Contextual Index Join Index Query Optimization Graph Coverage Problem Query Optimization Graph Coverage Problem Data Type Repository Data Type Definition Experimental Result Speed improvement: 6-10 times Space overhead: Around 2 times original corpus size. E x p e r i m e n t a l R e s u l t S p e e d i m p r o v e m e n t : t i m e s S p a c e o v e r h e a d : A r o u n d 2 t i m e s o r i g i n a l c o r p u s s i z e.

Data-oriented Content Query System Web Info Extraction Typed Entity Search Web-based Q/A

14