Mianwei Zhou, Tao Cheng, Kevin Chen-Chuan Chang WSDM 2010, New York, USA 1.

Slides:



Advertisements
Similar presentations
XML DOCUMENTS AND DATABASES
Advertisements

1 EntityRank: Searching Entities Directly and Holistically Tao Cheng Joint work with : Xifeng Yan, Kevin Chang VLDB 2007, Vienna, Austria.
EntityRank: Searching Entities Directly and Holistically - Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang CS Department, UIUC Presented By: Md. Abdus Salam.
Vikas BhardwajColumbia University NLP for the Web – Spring 2010 Improving QA Accuracy by Question Inversion Prager et al. IBM T.J. Watson Res. Ctr. 02/18/2010.
1 Oct 30, 2006 LogicSQL-based Enterprise Archive and Search System How to organize the information and make it accessible and useful ? Li-Yan Yuan.
Web Search – Summer Term 2006 VI. Web Search - Indexing (c) Wolfgang Hürst, Albert-Ludwigs-University.
Information Retrieval in Practice
Query Execution Professor: Dr T.Y. Lin Prepared by, Mudra Patel Class id: 113.
Search Engines and Information Retrieval
Data-oriented Content Query System: Searching for Data into Text on the Web Mianwei Zhou, Kevin Chen-Chuan Chang Department of Computer Science UIUC 1.
KnowItNow: Fast, Scalable Information Extraction from the Web Michael J. Cafarella, Doug Downey, Stephen Soderland, Oren Etzioni.
The Informative Role of WordNet in Open-Domain Question Answering Marius Paşca and Sanda M. Harabagiu (NAACL 2001) Presented by Shauna Eggers CS 620 February.
INFO 624 Week 3 Retrieval System Evaluation
Automatic Set Expansion for List Question Answering Richard C. Wang, Nico Schlaefer, William W. Cohen, and Eric Nyberg Language Technologies Institute.
IST NeOn-project.org The Semantic Web is growing… #SW Pages Lee, J., Goodwin, R. (2004) The Semantic.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
CS 345 Data Mining Lecture 1 Introduction to Web Mining.
Enhance legal retrieval applications with an automatically induced knowledge base Ka Kan Lo.
Overview of Search Engines
A Web-based Question Answering System Yu-shan & Wenxiu
 Fatemeh Lashkari UNB University May 7 th  Indexing  Semantic Search  Semantic Search Architecture  Index process  Index Maintenance.
CSC 9010 Spring Paula Matuszek A Brief Overview of Watson.
Search Engines and Information Retrieval Chapter 1.
Probabilistic Model for Definitional Question Answering Kyoung-Soo Han, Young-In Song, and Hae-Chang Rim Korea University SIGIR 2006.
1 Beyond Pages: Supporting Efficient, Scalable Entity Search with Dual-Inversion Index Tao Cheng and Kevin Chang Computer.
1 The BT Digital Library A case study in intelligent content management Paul Warren
Information Need Question Understanding Selecting Sources Information Retrieval and Extraction Answer Determina tion Answer Presentation This work is supported.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
AnswerBus Question Answering System Zhiping Zheng School of Information, University of Michigan HLT 2002.
Master Thesis Defense Jan Fiedler 04/17/98
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Question Answering.  Goal  Automatically answer questions submitted by humans in a natural language form  Approaches  Rely on techniques from diverse.
Pete Bohman Adam Kunk. What is real-time search? What do you think as a class?
1 Probability, Internet Search, and the Success of Google October 2012 © 2012 Massachusetts Institute of Technology. All rights reserved. Robert M. Freund.
Internet Information Retrieval Sun Wu. Course Goal To learn the basic concepts and techniques of internet search engines –How to use and evaluate search.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
EntityRank :Searching Entities Directly and Holistically Tao Cheng, Xifeng Yan, Kevin Chen-Chuan Chang Computer Science Department, University of Illinois.
Answering Definition Questions Using Multiple Knowledge Sources Wesley Hildebrandt, Boris Katz, and Jimmy Lin MIT Computer Science and Artificial Intelligence.
Q2Semantic: A Lightweight Keyword Interface to Semantic Search Haofen Wang 1, Kang Zhang 1, Qiaoling Liu 1, Thanh Tran 2, and Yong Yu 1 1 Apex Lab, Shanghai.
Question Answering over Implicitly Structured Web Content
Efficient Instant-Fuzzy Search with Proximity Ranking Authors: Inci Centidil, Jamshid Esmaelnezhad, Taewoo Kim, and Chen Li IDCE Conference 2014 Presented.
Searching the web Enormous amount of information –In 1994, 100 thousand pages indexed –In 1997, 100 million pages indexed –In June, 2000, 500 million pages.
6.1 © 2010 by Prentice Hall 6 Chapter Foundations of Business Intelligence: Databases and Information Management.
Mianwei Zhou, Hongning Wang, Kevin Chen-Chuan Chang University of Illinois Urbana Champaign Learning to Rank from Distant Supervision: Exploiting Noisy.
Next Generation Search Engines Ehsun Daroodi 1 Feb, 2003.
Templated Search over Relational Databases Date: 2015/01/15 Author: Anastasios Zouzias, Michail Vlachos, Vagelis Hristidis Source: ACM CIKM’14 Advisor:
Entity Search Are you searching for what you want? Kevin C. Chang Joint work with: Bin He, Zhen Zhang, Chengkai Li, Govind Kabra, Shui-Lung Chuang, Joe.
L JSTOR Tools for Linguists 22nd June 2009 Michael Krot Clare Llewellyn Matt O’Donnell.
Text Mining Application Programming Chapter 1 Introduction Manu Konchady, 2006.
Search Result Interface Hongning Wang Abstraction of search engine architecture User Ranker Indexer Doc Analyzer Index results Crawler Doc Representation.
Named Entity Recognition in Query Jiafeng Guo 1, Gu Xu 2, Xueqi Cheng 1,Hang Li 2 1 Institute of Computing Technology, CAS, China 2 Microsoft Research.
Information Retrieval Transfer Cycle Dania Bilal IS 530 Fall 2007.
Comparing Document Segmentation for Passage Retrieval in Question Answering Jorg Tiedemann University of Groningen presented by: Moy’awiah Al-Shannaq
CSE 6392 – Data Exploration and Analysis in Relational Databases April 20, 2006.
Advisor: Koh Jia-Ling Nonhlanhla Shongwe EFFICIENT QUERY EXPANSION FOR ADVERTISEMENT SEARCH WANG.H, LIANG.Y, FU.L, XUE.G, YU.Y SIGIR’09.
1 Entity Search Engine: Towards Agile Best-Effort Information Integration over the Web Tao Cheng, Kevin Chang University Of Illinois, Urbana-Champaign.
Keyword Translation Accuracy and Cross-Lingual Question Answering in Chinese and Japanese Teruko Mitamura Mengqiu Wang Hideki Shima Frank Lin In CMU EACL.
Improving QA Accuracy by Question Inversion John Prager, Pablo Duboue, Jennifer Chu-Carroll Presentation by Sam Cunningham and Martin Wintz.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
Integrating linguistic knowledge in passage retrieval for question answering J¨org Tiedemann Alfa Informatica, University of Groningen HLT/EMNLP 2005.
Consumer Health Question Answering Systems Rohit Chandra Sourabh Singh
General Architecture of Retrieval Systems 1Adrienn Skrop.
Information Retrieval in Practice
Improving Search Relevance for Short Queries in Community Question Answering Date: 2014/09/25 Author : Haocheng Wu, Wei Wu, Ming Zhou, Enhong Chen, Lei.
Prepared by Rao Umar Anwar For Detail information Visit my blog:
PJ SEO Specialists WordPress Web Development and SEO.
Shanghai Largest city china
CSE 635 Multimedia Information Retrieval
Information Retrieval and Web Design
Introduction to Search Engines
Presentation transcript:

Mianwei Zhou, Tao Cheng, Kevin Chen-Chuan Chang WSDM 2010, New York, USA 1

Web Info Extraction Typed Entity Search Web-based Q/A In most cases, what we really want are not pages, but the information units inside. ? ? 2

Specialized Information Extractors Web Information Extraction (WIE) (Marius 2006, Cafarella 2005, Etzioni 2004) Pattern: “X is CEO of Y” CompanyCEO GoogleEric Schmidt IBMS. Palmisano …… Limitation Focus on simple patterns. Lack of interactivity. 3

Web-based Question Answering (WQA) (Wu 2007, Lin 2003, Brill 2002) Who is CEO of Dell? Keywords: “CEO Dell” Parse Top-k results Michael Dell Limitation Only rely on top-k pages to retrieve the answer. 4

Typed-Entity Search (TES) (Cheng 2007, Cafarella 2007, Chakrabarti 2006) Amazon Phone …… Ranked Entity List But … Where is CEO Limitation Limited Number of Data Type Lack of Flexibility 5

? ? Data-oriented Content Query System Web Info Extraction Typed Entity Search Web-based QA Requirements 1.Extensible Data Types 2.Flexible Contextual Patterns 3.Customizable Scoring 6

Input: CQL (Content Query Language) Output Entity Search Web QA Data-oriented Content Query System 7

8

What we needRelational Model Person Organization Location Number Person Organization Location 9

What we needRelational Model Find the population of China WHERE pattern(…) GROUP BY #number ORDER BY conf() FROM #number 10 China has a population of 1.3 billion China with its population of 1.3 billion people China is established in Shanghai is the largest city with 15 million inhabitants in China 1.3 billion 15 million billion 2.15 million … billion 2.15 million …

What we needRelational Model Number Location Person Population Phone Price Capital Headquarter Professor CEO President Table View Number population price phone 11

Index Layer Parsing Layer Index Selection Module Execution Tree INPUT SELECT … FROM … WHERE … OUTPUT Index Design Special Inverted Index Contextual Index Join Index Index Design Special Inverted Index Contextual Index Join Index Query Optimization Graph Coverage Problem Query Optimization Graph Coverage Problem Data Type Repository Data Type Definition 12 Experimental Result Speed improvement: 6-10 times Space overhead: Around 2 times original corpus size. E x p e r i m e n t a l R e s u l t S p e e d i m p r o v e m e n t : t i m e s S p a c e o v e r h e a d : A r o u n d 2 t i m e s o r i g i n a l c o r p u s s i z e.

Data-oriented Content Query System Web Info Extraction Typed Entity Search Web-based Q/A 13

14