Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G.

Slides:



Advertisements
Similar presentations
User Care Preference-based Service Discovery in a Ubiquitous Environments Dongpil Kwak, Joongsoo Lee, Dohyun Kim, and Younghee Lee Talk by Joongsoo Lee.
Advertisements

XML DOCUMENTS AND DATABASES
Lukas Blunschi Claudio Jossen Donald Kossmann Magdalini Mori Kurt Stockinger.
Chapter 5: Introduction to Information Retrieval
1 Evaluation Rong Jin. 2 Evaluation  Evaluation is key to building effective and efficient search engines usually carried out in controlled experiments.
RankSQL: Supporting Ranking Queries in RDBMS Chengkai Li (UIUC) Mohamed A. Soliman (Univ. of Waterloo) Kevin Chen-Chuan Chang (UIUC) Ihab F. Ilyas (Univ.
SPARK: Top-k Keyword Query in Relational Databases Yi Luo, Xuemin Lin, Wei Wang, Xiaofang Zhou Univ. of New South Wales, Univ. of Queensland SIGMOD 2007.
Dialogue – Driven Intranet Search Suma Adindla School of Computer Science & Electronic Engineering 8th LANGUAGE & COMPUTATION DAY 2009.
Information Retrieval in Practice
T.Sharon - A.Frank 1 Internet Resources Discovery (IRD) Classic Information Retrieval (IR)
Xyleme A Dynamic Warehouse for XML Data of the Web.
Database and Information- Retrieval Methods for Knowledge Discovery Database and Information- Retrieval Methods for Knowledge Discovery Gerhard Weikum,
1 Ranked Queries over sources with Boolean Query Interfaces without Ranking Support Vagelis Hristidis, Florida International University Yuheng Hu, Arizona.
Shared Ontology for Knowledge Management Atanas Kiryakov, Borislav Popov, Ilian Kitchukov, and Krasimir Angelov Meher Shaikh.
FACT: A Learning Based Web Query Processing System Hongjun Lu, Yanlei Diao Hong Kong U. of Science & Technology Songting Chen, Zengping Tian Fudan University.
WebMiningResearch ASurvey Web Mining Research: A Survey By Raymond Kosala & Hendrik Blockeel, Katholieke Universitat Leuven, July 2000 Presented 4/18/2002.
Information Retrieval and Extraction 資訊檢索與擷取 Chia-Hui Chang National Central University
Overview of Search Engines
Microsoft Access Ervin Ha.
Ihr Logo Data Explorer - A data profiling tool. Your Logo Agenda  Introduction  Existing System  Limitations of Existing System  Proposed Solution.
NUITS: A Novel User Interface for Efficient Keyword Search over Databases The integration of DB and IR provides users with a wide range of high quality.
Keyword Search in Relational Databases Jaehui Park Intelligent Database Systems Lab. Seoul National University
1 Chapter 1: Finding Your Way Through a Database Exploring Microsoft Office Access 2010.
Tables to Linked Data Zareen Syed, Tim Finin, Varish Mulwad and Anupam Joshi University of Maryland, Baltimore County
DBrev: Dreaming of a Database Revolution Gjergji Kasneci, Jurgen Van Gael, Thore Graepel Microsoft Research Cambridge, UK.
Probabilistic Ranking of Database Query Results Surajit Chaudhuri, Microsoft Research Gautam Das, Microsoft Research Vagelis Hristidis, Florida International.
Web Data Management Dr. Daniel Deutch. Web Data The web has revolutionized our world Data is everywhere Constitutes a great potential But also a lot of.
Chapter 2 Architecture of a Search Engine. Search Engine Architecture n A software architecture consists of software components, the interfaces provided.
Data Mining Chapter 1 Introduction -- Basic Data Mining Tasks -- Related Concepts -- Data Mining Techniques.
Ontology-Based Information Extraction: Current Approaches.
Querying Structured Text in an XML Database By Xuemei Luo.
Structured Querying of Web Text A Technical Challenge Kulsawasd Jitkajornwanich University of Texas at Arlington CSE6339 Web Mining.
1 Information Retrieval Acknowledgements: Dr Mounia Lalmas (QMW) Dr Joemon Jose (Glasgow)
Structured Querying of Web Text: A Technical Challenge Michael J. Cafarella, Christopher Re, Dan Suciu, Oren Etzioni, Michele Banko Presenter: Shahina.
EASE: An Effective 3-in-1 Keyword Search Method for Unstructured, Semi-structured and Structured Data Cuoliang Li, Beng Chin Ooi, Jianhua Feng, Jianyong.
Data Mining By Dave Maung.
Presenter: Shanshan Lu 03/04/2010
Dataware’s Document Clustering and Query-By-Example Toolkits John Munson Dataware Technologies 1999 BRS User Group Conference.
ICDL 2004 Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer Science Old Dominion University.
GUIDED BY DR. A. J. AGRAWAL Search Engine By Chetan R. Rathod.
Lecture 1: Overview of IR Maya Ramanath. Who hasn’t used Google? Why did Google return these results first ? Can we improve on it? Is this a good result.
1 Chapter 1: Finding Your Way Through a Database Exploring Microsoft Office Access 2007.
Information in the Digital Environment Information Seeking Models Dr. Dania Bilal IS 530 Spring 2005.
Ranking objects based on relationships Computing Top-K over Aggregation Sigmod 2006 Kaushik Chakrabarti et al.
Integrating Structured & Unstructured Data. Goals  Identify some applications that have crucial requirement for integration of unstructured and structured.
Database Management Supplement 1. 2 I. The Hierarchy of Data Database File (Entity, Table) Record (info for a specific entity, Row) Field (Attribute,
Jennifer Widom NoSQL Systems Motivation. Jennifer Widom NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not.
1 NAGA: Searching and Ranking Knowledge Gjergji Kasneci Joint work with: Fabian M. Suchanek, Georgiana Ifrim, Maya Ramanath, and Gerhard Weikum.
Information Retrieval
Ranking of Database Query Results Nitesh Maan, Arujn Saraswat, Nishant Kapoor.
Feb 24-27, 2004ICDL 2004, New Dehli Improving Federated Service for Non-cooperating Digital Libraries R. Shi, K. Maly, M. Zubair Department of Computer.
Date: 2013/4/1 Author: Jaime I. Lopez-Veyna, Victor J. Sosa-Sosa, Ivan Lopez-Arevalo Source: KEYS’12 Advisor: Jia-ling Koh Speaker: Chen-Yu Huang KESOSD.
Gaby Nativ, SDBI  Motivation  Other Ontologies  System overview  YAGO Dive IN  LEILA  NAGA  Conclusion.
Evaluation of Information Retrieval Systems Xiangming Mu.
Manipulating Data Lesson 3. Objectives Queries The SELECT query to retrieve or extract data from one table, how to retrieve or extract data by using.
Integrated Departmental Information Service IDIS provides integration in three aspects Integrate relational querying and text retrieval Integrate search.
PAIR project progress report Yi-Ting Chou Shui-Lung Chuang Xuanhui Wang.
Structured-Value Ranking in Update- Intensive Relational Databases Jayavel Shanmugasundaram Cornell University (Joint work with: Lin Guo, Kevin Beyer,
Information Retrieval in Practice
The Context of Database Management
Summarizing Entities: A Survey Report
Multimedia Information Retrieval
Search Techniques and Advanced tools for Researchers
NoSQL Systems Motivation.
IR Theory: Evaluation Methods
موضوع پروژه : بازیابی اطلاعات Information Retrieval
Evaluation of IR Performance
CSE 635 Multimedia Information Retrieval
Information Retrieval and Web Design
Presentation transcript:

Databases & Information Retrieval Maya Ramanath ( Further Reading: Combining Database and Information-Retrieval Techniques for Knowledge Discovery. G. Weikum, G. Kasneci, M. Ramanath and F.M. Suchanek, CACM, April 2009 DB & IR: Both Sides Now. G. Weikum, Keynote at SIGMOD 2007 )

DB and IR: Different Motivations Both deal with large amounts of information, but… DBIR Applicationsonline reservation, banking libraries Emphasisdata consistency, efficiency result quality, user satisfaction Datastructured recordsunstructured text Queriespreciseinterpretations vary Resultsexact match/all results ranked/top-k results

Why Combine Now? The applications drive the need – The need to manage both structured and unstructured data in an integrated manner Healthcare example – Find young patients in central Europe who have been reported, in the last two weeks, to have symptoms of tropical virus diseases and an indication of anomalies. Newspaper archives, product catalogues, etc.

Integrating DB & IR top-k processing, keyword search on graphs top-k processing, keyword search on graphs IR Systems extracting entities and relationships, ranking for entities extracting entities and relationships, ranking for entities DB Systems Structured queries / boolean match results (SQL) Untructured queries / ranked results (keywords/top-k) Structured data (relational) Unstructured data (text) query processing for text search, effective query interfaces, ranking for structured data

Modules 1.Top-k processing 2.Query Processing and Interfaces 3.Keyword Search on Graphs 4.Entity and Relationship Extraction 5.Ranking and Structured Data

1. Top-k Processing (1/2) Structured data, with scores in multiple dimensions Return the top-k “objects” CarColor BMW X10.9 Honda City0.8 Maruti Swift0.6 Tata Nano0.1 CarMileage Honda City0.8 Maruti Swift0.6 Tata Nano0.3 BMW X10.1 CarService Tata Nano0.7 Maruti Swift0.6 Honda City0.3 BMW X10.1

1. Top-k Processing (2/2) Top-k Joins – Example: Return the best house-school pair HousesRatingLocation H10.9L1 H20.8L2 H30.6L3 H40.1L3 SchoolsRatingLocation S10.4L2 S20.2L2 S30.8L3 S40.1L3

2. Query Processing and Interfaces (1/3) Given: Database of text documents and a text- centric task. – Extract information about disease outbreaks Strategies – Scan all documents – very expensive – Filter promising documents – affects recall Develop cost models and execution strategies appropriate for this setting

2. Query Processing and Interfaces (2/3) Querying with “typed” keywords Keyword querying: Easy to use Structured queries: Precise Find the middle ground… Instead of “german has won nobel award” q(X) :- GERMAN(x), hasWonPrize(x,y), NOBEL_PRIZE(y)  “german, has won (nobel award)”

2. Query Processing and Interfaces (3/3) Does the output have to be a boring list of ranked results? Nope !

3. Keyword Search on Graphs (1/3) Lots of graphs around – Relational DB (tuples+foreign keys) – XML data (elements/sub-elements/id/idrefs) – RDF (graph-structured knowledge-bases) Easy to query with keywords, instead of SQL/XQuery/SPARQL Results are the top-k interconnections between the keywords

3. Keyword Search on Graphs (2/3)

3. Keyword Search on Graphs (3/3) Query: “Einstein”, “Bohr” vegetarian Tom Cruise 1962 isa bornIn diedIn Einstein BohrNobel Prize won

4. Entity and Relationship Extraction (1/2) Information Extraction (or Knowledge Harvesting) Bill Gates was the founder of Microsoft and later it’s CEO. Apple was established on April 1, 1976 by Steve Jobs, Steve Wozniak, and Ronald Wayne. Infosys was founded on 2 July 1981 by seven entrepreneurs: N. R. Narayana Murthy, Nandan Nilekani, … CompanyFounder MicrosoftBill Gates AppleSteve Jobs AppleSteve Wozniak InfosysN. R. Narayana Murthy

4. Entity and Relationship Extraction (2/2) How to build a knowledge-base of facts? – Structurize Wikipedia – Construct rules for extraction How do I acquire all the facts in the world? – Extract “everything” – Don’t stop extracting

5. Ranking and Structured Data Not the same as top-k processing Given: Data with stucture in it – Relational tables (flat) – XML (trees/graphs) – Text documents consisting of entities Task: Rank the query results – SQL/Xquery/”typed” keywords

QUESTIONS?