Copyright © 2003 by Kyu-Young Whang

Slides:



Advertisements
Similar presentations
IMPLEMENTATION OF INFORMATION RETRIEVAL SYSTEMS VIA RDBMS.
Advertisements

Search Engines and Information Retrieval
CSE 190: Internet E-Commerce Lecture 10: Data Tier.
Information Retrieval in Practice
Chapter 14 The Second Component: The Database.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
10. Creating and Maintaining Geographic Databases.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
1 CS222: Principles of Database Management Fall 2010 Professor Chen Li Department of Computer Science University of California, Irvine Notes 01.
What is IIS? IIS (Internet Information Server) is a group of Internet servers (including a Web or Hypertext Transfer Protocol server and a File Transfer.
Search Engines and Information Retrieval Chapter 1.
Course Introduction Introduction to Databases Instructor: Joe Bockhorst University of Wisconsin - Milwaukee.
DBXplorer: A System for Keyword- Based Search over Relational Databases Sanjay Agrawal Surajit Chaudhuri Gautam Das Presented by Bhushan Pachpande.
CS461: Principles and Internals of Database Systems Instructor: Ying Cai Department of Computer Science Iowa State University Office:
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
CPS120: Introduction to Computer Science Lecture 19 Introduction to SQL.
Chapter 15 Relational Implementation with DB2 David M. Kroenke Database Processing © 2000 Prentice Hall.
Instructor: Dema Alorini Database Fundamentals IS 422 Section: 7|1.
Demo: Power Tools for P8 Presenter: Jay Bowen Demonstration Topic: Choice List Features Demo URL below Power Tools Choice List Support 1. Native P8 Choice.
Creating and Maintaining Geographic Databases. Outline Definitions Characteristics of DBMS Types of database Relational model SQL Spatial databases.
GLOBEX INFOTEK Copyright © 2013 Dr. Emelda Ntinglet-DavisSYSTEMS ANALYSIS AND DESIGN METHODSINTRODUCTORY SESSION EFFECTIVE DATABASE DESIGN for BEGINNERS.
Lecture 10 Creating and Maintaining Geographic Databases Longley et al., Ch. 10, through section 10.4.
Toward Entity Retrieval over Structured and Text Data Mayssam Sayyadian, Azadeh Shakery, AnHai Doan, ChengXiang Zhai Department of Computer Science University.
Presented By: Carlton Northern and Jeffrey Shipman The Anatomy of a Large-Scale Hyper-Textural Web Search Engine By Lawrence Page and Sergey Brin (1998)
Christoph F. Eick: Final Words COSC Topics Covered in COSC 3480  Data models (ER, Relational, XML)  Using data models; learning how to store real.
Database System Architecture and Implementation Execution Costs 1 Slides Credit: Michael Grossniklaus – Uni-Konstanz.
Information Retrieval in Practice
Chapter 1 Introduction.
DBMS & TPS Barbara Russell MBA 624.
CS320 Web and Internet Programming SQL and MySQL
CS122B: Projects in Databases and Web Applications Winter 2017
Information Retrieval (in Practice)
Advanced Accounting Information Systems
Using E-Business Suite Attachments
Supervisor: Prof Michael Lyu Presented by: Lewis Ng, Philip Chan
CS422 Principles of Database Systems Course Overview
9. Creating and Maintaining Geographic Databases
Lecture 8 Database Implementation
15-826: Multimedia Databases and Data Mining
Database Application Development
ICT Database Lesson 1 What is a Database?.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Translation of ER-diagram into Relational Schema
Tools for Memory: Database Management Systems
Objective % Explain concepts used to create websites.
File Organizations Chapter 8 “How index-learning turns no student pale
DATABASE MANAGEMENT SYSTEM
WIRED Week 2 Syllabus Update Readings Overview.
Database.
DATABASE SYSTEM UNIT I.
Teaching slides Chapter 8.
Team Project, Part II NOMO Auto, Part II IST 210 Section 4
Selected Topics: External Sorting, Join Algorithms, …
The Relational Model The slides for this text are organized into chapters. This lecture covers Chapter 3. Chapter 1: Introduction to Database Systems Chapter.
Instructor 彭智勇 武汉大学软件工程国家重点实验室 电话:
Database management concepts
Chapter 7 Searching Your Products
CS3220 Web and Internet Programming SQL and MySQL
CS122B: Projects in Databases and Web Applications Winter 2019
Database Management Systems
Chapter 1 Introduction to Database Processing
CS122B: Projects in Databases and Web Applications Spring 2018
Objective Explain concepts used to create websites.
CS3220 Web and Internet Programming SQL and MySQL
CS122B: Projects in Databases and Web Applications Winter 2018
Course Instructor: Supriya Gupta Asstt. Prof
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Director.
Lecture 1: Overview of CSCI 485 Notes: I presented parts of this lecture as a keynote at Educator’s Symposium of OOPSLA Shahram Ghandeharizadeh Associate.
Database Application Development
Geographic Information Systems
Presentation transcript:

Copyright © 2003 by Kyu-Young Whang Tight-Coupling: A Way of Building High-Performance Application Specific Engines 2003.3 Kyu-Young Whang, Professor Department of Computer Science Korea Advanced Institute of Science and Technology(KAIST) Copyright © 2003 by Kyu-Young Whang

Copyright © 2003 by Kyu-Young Whang Data in Internet Era The amount of data grows exponentially (10 times in 3~4 years) Performance with such large DBs is an issue New database applications Information Retrieval (IR) Spatial Databases Data Mining OLAP Data Streaming Multimedia Databases … Copyright © 2003 by Kyu-Young Whang

Adding Application-Specific Features to DBMS Conventional techniques Oracle’s Cartridge IBM DB2 Extender Informix DataBlade Example ⇒ Performance not optimal due to relatively high-level interface with the main engine Copyright © 2003 by Kyu-Young Whang

Copyright © 2003 by Kyu-Young Whang Database Module by Raghu Ramakrishnan Build a suite of “data service” modules that can plug and play Loose coupling across modules => Performance and consistency issues DBMS Spatial module OLAP module Text module Updates Text Query Spatial Query OLAP Query Sync Master DB Copyright © 2003 by Kyu-Young Whang

Copyright © 2003 by Kyu-Young Whang New Technique: Tight Coupling (implementor’s approach) Direct implementation of a new feature at the level of records and indexes (e.g., at the same level of a B+-tree) Examples Tight Coupling of Spatial Features with DBMS Tight Coupling of IR with DBMS The engine code that must be modified to accommodate a new feature is not that big (approximately 20,000 lines) Copyright © 2003 by Kyu-Young Whang

Tight-Coupling with Spatial Features Embedding spatial indexes Integrated management of spatial and nonspatial attributes Queries Find the phone number of McDonald within 10km of Stanford University Integrated query against spatial and nonspatial attributes data record spatial integer spatial index B+-tree index SELECT s.phone_number FROM universities u, stores s WHERE DISTANCE(u.location, s.location)<10 AND u.name=“Stanford” AND s.name=“McDonald”; Copyright © 2003 by Kyu-Young Whang

Conventional Systems (e.g., ArcInfo) Spatial DB Non-Spatial DB Spatial Query Processing Module DBMS Query Processor Copyright © 2003 by Kyu-Young Whang

Tight-Coupling with IR Embedding IR Indexes Integerated management of text and nontext attributes Queries Find the papers that contain the word “database” in the title and that have been published after year 2000 Integrated query against text and nontext attributes data record text integer IR index B+-tree index SELECT p.oid FROM paper p WHERE MATCH(p.title, “database”)>0 AND p.pubyear>2000; Copyright © 2003 by Kyu-Young Whang

Copyright © 2003 by Kyu-Young Whang Features Crash recovery and fine-granularity locking and logging for the IR index Bulk loading and bulk deletion for the IR index Immediate update for the IR index Copyright © 2003 by Kyu-Young Whang

Application of IR Index Site-limited Search (Hosted SiteSearch in Google) Find the web pages that include the word “SQL” from the site “Microsoft” Directory Search Find the web pages that include the word “park” from the directory “Recreation > Travel > Attractions” Site 1 Site 2 Site n search database server Copyright © 2003 by Kyu-Young Whang

Copyright © 2003 by Kyu-Young Whang Implementation Example Schema Query Find the web pages that contain the word “Korea” from the site having siteId=6000 (Site-limited Search) siteInfo table pageInfo table column name column type column name column type description description siteId integer Site identifier siteId integer Site identifier URL varchar Site URL siteIdText text Site identifier title text Site title title text Page title description text Site description URL varchar Page URL content text Page content SELECT p.oid FROM pageInfo p WHERE MATCH(p.content, “korea”)>0 AND p.siteId = 6000; Copyright © 2003 by Kyu-Young Whang

Copyright © 2003 by Kyu-Young Whang Naïve Implementation Find the record of the web page containing the word “Korea” Access the records and select those whose siteId = 6000 ⇒ Very bad in performance (Tuples are scattered all over the database causing excessive random accesses) Copyright © 2003 by Kyu-Young Whang

Copyright © 2003 by Kyu-Young Whang IR Index Join Method We use siteIdText of type ‘text’ instead of siteId of type integer Do join between the posting list (matching “Korea”) from the IR Index on the attribute content and the posting list (matching “6000”) from the IR Index on the attribute siteIdText ⇒ Performance is good since join is done in the indexes without accessing tuples IR Index on pageInfo.siteIdText Keyword “6000” Posting list doc10 doc12 doc13 doc20 doc25 doc35 IR Index on pageInfo.content “Korea” doc1 doc2 doc22 Copyright © 2003 by Kyu-Young Whang

Copyright © 2003 by Kyu-Young Whang Embedded Posting Method Embedded Posting: Other attribute values of the record are embedded in the posting of another attribute In this example, siteId (of type integer) is embedded in the posting of the attribute content ⇒ Performance is good since query is resolved by accessing only one posting list without accessing the tuples (vs. space overhead) IR Index on pageInfo.content doc1 7500 doc10 6000 doc13 doc22 4522 doc25 Keyword “Korea” docId siteId Posting list a posting Copyright © 2003 by Kyu-Young Whang

ODYSSEUS Object-Relational DBMS Being developed at KAIST for over thirteen years Tightly coupled with IR (IR Index – patented) and Spatial (MLGF) features A DBMS and, at the same time, a search engine Concurrency control and recovery (coarse granularity and fine granularity) IR performance is comparable or better than commercial search engines Allows immediate updates An object-relational DBMS and, at the same time, a spatial DBMS Integrated management of spatial and nonspatial attributes Many commercial applications Approximately 400,000 lines of C/C++ (precision) codes Copyright © 2003 by Kyu-Young Whang