1 “One Size Fits All” An Idea Whose Time Has Come and Gone by Michael Stonebraker.

Slides:



Advertisements
Similar presentations
Alternate Title The elephants are selling 30 year old “bloatware”
Advertisements

One Size Fits All An Idea Whose Time Has Come and Gone by Michael Stonebraker.
The End of an Architectural Era Shimin Chen (Big Data Reading Group) (many slides are copied from Stonebraker’s presentation)
Functions of Database Management Systems Data storage retrieval and update facilities A user-accessible catalogue or data dictionary Support for shared.
ICS (072)Database Systems: A Review1 Database Systems: A Review Dr. Muhammad Shafique.
Monday, 08 June 2015Dr. Mohamed Osman1 What is Database Administration A high level function (technical Function) that is responsible for ► physical DB.
Database Software File Management Systems Database Management Systems.
IS 4420 Database Fundamentals Chapter 6: Physical Database Design and Performance Leon Chen.
1 9 Concepts of Database Management, 4 th Edition, Pratt & Adamski Chapter 9 Database Management Approaches.
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
CSC 2720 Building Web Applications Database and SQL.
Chapter 1 An Overview of Database Management. 1-2 Topics in this Chapter What is a Database System? What is a Database? Why Database? Data Independence.
Chapter 4 Database Management Systems. Chapter 4Slide 2 What is a Database Management System (DBMS)?  Database An organized collection of related data.
BigTable CSE 490h, Autumn What is BigTable? z “A BigTable is a sparse, distributed, persistent multidimensional sorted map. The map is indexed by.
Lecture 6 Indexing Part 2 Column Stores. Indexes Recap Heap FileBitmapHash FileB+Tree InsertO(1) O( log B n ) DeleteO(P)O(1) O( log B n ) Range Scan O(P)--
Database Design and Introduction to SQL
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Cloud Computing Lecture Column Store – alternative organization for big relational data.
Chapter 6 Physical Database Design. Introduction The purpose of physical database design is to translate the logical description of data into the technical.
Information storage: Introduction of database 10/7/2004 Xiangming Mu.
1 Intro to Info Tech Database Management Systems Copyright 2003 by Janson Industries This presentation can be viewed on line at:
CS370 Spring 2007 CS 370 Database Systems Lecture 2 Overview of Database Systems.
The Worlds of Database Systems Chapter 1. Database Management Systems (DBMS) DBMS: Powerful tool for creating and managing large amounts of data efficiently.
Chapter 2 CIS Sungchul Hong
DATABASE. A database is collection of information that is organized so that it can easily be accessed, managed and updated. It is also the collection.
1 CS 430 Database Theory Winter 2005 Lecture 1: Introduction.
STORING ORGANIZATIONAL INFORMATION— DATABASES CIS 429—Chapter 7.
Database System Concepts and Architecture Lecture # 2 21 June 2012 National University of Computer and Emerging Sciences.
 DATABASE DATABASE  DATABASE ENVIRONMENT DATABASE ENVIRONMENT  WHY STUDY DATABASE WHY STUDY DATABASE  DBMS & ITS FUNCTIONS DBMS & ITS FUNCTIONS 
Chapter 1 In-lab Quiz Next week
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
TM 7-1 Copyright © 1999 Addison Wesley Longman, Inc. Physical Database Design.
C-Store: Column-Oriented Data Warehousing Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May 17, 2010.
1 © Prentice Hall, 2002 Chapter 6: Physical Database Design and Performance Modern Database Management 6 th Edition Jeffrey A. Hoffer, Mary B. Prescott,
Chapter 6 1 © Prentice Hall, 2002 The Physical Design Stage of SDLC (figures 2.4, 2.5 revisited) Project Identification and Selection Project Initiation.
1 CS 430 Database Theory Winter 2005 Lecture 17: Objects, XML, and DBMSs.
1099 Why Use InterBase? Bill Todd The Database Group, Inc.
“One Size Fits All” An Idea Whose Time Has Come and Gone by Michael Stonebraker.
Ch. 1 데이터베이스시스템 (2). Ch.1 Database System 데이터베이스시스템 2 What to Learn Database System Overview Entity-Relationship diagram Relational Data Model  Structure.
1 C-Store: A Column-oriented DBMS By New England Database Group.
Instructor: Dema Alorini Database Fundamentals IS 422 Section: 7|1.
CHAPTER 3 DATABASES AND DATA WAREHOUSES. 2 OPENING CASE STUDY Chrysler Spins a Competitive Advantage with Supply Chain Management Software Chapter 2 –
Database Management COP4540, SCS, FIU Physical Database Design (ch. 16 & ch. 3)
Introduction to Database Systems1. 2 Basic Definitions Mini-world Some part of the real world about which data is stored in a database. Data Known facts.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
Introduction to Database AIT632 Chapter 1 Sungchul Hong.
Data resource management
Creating and Maintaining Geographic Databases. Outline Definitions Characteristics of DBMS Types of database Relational model SQL Spatial databases.
INNOV-10 Progress® Event Engine™ Technical Overview Prashant Thumma Principal Software Engineer.
Non-Traditional Databases. Reading 1. Scientific data management at the Johns Hopkins institute for data intensive engineering and science Yanif Ahmad,
SYS364 Database Design Continued. Database Design Definitions Initial ERD’s Normalization of data Final ERD’s Database Management Database Models File.
GLOBEX INFOTEK Copyright © 2013 Dr. Emelda Ntinglet-DavisSYSTEMS ANALYSIS AND DESIGN METHODSINTRODUCTORY SESSION EFFECTIVE DATABASE DESIGN for BEGINNERS.
1 “One Size Fits All” An Idea Whose Time Has Come and Gone by Michael Stonebraker.
MGA Duplica Replication Tool. 1. High Availability and Avoidance of Data Loss  Replicate to alternate databases 2. Split activities across databases.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
Chapter 8 Physical Database Design. Outline Overview of Physical Database Design Inputs of Physical Database Design File Structures Query Optimization.
Chapter 4 Logical & Physical Database Design
Lecture 10 Creating and Maintaining Geographic Databases Longley et al., Ch. 10, through section 10.4.
CS 440 Database Management Systems Lecture 6: Data storage & access methods 1.
CS 540 Database Management Systems
1 Advanced Database Systems: DBS CB, 2 nd Edition Advanced Topics of Interest: In- Memory DB (IMDB) and Column-Oriented DB.
What is Database Administration ?
Database Management.
Non-Traditional Databases
CHAPTER 5: PHYSICAL DATABASE DESIGN AND PERFORMANCE
Chapter 8 Data Base Security
Physical Database Design
Using Columnstore indexes in Azure DevOps Services. Lessons learned
Using Columnstore indexes in Azure DevOps Services. Lessons learned
overview today’s ideas relational databases
Presentation transcript:

1 “One Size Fits All” An Idea Whose Time Has Come and Gone by Michael Stonebraker

2 Current DBMS Gold Standard Current DBMS Gold Standard  Store fields in one record contiguously on disk  Use B-tree indexing  Use small (e.g. 4K) disk blocks  Align fields on byte or word boundaries  Conventional (row-oriented) query optimizer and executor  Store fields in one record contiguously on disk  Use B-tree indexing  Use small (e.g. 4K) disk blocks  Align fields on byte or word boundaries  Conventional (row-oriented) query optimizer and executor

3 Terminology -- “Row Store” Record 2 Record 4 Record 1 Record 3 E.g. DB2, Oracle, Sybase, SQLServer, …

4 Row Stores are Write Optimized Row Stores are Write Optimized  Can insert and delete a record in one physical write  Good for OLTP  But not for the data warehouse and other read- mostly markets  Can insert and delete a record in one physical write  Good for OLTP  But not for the data warehouse and other read- mostly markets

5 The Elephants and Warehouses  Bitmap indexes  Star schema optimization  Materialized views  Compression (coding) or attributes  But there is a better idea……  Bitmap indexes  Star schema optimization  Materialized views  Compression (coding) or attributes  But there is a better idea……

6 A Column Store (Like Sybase IQ)

7 Among the Ideas……  Only read the attributes you need  Coding is more effective  No alignment  Big data blocks  Only read the attributes you need  Coding is more effective  No alignment  Big data blocks Huge win on stuff like TPC-H!! Stream Processing is another example

8 Example Application – Feed Alarms  Custom-coded Feed alarm application Feed A Feed B alarms

9 Characteristics of Feed Alarm Pilot Characteristics of Feed Alarm Pilot  500 rapidly updating tickers (5 sec. interval) slowly updating tickers (60 sec. interval) in each FEED.  Problem Types 1.Low-level alarm  Ticker not seen within update interval. 2.Problem in Feed  More than 100 low-alarms from Feed A or Feed B 3.Problem in Exchange  More than 100 low-level alarms from NASDAQ or NYSE  Suppression: –When problems of type 2 or 3 detected, do not emit (distracting) problems of type 1.

10 Results  StreamBase implementation: –~ 150K msgs/sec on a 3.2GHz Linux pentium  Elephant solution –~900 msgs/sec on the same hardware More than 2 orders of magnitude difference……

11  Inbound vs outbound processing  The right primitives  Integration of application logic Why?

12 Traditional Model Outbound Processing Storage Updates Data Processing And queries

13 Stream Processing Model Inbound Processing Storage Data Application

14 Alarm Correlation Application Alarm D=5 sec Alarm D=60 sec U Count fast 4000 slow Reuters Hi = Prob in Reuters Filter ex =NY U Count 100 Hi = Prob in NY U Count 100 Filter ex = NY NY NA Alarm D=5 sec Alarm D=60 sec U Count fast 4000 slow ComStock Filter hi/lo Lo = Problem in Security In Reuters Hi = Prob in Comstock Filter hi/lo Lo = Prob in Security in Comstock Map Filter hi=1 Hi = Prob in NA Filter hi=1 Filter Alarm=1 Filter Alarm=1 Filter Alarm=1 Filter Alarm=1 Map

15 Inbound Processing  Never store the data!  Lower overhead  Lower latency

16 Inbound Processing in DBMSs  Triggers (glue-on)  Limited support  Often slow In theory, a DBMS could be both inbound and outbound, but this is a research project…. Hooking a query plan up to a stream is a start…..

17 Windowed Time Series Operators  Windowed time series operators –Group by stock_id –Window is 2 ticks –Slide by 1 tick  Resilient to stream imperfections –User-specified timeouts for late data

18 Alarm Correlation Application Alarm D=5 sec Alarm D=60 sec U Count fast 4000 slow Reuters Hi = Prob in Reuters Filter ex =NY U Count 100 Hi = Prob in NY U Count 100 Filter ex = NY NY NA Alarm D=5 sec Alarm D=60 sec U Count fast 4000 slow ComStock Filter hi/lo Lo = Problem in Security In Reuters Hi = Prob in Comstock Filter hi/lo Lo = Prob in Security in Comstock Map Filter hi=1 Hi = Prob in NA Filter hi=1 Filter Alarm=1 Filter Alarm=1 Filter Alarm=1 Filter Alarm=1 Map

19 Windowed Aggregates with Timeout in DBMSs  In the trigger system?  On stored data (polling)?

20 Integration of Application Logic  All required capabilities in single system –No process switches –Integrated storage (not client-server)

21 Integrated Code Count 100 same as Map F.evaluate: cnt++ if (cnt % 100 != 0) if !suppress emit lo-alarm else emit drop-alarm else emit hi-alarm, set suppress = true Lets first 100 low-alarms through. Emits one high-alarm for every 100 low-alarms. Suppresses low-alarms after 1 st high-alarm.

22 Application Integration in DBMSs  Client-server present for protection  Stored procedures are a start –tough to do control flow  Object-relational blades are better –But still tough to do control flow  Unified programming language never made it –E.g. Rigel or Pascal R  No support for embedded DBMS applications

23 Transactions in Streams  Locking –Critical sections are enough; no need for xacts  Crash recovery –Log-based recovery slow –doesn’t recover whole state –System unavailable during recovery  Much better to just do HA –Failover to a backup (Tandem-style) –Forget about state recovery

24 Net-Net  Inbound vs outbound processing  Windowed primitives vs end-of-table primitives  Separate app vs embedded app  HA failover vs transactions

25 Whenever These Matter a Lot  Separate engine  To get 2 orders of magnitude benefit

26 Candidates for a Separate Engine  OLTP  Warehouses  Stream processing  Sensor networks (TinyDB, etc.)  Text retrieval (Google, etc.)  Scientific data bases (lineage, arrays, etc.)

27 Obvious Research Template  Pick an area where “one size doesn’t fit”  And figure out what does

28 More Generally  Current system software factored into –App server (e.g. Websphere) –Messaging system (e.g, MQSeries) –DBMS (e.g. DB2)  Stream processing engines integrate pieces of all three –To avoid process switches  How many other interesting factorings are there?

29 High Level Stream Processing Bit  StreamSQL?  Rule engine?

30 Interesting Stream Issues  Morph from history to real time seamlessly  Replay  On the fly  Stream imperfections  Late  Missing  Out-of-order  Causality  Tick (symbol, volume, price, time)  Splits (symbol, factor) Produce the split-adjusted price!