History of Database Systems

Slides:



Advertisements
Similar presentations
Andy Pavlo April 13, 2015April 13, 2015April 13, 2015 NewS QL.
Advertisements

2 Proprietary & Confidential What is Sharding Benefits of Sharding Alternatives of Sharding When to start Sharding Agenda.
Data Management in the Cloud Paul Szerlip. The rise of data Think about this o For the past two decades, the largest generator of data was humans -- now.
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
HBase Presented by Chintamani Siddeshwar Swathi Selvavinayakam
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
NoSQL Database.
CS 405G: Introduction to Database Systems 24 NoSQL Reuse some slides of Jennifer Widom Chen Qian University of Kentucky.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
Database Design and Introduction to SQL
AN INTRODUCTION TO NOSQL DATABASES Karol Rástočný, Eduard Kuric.
SQL vs NOSQL Discussion
CS345: Advanced Databases Chris Ré. What this course is Database fundamentals: –Theory –Old Crusty, Good SQL stuff –No/New/Not-Yet SQL New stuff: Knowledge.
Goodbye rows and tables, hello documents and collections.
Modern Databases NoSQL and NewSQL Willem Visser RW334.
NoSQL Not Only SQL Edel Sherratt. What is NoSQL? Not Only SQL Large volumes of data No schema Partition tolerance – scale by adding more commodity servers.
Changwon Nati Univ. ISIE 2001 CSCI5708 NoSQL looks to become the database of the Internet By Lawrence Latif Wed Dec Nhu Nguyen and Phai Hoang CSCI.
Chapter 1 Introduction Yonsei University 1 st Semester, 2015 Sanghyun Park.
CSE 3330 Database Concepts MongoDB. Big Data Surge in “big data” Larger datasets frequently need to be stored in dbs Traditional relational db were not.
Lecture 8: Databases and Data Infrastructure CS 6071 Big Data Engineering, Architecture, and Security Fall 2015, Dr. Rozier.
Chapter 1 Introduction Yonsei University 1 st Semester, 2014 Sanghyun Park.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
NoSQL Systems Motivation. NoSQL: The Name  “SQL” = Traditional relational DBMS  Recognition over past decade or so: Not every data management/analysis.
NOSQL DATABASE Not Only SQL DATABASE
CPT-S Advanced Databases 11 Yinghui Wu EME 49.
NoSQL: Graph Databases. Databases Why NoSQL Databases?
Data and Information Systems Laboratory University of Illinois Urbana-Champaign Data Mining Meeting Mar, From SQL to NoSQL Xiao Yu Mar 2012.
Big Data Yuan Xue CS 292 Special topics on.
Group members: Phạm Hoàng Long Nguyễn Huy Hùng Lê Minh Hiếu Phan Thị Thanh Thảo Nguyễn Đức Trí 1 BIG DATA & NoSQL Topic 1:
BIG DATA/ Hadoop Interview Questions.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
NoSQL: Graph Databases
Neo4j: GRAPH DATABASE 27 March, 2017
CSCI5570 Large Scale Data Processing Systems
CS 405G: Introduction to Database Systems
NoSQL: Graph Databases
and Big Data Storage Systems
CSE 775 – Distributed Objects Bekir Turkkan & Habib Kaya
Database Introduction
CS122B: Projects in Databases and Web Applications Winter 2017
Introduction In the computing system (web and business applications), there are enormous data that comes out every day from the web. A large section of.
MongoDB Er. Shiva K. Shrestha ME Computer, NCIT
Database Management System
Dremel.
Modern Databases NoSQL and NewSQL
NOSQL.
Introduction to NewSQL
NOSQL databases and Big Data Storage Systems
New Mexico State University
A Comparison of SQL and NoSQL Databases
NoSQL Systems Overview (as of November 2011).
Massively Parallel Cloud Data Storage Systems
1 Demand of your DB is changing Presented By: Ashwani Kumar
A brief history of data and databases
NOSQL and CAP Theorem.
NoSQL Databases An Overview
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
11/18/2018 2:14 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
NoSQL Databases Antonino Virgillito.
Overview of big data tools
NoSQL W2013 CSCI 2141.
Interpret the execution mode of SQL query in F1 Query paper
Chapter 1: Introduction
H-store: A high-performance, distributed main memory transaction processing system Robert Kallman, Hideaki Kimura, Jonathan Natkins, Andrew Pavlo, Alex.
April 13th – Semi-structured data
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Introduction to NoSQL Database Systems
Chapter 1: Introduction
NoSQL & Document Stores
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

History of Database Systems Two parts Jiannan Wang Simon Fraser University Jiannan @ SFU

Database Systems in a Half Century (1960s – 2010s) When What Early 1960 – Early 1970 The Navigational Database Empire Mid 1970 – Mid 1980 The Database World War I Mid 1980 – Early 2000 The Relational Database Empire Mid 2000 – Now The Database World War II References. https://en.wikipedia.org/wiki/Database#History What Goes Around Comes Around (Michael Stonebraker, Joe Hellerstein) 40 Years VLDB Panel Jiannan @ SFU

The Navigational Database Empire (Early1960 – Early 1970) Data Model How to organize data How to access data Navigational Data Model Organize data into a multi-dimensional space (i.e., A space of records) Access data by following pointers between records Inventor: Charles Bachman The 1973 ACM Turing Award Turing Lecture: “The Programmer As Navigator” Jiannan @ SFU

The Navigational Database Empire (Early1960 – Early 1970) Representative Navigational Database Systems Integrated Data Store (IDS), 1964, GE Information Management System (IMS), 1966, IBM Integrated Database Management System (IDMS), 1973, Goodrich CODASYL Short for “Conference/Committee on Data Systems Languages” Define navigational data model as standard database interface (1969) Jiannan @ SFU

The Birth of Relational Model Ted Codd Born in 1923 PHD in 1965 “A Relational Model of Data for Large Shared Data Banks” in 1970 Relational Model Organize data into a collection of relations Access data by a declarative language (i.e., tell me what you want, not how to find it) Data Independence Jiannan @ SFU

The Database World War I: Background One Slide (Navigational Model) Led by Charles Bachman (1973 ACM Turing Award) Has built mature systems Dominated the database market The other Slide (Relational Model) Led by Ted Codd (mathematical programmer, IBM) A theoretical paper with no system built Little support from IBM Jiannan @ SFU

The Database World War I: Three Big Campaigns 1. Which data model is better in theory? (Mid 1970) 2. Which data model is better in practice? (Late 1970 – Early 1980) 3. Which data model is better in business? (Early 1980 – Mid 1980) Jiannan @ SFU

The “Theory” Campaign A debate at ACM SIGFIDET (precursor of SIGMOD) 1974 Navigational model is bad Data Organization: So complex Data Access: No declarative language Relational model is bad Data Organization: A special case of navigational model Data Access: No system proof that declarative language is viable Jiannan @ SFU

The “Practice” Campaign The Big Question Can a relational database system perform as good as a navigational system? System prototypes Ingres at UC Berkeley (early and pioneering) System R at IBM (arguably got more stuff “right”) The System R Team Query Optimization (Patricia P. Griffiths et al.) SQL ( Donald D. Chamberlin et al.) Transaction (Jim Gray et al.) Jiannan @ SFU

The “Business” Campaign Commercialization of Relational Database Systems Not as easy as we thought Three reasons (required) that led relational database systems to won The minicomputer revolution (1977) Competing products (e.g. IDMS) could not be ported to the minicomputer Relational front end was not added to navigational database systems Jiannan @ SFU

What Can We Learn? The winning of theory The winning of practice Lesson 1. Lesson 2. Lesson 3. The winning of theory The winning of practice The winning of practice The winning of business Everyone can get a chance to win Jiannan @ SFU

The Relational Database Empire (Mid1980 – Early 2000) Parallel and distributed DBs (1980 – 1990) SystemR*, Distributed Ingres, Gamma, etc. Objected-oriented DB (1980 – 1990) Objects: Data/Code Integration Extensibility: User-defined functions, User-defined data types MySQL and PostgresSQL (1990s) Widely used open-source relational DB systems Jiannan @ SFU

Database Systems in a Half Century (1960s – 2010s) When What Early 1960 – Early 1970 The Navigational Database Empire Mid 1970 – Mid 1980 The Database World War I Mid 1980 – Early 2000 The Relational Database Empire Mid 2000 – Now The Database World War II Why do we talk about History? - Help you to learn new tools - Avoid you to reinvent wheels Navigational DB exists before relational DB Why relational DB can win the war Relational DB dominates the market and people think this is a solved problem The DB world War References. https://en.wikipedia.org/wiki/Database#History What Goes Around Comes Around (Michael Stonebraker, Joe Hellerstein) 40 Years VLDB Panel Jiannan @ SFU

The Database World War II: Background Internet Boom (Early 2000) Larger data volume that cannot be fit in a single machine Faster data updates that cannot be handled by a single machine Commercial distributed database systems are expensive  Open-source database systems do not support distributed computing well  Jiannan @ SFU

OLTP OLAP Workload Workload OnLine Transaction Processing OnLine Analytical Processing Workload High-frequent Updates + Small Queries Workload Low-frequent Updates + Big Queries Jiannan @ SFU

The Database World War II: Two Big Campaigns Which is better for (distributed) OLTP: NoSQL vs. Relational DBMS? Which is better for (distributed) OLAP: MapReduce vs. Relational DBMS? Jiannan @ SFU

The Database World War II: Two Big Campaigns Which is better for OLTP: NoSQL vs. Relational DBMS? Which is better for OLAP: MapReduce vs. Relational DBMS? Jiannan @ SFU

What Happened To OLTP? OldSQL (1970 – Now) NoSQL (2000 – Now) NewSQL (2010 – Now) Jiannan @ SFU

OldSQL (1970 – Now) Still very big market!!! Traditional SQL vendors Limitation 1: Not Scalable Limitation 2: Pre-defined Schema . . . Still very big market!!! Jiannan @ SFU

The advent of Web 2.0 Read-only Web  Read-write Web Highly Scalable Scale to1,000,000 users and 1000 servers Highly Available Available 24 hours a day, 7 days a week Highly Flexible Flexible schema and flexible data types Jiannan @ SFU

NoSQL Pioneers Memcached [Fitzpatrick 2004] In-memory indexes can be highly scalable BigTable [Chang et al. 2006] Persistent record storage could be scaled to thousands of nodes Dynamo [DeCandia et al. 2007] Eventual consistency allows for higher availability and scalability Jiannan @ SFU

NoSQL Categories NoSQL Data Model Example Systems Key-value Stores Hash DynamoDB, Riak, Redis, Membase Document Stores Json SimpleDB, CouchBase, MongoDB Wide-column Stores Big Table Hbase, Cassandra, HyperTable Graph Database Graph Neo4J, InfoGrid, GraphBase DynamoDB, Riak, Redis, Membase, SimpleDB, CouchBase, MongoDB, Hbase, Cassandra, HyperTable, Neo4J, InfoGrid, GraphBase, … Jiannan @ SFU

NoSQL Limitations Low-level Language Weak Consistency Simple read/write database operators Weak Consistency Eventual Consistency Lack of Standardization 100+ NoSQL systems Jiannan @ SFU

NewSQL + 90% query time spent on overhead Strong Consistency High Scalability + 90% query time spent on overhead Jiannan @ SFU

NewSQL Market Limitations Available but not highly available Scalable but not highly scalable Available but not highly available Flexible but not highly flexible Jiannan @ SFU

The Database World War II: Two Big Campaigns Which is better for OLTP: NoSQL vs. Relational DBMS? Which is better for OLAP: MapReduce vs. Relational DBMS? Jiannan @ SFU

How to store 1PB using 10,000 machines? GFS (HDFS) = 100GB *10,000 Machines How to store 1PB using 10,000 machines? GFS (HDFS) How to process 1PB using 10,000 machines? MapReduce Jiannan @ SFU

(Your program will be OK when failures happen) Why MapReduce? 1. Fault Tolerant (Your program will be OK when failures happen) # of machines Failure Probability 1 0.1% 10 0.9% 100 9.5% 1000 63.2% 10,000 ??? Cost for 5 nodes Failure Probability $100/day 0.1% $1/day 10% Reserved Instance Spot Instance 99.9% Jiannan @ SFU

Why MapReduce? 2. Complex Analytics 3. Heterogeneous Storage Systems SQL Machine Learning Graph Processing 2. Complex Analytics MapReduce 3. Heterogeneous Storage Systems Jiannan @ SFU

The Great MapReduce Debate (2008-2010) http://tiny.cc/mapreduce-debate Jiannan @ SFU

MapReduce vs. SQL SELECT Map(v) Map (k, v) FROM Table SELECT Reduce(v) GROUP BY k Reduce (k, v_list) Jiannan @ SFU

MapReduce: A Major Step Backwards 1. MapReduce is a step backwards in database access 2. MapReduce is a poor implementation 3. MapReduce is not novel 4. MapReduce is missing features 5. MapReduce is incompatible with the DBMS tools Dewitt, D. and Stonebraker, M. MapReduce: A Major Step Backwards blogpost; January 17, 2008 Jiannan @ SFU

Comments From The Other Side vs. MapReduce is a program model rather than a database system Jiannan @ SFU

From Stonebraker et al. From Dean and Ghemawat Jiannan @ SFU

What They Agree On? Fault Tolerant Complex Analytics Advantages of MapReduce: Fault Tolerant Complex Analytics Heterogeneous Storage Systems No Data Loading Requirement Both should Learn from Each Other Jiannan @ SFU

Who won the debate? Nobody is writing MapReduce code right now. Many new systems (e.g., Spark, HIVE) were built on MapReduce Jiannan @ SFU

Summary Why Relational Database? ( Mid 1970 – Now) Why NoSQL? (Mid 2000 – Now) Why MapReduce? (Mid 2000 – Now) Jiannan @ SFU