Daniel Abadi Yale University. * The Big Data phenomenon is the best thing that could have happened to the database community * Despite other definitions.

Slides:



Advertisements
Similar presentations
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
Advertisements

Real-Time Big Data Use Cases John Leach CTO, Splice Machine.
Database Scalability, Elasticity, and Autonomy in the Cloud Agrawal et al. Oct 24, 2011.
Distributed Systems Fall 2010 Replication Fall 20105DV0203 Outline Group communication Fault-tolerant services –Passive and active replication Highly.
Evaluation of distributed open source solutions in CERN database use cases HEPiX, spring 2015 Kacper Surdy IT-DB-DBF M. Grzybek, D. L. Garcia, Z. Baranowski,
What Should the Design of Cloud- Based (Transactional) Database Systems Look Like? Daniel Abadi Yale University March 17 th, 2011.
Session – 6 DISTRIBUTED DATABASE ARCHITECTURE Matakuliah: M0184 / Pengolahan Data Distribusi Tahun: 2005 Versi:
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Inventory Management System With Berkeley DB 1. What is Berkeley DB? Berkeley DB is an Open Source embedded database library that provides scalable, high-
CS346: Advanced Databases Graham Cormode Term 2.
CMU SCS Carnegie Mellon Univ. Dept. of Computer Science /615 - DB Applications C. Faloutsos – A. Pavlo How to Scale a Database System.
NoSQL Database.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
IBM Haifa Research 1 The Cloud Trade Off IBM Haifa Research Storage Systems.
NoSQL W2013 CSCI 2141.
Manage & Configure SQL Database on the Cloud Haishi Bai Technical Evangelist Microsoft.
Databases with Scalable capabilities Presented by Mike Trischetta.
H ADOOP DB: A N A RCHITECTURAL H YBRID OF M AP R EDUCE AND DBMS T ECHNOLOGIES FOR A NALYTICAL W ORKLOADS By: Muhammad Mudassar MS-IT-8 1.
:: Conférence :: NoSQL / Scalabilite Etat de l’art Samuel BERTHE10 Mars 2014Epitech Nantes.
1. Big Data A broad term for data sets so large or complex that traditional data processing applications ae inadequate. 2.
Getting Biologists off ACID Ryan Verdon 3/13/12. Outline Thesis Idea Specific database Effects of losing ACID What is a NoSQL database Types of NoSQL.
HadoopDB project An Architetural hybrid of MapReduce and DBMS Technologies for Analytical Workloads Anssi Salohalla.
Introduction to Hadoop and HDFS
Modern Databases NoSQL and NewSQL Willem Visser RW334.
1 Moshe Shadmon ScaleDB Scaling MySQL in the Cloud.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
NoSQL Databases Oracle - Berkeley DB. Content A brief intro to NoSQL About Berkeley Db About our application.
When bet365 met Riak and discovered a true, “always on” database.
S.Sathya M.Victor Jose Department of Computer Science and Engineer Noorul Islam Centre for Higher Education Kumaracoil,Tamilnadu,IndiaPROCEEDINGS OF ICETECT.
Info Systems Fall 2013 . The modern role of often not-so-modern database technology  We will look at MySQL SQL PHP  NoSQL DBs Mongo and GUIs for it.
SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School of Information IS 257: Database Management.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
Logical Database Design Chapter 4 G. Green 1. Agenda Evolution of Data Models Chapter 1 pgs 25 – 28 Chapter 9 pgs 409 – 418 Relational Database Model.
Highly available database clusters with JDBC
CAP Theorem Justin DeBrabant CIS Advanced Systems - Fall 2013.
BTM 382 Database Management Chapter 2: Data models Chapter : CAP and Hadoop Chitu Okoli Associate Professor in Business Technology Management John.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.
NOSQL DATABASE Not Only SQL DATABASE
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
Very Brief Background on RDBMSs, Big Data/NoSQL Systems, Machine Learning AnHai Doan.
Amirhossein Saberi May CASSANDRA NAME A daughter of the Trojan king Priam, who was given the gift of prophecy by Apollo. When she cheated him, however,
Intro to NoSQL Databases Tony Hannan November 2011.
CSCI5570 Large Scale Data Processing Systems
NO SQL for SQL DBA Dilip Nayak & Dan Hess.
Database Services Katarzyna Dziedziniewicz-Wojcik On behalf of IT-DB.
An Open Source Project Commonly Used for Processing Big Data Sets
Chapter 14 Big Data Analytics and NoSQL
Operational & Analytical Database
Modern Databases NoSQL and NewSQL
NOSQL.
Introduction to NewSQL
View Change Protocols and Reconfiguration
A Comparison of SQL and NoSQL Databases
1 Demand of your DB is changing Presented By: Ashwani Kumar
NoSQL Databases An Overview
Tiers vs. Layers.
NoSQL W2013 CSCI 2141.
The PROCESS of Queries John Deardurff Website: ThatAwesomeTrainer.com
The PROCESS of Queries John Deardurff
Database Architecture
relational thoughts on NoSql
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Transaction Properties: ACID vs. BASE
View Change Protocols and Reconfiguration
Distributed Database Management Systems
NoSQL & Document Stores
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Daniel Abadi Yale University

* The Big Data phenomenon is the best thing that could have happened to the database community * Despite other definitions related to ‘3 Vs’ --- Big Data means BIG Data * Which means we need scalable database systems * Still two main components of Big Data * Performing data analysis at scale * Performing requests on data at scale

* Database community has won the battle * Some thought that MapReduce might replace traditional database technology as the primary means to perform analysis at scale * Just about every MapReduce vendor has abandoned this goal * Hadapt, Impala, Tez, and several others are in a race to see who can add the most traditional database execution technology to Hadoop fastest * Everyone is going in the direction of cost-based optimizers, traditional database operators, and push-based query execution

* The database community is losing the battle * NoSQL systems still have very little traditional database technology inside (despite adding SQL interfaces) * No race to add DB technology --- why? * Don’t blame CAP --- CAP is only relevant when there’s a network partition * We never figured out how to do ACID and active replication at scale * Many new proposals make simplifying assumptions in order to handle scale * It’s been 30 years ---- why can’t we build a distributed database that can handle distributed transactions over actively replicated data at scale?