2014.11.25- SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School of Information IS 257: Database Management.

Slides:



Advertisements
Similar presentations
Andy Pavlo April 13, 2015April 13, 2015April 13, 2015 NewS QL.
Advertisements

CS525: Special Topics in DBs Large-Scale Data Management MapReduce High-Level Langauges Spring 2013 WPI, Mohamed Eltabakh 1.
HadoopDB Inneke Ponet.  Introduction  Technologies for data analysis  HadoopDB  Desired properties  Layers of HadoopDB  HadoopDB Components.
BigData Tools Seyyed mohammad Razavi. Outline  Introduction  Hbase  Cassandra  Spark  Acumulo  Blur  MongoDB  Hive  Giraph  Pig.
Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.
HBase Presented by Chintamani Siddeshwar Swathi Selvavinayakam
Hive: A data warehouse on Hadoop
NoSQL and NewSQL Justin DeBrabant CIS Advanced Systems - Fall 2013.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Daniel Abadi Yale University. * The Big Data phenomenon is the best thing that could have happened to the database community * Despite other definitions.
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-1 HDFS itself is “big” Why do we need “hbase” that is bigger and more complex? Word count, web logs.
Raghav Ayyamani. Copyright Ellis Horowitz, Why Another Data Warehousing System? Problem : Data, data and more data Several TBs of data everyday.
Hive – A Warehousing Solution Over a Map-Reduce Framework Presented by: Atul Bohara Feb 18, 2014.
SQL on Hadoop. Todays agenda Introduction Hive – the first SQL approach Data ingestion and data formats Impala – MPP SQL.
Massively Parallel Cloud Data Storage Systems S. Sudarshan IIT Bombay.
Hive: A data warehouse on Hadoop Based on Facebook Team’s paperon Facebook Team’s paper 8/18/20151.
HADOOP ADMIN: Session -2
Distributed Data Stores and No SQL Databases S. Sudarshan IIT Bombay.
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
The Multiple Uses of HBase Jean-Daniel Cryans, DB Berlin Buzzwords, Germany, June 7 th,
Hive : A Petabyte Scale Data Warehouse Using Hadoop
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
Distributed Data Stores and No SQL Databases S. Sudarshan Perry Hoekstra (Perficient) with slides pinched from various sources such as Perry Hoekstra (Perficient)
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
SLIDE 1IS 257 – Fall 2014 MapReduce, HBase, Pig and Hive University of California, Berkeley School of Information IS 257: Database Management.
© , OrangeScape Technologies Limited. Confidential 1 Write Once. Cloud Anywhere. Building Highly Scalable Web applications BASE gives way to ACID.
Goodbye rows and tables, hello documents and collections.
Distributed Indexing of Web Scale Datasets for the Cloud {ikons, eangelou, Computing Systems Laboratory School of Electrical.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
Modern Databases NoSQL and NewSQL Willem Visser RW334.
Hive Facebook 2009.
1 Yasin N. Silva Arizona State University This work is licensed under a Creative Commons Attribution-NonCommercial-ShareAlike 4.0 International License.
Storage and Analysis of Tera-scale Data : 2 of Database Class 11/24/09
Introduction to Hadoop Programming Bryon Gill, Pittsburgh Supercomputing Center.
An Introduction to HDInsight June 27 th,
+ Hbase: Hadoop Database B. Ramamurthy. + Motivation-0 Think about the goal of a typical application today and the data characteristics Application trend:
A NoSQL Database - Hive Dania Abed Rabbou.
Data and SQL on Hadoop. Cloudera Image for hands-on Installation instruction – 2.
SLIDE 1IS 240 – Spring 2013 MapReduce, HBase, and Hive University of California, Berkeley School of Information IS 257: Database Management.
Hive. What is Hive? Data warehousing layer on top of Hadoop – table abstractions SQL-like language (HiveQL) for “batch” data processing SQL is translated.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
IBM Research ® © 2007 IBM Corporation A Brief Overview of Hadoop Eco-System.
NoSQL Or Peles. What is NoSQL A collection of various technologies meant to work around RDBMS limitations (mostly performance) Not much of a definition...
Nov 2006 Google released the paper on BigTable.
NOSQL DATABASE Not Only SQL DATABASE
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Cloudera Kudu Introduction
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
CMU SCS KDD '09Faloutsos, Miller, Tsourakakis P8-1 Large Graph Mining: Power Tools and a Practitioner’s guide Task 8: hadoop and Tera/Peta byte graphs.
Learn. Hadoop Online training course is designed to enhance your knowledge and skills to become a successful Hadoop developer and In-depth knowledge of.
BIG DATA. Big Data: A definition Big data is a collection of data sets so large and complex that it becomes difficult to process using on-hand database.
BIG DATA/ Hadoop Interview Questions.
B ig D ata Analysis for Page Ranking using Map/Reduce R.Renuka, R.Vidhya Priya, III B.Sc., IT, The S.F.R.College for Women, Sivakasi.
1 Gaurav Kohli Xebia Breaking with DBMS and Dating with Relational Hbase.
Image taken from: slideshare
CSCI5570 Large Scale Data Processing Systems
How did it start? • At Google • • • • Lots of semi structured data
INTRODUCTION TO PIG, HIVE, HBASE and ZOOKEEPER
CS122B: Projects in Databases and Web Applications Winter 2017
MapReduce, HBase, Pig and Hive
Operational & Analytical Database
Modern Databases NoSQL and NewSQL
Massively Parallel Cloud Data Storage Systems
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Overview of big data tools
Cloud Computing for Data Analysis Pig|Hive|Hbase|Zookeeper
Pig Hive HBase Zookeeper
Presentation transcript:

SLIDE 1IS 257 – Fall 2014 NewSQL and VoltDB University of California, Berkeley School of Information IS 257: Database Management

SLIDE 2 Project Presentation Sign up at IS 257 – Fall 2014

SLIDE 3 History of the World, Part 1 Relational Databases – mainstay of business Web-based applications caused spikes –Especially true for public-facing e-Commerce sites Developers begin to front RDBMS with memcache or integrate other caching mechanisms within the application (ie. Ehcache)

SLIDE 4 Scaling Up Issues with scaling up when the dataset is just too big RDBMS were not designed to be distributed Began to look at multi-node database solutions Known as ‘scaling out’ or ‘horizontal scaling’ Different approaches include: –Master-slave –Sharding

SLIDE 5 Scaling RDBMS – Master/Slave Master-Slave –All writes are written to the master. All reads performed against the replicated slave databases –Critical reads may be incorrect as writes may not have been propagated down –Large data sets can pose problems as master needs to duplicate data to slaves

SLIDE 6 Scaling RDBMS - Sharding Partition or sharding –Scales well for both reads and writes –Not transparent, application needs to be partition-aware –Can no longer have relationships/joins across partitions –Loss of referential integrity across shards

SLIDE 7 Other ways to scale RDBMS Multi-Master replication INSERT only, not UPDATES/DELETES No JOINs, thereby reducing query time –This involves de-normalizing data In-memory databases

SLIDE 8 NoSQL NoSQL databases adopted these approaches to scaling, but lacked ACID transaction and SQL At the same time, many Web-based services needed to deal with Big Data (the Three V’s we looked at last time) and created custom approaches to do this In particular, MapReduce… IS 257 – Fall 2014

SLIDE 9 MapReduce and Hadoop MapReduce developed at Google MapReduce implemented in Nutch –Doug Cutting at Yahoo! –Became Hadoop (named for Doug’s child’s stuffed elephant toy) IS 257 – Fall 2014

SLIDE 10 Example Page 1: the weather is good Page 2: today is good Page 3: good weather is good. From “MapReduce: Simplified data Processing… ”, Jeffrey Dean and Sanjay Ghemawat IS 257 – Fall 2014

SLIDE 11 Map output Worker 1: –(the 1), (weather 1), (is 1), (good 1). Worker 2: –(today 1), (is 1), (good 1). Worker 3: –(good 1), (weather 1), (is 1), (good 1). From “MapReduce: Simplified data Processing… ”, Jeffrey Dean and Sanjay Ghemawat IS 257 – Fall 2014

SLIDE 12 Reduce Input Worker 1: –(the 1) Worker 2: –(is 1), (is 1), (is 1) Worker 3: –(weather 1), (weather 1) Worker 4: –(today 1) Worker 5: –(good 1), (good 1), (good 1), (good 1) From “MapReduce: Simplified data Processing… ”, Jeffrey Dean and Sanjay Ghemawat IS 257 – Fall 2014

SLIDE 13 Reduce Output Worker 1: –(the 1) Worker 2: –(is 3) Worker 3: –(weather 2) Worker 4: –(today 1) Worker 5: –(good 4) From “MapReduce: Simplified data Processing… ”, Jeffrey Dean and Sanjay Ghemawat IS 257 – Fall 2014

SLIDE 14 But – Raw Hadoop means code Most people don’t want to write code if they don’t have to Various tools layered on top of Hadoop give different, and more familiar, interfaces Hbase – intended to be a NoSQL database abstraction for Hadoop Hive and it’s SQL-like language IS 257 – Fall 2014

SLIDE 15 Introduction to Pig PIG – A data-flow language for MapReduce

SLIDE 16 Pig Latin Data flow language –User specifies a sequence of operations to process data –More control on the processing, compared with declarative language Various data types are supported ”Schema”s are supported User-defined functions are supported 16

SLIDE 17 Motivation by Example Suppose we have user data in one file, website data in another file. We need to find the top 5 most visited pages by users aged

SLIDE 18 Hive - SQL on top of Hadoop IS 257 – Fall 2014

SLIDE 19 Hive A database/data warehouse on top of Hadoop –Rich data types (structs, lists and maps) –Efficient implementations of SQL filters, joins and group-by’s on top of map reduce Allow users to access Hive data without using Hive Link: – /trunk/ IS 257 – Fall 2014

SLIDE 20 Hive Architecture HDFS Hive CLI DDL Queries Browsing Map Reduce SerDe ThriftJuteJSON Thrift API MetaStore Web UI Mgmt, etc Hive QL PlannerExecutionParser Planner IS 257 – Fall 2014

SLIDE 21 Overview Review –Big Data Technologies –Hadoop, Pig, Hive… NewSQL –The concept –VoltDB IS 257 – Fall 2014

SLIDE 22 Spark One problem with Hadoop/MapReduce is that it is fundamental batch oriented, and everything goes through a read/write on HDFS for every step in a dataflow Spark was developed to leverage the main memory of distributed clusters and to, whenever possible, use only memory-to- memory data movement (with other optimizations Can give up to 100fold speedup over MR IS 257 – Fall 2014

SLIDE 23 Meanwhile… The database community continues to develop new approaches, some of which try to provide the benefits of SQL and ACID relations to big data As we have seen before there is a broad spectrum of database systems and capabilities… IS 257 – Fall 2014

SLIDE 24IS 257 – Fall 2014

SLIDE 25 NewSQL NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional RDBMS IS 257 – Fall 2014

SLIDE 26 NewSQL NewSQL systems focus on workloads that have large numbers of transactions that are: –Short duration –Touch a small fraction of data in the DB using index lookups (i.e., no full table scans or large distributed joins) –Repetitive (i.e. executing the same queries with different inputs) IS 257 – Fall 2014

SLIDE 27 NewSQL There are many different architectures for NewSQL DBs –E.g., Google Spanner, SAP HANA, Clustrix, VoltDB, etc. We are going to look at just one example (VoltDB) and see how it works for very high-velocity workloads.. IS 257 – Fall 2014