Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research.

Slides:



Advertisements
Similar presentations
Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research.
Advertisements

Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research.
Starfish: A Self-tuning System for Big Data Analytics.
Arjun Suresh S7, R College of Engineering Trivandrum.
SDN + Storage.
Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.
© Hortonworks Inc Daniel Dai Thejas Nair Page 1 Making Pig Fly Optimizing Data Processing on Hadoop.
Store RDF Triples In A Scalable Way Liu Long & Liu Chunqiu.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
Spark: Cluster Computing with Working Sets
Clydesdale: Structured Data Processing on MapReduce Jackie.
A Comparison of Join Algorithms for Log Processing in MapReduce Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita, Yuanyuan Tian.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
Distributed and Streaming Evaluation of Batch Queries for Data-Intensive Computational Turbulence Kalin Kanov Department of Computer Science Johns Hopkins.
PARALLEL DBMS VS MAP REDUCE “MapReduce and parallel DBMSs: friends or foes?” Stonebraker, Daniel Abadi, David J Dewitt et al.
1 A Comparison of Approaches to Large-Scale Data Analysis Pavlo, Paulson, Rasin, Abadi, DeWitt, Madden, Stonebraker, SIGMOD’09 Shimin Chen Big data reading.
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Hadoop: The Definitive Guide Chap. 8 MapReduce Features
Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google∗
Introduction to Column-Oriented Databases Seminar: Columnar Databases, Nov 2012, Univ. Helsinki.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
1 The Google File System Reporter: You-Wei Zhang.
CHP - 9 File Structures. INTRODUCTION In some of the previous chapters, we have discussed representations of and operations on data structures. These.
MapReduce VS Parallel DBMSs
A Comparison of Join Algorithms for Log Processing in MapReduce Spyros Blanas, Jignesh M. Patel(University of Wisconsin-Madison) Eugene J. Shekita, Yuanyuan.
CoHadoop: Flexible Data Placement and Its Exploitation in Hadoop
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
HBase A column-centered database 1. Overview An Apache project Influenced by Google’s BigTable Built on Hadoop ▫A distributed file system ▫Supports Map-Reduce.
Panagiotis Antonopoulos Microsoft Corp Ioannis Konstantinou National Technical University of Athens Dimitrios Tsoumakos.
Storage in Big Data Systems
Physical Database Design & Performance. Optimizing for Query Performance For DBs with high retrieval traffic as compared to maintenance traffic, optimizing.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Hypertable Doug Judd Zvents, Inc.. hypertable.org Background.
Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.
Daniel J. Abadi · Adam Marcus · Samuel R. Madden ·Kate Hollenbach Presenter: Vishnu Prathish Date: Oct 1 st 2013 CS 848 – Information Integration on the.
ICPP 2012 Indexing and Parallel Query Processing Support for Visualizing Climate Datasets Yu Su*, Gagan Agrawal*, Jonathan Woodring † *The Ohio State University.
C-Store: How Different are Column-Stores and Row-Stores? Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 8, 2009.
Indexing HDFS Data in PDW: Splitting the data from the index VLDB2014 WSIC、Microsoft Calvin
1 HDF5 Life cycle of data Boeing September 19, 2006.
CPSC 404, Laks V.S. Lakshmanan1 External Sorting Chapter 13: Ramakrishnan & Gherke and Chapter 2.3: Garcia-Molina et al.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Physical Database Design Purpose- translate the logical description of data into the technical specifications for storing and retrieving data Goal - create.
I MPLEMENTING FILES. Contiguous Allocation:  The simplest allocation scheme is to store each file as a contiguous run of disk blocks (a 50-KB file would.
Infrastructure for Data Warehouses. Basics Of Data Access Data Store Machine Memory Buffer Memory Cache Data Store Buffer Bus Structure.
A Comparison of Join Algorithms for Log Processing in MapReduce SIGMOD 2010 Spyros Blanas, Jignesh M. Patel, Vuk Ercegovac, Jun Rao, Eugene J. Shekita,
Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
1 Chapter 9 Tuning Table Access. 2 Overview Improve performance of access to single table Explain access methods – Full Table Scan – Index – Partition-level.
Most slides & Paper by: Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While.
--A Gem of SQL Server 2012, particularly for Data Warehousing-- Present By Steven Wang.
Only Aggressive Elephants are Fast Elephants Nov 11 th 2013 Database Lab. Wonseok Choi.
Oracle Announced New In- Memory Database G1 Emre Eftelioglu, Fen Liu [09/27/13] 1 [1]
Copyright © 2010 The HDF Group. All Rights Reserved1 Data Storage and I/O in HDF5.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Module 11: File Structure
MapReduce Computing Paradigm Basics Fall 2013 Elke A. Rundensteiner
Lecture 10: Buffer Manager and File Organization
MapReduce Simplied Data Processing on Large Clusters
湖南大学-信息科学与工程学院-计算机与科学系
Degree-aware Hybrid Graph Traversal on FPGA-HMC Platform
Chapter 13: Data Storage Structures
Four Rules For Columnstore Query Performance
Indexing 4/11/2019.
Chapter 13: Data Storage Structures
Chapter 13: Data Storage Structures
Map Reduce, Types, Formats and Features
Presentation transcript:

Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research Center) Sandeep Tata (IBM Almaden Research Center) Column-Oriented Storage Techniques for MapReduce 1

Motivation Parallel DBMS MapReduce Column – Oriented Storage Performance Ease of use Fault tolerance…. 2

Challenges 3 How to incorporate columnar–storage into an existing MR system (Hadoop) without changing its core parts? How can columnar-storage operate efficiently on top of a DFS (HDFS)? Unique problems to MapReduce : The use of complex data types which are common in many MapReduce jobs. Hadoop’s choice of Java as its default programming language.

Outline Address problems Column-Oriented Storage Lazy Tuple Construction Compression Experimental Evaluation Conclusions 4

Complex Data Types 5 The use of complex types causes two major problems: Deserialization costs Switching to a binary storage format can improve Hadoop’s scan performance by 3x. The lack of effective columnoriented compression techniques. Column-oriented storage formats tend to exhibit better compression ratios.

Hadoop’s choice of Java 6 Java require deserialization. The overhead of deserializing and creating the objects corresponding to a complex type can be substantial. Lazy Record Construction mitigates the deserialization overhead in Hadoop.

Outline Address problems Column-Oriented Storage Its interaction with Hadoop’s data replication and scheduling. Lazy Tuple Construction Compression Experimental Evaluation Conclusions 7

Column-Oriented Storage in Hadoop 8 Implement a column-oriented storage format How to generate roughly equal sized splits to guarantee effectively parallelized over the cluster. How to make sure corresponding values from different columns in the dataset are co-located on the same node. HDFS does not provide any co-location guarantees

A dataset with three columns c1,c2,c3 which are stored in three different files. They are randomly spread over the cluster. 9 We will introduce a new format to avoid these problems Remotely accessing will occur!

Row-Store:Merits/Limits with MapReduce 10 Data loading is fast(no additional processing). All columns of a data row are located in the same HDFS block. Not all columns are used(unnecessary storage bandwidth). Compression of different types may add additional overhead. I used one slide of Professor Xiaodong Zhang's as reference Data loading is fast(no additional processing). All columns of a data row are located in the same HDFS block. Not all columns are used(unnecessary storage bandwidth). Compression of different types may add additional overhead. I used one slide of Professor Xiaodong Zhang's as reference

Column-Store:Merits/Limits with MapReduce 11 Unnecessary I/O costs can be avoided:Only needed columns are loaded,and easy compression. Additional network transfers for column grouping Unnecessary I/O costs can be avoided:Only needed columns are loaded,and easy compression. Additional network transfers for column grouping

Read Operation in Row-store 12 Read local rows concurrently Discard unneeded columns Read local rows concurrently Discard nuneeded columns

Read Operation in Column-store 13 This slide is made by Professor Zhang RCFile Format—Avoids these problems which occur in Row- store and Column-store

Goals of RCFile 14 Eliminate unnecessary I/O costs like Column-store Only read needed columns from disks Eliminate network costs in row construction like Row-store Keep the fast data loading speed of Row-store Can apply efficient data compression algorithms Conveniently like Column-store. Eliminate all the limits of Row-store and Column-store.

RCFile Format—Avoids these problems 15 A fast and space-efficient placement structure. Metadata Describes the columns in the data region and their starting offsets. The number of rows in the Data Region Packing all columns. a special synchronization marker Metadata Data region: The laid out is in a column-oriented fashion. Data region: The laid out is in a column-oriented fashion. A row group

RCFile:Partitioning a Table into Row Groups 16

Inside a Row Group 17

RCFile: Inside each Row Group 18

RCFile:Distributed Row-Group Data among Nodes 19

Optimize RCFile 20 Main disadvantage Tuning the row-group size is critical. Extra metadata needs to be written for each row group, leading to additional space overhead. Adding a column to a dataset is expensive. The entire dataset has to be read and each block re-written.

CIF Storage Format 21 A dataset is loaded Break the dataset into smaller partitions. Each partition referred to as a split-directory. Each partition contains a set of files, one per column in the dataset. An additional file describing the schema is also kept in each split-directory. How to guarantee co-location of a row?

ColumnPlacementPolicy(CPP) 22 CPP is a new HDFS block placement policy which can solve the problem of co-locating. CPP guarantees that the files corresponding to the different columns of a split are always co-located across replicas. HDFS allows its placement policy to be changing by setting the configuration property “dfs.block.replicator.classname”to point to the appropriate class.

Column-Oriented Storage in CIF format in Hadoop NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} John45“hobbies”: {tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} 1 st node 2 nd node NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} NameAgeInfo John45“hobbies”:{tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} Name Joe David Age Info “hobbies”: {tennis} “friends”:{Ann, Nick} “friends”: {George} Name John Smith Age Info “hobbies”: {tennis, golf} “hobbies”: {swimming} “friends”: {Helen} 23

Replication and Co-location HDFS Replication Policy Node ANode BNode CNode D NameAgeInfo Joe23“hobbies”: {tennis} “friends”: {Ann, Nick} David32“friends”: {George} John45“hobbies”: {tennis, golf} Smith65“hobbies”: {swimming} “friends”: {Helen} Name Joe David Age Info “hobbies”: {tennis} “friends”:{Ann, Nick} “friends”: {George} Name Joe David Name Joe David Age Age Info “hobbies”: {tennis} “friends”: {Ann,Nick} “friends”: {George} Info “hobbies”: {tennis} “friends”:{Ann, Nick} “friends”: {George} CPP 24 Perhaps this slide is made by the author.

Outline Column-Oriented Storage Lazy Tuple Construction It is used to mitigate the deserialization overhead in Hadoop,as well as eliminate disk I/O. Compression Experiments Conclusions 25

Implementation 26 The basic idea: Deserialize only those columns of a record that are actually accessed in a map function. We use one class called LazyRecord. CurPos pointer:It keeps track of the current record the map function is working on. LastPos pointer:It keeps track of the last record that was actually read and deserialized for a particular column file.

Class MyMapper { void map (NullWritable key, Record rec) { String url = (String) rec.get("url"); if (url.contains("ibm.com/jp")) output.collect(null, rec.get("metadata").get("content- type")); } } 27 Each time RecordReader is asked to read the next record, it increments curPos. No bytes are actually read or deserialized until one of the get() methods is called on the resulting Record object.

Example AgeName Record if (age < 35) return name Joe David John Mary Ann Map Method 23Joe 32David No bytes are actually read if age > 35. We avoid reading and deserializing the name field. 28

Skip List Format 29 A skip list format can be used within each column file to efficiently skip records. A column file contains two kinds of values: Regular serialized values. Skip blocks. Skip blocks contain information about byte offsets to enable skipping the next N records. Skip() method Called by LazyRecord as skip(curPos-lastPos)

Example Age “hobbies”: tennis “friends” : Ann, Nick Null “friends” : George Info Skip10 = 2013 Skip100 = Skip 10 = 1246 … “hobbies”: tennis, golf 10 rows 100 rows … … if (age < 35) return hobbies … … 30

Outline Column-Oriented Storage Lazy Record Construction Compression We propose two schemes to compress columns of complex types Both are amenable to lazy decompression. Experiments Conclusions 31

Compressed Blocks 32 Compress a block of contiguous column values. The compressed block size is set at load time. Compression ratio and the decompression overhead are affected. A header indicates the number of records in a compressed block and the block’s size. Advantage: One block can be skipped if no values are accessed in it Disadvantage: If a value in the block is accessed, the entire block needs to be decompressed.

Dictionary Compressed Skip List 33 This scheme is tailored for map-typed columns.  Build a dictionary of keys for each block of map values.  Store the compressed keys in a map using a skip list format. Disadvantage: Provide a worse compression ratio but with lower CPU overhead for decompression. Advantage:A value can be accessed without having to decompress an entire block of values.

Outline Column-Oriented Storage Lazy Record Construction Compression Experiments Conclusions 34

Experimental Setup 42 node cluster Each node: 8 cores 32 GB main memory 5 500GB SATA 1.0 disks Network : 1Gbit ethernet switch Hadoop version:

Overhead of Columnar Storage Synthetic Dataset 57GB 13 columns 6 Integers, 6 Strings, 1 Map Query Select * 36 Single node experiment Using a binary format can dramatically improve Hadoop’s performance Scan time

Benefits of Column-Oriented Storage Query Projection of different columns 37 Single node experiment CIF reading much less data than SEQ leads to the speedup CIF reading much less data than SEQ leads to the speedup Gathering data from columns stored in different files incurs additional seeks

Conclusions Describe a new column-oriented binary storage format in MapReduce. Introduce skip list layout. Describe the implementation of lazy record construction. Show that lightweight dictionary compression for complex columns can be beneficial. 38

39