Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research.

Slides:



Advertisements
Similar presentations
Chapter 13: Query Processing
Advertisements

1 Senn, Information Technology, 3 rd Edition © 2004 Pearson Prentice Hall James A. Senns Information Technology, 3 rd Edition Chapter 7 Enterprise Databases.
AP STUDY SESSION 2.
1
1 Vorlesung Informatik 2 Algorithmen und Datenstrukturen (Parallel Algorithms) Robin Pomplun.
Copyright © 2003 Pearson Education, Inc. Slide 1 Computer Systems Organization & Architecture Chapters 8-12 John D. Carpinelli.
Processes and Operating Systems
Copyright © 2011, Elsevier Inc. All rights reserved. Chapter 6 Author: Julia Richards and R. Scott Hawley.
Author: Julia Richards and R. Scott Hawley
1 Copyright © 2013 Elsevier Inc. All rights reserved. Appendix 01.
1 Copyright © 2013 Elsevier Inc. All rights reserved. Chapter 3 CPUs.
Properties Use, share, or modify this drill on mathematic properties. There is too much material for a single class, so you’ll have to select for your.
UNITED NATIONS Shipment Details Report – January 2006.
1 RA I Sub-Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Casablanca, Morocco, 20 – 22 December 2005 Status of observing programmes in RA I.
Exit a Customer Chapter 8. Exit a Customer 8-2 Objectives Perform exit summary process consisting of the following steps: Review service records Close.
FACTORING ax2 + bx + c Think “unfoil” Work down, Show all steps.
Year 6 mental test 10 second questions
Chapter 6 File Systems 6.1 Files 6.2 Directories
1 Chapter 12 File Management Patricia Roy Manatee Community College, Venice, FL ©2008, Prentice Hall Operating Systems: Internals and Design Principles,
The 5S numbers game..
REVIEW: Arthropod ID. 1. Name the subphylum. 2. Name the subphylum. 3. Name the order.
Break Time Remaining 10:00.
Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research.
Table 12.1: Cash Flows to a Cash and Carry Trading Strategy.
Database Performance Tuning and Query Optimization
PP Test Review Sections 6-1 to 6-6
Chapter 1 Object Oriented Programming 1. OOP revolves around the concept of an objects. Objects are created using the class definition. Programming techniques.
Hash Tables.
Microsoft Confidential. We look at the world... with our own eyes...
Yong Choi School of Business CSU, Bakersfield
Chapter 10: Virtual Memory
Bellwork Do the following problem on a ½ sheet of paper and turn in.
Exarte Bezoek aan de Mediacampus Bachelor in de grafische en digitale media April 2014.
Making Time-stepped Applications Tick in the Cloud Tao Zou, Guozhang Wang, Marcos Vaz Salles*, David Bindel, Alan Demers, Johannes Gehrke, Walker White.
Name Convolutional codes Tomashevich Victor. Name- 2 - Introduction Convolutional codes map information to code bits sequentially by convolving a sequence.
Chapter 6 File Systems 6.1 Files 6.2 Directories
1 public class Newton { public static double sqrt(double c) { double epsilon = 1E-15; if (c < 0) return Double.NaN; double t = c; while (Math.abs(t - c/t)
Copyright © 2012, Elsevier Inc. All rights Reserved. 1 Chapter 7 Modeling Structure with Blocks.
1 RA III - Regional Training Seminar on CLIMAT&CLIMAT TEMP Reporting Buenos Aires, Argentina, 25 – 27 October 2006 Status of observing programmes in RA.
Factor P 16 8(8-5ab) 4(d² + 4) 3rs(2r – s) 15cd(1 + 2cd) 8(4a² + 3b²)
Basel-ICU-Journal Challenge18/20/ Basel-ICU-Journal Challenge8/20/2014.
1..
CONTROL VISION Set-up. Step 1 Step 2 Step 3 Step 5 Step 4.
© 2012 National Heart Foundation of Australia. Slide 2.
Adding Up In Chunks.
Copyright © 2013 by John Wiley & Sons. All rights reserved. HOW TO CREATE LINKED LISTS FROM SCRATCH CHAPTER Slides by Rick Giles 16 Only Linked List Part.
MaK_Full ahead loaded 1 Alarm Page Directory (F11)
Artificial Intelligence
Model and Relationships 6 M 1 M M M M M M M M M M M M M M M M
Subtraction: Adding UP
1 hi at no doifpi me be go we of at be do go hi if me no of pi we Inorder Traversal Inorder traversal. n Visit the left subtree. n Visit the node. n Visit.
Analyzing Genes and Genomes
Lilian Blot CORE ELEMENTS SELECTION & FUNCTIONS Lecture 3 Autumn 2014 TPOP 1.
©Brooks/Cole, 2001 Chapter 12 Derived Types-- Enumerated, Structure and Union.
Essential Cell Biology
Clock will move after 1 minute
Intracellular Compartments and Transport
PSSA Preparation.
Essential Cell Biology
 2003 Prentice Hall, Inc. All rights reserved. 1 Chapter 13 - Exception Handling Outline 13.1 Introduction 13.2 Exception-Handling Overview 13.3 Other.
1 Chapter 13 Nuclear Magnetic Resonance Spectroscopy.
Energy Generation in Mitochondria and Chlorplasts
Select a time to count down from the clock above
Clydesdale: Structured Data Processing on MapReduce Jackie.
Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research.
Most slides & Paper by: Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While.
Presentation transcript:

Avrilia Floratou (University of Wisconsin – Madison) Jignesh M. Patel (University of Wisconsin – Madison) Eugene J. Shekita (While at IBM Almaden Research Center) Sandeep Tata (IBM Almaden Research Center) Presented by: Luyang Zhang && Yuguan Li Column-Oriented Storage Techniques for MapReduce 1

Motivation DatabasesMapReduce Column – Oriented Storage Performance Programmability Fault tolerance 2

Column-Oriented Storage 3 Benefits: Column-Oriented organizations are more efficient when an aggregate needs to be computed over many rows but only for a notably smaller subset of all columns of data. Column-Oriented organizations are more efficient when new values of a column are supplied for all rows at once. Column data is of uniform type, which provides some opportunity for storage size optimization. (e.g. Compression)

Questions 4 How to incorporate columnar–storage into an existing MR system (Hadoop) without changing its core parts? How can columnar-storage operate efficiently on top of a DFS (HDFS)? Is it easy to apply well-studied techniques from the database field to the Map-Reduce framework given that: It processes one tuple at a time. It does not use a restricted set of operators. It is used to process complex data types.

Challenges 5 In Hadoop, it is often convenient to use complex types like arrays, maps, and nested records to model data. --- which leads to a high deserialization cost and lack of effective column-oriented compression techniques. Serialization: data structure in memory bytes that can be transmitted Deserialization: bytes data structure in memory (Since hadoop is written in Java, more complex than C++ )

Challenges 6 Compression: Although the column data seems to be more similar and share a high compression ratio, the complex type makes some existed technologies cannot be applied to Hadoop. Programming API: Some technologies are not feasible for hand-coded mapreduce function.

Outline Column-Oriented Storage Lazy Tuple Construction Compression Experimental Evaluation Conclusions 7

Column-Oriented Storage in Hadoop 8 Main Idea: Store each column of the dataset in a separate file Problems: How can we generate roughly equal sized splits so that a job can be effectively parallelized over the cluster? How do we make sure that the corresponding values from different columns in the dataset are co-located on the same node running the map task?

Column-Oriented Storage in Hadoop NameAgeInfo Joe23hobbies: {tennis} friends: {Ann, Nick} David32friends: {George} John45hobbies: {tennis, golf} Smith65hobbies: {swimming} friends: {Helen} 1 st node 2 nd node Horizontally Partitioning Into split-directories NameAgeInfo Joe23hobbies: {tennis} friends: {Ann, Nick} David32friends: {George} NameAgeInfo John45hobbies:{tennis, golf} Smith65hobbies: {swimming} friends: {Helen} Name Joe David Age Info hobbies: {tennis} friends:{Ann, Nick} friends: {George} Name John Smith Age Info hobbies: {tennis, golf} hobbies: {swimming} friends: {Helen} Introduce new InputFormat/OutputFormat : ColumnInputFormat (CIF) ColumnOutputFormat (COF) 9 /data/ / /data/ /s1 /data/ /s2

ColumnInputFormat V.S RCFile Format 10 RCFile Format: Avoid Replication and Co-location problem Using Pax instead of a true column-oriented format, all columns will be packed in a single row-group as a split. Efficient I/O elimination become difficult. Metadata need additional space overhead. CIF: Need to tackle Replication and Co-location Efficient I/O elimination Consider adding a column to a dataset.

Replication and Co-location HDFS Replication Policy Node ANode BNode CNode D NameAgeInfo Joe23hobbies: {tennis} friends: {Ann, Nick} David32friends: {George} John45hobbies: {tennis, golf} Smith65hobbies: {swimming} friends: {Helen} Name Joe David Age Info hobbies: {tennis} friends:{Ann, Nick} friends: {George} Name Joe David Name Joe David Age Age Info hobbies: {tennis} friends: {Ann,Nick} friends: {George} Info hobbies: {tennis} friends:{Ann, Nick} friends: {George} CPP Introduce a new column placement policy (CPP) Can be assigned to dfs.block.replicator.classname 11

Example AgeName Record if (age < 35) return name Joe David John Mary Ann Map Method 23Joe 32David What if age > 35? Can we avoid reading and deserializing the name field? 12 ColumnInputForm at.setColumns(job,Age,Name)

Outline Column-Oriented Storage Lazy Tuple Construction Compression Experiments Conclusions 13

Lazy Tuple Construction Deserialization of each record field is deferred to the point where it is actually accessed, i.e. when the get() methods are called. *:Deserialize only those columns that are actually accessed in map function Mapper ( NullWritable key, Record value) { String name; int age = value.get(age); if (age < 35) name = value.get(name); } Mapper ( NullWritable key, LazyRecord value) { String name; int age = value.get(age); if (age < 35) name = value.get(name); } 14

LazyRecord implements Record 15 lastPos =curPos name Why do we need these: Without lastPos pointer, each nextRecord call would require all the columns to be deserialized to extract the length information to update their respective curPos pointer. age lastPos curPos skip

Skip List (Logical Behavior) R1R2R10R20R99 R R90... R1 R20R90R R10 Skip 100 Records Skip R1R2R10R20R90R99 R1R10R20R90 R1R100

Example Age Joe Jane David Name Skip10 = 1002 Skip100 = 9017 Skip 10 = 868 … … Mary 10 rows 100 rows Skip Bytes Ann … if (age < 35) return name … 17 John

Example Age hobbies: tennis friends : Ann, Nick Null friends : George Info Skip10 = 2013 Skip100 = Skip 10 = 1246 … hobbies: tennis, golf 10 rows 100 rows … … if (age < 35) return hobbies … … 18

Outline Column-Oriented Storage Lazy Record Construction Compression Experiments Conclusions 19

Compression # Records in B1 # Records in B2 LZO/ZLIB compressed block RID : LZO/ZLIB compressed block RID : B1 B2 Null Skip10 = 210 Skip100 = 1709 Skip 10 = 304 … … 0: {tennis, golf} 10 rows 100 rows … Dictionary hobbies : 0 friends : 1 Compressed Blocks Dictionary Compressed Skip Lists Skip Bytes Decompress 0 : {tennis} 1 : {Ann, Nick} 1: {George} 20

Outline Column-Oriented Storage Lazy Record Construction Compression Experiments Conclusions 21

RCFile Metadata Joe, David John, Smith 23, 32 {hobbies: {tennis} friends: {Ann, Nick}}, {friends:{George}} {hobbies: {tennis, golf}}, {hobbies: {swimming} friends: {Helen}} Row Group 1 Row Group 2 NameAgeInfo Joe23hobbies: {tennis} friends: {Ann, Nick} David32friends: {George} John45hobbies: {tennis, golf} Smith65hobbies: {swimming} friends: {Helen} 45, 65 22

Experimental Setup 42 node cluster Each node: 2 quad-core 2.4GHz sockets 32 GB main memory four 500GB HDD Network : 1Gbit ethernet switch 23

Overhead of Columnar Storage Synthetic Dataset 57GB 13 columns 6 Integers, 6 Strings, 1 Map Query Select * 24 Single node experiment

Benefits of Column-Oriented Storage Query Projection of different columns 25 Single node experiment

Workload URLInfo { String url String srcUrl time fetchTime String inlink[] Map metadata Map annotations byte[] content } If( url contains ibm.com/jp ) find all the distinct encodings reported by the page Schema Query Dataset : 6.4 TB Query Selectivity : 6% 26

27 SEQ: 754 sec Comparison of Column-Layouts (Map phase)

Comparison of Column-Layouts (Map phase)

Comparison of Column – Layouts (Total job) 29 SEQ: 806 sec

Conclusions Describe a new column-oriented binary storage format in MapReduce. Introduce skip list layout. Describe the implementation of lazy record construction. Show that lightweight dictionary compression for complex columns can be beneficial. 30

Comparison of Sequence Files 31

RCFile 32

Comparison of Column-Layouts LayoutData Read (GB) Map Time (sec) Map Time Ratio Total Time (sec) Total Time Ratio Seq - uncomp Seq - record Seq - block Seq - custom x8061.0x RCFile x7611.1x RCFile - comp x2912.8x CIF - ZLIB x7710.4x CIF x7810.3x CIF - LZO x7910.2x CIF - SL x7011.5x CIF -DCSL x6312.8x 33

Comparison of Column-Layouts 34 SEQ: 754 sec CIF – DCSL results in the highest map time speedup and improves the total job time by more than an order of magnitude (12.8X).

RCFile 35 SEQ: 754 sec

36 Comparison of Sequence Files SEQ: 754 sec