Application-driven Energy-efficient Architecture Explorations for Big Data Authors: Xiaoyan Gu Rui Hou Ke Zhang Lixin Zhang Weiping Wang (Institute of.

Slides:

Advertisements

Similar presentations

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.

Advertisements

LIBRA: Lightweight Data Skew Mitigation in MapReduce

Scalable Multi-Cache Simulation Using GPUs Michael Moeng Sangyeun Cho Rami Melhem University of Pittsburgh.

Daniel Schall, Volker Höfner, Prof. Dr. Theo Härder TU Kaiserslautern.

FAWN: Fast Array of Wimpy Nodes Developed By D. G. Andersen, J. Franklin, M. Kaminsky, A. Phanishayee, L. Tan, V. Vasudevan Presented by Peter O. Oliha.

FAWN: Fast Array of Wimpy Nodes A technical paper presentation in fulfillment of the requirements of CIS 570 – Advanced Computer Systems – Fall 2013 Scott.

A Scalable, Predictable Join Operator for Highly Concurrent Data Warehouses George Candea (EPFL & Aster Data) Neoklis Polyzotis (UC Santa Cruz) Radek Vingralek.

Spark: Cluster Computing with Working Sets

CSC457 Seminar YongKang Zhu December 6 th, 2001 About Network Processor.

HadoopDB An Architectural Hybrid of Map Reduce and DBMS Technologies for Analytical Workloads Presented By: Wen Zhang and Shawn Holbrook.

Analysis of Database Workloads on Modern Processors Advisor: Prof. Shan Wang P.h.D student: Dawei Liu Key Laboratory of Data Engineering and Knowledge.

Thin Servers with Smart Pipes: Designing SoC Accelerators for Memcached Bohua Kou Jing gao.

Energy-efficient Cluster Computing with FAWN: Workloads and Implications Vijay Vasudevan, David Andersen, Michael Kaminsky*, Lawrence Tan, Jason Franklin,

Authors: David G. Andersen et al. Offense: Chang Seok Bae Yi Yang.

FAWN: A Fast Array of Wimpy Nodes Authors: David G. Andersen et al. Offence: Jaime Espinosa Chunjing Xiao.

FAWN: A Fast Array of Wimpy Nodes Presented by: Aditi Bose & Hyma Chilukuri.

FAWN: A Fast Array of Wimpy Nodes Presented by: Clint Sbisa & Irene Haque.

Computer Science Storage Systems and Sensor Storage Research Overview.

Presenter MaxAcademy Lecture Series – V1.0, September 2011 Introduction and Motivation.

Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.

Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc

Storage: Scaling Out > Scaling Up? Ankit Singla Chi-Yao Hong.

A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Emalayan Vairavanathan

CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.

Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.

Introduction to Hadoop and HDFS

Contents HADOOP INTRODUCTION AND CONCEPTUAL OVERVIEW TERMINOLOGY QUICK TOUR OF CLOUDERA MANAGER.

Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.

Sogang University Advanced Computing System Chap 1. Computer Architecture Hyuk-Jun Lee, PhD Dept. of Computer Science and Engineering Sogang University.

Advanced Computer Architecture, CSE 520 Generating FPGA-Accelerated DFT Libraries Chi-Li Yu Nov. 13, 2007.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,

Frank Casilio Computer Engineering May 15, 1997 Multithreaded Processors.

Web Search Using Mobile Cores Presented by: Luwa Matthews 0.

StreamX10: A Stream Programming Framework on X10 Haitao Wei School of Computer Science at Huazhong University of Sci&Tech.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.

Amar Phanishayee,LawrenceTan,Vijay Vasudevan

Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.

CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.

Performance and Energy Efficiency Evaluation of Big Data Systems Presented by Yingjie Shi Institute of Computing Technology, CAS

Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies

1 Lecture 20: Big Data, Memristors Today: architectures for big data, memristors.

1 Efficient Mixed-Platform Clouds Phillip B. Gibbons, Intel Labs Michael Kaminsky, Michael Kozuch, Padmanabhan Pillai (Intel Labs) Gregory Ganger, David.

Scalable data access with Impala Zbigniew Baranowski Maciej Grzybek Daniel Lanza Garcia Kacper Surdy.

ETRI Site Introduction Han Namgoong,

Multimedia Retrieval Architecture Electrical Communication Engineering, Indian Institute of Science, Bangalore – , India Multimedia Retrieval Architecture.

Institute of Software,Chinese Academy of Sciences An Insightful and Quantitative Performance Optimization Chain for GPUs Jia Haipeng.

INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

Accelerating K-Means Clustering with Parallel Implementations and GPU Computing Janki Bhimani Miriam Leeser Ningfang Mi

Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,

A Study of Data Partitioning on OpenCL-based FPGAs Zeke Wang (NTU Singapore), Bingsheng He (NTU Singapore), Wei Zhang (HKUST) 1.

NFV Compute Acceleration APIs and Evaluation

Introduction to Distributed Platforms

Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.

Extreme Big Data Examples

Parallel Programming By J. H. Wang May 2, 2017.

BitWarp Energy Efficient Analytic Data Processing on Next Generation General Purpose GPUs Jason Power || Yinan Li || Mark D. Hill || Jignesh M. Patel.

SpatialHadoop: A MapReduce Framework for Spatial Data

Accelerating MapReduce on a Coupled CPU-GPU Architecture

Be Fast, Cheap and in Control

Big Data - in Performance Engineering

Toward a Unified HPC and Big Data Runtime

Clouds & Containers: Case Studies for Big Data

Discretized Streams: A Fault-Tolerant Model for Scalable Stream Processing Zaharia, et al (2012)

Chapter 1 Introduction.

Presentation transcript:

Application-driven Energy-efficient Architecture Explorations for Big Data Authors: Xiaoyan Gu Rui Hou Ke Zhang Lixin Zhang Weiping Wang (Institute of Computing Technology, Chinese Academy of Sciences) Reviewed by- Siddharth Bhave (University of Washington, Tacoma)

Big Data  What is Big Data?  Problems with Big data  Energy Consumption  Velocity (Operation latency and throughput)  Volume (storing capacity)  Variety  Managing Big Data Problems  Storage Technologies  Partitioning  Multithreading  Parallel Processing  Efficient Architecture  Hadoop, Map Reduce, MAHOUT  Find bottle neck

Introduction  Big data management at architecture level  Two architecture systems  Xeon-based cluster  Atom Based (micro-server) Cluster  Comparison Based on: -  Energy consumption  Execution time

Motivation  Ever increasing data.  Energy and Time tradeoff in Xeon and Atom based clusters.  Bottleneck by the processes of compression/decompression  Stateless data processing

Mastiff  Mastiff - Targeted application for performance analysis  Big data processing engine  Columnar store policy Compressio n Ratio on 3 GB data Compressio n Ratio on 100 GB data Compressio n Ratio on 500 GB data Mastiff Hadoop HDFS

Working flow of the Mastiff

Methodology  TPC-H test benchmark of queries and concurrent data  1 TB of verification data  2 cases - data load and data query  Fluke NORMA 4000  Average cases and median results are reported

Power and Performance Evaluation Time on Atom Cluster (30 nodes) Time on Xeon Cluster (30 nodes) Time on Xeon Cluster (15 nodes) Data Load3.435 hours1.543 hours3.242 hours Data Query5.877 hours2.724 hours5.564 hours  Take 3 cases for time and energy consumption  31 nodes – Atom Cluster (1 master node)  31 nodes – Xeon Cluster (1 master node)  16 nodes – Xeon Cluster (1 master node)

Energy consumption between 30-node Atom Cluster and 30-node Xeon Cluster Power and Performance Evaluation (cont’d)

Energy consumption between 30-node Atom Cluster and 15-node Xeon Cluster Power and Performance Evaluation (cont’d)

Time Breakdown in Map Phase Power and Performance Evaluation (cont’d)

Time Breakdown in Reduce phase Power and Performance Evaluation (cont’d)

Findings  Atom platform more power efficient  Data compression and decompression occupies significant percentage.  Compression and decompression can be done in software pipeline fashion i.e. with multiple interleave

Propositions  Heterogeneous architecture  Accelerators to perform data compression/decompression  Multiple interleaved compression/decompression

Off-chip and On-chip Accelerators

Multiple Interleaved Tasks

Strengths  A much needed innovative concept  Organized well  Detailed description of energy and time investigation  Already implemented propositions

Weaknesses  Not enough power meters to monitor all nodes  2 assumptions  Power of every network router is evenly counted towards nodes  Energy consumption of each node is similar  Results are generalized by Hadoop even if they might not be true for every application.  Vague propsitions implementation

FAWN: A Fast Array of Wimpy Nodes Authors: David G. Andersen Jason Franklin Michael Kaminsky Amar Phanishayee Lawrence Tan Vijay Vasudevan (Carnegie Mellon University)

 High performance, energy efficient system for storage  Large number of small low-performance (hence wimpy) nodes with moderate amounts of local storage  2 parts: FAWN-DS (data store) and FAWN-KV (key value)  Motivation  Traditional architecture consumes too much power  I/O bottleneck due to current storage inabilities Introduction

Features  Pairs of low powered embedded nodes with flash storage  FAWN-DS is the backend that consists of the large number of nodes  Each node has some RAM and flash  FAWN-KV is a consistent, replicated, highly available and high performance key value storage system

FAWN Architecture

Efficient Data Streaming with On-chip Accelerators: Opportunities and Chanllenges Authors: Rui Hou Lixin Zhang Michael C. Huang Kun Wang Hubertus Franke Yi Ge Xiaotao Chang (University of Rochester)

Motivation  Transistor density increasing day by day  Many cores are integrated in a single die  Advantage of on-chip accelerator instead of using it as PCI

On-Chip Accelerator Architecture

 3 types of accelerators  Crypto accelerators  Decompression accelerators  Network offload accelerator  Some common characteristics of data stream in the 3 accelerators  Optimize the power and performance of the accelerators. Features

Thank You