Active Disks: Programming Model, Algorithm and Evaluation Anurag Acharya, Mustafa Uysal, Joel Saltz.

Slides:



Advertisements
Similar presentations
A Workflow Engine with Multi-Level Parallelism Supports Qifeng Huang and Yan Huang School of Computer Science Cardiff University
Advertisements

INTRODUCTION TO SIMULATION WITH OMNET++ José Daniel García Sánchez ARCOS Group – University Carlos III of Madrid.
Department of Computer Science and Engineering University of Washington Brian N. Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Chapter 101 Virtual Memory Chapter 10 Sections and plus (Skip:10.3.2, 10.7, rest of 10.8)
DIRECT MEMORY ACCESS CS 147 Thursday July 5,2001 SEEMA RAI.
11/13/01CS-550 Presentation - Overview of Microsoft disk operating system. 1 An Overview of Microsoft Disk Operating System.
Extensibility, Safety and Performance in the SPIN Operating System Brian Bershad, Stefan Savage, Przemyslaw Pardyak, Emin Gun Sirer, Marc E. Fiuczynski,
Query Processing (overview)
Memory Management 2010.
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
Chapter 9 Virtual Memory Produced by Lemlem Kebede Monday, July 16, 2001.
Figure 1.1 Interaction between applications and the operating system.
Modified from Silberschatz, Galvin and Gagne ©2009 CS 446/646 Principles of Operating Systems Lecture 1 Chapter 1: Introduction.
Lecture 1: Introduction CS170 Spring 2015 Chapter 1, the text book. T. Yang.
HW/SW Co-Synthesis of Dynamically Reconfigurable Embedded Systems HW/SW Partitioning and Scheduling Algorithms.
1/16/2008CSCI 315 Operating Systems Design1 Introduction Notice: The slides for this lecture have been largely based on those accompanying the textbook.
Chapter 8: Main Memory.
Microsoft ® SQL Server ® 2008 and SQL Server 2008 R2 Infrastructure Planning and Design Published: February 2009 Updated: January 2012.
Database Systems: Design, Implementation, and Management Eighth Edition Chapter 10 Database Performance Tuning and Query Optimization.
Database System Concepts, 5th Ed. ©Silberschatz, Korth and Sudarshan See for conditions on re-usewww.db-book.com Chapter 13: Query Processing.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
Chapter 1. Introduction What is an Operating System? Mainframe Systems
Disklets –Take streams as inputs, generate streams as outputs –Streams accessed using interface that delivers data in buffers with known size –Cannot allocate.
CGS 3763 Operating Systems Concepts Spring 2013 Dan C. Marinescu Office: HEC 304 Office hours: M-Wd 11: :30 AM.
©Silberschatz, Korth and Sudarshan13.1Database System Concepts Chapter 13: Query Processing Overview Measures of Query Cost Selection Operation Sorting.
Chapter 1: Introduction. 1.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 1: Introduction What Operating Systems Do Computer-System.
Computer Emergency Notification System (CENS)
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Computer Organization & Assembly Language © by DR. M. Amer.
EXTENSIBILITY, SAFETY AND PERFORMANCE IN THE SPIN OPERATING SYSTEM
Main Memory. Chapter 8: Memory Management Background Swapping Contiguous Memory Allocation Paging Structure of the Page Table Segmentation Example: The.
File Systems cs550 Operating Systems David Monismith.
Full and Para Virtualization
Lecture 14- Parallel Databases Advanced Databases Masood Niazi Torshiz Islamic Azad University- Mashhad Branch
CS 440 Database Management Systems Lecture 5: Query Processing 1.
CS 540 Database Management Systems
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
CS4315A. Berrached:CMS:UHD1 Introduction to Operating Systems Chapter 1.
Done by: Chelsea Bryan Friday, October 10,2014.   The BIOS (aka) Basic input/output system, is a built in software that determines what's a computer.
MINIX Presented by: Clinton Morse, Joseph Paetz, Theresa Sullivan, and Angela Volk.
Virtual Memory By CS147 Maheshpriya Venkata. Agenda Review Cache Memory Virtual Memory Paging Segmentation Configuration Of Virtual Memory Cache Memory.
Silberschatz, Galvin and Gagne ©2009 Operating System Concepts – 8 th Edition, Chapter 8: Main Memory.
Chapter 8: Memory Management. 8.2 Silberschatz, Galvin and Gagne ©2005 Operating System Concepts Chapter 8: Memory Management Background Swapping Contiguous.
1 بسم الله الرحمن الرحيم MEMORY AND I/O. Introduction to 8086 Microprocessor 8086 Pin Configuration Pin Configuration 8086 Architecture & Modes 2.
Introduction to Computers - Hardware
Memory Management.
CS 540 Database Management Systems
CS 440 Database Management Systems
Parallel Databases.
Parallel Data Laboratory, Carnegie Mellon University
Chapter 12: Query Processing
Chapter 1: Introduction
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy 11/12/2018.
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 8 11/24/2018.
Multistep Processing of a User Program
Chapter 2: The Linux System Part 1
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 12/1/2018.
So far… Text RO …. printf() RW link printf Linking, loading
Main Memory Background Swapping Contiguous Allocation Paging
Chapter 8: Memory management
Outline Module 1 and 2 dealt with processes, scheduling and synchronization Next two modules will deal with memory and storage Processes require data to.
File-System Structure
Chapter 8: Memory Management strategies
Java Programming Introduction
Memory Management Lectures notes from the text supplement by Siberschatz and Galvin Modified by B.Ramamurthy Chapter 9 4/5/2019.
Introduction to Operating Systems
6- General Purpose GPU Programming
Presentation transcript:

Active Disks: Programming Model, Algorithm and Evaluation Anurag Acharya, Mustafa Uysal, Joel Saltz

Introduction Active Disk architectures integrate significant processing power and memory into a disk drive and allow application-specific code to be downloaded and executed on the data that is being read from(written to) disk. Motivation of this architecture: Offload bulk of the processing to the disk resident processors and to use the host processor primarily for coordination, scheduling and combination of results from individual disks.

Motivation “While processors are doubling performance every 18 months, customers are doubling data storage every 5 months.” –Greg Papadopolous These trends have two implications: 1.Large data warehouses will always have a large number of disks 2.Architectures that do not scale the processing power as the dataset grows may not be able to keep up with the processing requirement.

Motivation (cont.) The disk transfer rate has been increasing rapidly. And the power of cheap processors and the size of cheap memory is increasing rapidly. These trends has two implications: 1.Given the improvements in data transfer rates, even a state-of-the-art processor can keep only a small number of drives busy. 2. Given the price drop of processors and memory chips, it’s becoming economically feasible to place substantial computational capability on individual disks.

Active disk architecure An application is partitioned between a host-resident component and a disk-resident component. Stream-based programming model for disklets(disk- resident code). A disklet cannot allocate (or free) memory. A disklet is sandboxed within the buffers corresponding to each of its input streams, which are allocated and freed by the operating system. A disklet is also not allowed to initiate I/O operations on its own.

Programming Model Stream-based programming model for disklets and their interaction with host-resident peers. Three types of streams: Disk-resident streams which are files or ranges in files Host-resident streams which are used by host-resident code to interact with disklets Pipe streams which are used to pipe results of one disklet into another. Communication between a disklet and its environment is restricted to its input and output streams. The source and sinks for these streams are specified by the host- resident program as a part of the installation of the disklet.

Operating system support Active disks require operating system support both at the host and on the disk. Design of the OS layer at the disk(DiskOS) has conflicting requirements. – We would like the DiskOS to be as thin as possible so that the disklets can make full use of the limited resources – We would like to move as much as possible of the common functionality into the DiskOS so that disklets can be small and easy to analyze.

Operating system support (cont.) The DiskOS provides three services – memory management, stream communication and disklet scheduling. The stream-based model simplifies memory management as all memory is allocated in contiguous blocks whose size is known a priori and the lifetime of all blocks is known. It also simplifies the communication support required as all stream buffers are allocated and managed by the DiskOS. A disklet is ready to run whenever there’s new data available on one or more of its input streams.

Operating system support (cont.) Host-level OS Support: Limited new host-level OS function is needed – support for installation of disklets and management of host- resident streams. Disklet installation requires analysis of disklet code to ensure memory safety, linking against the DiskOS environment and downloading the code to the disk. The primary difference between the semantics of streams currently used in operating systems and the semantics proposed for Active Disk Streams is that the latter deliver data in a quantized manner. These buffers are allocated by the operating system and are freed implicitly.

Algorithms Six algorithms from three application domains-relational database processing, image processing and satellite data processing. Conventional-disk and active disk version are compared. SQL SELECT SQL GROUP BY EXTERNAL SORT Datacube Image convolution Generating composite satellite images

Algorithms (cont.) SQL SELECT: The active disk algorithm applies the SELECT predicate at the disk and forwards only the successful tuples to the host. SQL GROUP BY: The disklet performs local group-bys as long as the number of aggregates being computed fits in the disk- memory. When it runs out of space, it ships the partial results to the host and reinitializes the disk-memory. The host accumulates the partial results forwarded by all disklets.

Evaluation For the core set of experiments, we use two configurations –one corresponding to systems that can be purchased today and the other corresponding to systems that are likely to be available by the end of the decade. To explore the scalability of the two architectures, we varied the number of disks in each configuration from 4 to 32. A simulator is developed to simulate both conventional- disk and active-disk architectures.

Performanceof active disks Our result indicate that active disks achieve performance improvements between 1.07 times and 3.15 times for 4-disk configurations with Today’s components and between 1.18 times and 3.2 times for Future components.

Performanceof active disks (Cont.) We note that increasing the number of disks beyond four for conventional-disk architecture provides little or no advantage. On the other hand, active-disk architectures scale perfectly up to 16 disks.

Conclusion Active disk architectures integrate significant processing power and memory into a disk drive and allow application-specific code to be downloaded and executed on the data that is being read from (written to) disk. Application is partitioned between a host-resident component and a disk-resident component. It uses stream-based programming model. And the results indicate that active-disk architecture scale well with the number of disks.

Questions 1.What’s the motivation of active disk architecture? 2.What’s the programming model of active disk ?