Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER.

Slides:



Advertisements
Similar presentations
Database System Concepts and Architecture
Advertisements

Operating System.
A PLFS Plugin for HDF5 for Improved I/O Performance and Analysis Kshitij Mehta 1, John Bent 2, Aaron Torres 3, Gary Grider 3, Edgar Gabriel 1 1 University.
Intel® Manager for Lustre* Lustre Installation & Configuration
ParaMEDIC: Parallel Metadata Environment for Distributed I/O and Computing P. Balaji, Argonne National Laboratory W. Feng and J. Archuleta, Virginia Tech.
The Zebra Striped Network Filesystem. Approach Increase throughput, reliability by striping file data across multiple servers Data from each client is.
The Zebra Striped Network File System Presentation by Joseph Thompson.
Ceph: A Scalable, High-Performance Distributed File System
Ceph: A Scalable, High-Performance Distributed File System Priya Bhat, Yonggang Liu, Jing Qin.
A CHAT CLIENT-SERVER MODULE IN JAVA BY MAHTAB M HUSSAIN MAYANK MOHAN ISE 582 FALL 2003 PROJECT.
Accurate and Efficient Replaying of File System Traces Nikolai Joukov, TimothyWong, and Erez Zadok Stony Brook University (FAST 2005) USENIX Conference.
Introduction to Systems Architecture Kieran Mathieson.
I/O Hardware n Incredible variety of I/O devices n Common concepts: – Port – connection point to the computer – Bus (daisy chain or shared direct access)
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
Common System Components
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
OPERATING SYSTEMS Introduction
Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) Monday, Aug. 19, 2002.
Copyright Arshi Khan1 System Programming Instructor Arshi Khan.
RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.
Object-based Storage Long Liu Outline Why do we need object based storage? What is object based storage? How to take advantage of it? What's.
Networked File System CS Introduction to Operating Systems.
1 Chapter Client-Server Interaction. 2 Functionality  Transport layer and layers below  Basic communication  Reliability  Application layer.
1 A Look at PVFS, a Parallel File System for Linux Talk originally given by Will Arensman and Anila Pillai.
Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.
Operating System. Architecture of Computer System Hardware Operating System (OS) Programming Language (e.g. PASCAL) Application Programs (e.g. WORD, EXCEL)
Disk Access. DISK STRUCTURE Sector: Smallest unit of data transfer from/to disk; 512B 2/4/8 adjacent sectors transferred together: Blocks Read/write heads.
Optimizing Performance of HPC Storage Systems
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Active Storage and Its Applications Jarek Nieplocha, Juan Piernas-Canovas Pacific Northwest National Laboratory 2007 Scientific Data Management All Hands.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Building a Parallel File System Simulator E Molina-Estolano, C Maltzahn, etc. UCSC Lab, UC Santa Cruz. Published in Journal of Physics, 2009.
Opportunities in Parallel I/O for Scientific Data Management Rajeev Thakur and Rob Ross Mathematics and Computer Science Division Argonne National Laboratory.
Serverless Network File Systems Overview by Joseph Thompson.
CCGrid 2014 Improving I/O Throughput of Scientific Applications using Transparent Parallel Compression Tekin Bicer, Jian Yin and Gagan Agrawal Ohio State.
Ceph: A Scalable, High-Performance Distributed File System
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Silberschatz, Galvin and Gagne  Operating System Concepts Chapter 13: I/O Systems I/O Hardware Application I/O Interface Kernel I/O Subsystem.
Ultimate Integration Joseph Lappa Pittsburgh Supercomputing Center ESCC/Internet2 Joint Techs Workshop.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
Lecture 4 Mechanisms & Kernel for NOSs. Mechanisms for Network Operating Systems  Network operating systems provide three basic mechanisms that support.
Accelerating High Performance Cluster Computing Through the Reduction of File System Latency David Fellinger Chief Scientist, DDN Storage ©2015 Dartadirect.
BIT 3193 MULTIMEDIA DATABASE CHAPTER 5 : MULTIMEDIA DATABASE MANAGEMENT SYSTEM ARCHITECTURE.
Introduction Contain two or more CPU share common memory and peripherals. Provide greater system throughput. Multiple processor executing simultaneous.
Silberschatz, Galvin, and Gagne  Applied Operating System Concepts Module 12: I/O Systems I/O hardwared Application I/O Interface Kernel I/O.
Parallel IO for Cluster Computing Tran, Van Hoai.
Silberschatz, Galvin and Gagne ©2011 Operating System Concepts Essentials – 8 th Edition Chapter 2: The Linux System Part 5.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Computer Science Lecture 19, page 1 CS677: Distributed OS Last Class: Fault tolerance Reliable communication –One-one communication –One-many communication.
1 Chapter 2: Operating-System Structures Services Interface provided to users & programmers –System calls (programmer access) –User level access to system.
DOSAS: Mitigating the Resource Contention in Active Storage Systems Chao Chen 1, Yong Chen 1 and Philip C. Roth 2 1 Texas Tech University 2 Oak Ridge National.
CIT 140: Introduction to ITSlide #1 CSC 140: Introduction to IT Operating Systems.
Compute and Storage For the Farm at Jlab
Parallel Virtual File System (PVFS) a.k.a. OrangeFS
Module 12: I/O Systems I/O hardware Application I/O Interface
Operating System.
Andy Wang COP 5611 Advanced Operating Systems
Chapter 1: Introduction
Parallel Data Laboratory, Carnegie Mellon University
Introduction to HDFS: Hadoop Distributed File System
Chapter 2: System Structures
Operating System Concepts
CS703 - Advanced Operating Systems
Language Processors Application Domain – ideas concerning the behavior of a software. Execution Domain – Ideas implemented in Computer System. Semantic.
Chapter 2: The Linux System Part 5
Chapter 13: I/O Systems I/O Hardware Application I/O Interface
Chapter 15: File System Internals
Module 12: I/O Systems I/O hardwared Application I/O Interface
Presentation transcript:

Active Storage Processing in Parallel File Systems Jarek Nieplocha Evan Felix Juan Piernas-Canovas SDM CENTER

2 Active Storage in Parallel Filesystems Active Storage exploits the old concept of moving computing to the data source Avoids data movement across the network in parallel machine by allowing applications use compute resources on the I/O nodes of the cluster for data processing P P P P Network FS compute nodes I/O nodes Y=foo(X) x Y P P P P Network FS compute nodes I/O nodes Y=foo(X) Active Storage Traditional Approach

3 ExampleExample BLAS DSCAL on disk Y = α. Y Experiment Traditional: The input file is read from filesystem, and the output file is written to the same file system. The input file has 120,586,240 doubles. Active Storage: Each server receives the factor, reads the array of doubles from its disk locally, and stores the resulting array on the same disk. Each server processes 120,586,240/N doubles, where N is the number of servers Speedup contributed to avoiding data movement between client and servers

4 Related Work Active Disk/Storage concept was introduced a decade ago to use Processing resources ‘Near’ the disk On the Disk Controller. On Processors connected to disks. Reduce network bandwidth/latency limitations. References DiskOS Stream Based model (ASPLOS’98: Acharya, Uysal, Saltz) Active Storage For Large-Scale Data Mining and Multimedia (VLDB ’98: Riedel, Gibson, Faloutsos) Research proved Active Disk idea interesting, but Difficult to take advantage of in practice Processors in disk controllers not designed for the purpose Vendors have not been providing SDK Y=foo(X)

5 Lustre Architecture Client OST MDS O(10) OST O(1000) O(10000) NAL Directory Metadata & concurrency File IO & Locking Recovery, File Status, File Creation

6 Lustre Client OSC NAL LLITE LOV Application User Space Application IO requests LLITE module implements Linux VFS layer LOV stripes object and targets IO to correct Object Client OSC packages up request for transmission over the NAL

7 Lustre Object Storage Server OST OBDfilter ext3 NAL Requests arrive from Portals NAL Object Storage Target directs Request to appropriate lower level OBD OBDfilter presents ext3 as Object Based Disk

8 Current Implementation of Active Storage OST OBDfilter ext3 ASOBD ASDEV Processing Component User Space Extra Module passes data, until told to pipe data elsewhere Data is sent to user space process through Unix Character Device File. Processed Data is written back to disk Pattern: 1W->2W NA L

9 9.4 Tesla High Throughput Mass Spectrometer 1 Experiment per hour 5000 spectra per experiment 4 MByte per spectrum Per instrument: 20 Gbytes per hour 480 Gbytes per day Active Storage Application High Throughput Proteomics Application Problem Given 2 float input number for target mass and tolerance, find all the possible protein sequences that would fit into specified range Active Storage Solution Each OST receives its part of the float pair sent by the client stores the resulting processing output in its Lustre OBD (object-based disk) Next generation technology will increase data rates x200

10 SC’2004 StorCloud Most Innovative Use Award Sustained 4GB/s Active Storage write processing 320 TB Lustre GB disks 40 Lustre OSS's running Active Storage 4 Logical Disks (160 OST’s) 2 Xeon Processors 1 MDS 1 Client creating files Lustre OST Client System Gigabit Network Lustre OSS 39 Lustre OST Lustre OST Lustre MDS Lustre OSS 0 Lustre OSS 38

11 Real Time Visualization

12 Active Storage Processing Patterns PatternDescription 1W->2W Data will be written to the original raw file. A new file will be created that will receive the data after it has been sent out to a processing component. 1W->1W Data will be processed then written to the original file 1R->1W Data that was previously stored on the OBD can be re-processed into a new file. 1W->0 Data will be written to the original file, and also passed out to a processing component. There is no return path for data, the processing component will do 'something' with the data. 1R->0 Data that was previously stored on the OBD is read and sent to a processing component. There is no return path 1W->#W Data is read from one file and processed, but there may be many files that are output from #W->1W There are many inputs from various files being written as outputs from the processing component. 1R->1R Data is read from a file on disk, sent to a processing component, then the output is sent to the reading process.

13 Status and Future Work Status Proof of concept 1W->2W code works now Difficult to administer and use – 2 people Memory copies between user and kernel space Future Implement other processing patterns Optimize performance by eliminating memory copies Implement Active Storage for PVFS Support different striping in files HDF, NetCDF, Database Stored Procedure Calls ….