AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems.

Slides:



Advertisements
Similar presentations
An Exercise in Improving SAS Performance on Mainframe Processors
Advertisements

Final Year Project Progress January 2007 By Daire O’Neill 4EE.
An Adaptable Benchmark for MPFS Performance Testing A Master Thesis Presentation Yubing Wang Advisor: Prof. Mark Claypool.
File System Implementation
Manajemen Basis Data Pertemuan 2 Matakuliah: M0264/Manajemen Basis Data Tahun: 2008.
CPSC 231 Sorting Large Files (D.H.)1 LEARNING OBJECTIVES Sorting of large files –merge sort –performance of merge sort –multi-step merge sort.
1 CSIT431 Introduction to Operating Systems Welcome to CSIT431 Introduction to Operating Systems In this course we learn about the design and structure.
SECTIONS 13.1 – 13.3 Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin SECONDARY STORAGE MANAGEMENT.
Multi-criteria infrastructure for location-based applications Shortly known as: Localization Platform Ronen Abraham Ido Cohen Yuval Efrati Tomer Sole'
Security and Digital Recording System Students: Gadi Marcu, Tomer Alon Number:D1123 Supervisor: Erez Zilber Semester:Spring 2004 Final Presentation.
STORAGE Virtualization
Operating Systems Concepts 1. A Computer Model An operating system has to deal with the fact that a computer is made up of a CPU, random access memory.
Introduction to Database Systems 1 The Storage Hierarchy and Magnetic Disks Storage Technology: Topic 1.
An Introduction to Operating Systems. Definition  An Operating System, or OS, is low-level software that enables a user and higher-level application.
DEDUPLICATION IN YAFFS KARTHIK NARAYAN PAVITHRA SESHADRIVIJAYAKRISHNAN.
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
COMPANY AND PRODUCT OVERVIEW Russ Taddiken Director of Principal Storage Architecture.
Lecture 11: DMBS Internals
THOUGHTS ON DATA MANAGEMENT by Justin Burruss & David Schissel SWIM Workshop November 7-9, 2005 Oak Ridge, TN.
Flashing Up the Storage Layer I. Koltsidas, S. D. Viglas (U of Edinburgh), VLDB 2008 Shimin Chen Big Data Reading Group.
Wrangling Customer Usage Data with Hadoop Clearwire – Thursday, June 27 th Carmen Hall – IT Director Mathew Johnson – Sr. IT Manager.
Computer System Overview Chapter 1. Operating System Exploits the hardware resources of one or more processors Provides a set of services to system users.
M i SMob i S Mob i Store - Mobile i nternet File Storage Platform Chetna Kaur.
3G Single Core Modem A New Telecommunications Device Group 4: Warren Irwin, Austin Beam, Amanda Medlin, Rob Westerman, Brittany Deardian.
Mayuresh Varerkar ECEN 5613 Current Topics Presentation March 30, 2011.
1 Wenguang WangRichard B. Bunt Department of Computer Science University of Saskatchewan November 14, 2000 Simulating DB2 Buffer Pool Management.
I/O (Input and Output) An I/O device acts as an interface between a computer and a user Without I/O devices, a computer is nothing but a box full of.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming  To allocate scarce memory resources.
Grid Lab About the need of 3 Tier storage 5/22/121CHEP 2012, The need of 3 Tier storage Dmitri Ozerov Patrick Fuhrmann CHEP 2012, NYC, May 22, 2012 Grid.
Introduction to HDFS Prasanth Kothuri, CERN 2 What’s HDFS HDFS is a distributed file system that is fault tolerant, scalable and extremely easy to expand.
02/09/2010 Industrial Project Course (234313) Virtualization-aware database engine Final Presentation Industrial Project Course (234313) Virtualization-aware.
By Teacher Asma Aleisa Year 1433 H.   Goals of memory management  To provide a convenient abstraction for programming.  To allocate scarce memory.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
CMS Computing Model Simulation Stephen Gowdy/FNAL 30th April 2015CMS Computing Model Simulation1.
Over-Trained Network Node Removal and Neurotransmitter-Inspired Artificial Neural Networks By: Kyle Wray.
Generic GUI – Thoughts to Share Jinping Gwo EMSGi.org.
CLOUD BASED STORAGE Amy. Cloud Based Storage Cloud based storage is “the storage of data online in the cloud”
Granularity in the Data Warehouse Chapter 4. Raw Estimates The single most important design issue facing the data warehouse developer is determining the.
11.1 Silberschatz, Galvin and Gagne ©2005 Operating System Principles 11.5 Free-Space Management Bit vector (n blocks) … 012n-1 bit[i] =  1  block[i]
Image Processing A Study in Pixel Averaging Building a Resolution Pyramid With Parallel Computing Denise Runnels and Farnaz Zand.
Introduction to Database Systems1 External Sorting Query Processing: Topic 0.
Participation of JINR in CERN- INTAS project ( ) Korenkov V., Mitcin V., Nikonov E., Oleynik D., Pose V., Tikhonenko E. 19 march 2004.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
Jérôme Jaussaud, Senior Product Manager
© 2014 VMware Inc. All rights reserved. Cloud Archive for vCloud ® Air™ High-level Overview August, 2015 Date.
Copyright © 2010 Hitachi Data Systems. All rights reserved. Confidential – NDA Strictly Required Hitachi Storage Solutions Hitachi HDD Directions HDD Actual.
RAID Technology By: Adarsha A,S 1BY08A03. Overview What is RAID Technology? What is RAID Technology? History of RAID History of RAID Techniques/Methods.
CS422 Principles of Database Systems Disk Access Chengyu Sun California State University, Los Angeles.
Input and Output Optimization in Linux for Appropriate Resource Allocation and Management James Avery King.
CS Introduction to Operating Systems
Memory Management.
How Alluxio (formerly Tachyon) brings a 300x performance improvement to Qunar’s streaming processing Xueyan Li (Qunar) & Chunming Li (Garena)
Operating Systems (CS 340 D)
I/O Resource Management: Software
Operating Systems (CS 340 D)
Introduction to Computers
Introduction to Computers
Lecture 11: DMBS Internals
Real IBM C exam questions and answers
Overview Introduction VPS Understanding VPS Architecture
Sanuja Dabade & Eilbroun Benjamin CS 257 – Dr. TY Lin
Chapter 11: File System Implementation
Lecture 7: Index Construction
Specialized Cloud Architectures
Adoption of Building Information Modeling Top Benefits of BIM data for Facilities Managers.
File Management System Simulation
CENG 351 Data Management and File Structures
Efficient Migration of Large-memory VMs Using Private Virtual Memory
Presentation transcript:

AMIR RACHUM CHAI RONEN FINAL PRESENTATION INDUSTRIAL SUPERVISOR: DR. ROEE ENGELBERG, LSI Optimized Caching Policies for Storage Systems

System data is stored over different types of storage devices Generally speaking, in data storage, for a given price, the higher the speed, the lower the volume The idea is enable use of larger, low- cost disk space with the benefits of high-speed hardware-optimize data storage for fastest overall disk access This requires a dynamic algorithm for managing (migrating) the data across the tiers. Introduction – Storage Tiering SSD High Cost High Performance Low Volume SATA Drive Low Cost Low Performance High Volume

Goals Creating a platform which will allow us to test different algorithms in system-specific scenarios. Testing several algorithms and finding the optimal algorithm amongst them for storage tiering in different scenarios.

Methodology We coded a simulator that represents the platform running the tiered storage system. We created several data structures that represent the data on the system, its location at all times, record read/write operations, and several other unique features We used a recording of real I/O calls for such a system to simulate an actual scenario.

Accomplishments Created an Algorithm interface that supports any algorithm, multiple tiers and multiple platform data structures. Our design is generic enough to enable very easy addition of usage statistics and platform data. CLI enabled quick input of input file, chunk size, tiers information. Varying chunk size let us research the effect of the size on run time and algorithm effectiveness. We implemented 2 caching algorithms:  A “naïve” algorithm that transfers every chunk to the top tier upon IO  A more efficient algorithm that minimizes migrations Smart implementation resulted in low disk space usage for the various data structures (used a default tier).

Algorithm conclusions  We ran 3 different scenarios:  Small chunk size (16B), small SSD size (64B, *4 chunk size)  Large chunk size (2048B), (relatively) small SSD size( 8196B, *4 chunk size)  Small chunk size (16B), relatively large SSD size ( 8196B, *512 chunk size)

Algorithm conclusions  When using extremely small SSD size (*4 chunk size), both caching algorithms are ineffective:  The naïve one showed a high number of reads from higher tier, yet had twice as many migrations between tiers  The smart algorithm, despite having half the migrations of the naïve algorithm, showed very little reading from higher tier.  In this case, the dummy algorithm proved very efficient, as it saved all the time needed for relatively useless migrations.

Algorithm Conclusions (16/64)

Algorithm conclusions When running with a large chunk size and *4 SSD size, the caching algorithms received much better results than the dummy algorithm. However, the 2 caching algorithms did not differ in between themselves.

Algorithm Conclusions (2048/8192)

Algorithm conclusions Running with a small chunk size and a large SSD size, the 2 caching algorithms also gave similar results. However, they were far inferior to the results from the previous run.

Algorithm Conclusions (16/8192)

General Conclusions Chunk size greatly affects the runtime of the platform, but “standard” size does not take long to run. Smart usage of Boost greatly decreases work and is very effective. Good implementation can result in huge disk space saving. Despite having data structures in the platform, most non-naïve algorithms also need their own data structure of some sort Working with Git source control proved to be very helpful:  Retrieving old code that was once thought to be obsolete.  Collaboration.