Evaluating the performance of Seagate Kinetic Drives Technology and its integration with the CERN EOS storage system Ivana Pejeva openlab Summer Student.

Slides:



Advertisements
Similar presentations
1 A GPU Accelerated Storage System NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany Sathish Gopalakrishnan Matei.
Advertisements

Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Google Jaehyun Han 1.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Exploring The Green Blade Ken Lutz University of California, Berkeley LoCal Retreat, June 8, 2009.
1 External Sorting Chapter Why Sort?  A classic problem in computer science!  Data requested in sorted order  e.g., find students in increasing.
1 External Sorting for Query Processing Yanlei Diao UMass Amherst Feb 27, 2007 Slides Courtesy of R. Ramakrishnan and J. Gehrke.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark 2.
Northwestern University 2007 Winter – EECS 443 Advanced Operating Systems The Google File System S. Ghemawat, H. Gobioff and S-T. Leung, The Google File.
Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
Infiniband enables scalable Real Application Clusters – Update Spring 2008 Sumanta Chatterjee, Oracle Richard Frank, Oracle.
CERN openlab Open Day 10 June 2015 KL Yong Sergio Ruocco Data Center Technologies Division Speeding-up Large-Scale Storage with Non-Volatile Memory.
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
A Workflow-Aware Storage System Emalayan Vairavanathan 1 Samer Al-Kiswany, Lauro Beltrão Costa, Zhao Zhang, Daniel S. Katz, Michael Wilde, Matei Ripeanu.
PROOF: the Parallel ROOT Facility Scheduling and Load-balancing ACAT 2007 Jan Iwaszkiewicz ¹ ² Gerardo Ganis ¹ Fons Rademakers ¹ ¹ CERN PH/SFT ² University.
©2001 Pål HalvorsenINFOCOM 2001, Anchorage, April 2001 Integrated Error Management in MoD Services Pål Halvorsen, Thomas Plagemann, and Vera Goebel University.
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
CSC 456 Operating Systems Seminar Presentation (11/13/2012) Leon Weingard, Liang Xin The Google File System.
Bob Thome, Senior Director of Product Management, Oracle SIMPLIFYING YOUR HIGH AVAILABILITY DATABASE.
The Hadoop Distributed File System
Nexenta Proprietary Global Leader in Software Defined Storage Nexenta Technical Sales Professional (NTSP) COURSE CONTENT.
Nexenta Proprietary Global Leader in Software Defined Storage Nexenta Technical Sales Professional (NTSP) COURSE CONTENT.
ALICE data access WLCG data WG revival 4 October 2013.
Emalayan Vairavanathan
Profiling Grid Data Transfer Protocols and Servers George Kola, Tevfik Kosar and Miron Livny University of Wisconsin-Madison USA.
Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing,
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
Hadoop tutorials. Todays agenda Hadoop Introduction and Architecture Hadoop Distributed File System MapReduce Spark Cluster Monitoring 2.
Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.
File and Object Replication in Data Grids Chin-Yi Tsai.
A Measurement Based Memory Performance Evaluation of High Throughput Servers Garba Isa Yau Department of Computer Engineering King Fahd University of Petroleum.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.
CHEP04 Performance Analysis of Cluster File System on Linux Yaodong CHENG IHEP, CAS
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
GFS. Google r Servers are a mix of commodity machines and machines specifically designed for Google m Not necessarily the fastest m Purchases are based.
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
MDC323B SMB 3 is the answer Ned Pyle Sr. PM, Windows Server
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
Developer Day Windows Azure June 2012 & October 2012 News Mario Szpuszta Cloud Architect & Technical Evangelist, Microsoft Corp.
Grid Technology CERN IT Department CH-1211 Geneva 23 Switzerland t DBCF GT Upcoming Features and Roadmap Ricardo Rocha ( on behalf of the.
ALICE DATA ACCESS MODEL Outline 05/13/2014 ALICE Data Access Model 2  ALICE data access model  Infrastructure and SE monitoring.
PROOF Benchmark on Different Hardware Configurations 1 11/29/2007 Neng Xu, University of Wisconsin-Madison Mengmeng Chen, Annabelle Leung, Bruce Mellado,
Upgrading the Cloud Storage Benchmark Framework for ROOT 6 Compatibility By Surya Seetharaman Openlab Summer Intern 2015 IT Department Data Storage and.
CERN - European Organization for Nuclear Research FOCUS March 2 nd, 2000 Frédéric Hemmer - IT Division.
Database Management Systems, R. Ramakrishnan and J. Gehrke1 External Sorting Chapter 11.
Parallel IO for Cluster Computing Tran, Van Hoai.
External Sorting. Why Sort? A classic problem in computer science! Data requested in sorted order –e.g., find students in increasing gpa order Sorting.
1 Thierry Titcheu Chekam 1,2, Ennan Zhai 3, Zhenhua Li 1, Yong Cui 4, Kui Ren 5 1 School of Software, TNLIST, and KLISS MoE, Tsinghua University 2 Interdisciplinary.
The Google File System Sanjay Ghemawat, Howard Gobioff, and Shun-Tak Leung Presenter: Chao-Han Tsai (Some slides adapted from the Google’s series lectures)
An Analysis of Data Access Methods within WLCG Shaun de Witt, Andrew Lahiff (STFC)
Using Deduplicating Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, Robert Ricci University of Utah.
New directions in storage | ISGC 2015, Taipei | Patrick Fuhrmann | 19 March 2015 | 1 Presenter: Patrick Fuhrmann dCache.org Patrick Fuhrmann, Paul Millar,
Predrag Buncic CERN Data management in Run3. Roles of Tiers in Run 3 Predrag Buncic 2 ALICEALICE ALICE Offline Week, 01/04/2016 Reconstruction Calibration.
CalvinFS: Consistent WAN Replication and Scalable Metdata Management for Distributed File Systems Thomas Kao.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
15.June 2004Bernd Panzer-Steindel, CERN/IT1 CERN Mass Storage Issues.
Data & Storage Services CERN IT Department CH-1211 Genève 23 Switzerland t DSS Storage plans for CERN and for the Tier 0 Alberto Pace (and.
Data storage innovations demonstrated in collaboration with CERN openlab Dan Chester, Seagate Systems.
Federating Data in the ALICE Experiment
Diskpool and cloud storage benchmarks used in IT-DSS
Be Fast, Cheap and in Control
Introduction to Operating Systems
Lecture 15 Reading: Bacon 7.6, 7.7
SCONE: Secure Linux Containers Environments with Intel SGX
External Sorting.
Presentation transcript:

Evaluating the performance of Seagate Kinetic Drives Technology and its integration with the CERN EOS storage system Ivana Pejeva openlab Summer Student IT-DSS group 19/08/2015 Supervisor: Andreas J. Peters

The Project Goal ›Motivation Scalability (individual drives abstracted by larger clusters) Simplicity (reduced API and deployment model) Performance (optimized for streaming access) Cost efficiency ›Goal Evaluating the performance of the Seagate Kinetic Drives as a promising storage solution for CERN 19/08/2015 Ivana Pejeva 2

Seagate Kinetic Platform ›Directly Attached Storage (DAS) vs Ethernet Drive Technology ›What is Seagate Kinetic Platform? Key/Value Store Swift OpenStack Object Storage protocol Kinetic Drive API 19/08/2015Ivana Pejeva 3

Integration with EOS ›Kinetic IO Plug-in - comunication between the Kinetic Drives and EOS ›Allows arbitrary Reed-Solomon encoding using Intel ISA library with cauchy matrix and CRC32 block checksumming ›(32,4) - one can lose 4 drives without data loss with 12.5% of space overhead ›(10,2) - one can lose 2 drives without data loss with 20% of space overhead 19/08/2015 4

Test deployment Two kinetic configurations (32,4 and 10,2) via 10Gb gateway One conventional configuration (eos dev) with directly attached disks and 2 replicas Client machines (1GE,10GE) Storage server 19/08/2015Ivana Pejeva 5

Write performance ›Sequential upload Writing on the three instances with 1GE and 10GE clients using the xrdcopy utility File size (4k to 4Gb) Each file is uploaded 10 times ›Expected single drive performance Sequential write 50MB/s for kinetic drive ~100MB/s for conventional drive 19/08/2015Ivana Pejeva 6

Write performance

Confirmed high performance (using a single client) Usually not concerned about max performance on a single drive but about the aggregated speed to many clients 19/08/2015Ivana Pejeva 8

Read performance ›Using ROOT benchmark ›Reading an ATLAS file, accesed by single client ›Sequential and sparse accesses (100% and 50% entries) ›Measuring the CPU time, Real time and CPU ratio 19/08/2015Ivana Pejeva 9

Read performance – single client 19/08/2015Ivana Pejeva 10

Read performance – single client 19/08/2015Ivana Pejeva 11 The kinetic drives need to read blocks of 1MB from each drive If only a fraction of data is read, the kinetic plugin still reads all data while the dev drives read only the requested bytes

Summary ›Promising solution for future ›Expected performance was confirmed ›Next step Study the results for multiple-clients by measuring the aggregated throughput with sequential and sparse access 19/08/2015Ivana Pejeva 12