E VALUATION OF GLUSTER AT IHEP CHENG Yaodong CC/IHEP 2011-5-4.

Slides:



Advertisements
Similar presentations
Multiple Processor Systems
Advertisements

© 2006 DataCore Software Corp SANmotion New: Simple and Painless Data Migration for Windows Systems Note: Must be displayed using PowerPoint Slideshow.
Serverless Network File Systems. Network File Systems Allow sharing among independent file systems in a transparent manner Mounting a remote directory.
1 Magnetic Disks 1956: IBM (RAMAC) first disk drive 5 Mb – Mb/in $/year 9 Kb/sec 1980: SEAGATE first 5.25’’ disk drive 5 Mb – 1.96 Mb/in2 625.
Chapter 5: Server Hardware and Availability. Hardware Reliability and LAN The more reliable a component, the more expensive it is. Server hardware is.
Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.
by Michael Poat, Dr. Jerome Lauret, Wayne Betts
1 Principles of Reliable Distributed Systems Tutorial 12: Frangipani Spring 2009 Alex Shraer.
NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.
Homework 2 In the docs folder of your Berkeley DB, have a careful look at documentation on how to configure BDB in main memory. In the docs folder of your.
Overview of Lustre ECE, U of MN Changjin Hong (Prof. Tewfik’s group) Monday, Aug. 19, 2002.
What is it? Hierarchical storage software developed in collaboration with five US department of Energy Labs since 1992 Allows storage management of 100s.
Module – 7 network-attached storage (NAS)
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
Chapter 7 Configuring & Managing Distributed File System
© 2013 Mellanox Technologies 1 NoSQL DB Benchmarking with high performance Networking solutions WBDB, Xian, July 2013.
Hands-On Microsoft Windows Server 2008 Chapter 1 Introduction to Windows Server 2008.
1 The Google File System Reporter: You-Wei Zhang.
Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.
Hands-On Microsoft Windows Server 2008
1 Guide to Novell NetWare 6.0 Network Administration Chapter 13.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
Large Scale Test of a storage solution based on an Industry Standard Michael Ernst Brookhaven National Laboratory ADC Retreat Naples, Italy February 2,
Large Scale Parallel File System and Cluster Management ICT, CAS.
ITEP computing center and plans for supercomputing Plans for Tier 1 for FAIR (GSI) in ITEP  8000 cores in 3 years, in this year  Distributed.
Chapter 10 Chapter 10: Managing the Distributed File System, Disk Quotas, and Software Installation.
Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.
Server Performance, Scaling, Reliability and Configuration Norman White.
Ceph: A Scalable, High-Performance Distributed File System
VMware vSphere Configuration and Management v6
MDC323B SMB 3 is the answer Ned Pyle Sr. PM, Windows Server
Page 1 Printing & Terminal Services Lecture 8 Hassan Shuja 11/16/2004.
Rick Claus Sr. Technical Evangelist,
CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,
 Introduction  Architecture NameNode, DataNodes, HDFS Client, CheckpointNode, BackupNode, Snapshots  File I/O Operations and Replica Management File.
Building and managing production bioclusters Chris Dagdigian BIOSILICO Vol2, No. 5 September 2004 Ankur Dhanik.
GPFS: A Shared-Disk File System for Large Computing Clusters Frank Schmuck & Roger Haskin IBM Almaden Research Center.
Tackling I/O Issues 1 David Race 16 March 2010.
Load Rebalancing for Distributed File Systems in Clouds.
LIOProf: Exposing Lustre File System Behavior for I/O Middleware
Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,
An Introduction to GPFS
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Lecture 13 Parallel Processing. 2 What is Parallel Computing? Traditionally software has been written for serial computation. Parallel computing is the.
Sql Server Architecture for World Domination Tristan Wilson.
Server Performance, Scaling, Reliability and Configuration Norman White.
Advanced Network Administration Computer Clusters.
© 2007 Z RESEARCH Z RESEARCH Inc. Non-stop Storage GlusterFS Cluster File System.
Lustre File System chris. Outlines  What is lustre  How does it works  Features  Performance.
Compute and Storage For the Farm at Jlab
Introduction to Gluster
RHEV Platform at LHCb Red Hat at CERN 17-18/1/17
Status of BESIII Distributed Computing
WP18, High-speed data recording Krzysztof Wrona, European XFEL
Introduction to Distributed Platforms
Efficient data maintenance in GlusterFS using databases
Software Architecture in Practice
Diskpool and cloud storage benchmarks used in IT-DSS
Introduction to Data Management in EGI
CERN Lustre Evaluation and Storage Outlook
SAN and NAS.
Large Scale Test of a storage solution based on an Industry Standard
Storage Virtualization
AWS Cloud Computing Masaki.
Distributed computing deals with hardware
High-Performance Storage System for the LHCb Experiment
Specialized Cloud Architectures
INFNGRID Workshop – Bari, Italy, October 2004
Client/Server Computing and Web Technologies
CS 295: Modern Systems Organizing Storage Devices
Presentation transcript:

E VALUATION OF GLUSTER AT IHEP CHENG Yaodong CC/IHEP

2Yaodong Cheng/CC/IHEP M OTIVATION Currently we deploy LUSTRE (1.7PB, version ) for local experiments, which present good performance, however also have disadvantages, for example: No replication support No data rebalance support Bad metadata performance of a large number of small files Difficult to remove and add OSTs elastically Tightly coupled with Linux kernel, hard to debug … Actually, it is difficult to find a file system to meet all our requirements, and we will continue to use Luster for a period of time in the future, but… It is necessary to check requirement lists and find potential alternatives

3Yaodong Cheng/CC/IHEP C HECKLIST Performance High aggregate bandwidth; reasonable I/O speed for each task Scalability Near linear scalability in both capacity and performance Scale to a large number of files (metadata scalability) Stability/robustness Run unattended for long periods of time without operational intervention Reliability Make data reliable in case of hardware/software/network failure Elasticity Flexibly adapt to the growth (or reduction) of data and add or remove resources to a storage pool as needed, without disrupting the system Easy to use and maintenance Security Authentication & authorization Monitoring and alerting

4Yaodong Cheng/CC/IHEP W HAT IS G LUSTER GlusterFS is a scalable open source clustered file system It aggregates multiple storage bricks over Ethernet or Infiniband RDMA interconnect into one large parallel network file system offers a global namespace Focus on scalability, elasticity, reliability, performance, ease of use and manage, … Website: N × Performance & capacity

5Yaodong Cheng/CC/IHEP G LUSTER F EATURES More scalable, reliable No Metadata server with elastic HASH Algorithm More flexible volume management (stackable features) Elastic to add, replace or remove storage bricks Application specific scheduling / load balancing round robin; adaptive least usage; non-uniform file access Automatic file replication, Snapshot, and Undelete Good performance Striping I/O accelerators - I/O threads, I/O cache, read ahead and write behind ! Fuse-based client Fully POSIX compliant Unified VFS User space Easy to install, update, or debug

6Yaodong Cheng/CC/IHEP G LUSTER A RCHITECTURE

7Yaodong Cheng/CC/IHEP ELASTIC VOLUME MANAGEMENT Storage volumes are abstracted from the underlying hardware and can grow, shrink, or be migrated across physical systems as necessary. Storage servers can be added or removed on-the-fly with data automatically rebalanced across the cluster. Data is always online and there is no application downtime. Three types of volume Distributed Volume Distributed Replicated Volume Distributed striped Volume brick1 brick2 brick3 Volume client

8Yaodong Cheng/CC/IHEP D ISTRIBUTED VOLUME Files are assigned into different bricks in multiple servers No replication, No stripe, files are kept in original format Client View Brick 1: server01/data files aaabbbccc../files/aaa../files/bbb../files/ccc../files/aaa../files/bbb../files/ccc Brick 2: server02/data Brick 3: server03/data

9Yaodong Cheng/CC/IHEP D ISTRIBUTED R EPLICATED V OLUME Replicate (mirror) data across two or more nodes Used in environments where high-availability and high- reliability are critical Also offer improved read performance in most environments But two or more times of raw disk capacity are needed Client View Brick 1: server01/data files aaabbbccc../files/aaa../files/bbb../files/ccc../files/aaa../files/bbb Brick 3: server03/data../files/ccc Brick 4: server04/data../files/aaa../files/bbb Brick 2: server02/data../files/ccc replica count: 2 mirror

10Yaodong Cheng/CC/IHEP D ISTRIBUTED S TRIPED V OLUME Stripe data across two or more nodes in the cluster. Generally only used in high concurrency environments accessing very large files File will be inaccessible if one brick fails Client View files aaabbbccc../files/aaa../files/bbb../files/ccc Brick 1: server01/data../files/aaa../files/bbb../files/ccc Brick 2: server02/data Brick 3: server03/data../files/aaa../files/bbb../files/ccc../files/aaa../files/bbb../files/ccc Stripe count: 3

11Yaodong Cheng/CC/IHEP E VALUATION OF G LUSTER Evaluation Environments Function Testing Reliability … Performance Testing Single client R/W Aggregate I/O Stress testing Metadata testing

12Yaodong Cheng/CC/IHEP E VALUATION E NVIRONMENTS Server configuration: three storage servers Processor: 2 Intel XEON 2.66GHz Memory: 24GB DISK: each server attach 12 SATA disks OS Kernel: SLC EL.cernsmp x86_64 Client configuration: 8 clients Processor: 2 Intel XEON 2.66GHz Memory: 16GB OS Kernel: SL el5 x86_64 Network link 10Gbps link interconnected between storage servers 10Gbps link between storage servers and 2 clients 1Gbps link between storage servers and 6 clients Software version GlusterFS server 10GbE Clients 1GbE server

13Yaodong Cheng/CC/IHEP T HREE VOLUMES Distributed volume Name: disvol Bricks: server1:/data02/gdata01 (11TB) server2:/data05/gdata01 (2TB) server3:/data06/gdata01 (2TB) Total capacity: 15TB Distributed replicated volume Name: repvol Bricks: server1:/data01/udata (863GB) server2:/data06/gdata01 (2TB) Total capacity: 863 GB Distributed striped volume Name: stripevol Bricks: server1:/data08/udata (2TB) server2:/data08/udata (2TB) Total capacity: 4TB

14Yaodong Cheng/CC/IHEP R ELIABILITY TESTING Two cases Storage server or network fails for one moment, then recovers Disk is destroyed and all data in the disk is lost permanently Different types of volume (distributed, striped, replicated volumes) and running operations (read, write) have different affects in the two cases Running operations mean that one is reading or writing files when storage server or network fails

15Yaodong Cheng/CC/IHEP R ESULTS disvolrepvolstripevol First case: Server 1 fails for one moment capacityShrink to 4TBExpand to 2TBError, can’t display File accessibility Files on server1 disappeared All files can be accessible. Transport endpoint is not connected Running read Files on server1: Error, exit Files on other server: ok No any break, all is right Read Error Running write the same as readAfter short break, then continue. Write Error Second case: Disks on server1 destroyed capacityShrink to 4TBExpand to 2TBError, can’t display FileFiles on server1 Lost All files can be accessible File Lost

16Yaodong Cheng/CC/IHEP P ERFORMANCE T EST Single client read/write Method: using “dd” write or read a 10GB file from one client read2: 2 threads read one file

17Yaodong Cheng/CC/IHEP M ULTIPLE CLIENTS Method: Multiple ‘dd’ of different files are read and written from multiple clients simultaneously The number of Clients

18Yaodong Cheng/CC/IHEP S TRESS TESTING 8 Clients, 8 threads per client read/write from 2 servers attached 12 SATA disks for 10 hours

19Yaodong Cheng/CC/IHEP M ETADATA TESTING mpirun -n 4 mdtest -d /data03/z -n i 10 4 tasks, files/directories (10 iterations)

20Yaodong Cheng/CC/IHEP C ONCLUSIONS In the test bed, gluster showed good scalability, performance, reliability, elasticity and so on. But, Not yet stable enough For example, lock problems occurred many time during the testing Not support file-based replication Currently, it only support volume-based replication, and replication policy can’t be changed after volume is created Lots of work, for the maintenance of global tree view is done by client, which overload the client, then such operations as ‘ls’ are very slow … Some other considerations are also taken into account when selecting storage Cost of administration, ease of deployment, capacity scaling, support for large name spaces Generally, gluster is a good file system, and we will continue to pay attention to it.

21Yaodong Cheng/CC/IHEP Thanks!!