Wide Area Network Access to CMS Data Using the Lustre Filesystem J. L. Rodriguez †, P. Avery, T. Brody †, D. Bourilkov , Y.Fu , B. Kim , C. Prescott.

Slides:

Advertisements

Similar presentations

Computing Infrastructure

Advertisements

Data Storage Solutions Module 1.2. Data Storage Solutions Upon completion of this module, you will be able to: List the common storage media and solutions.

Chapter 1: Introduction to Scaling Networks

CHEPREO Tier-3 Center Achievements. FIU Tier-3 Center Tier-3 Centers in the CMS computing model –Primarily employed in support of local CMS physics community.

IB in the Wide Area How can IB help solve large data problems in the transport arena.

Scale-out Central Store. Conventional Storage Verses Scale Out Clustered Storage Conventional Storage Scale Out Clustered Storage Faster……………………………………………….

Presented by: Yash Gurung, ICFAI UNIVERSITY.Sikkim BUILDING of 3 R'sCLUSTER PARALLEL COMPUTER.

Copyright 2009 FUJITSU TECHNOLOGY SOLUTIONS PRIMERGY Servers and Windows Server® 2008 R2 Benefit from an efficient, high performance and flexible platform.

NWfs A ubiquitous, scalable content management system with grid enabled cross site data replication and active storage. R. Scott Studham.

Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.

Gordon: Using Flash Memory to Build Fast, Power-efficient Clusters for Data-intensive Applications A. Caulfield, L. Grupp, S. Swanson, UCSD, ASPLOS’09.

BNL Oracle database services status and future plans Carlos Fernando Gamboa RACF Facility Brookhaven National Laboratory, US Distributed Database Operations.

RAID-x: A New Distributed Disk Array for I/O-Centric Cluster Computing Kai Hwang, Hai Jin, and Roy Ho.

Jun 29, 20101/25 Storage Evaluation on FG, FC, and GPCF Jun 29, 2010 Gabriele Garzoglio Computing Division, Fermilab Overview Introduction Lustre Evaluation:

Cluster computing facility for CMS simulation work at NPD-BARC Raman Sehgal.

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Database Services for Physics at CERN with Oracle 10g RAC HEPiX - April 4th 2006, Rome Luca Canali, CERN.

TPB Models Development Status Report Presentation to the Travel Forecasting Subcommittee Ron Milone National Capital Region Transportation Planning Board.

1 A Basic R&D for an Analysis Framework Distributed on Wide Area Network Hiroshi Sakamoto International Center for Elementary Particle Physics (ICEPP),

Pooja Shetty Usha B Gowda.  Network File Systems (NFS)  Drawbacks of NFS  Parallel Virtual File Systems (PVFS)  PVFS components  PVFS application.

Florida Tier2 Site Report USCMS Tier2 Workshop Fermilab, Batavia, IL March 8, 2010 Presented by Yu Fu For the Florida CMS Tier2 Team: Paul Avery, Dimitri.

9/16/2000Ian Bird/JLAB1 Planning for JLAB Computational Resources Ian Bird.

A TCP/IP transport layer for the DAQ of the CMS Experiment Miklos Kozlovszky for the CMS TriDAS collaboration CERN European Organization for Nuclear Research.

UTA Site Report Jae Yu UTA Site Report 4 th DOSAR Workshop Iowa State University Apr. 5 – 6, 2007 Jae Yu Univ. of Texas, Arlington.

Network Tests at CHEP K. Kwon, D. Han, K. Cho, J.S. Suh, D. Son Center for High Energy Physics, KNU, Korea H. Park Supercomputing Center, KISTI, Korea.

CMAQ Runtime Performance as Affected by Number of Processors and NFS Writes Patricia A. Bresnahan, a * Ahmed Ibrahim b, Jesse Bash a and David Miller a.

Amy Apon, Pawel Wolinski, Dennis Reed Greg Amerson, Prathima Gorjala University of Arkansas Commercial Applications of High Performance Computing Massive.

Data transfer over the wide area network with a large round trip time H. Matsunaga, T. Isobe, T. Mashimo, H. Sakamoto, I. Ueda International Center for.

03/03/09USCMS T2 Workshop1 Future of storage: Lustre Dimitri Bourilkov, Yu Fu, Bockjoo Kim, Craig Prescott, Jorge L. Rodiguez, Yujun Wu.

14 Aug 08DOE Review John Huth ATLAS Computing at Harvard John Huth.

Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.

© 2008 Cisco Systems, Inc. All rights reserved.Cisco ConfidentialPresentation_ID 1 Chapter 1: Introduction to Scaling Networks Scaling Networks.

DYNES Storage Infrastructure Artur Barczyk California Institute of Technology LHCOPN Meeting Geneva, October 07, 2010.

Plethora: A Wide-Area Read-Write Storage Repository Design Goals, Objectives, and Applications Suresh Jagannathan, Christoph Hoffmann, Ananth Grama Computer.

Fast Crash Recovery in RAMCloud. Motivation The role of DRAM has been increasing – Facebook used 150TB of DRAM For 200TB of disk storage However, there.

KIT – The cooperation of Forschungszentrum Karlsruhe GmbH und Universität Karlsruhe (TH) Hadoop on HEPiX storage test bed at FZK Artem Trunov.

High Energy Physics and Grids at UF (Dec. 13, 2002)Paul Avery1 University of Florida High Energy Physics.

Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.

CERN - IT Department CH-1211 Genève 23 Switzerland t High Availability Databases based on Oracle 10g RAC on Linux WLCG Tier2 Tutorials, CERN,

A Silvio Pardi on behalf of the SuperB Collaboration a INFN-Napoli -Campus di M.S.Angelo Via Cinthia– 80126, Napoli, Italy CHEP12 – New York – USA – May.

ClinicalSoftwareSolutions Patient focused.Business minded. Slide 1 Opus Server Architecture Fritz Feltner Sept 7, 2007 Director, IT and Systems Integration.

Florida Tier2 Site Report USCMS Tier2 Workshop Livingston, LA March 3, 2009 Presented by Yu Fu for the University of Florida Tier2 Team (Paul Avery, Bourilkov.

Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.

Linux IDE Disk Servers Andrew Sansum 8 March 2000.

PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.

Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.

FroNtier Stress Tests at Tier-0 Status report Luis Ramos LCG3D Workshop – September 13, 2006.

Pathway to Petaflops A vendor contribution Philippe Trautmann Business Development Manager HPC & Grid Global Education, Government & Healthcare.

LIOProf: Exposing Lustre File System Behavior for I/O Middleware

9/22/10 OSG Storage Forum 1 CMS Florida T2 Storage Status Bockjoo Kim for the CMS Florida T2.

10/18/01Linux Reconstruction Farms at Fermilab 1 Steven C. Timm--Fermilab.

Dynamic and Scalable Distributed Metadata Management in Gluster File System Huang Qiulan Computing Center,Institute of High Energy Physics,

IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.

Component 8/Unit 1bHealth IT Workforce Curriculum Version 1.0 Fall Installation and Maintenance of Health IT Systems Unit 1b Elements of a Typical.

Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.

Introduction to Data Analysis with R on HPC Texas Advanced Computing Center Feb

EGEE is a project funded by the European Union under contract IST Test di GPFS a Catania IV Workshop INFN Grid – Bari Ottobre

G. Russo, D. Del Prete, S. Pardi Frascati, 2011 april 4th-7th The Naples' testbed for the SuperB computing model: first tests G. Russo, D. Del Prete, S.

S. Pardi Computing R&D Workshop Ferrara 2011 – 4 – 7 July SuperB R&D on going on storage and data access R&D Storage Silvio Pardi

Testing the Zambeel Aztera Chris Brew FermilabCD/CSS/SCS Caveat: This is very much a work in progress. The results presented are from jobs run in the last.

Lustre File System chris. Outlines  What is lustre  How does it works  Features  Performance.

Experience of Lustre at QMUL

The demonstration of Lustre in EAST data system

Belle II Physics Analysis Center at TIFR

Diskpool and cloud storage benchmarks used in IT-DSS

Experience of Lustre at a Tier-2 site

CMS analysis job and data transfer test results

Introduction to Networks

Cost Effective Network Storage Solutions

Presentation transcript:

Wide Area Network Access to CMS Data Using the Lustre Filesystem J. L. Rodriguez †, P. Avery*, T. Brody †, D. Bourilkov *, Y.Fu *, B. Kim *, C. Prescott *, Y. Wu * † Florida International University (FIU), *University of Florida (UF) Network: Florida Lambda Rail (FLR)  FIU: Servers were connected to the FLR via a dedicated Campus Research Network 1Gbps, however local hardware issues limits FIU’s actual bandwidth to ~ 600 Mbps  UF: Servers connected to FLR via their own dedicated 2x10Gbps  Flatech: Servers connected to 1Gbps  Server TCP buffers set to max of 16MB Lustre Fileserver at UF-HPC/Tier2 Center: Gainesville, FL  Storage subsystem: Six each, RAID INC Falcon III with redundant dual port 4Gbit FC RAID controller shelves with 24x750 GB HDs, with raw storage of 104 TB  Attached to: Two dual quad core Barcelona Opteron 2350 with 16 GB RAM, three FC cards and 1x10GigE Chelsio NIC  Storage system clocked at greater than 1 GBps via TCP/IP large block I/O FIU Lustre Clients: Miami, FL  CMS analysis server: medianoche.hep.fiu.edu, dual 4 core Intel X5355 with 16GB RAM, dual 1GigE  FIU fileserver: fs1.local, dual 2 core Intel Xeon, with 16GB RAM, 3ware 9000 series RAID cntlr, NFS ver 3.x, RAID 5 (7+1) with 16TB disk raw  OSG gatekeeper: dgt.hep.fiu.edu, dual 2 core Xeon with 2GB RAM single GigE Used as in Lustre tests, experimented with NAT ( it works, but not tested)  System configuration: Lustre patched kernel EL_lustre , both systems mounted UF-HPC’s Lustre filesystem on local mount point Flatech Lustre Client: Melbourne, FL  CMS server: flatech-grid3.fit.edu, dual 4 core Intel E5410 w/8GB RAM, GigE  System configuration: unpatched SL4 kernel. Lustre enabled via runtime kernel modules Site Configuration and Security  All sites share common UID/GID domains  Mount access restricted to specific IP’s via firewall  ACLs and root_squash security features are not currently implemented in testbed The Florida State Wide Lustre Testbed Computing facilities in the distributed computing model for the CMS experiment at CERN. In the US, Tier2 sites are medium size facilities with approximately 10 6 kSI2K of computing power and 200TBs of disk storage. The facilities are centrally managed; each with dedicated computing resources and manpower. Tier3 sites on the other hand range in size from a single interactive analysis computer or small cluster to large facilities that rival the Tier2s in resources. Tier3s are usually found at Universities in close proximity to CMS researchers. CMS Experiment Online System CERN Computer Center FermiLab Korea Russia UK FSU MB/s 10 Gb/s Gb/s 1.0 Gb/s Tier 0 Tier 1 Tier 3 Tier 2 Desktop or laptop PCs, Macs… Flatech UCSDCaltech U Florida  3000 physicists, 60 countries  10s of Petabytes/yr by 2010  CERN / Outside = 10-20% FIU OSG FLR: 10 Gbps UF HPC UF Tier2 FIU Tier3 FlaTech Tier3 Introduction We explore the use of the Lustre cluster filesystem over the WAN to access CMS (Compact Muon Solenoid) data stored on a storage system located hundreds of miles away. The Florida State Wide Lustre Testbed consist of two client sites located at CMS Tier3s, one in Miami, FL, one in Melbourne, FL and a Luster storage system located in Gainesville at the University of Florida’s HPC Center. In this paper we report on I/O rates between sites, using both the CMS application suite CMSSW and the I/O benchmark tool IOzone. We describe our configuration, outlining the procedures implemented, and conclude with suggestions on the feasibility of implementing a distributed Lustre storage to facilitate CMS data access for users at remote Tier3 sites. Lustre is a POSIX compliant, network aware, highly scalable, robust and reliable cluster filesystem developed by Sun Microsystems Inc. The system can run over several different types of networking infrastructure including ethernet, Infiniband, myrinet and others. It can be configured with redundant components to eliminating single points of failure. It has been tested with 10,000’s of nodes, providing petabytes of storage and can move data at 100’s of GB/sec. The system employs state-of-the-art security features and plans to introduce GSS and kerberos based security in future releases. The system is available as Public Open Source under the GNU General Public License. Lustre is deployed on a broad array of computing facilities, Both large and small, commercial and public organizations including some of the largest super computing centers in the world are currently using the Lustre as their distributed file system. IO Performance with CMSSW: FIU to UF IO Performance with the IOzone benchmark tool FIU to UF The IOzone benchmark tool was used to establish the maximum possible I/O performance of Lustre over the WAN between FIU and UF and between Flatech and UF. Here we report on results between FIU and UF only. – Lustre fs at UF-HPC was mounted on local mount point on medianoche.hep.fiu.edu located in Miami – File sizes set to 2XRAM, to avoid cacheing effects – Measurements made as function of record length – Checked in multi-processor mode: 1 thru 8 concurrent processes – Checked with dd read/write rates – All tests consistent with IOzone results shown With large block IO, we can saturate the network link between UF and FIU using the standard IO benchmark tool IOzone Using the CMSSW application we tested the IO performance of the testbed between the FIU Tier3 and the UF-HPC Lustre storage. An IO bound CMSSW application was used for the tests. Its main function was to skim objects from data collected during the Cosmic Runs at Four Tesla (CRAFT) in the Fall of The application is the same as that utilized by the Florida Cosmic Analysis group. The output data file was redirected to /dev/null. Report on aggregate and average read I/O rate – Aggregate IO rate is the total IO rate per node vs. number of jobs concurrently running on a single node – Average IO rate is per process per job vs. number of jobs concurrently running on a single node Compare IO rates between Lustre NFS and local disk – NFS: fileserver 16TB 3ware 9000 over NFS ver. 3 – Local: single 750GB SATAII hard drive Observations – For NFS and Lustre filesystems the IO rates scale linearly with the number of jobs, not so with local disk – Average IO rates remain relatively constant as a function of jobs per node for distributed filesystem – The Lustre IO rate are significantly lower than seen with IOzone and lower than obtained with NFS We are now investigating the cause of the discrepancy between the Lustre CMSSW IO rates and the rates observed with IOzone Summary: – Lustre is very easy to deploy, particularly so as a client installation – Direct I/O operations show that the Lustre filesystem mounted over the WAN works reliably and with high degree of performance. We have demonstrated that we can easily saturate a 1 Gbps link with I/O bound applications – CMSSW remote data access was observed to be slower than expected when compared to IO rates using IO benchmarks and when compared to other distributed filesystems – We have demonstrated that the CMSSW application can access data located hundreds of miles away with the Lustre filesystem. Data accessed this way can be done seamlessly, reliably, with a reasonable degree of performance even with all components “out of the box” Lustre clients are easy to deploy, mounts are easy to establish, are reliable and robust Security established by restricting IPs and sharing UID/GID domains between all sites Conclusion: The Florida State Wide Lustre Testbed demonstrates an alternative method for accessing data stored at dedicated CMS computing facilities. This method has the potential of greatly simplifying access to data sets, large, medium or small, for remote experimenters with limited local computing resources. Summary and Conclusion Lustre version IO performance of the testbed between FIU and UF. The plot shows sequential and random read/write performance, in Mbytes per second using the IOzone as a function of record length.