CMS analysis job and data transfer test results

Slides:



Advertisements
Similar presentations
Jens G Jensen Atlas Petabyte store Supporting Multiple Interfaces to Mass Storage Providing Tape and Mass Storage to Diverse Scientific Communities.
Advertisements

J Jensen CCLRC RAL Data Management AUZN (mostly about SRM though) GGF 16, Athens J Jensen.
Andrew Hanushevsky7-Feb Andrew Hanushevsky Stanford Linear Accelerator Center Produced under contract DE-AC03-76SF00515 between Stanford University.
CSC 360- Instructor: K. Wu Overview of Operating Systems.
Distributed Application Management Using PLuSH Jeannie Albrecht, Christopher Tuttle, Alex C. Snoeren, and Amin Vahdat UC San Diego CSE {jalbrecht, ctuttle,
High Performance Cooperative Data Distribution [J. Rick Ramstetter, Stephen Jenks] [A scalable, parallel file distribution model conceptually based on.
Database Management COP4540, SCS, FIU An Introduction to database system.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
Design and Implementation of a Single System Image Operating System for High Performance Computing on Clusters Christine MORIN PARIS project-team, IRISA/INRIA.
Process Management A process is a program in execution. It is a unit of work within the system. Program is a passive entity, process is an active entity.
Experiences Deploying Xrootd at RAL Chris Brew (RAL)
Eos Center-wide File Systems Chris Fuson Outline 1 Available Center-wide File Systems 2 New Lustre File System 3 Data Transfer.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Window NT File System JianJing Cao (#98284).
Flexibility and user-friendliness of grid portals: the PROGRESS approach Michal Kosiedowski
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
StoRM Some basics and a comparison with DPM Wahid Bhimji University of Edinburgh GridPP Storage Workshop 31-Mar-101Wahid Bhimji – StoRM.
DDN & iRODS at ICBR By Alex Oumantsev History of ICBR  Campus wide Interdisciplinary Center for Biotechnology Research  Core Facility  Funded by the.
A. Sim, CRD, L B N L 1 OSG Applications Workshop 6/1/2005 OSG SRM/DRM Readiness and Plan Alex Sim / Jorge Rodriguez Scientific Data Management Group Computational.
RELATIONAL FAULT TOLERANT INTERFACE TO HETEROGENEOUS DISTRIBUTED DATABASES Prof. Osama Abulnaja Afraa Khalifah
Wide Area Network Access to CMS Data Using the Lustre Filesystem J. L. Rodriguez †, P. Avery*, T. Brody †, D. Bourilkov *, Y.Fu *, B. Kim *, C. Prescott.
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
03/03/09USCMS T2 Workshop1 Future of storage: Lustre Dimitri Bourilkov, Yu Fu, Bockjoo Kim, Craig Prescott, Jorge L. Rodiguez, Yujun Wu.
D C a c h e Michael Ernst Patrick Fuhrmann Tigran Mkrtchyan d C a c h e M. Ernst, P. Fuhrmann, T. Mkrtchyan Chep 2003 Chep2003 UCSD, California.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
© 2005 Prentice Hall10-1 Stumpf and Teague Object-Oriented Systems Analysis and Design with UML.
UMD TIER-3 EXPERIENCES Malina Kirn October 23, 2008 UMD T3 experiences 1.
INFO1408 Database Design Concepts Week 15: Introduction to Database Management Systems.
 CASTORFS web page - CASTOR web site - FUSE web site -
Redirector xrootd proxy mgr Redirector xrootd proxy mgr Xrd proxy data server N2N Xrd proxy data server N2N Global Redirector Client Backend Xrootd storage.
INTRODUCTION TO DBS Database: a collection of data describing the activities of one or more related organizations DBMS: software designed to assist in.
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 6: Accessing.
Scientific Storage at FNAL Gerard Bernabeu Altayo Dmitry Litvintsev Gene Oleynik 14/10/2015.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
First Look at the New NFSv4.1 Based dCache Art Kreymer, Stephan Lammel, Margaret Votava, and Michael Wang for the CD-REX Department CD Scientific Computing.
Final Implementation of a High Performance Computing Cluster at Florida Tech P. FORD, X. FAVE, K. GNANVO, R. HOCH, M. HOHLMANN, D. MITRA Physics and Space.
Chapter 9  Definition of terms  List advantages of client/server architecture  Explain three application components:
EGI-Engage Data Services and Solutions Part 1: Data in the Grid Vincenzo Spinoso EGI.eu/INFN Data Services.
CSC 480 Software Engineering Lecture 17 Nov 4, 2002.
GridKa December 2004 Forschungszentrum Karlsruhe in der Helmholtz-Gemeinschaft Doris Ressmann dCache Implementation at FZK Forschungszentrum Karlsruhe.
Distributed File System. Outline Basic Concepts Current project Hadoop Distributed File System Future work Reference.
DCache/XRootD Dmitry Litvintsev (DMS/DMD) FIFE workshop1Dmitry Litvintsev.
StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group
IHEP Computing Center Site Report Gang Chen Computing Center Institute of High Energy Physics 2011 Spring Meeting.
Riccardo Zappi INFN-CNAF SRM Breakout session. February 28, 2012 Ingredients 1. Basic ingredients (Fabric & Conn. level) 2. (Grid) Middleware ingredients.
Onedata Eventually Consistent Virtual Filesystem for Multi-Cloud Infrastructures Michał Orzechowski (CYFRONET AGH)
Scalable sync-and-share service with dCache
WP18, High-speed data recording Krzysztof Wrona, European XFEL
StoRM: a SRM solution for disk based storage systems
Vincenzo Spinoso EGI.eu/INFN
An Introduction to database system
The Client/Server Database Environment
Data services on the NGS
Scaling Spark on HPC Systems
dCache “Intro” a layperson perspective Frank Würthwein UCSD
Experiences with http/WebDAV protocols for data access in high throughput computing
Introduction to Data Management in EGI
dCache Status and Plans – Proposals for SC3
Daniel Murphy-Olson Ryan Aydelott1
CSC 480 Software Engineering
Chapter 9: The Client/Server Database Environment
Enabling High Speed Data Transfer in High Energy Physics
TYPES OFF OPERATING SYSTEM
Storage Virtualization
-A File System for Lots of Tiny Files
Practice Six Chapter Eight.
Hadoop Technopoints.
Introducing NTFS Reliability Security Long file names Efficiency
Web Application Development Using PHP
Presentation transcript:

CMS analysis job and data transfer test results Using Lustre Filesystem with dCache for CMS Analysis Y. Wu*, B. Kim*, J. L. Rodriguez†, Y. Fu*, P. Avery* *University of Florida, †Florida International University (FIU) Introduction Abstract This poster presents our new experimentation to utilize Lustre filesystem for CMS analysis with direct POSIX file access while keeping dCache as the frontend for data distribution and management. We describe our implementations to integrate dCache with Lustre filesystem and how to enable users to access data without going through dCache file read protocol. Our initial CMS analysis job measurement shows very good data access performance and the transfer performance is also very good when using Lustre as dCache pool. Motivations - Lustre is a scalable, secure, robust, highly-available cluster file system that has been used successfully among many high performance computing facilities; - dCache has been successfully used to manage large amount of data, especially HEP/LHC data globally. However, the data access in dCache is relatively slow; - It will be very good to get the advantages of both systems; Implementations CMS analysis job and data transfer test results Use Lustre directly as dCache pool - Lustre directories are used directly as dCache pool storage; - Soft links are created separately between pnfsid and logical filename; Use Lustre as tertiary storage - Take full advantage of dCache HSM interface with customized callout script; - Files are first transferred to stageout pools and then migrated to Lustre automatically by dCache; - Files can be staged in using stagein pools from Lustre filesystem; - Mapping between pnfsid and regular filename during migration for direct data access; PoolManager and pnfs directory tags are modified for files to use the special Lustre pools. The diagram for using lustre filesystem as tertiary storage is shown below: CMSSW Job execution time CMS user analysis job executes much more efficient with directly accessing data stored on Lustre filesystem. The following figure shows the CMSSW user analysis job execution time using directly Lustre access, accessing through dcap with xfs filesystem dCache pool, and accessing through dcap with Lustre filesystem dCache pool: Transfer performance Use Lustre directly as dCache pool gives very good transfer performance. The performance using Lustre as tertiary storage is relatively low due to the two-step copying through stagein pool. dCache core dCache pools Lustre storage /crn/scratch/phedex /data/store/mc/…. HSM interface Raid pool Stagein pool Pool manager Resilient pool Stageout pool Pnfs manager dCache doors gridftp dcap /pnfs/ihepa.ufl.edu/data/raid/… /pnfs/ihepa.ufl.edu/data/lustre/ Pnfs ….. ….. Conclusions - We described our implementation to utilize Lustre filesystem for CMS analysis with direct POSIX file access while keeping dCache as the frontend for data distribution and management; - Utilizing Lustre filesystem can significantly improve the data access performance comparing with data access through dCache dcap protocol. Over 60% performance improvement has been observed when running jobs with direct data access through Lustre filesystem. Dcap data access with Lustre filesystem dCache pool is also faster with smaller number of jobs, but slows down with increasing number of jobs; - dCache with directly Lustre mounted pools shows very good transfer performance;