OSG Public Storage and iRODS

Slides:



Advertisements
Similar presentations
Cultural Heritage in REGional NETworks REGNET Project Meeting Content Group
Advertisements

ICIS-NPDES Plugin Design Preview Webinar ICIS-NPDES Full Batch OpenNode2 Plugin Project Presented by Bill Rensmith Windsor Solutions, Inc. 3/15/2012.
OSG Public Storage Project Summary Ted Hesselroth October 5, 2010 Fermilab.
 Management has become a multi-faceted complex task involving:  Storage Management  Content Management  Document Management  Quota Management.
Introduction to DBA.
A Very Brief Introduction to iRODS
6th Biennial Ptolemy Miniconference Berkeley, CA May 12, 2005 Distributed Computing in Kepler Ilkay Altintas Lead, Scientific Workflow Automation Technologies.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 5: Managing File Access.
14.1 © 2004 Pearson Education, Inc. Exam Planning, Implementing, and Maintaining a Microsoft Windows Server 2003 Active Directory Infrastructure.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 8: Implementing and Managing Printers.
Hands-On Microsoft Windows Server 2003 Administration Chapter 6 Managing Printers, Publishing, Auditing, and Desk Resources.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 11 Managing and Monitoring a Windows Server 2008 Network.
Module 2: Planning to Install SQL Server. Overview Hardware Installation Considerations SQL Server 2000 Editions Software Installation Considerations.
MCTS Guide to Microsoft Windows Server 2008 Network Infrastructure Configuration Chapter 7 Configuring File Services in Windows Server 2008.
Printing Terminology. Requirements for Network Printing At least one computer to operate as the print server Sufficient RAM to process documents Sufficient.
Minerva Infrastructure Meeting – October 04, 2011.
70-293: MCSE Guide to Planning a Microsoft Windows Server 2003 Network, Enhanced Chapter 14: Problem Recovery.
1 Objectives Discuss the Windows Printer Model and how it is implemented in Windows Server 2008 Install the Print Services components of Windows Server.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Module 10 Configuring and Managing Storage Technologies.
Module 13: Configuring Availability of Network Resources and Content.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment, Enhanced Chapter 5: Managing File Access.
Chapter 8 Implementing Disaster Recovery and High Availability Hands-On Virtual Computing.
OSG Operations and Interoperations Rob Quick Open Science Grid Operations Center - Indiana University EGEE Operations Meeting Stockholm, Sweden - 14 June.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
Rule-Based Data Management Systems Reagan W. Moore Wayne Schroeder Mike Wan Arcot Rajasekar {moore, schroede, mwan, {moore, schroede, mwan,
Module 9 Planning a Disaster Recovery Solution. Module Overview Planning for Disaster Mitigation Planning Exchange Server Backup Planning Exchange Server.
Electronic Records Management: A Checklist for Success Jesse Wilkins April 15, 2009.
Production Data Grids SRB - iRODS Storage Resource Broker Reagan W. Moore
Database Design and Management CPTG /23/2015Chapter 12 of 38 Functions of a Database Store data Store data School: student records, class schedules,
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
1 Chapter Overview Preparing to Upgrade Performing a Version Upgrade from Microsoft SQL Server 7.0 Performing an Online Database Upgrade from SQL Server.
1 Week #10Business Continuity Backing Up Data Configuring Shadow Copies Providing Server and Service Availability.
TeraGrid CTSS Plans and Status Dane Skow for Lee Liming and JP Navarro OSG Consortium Meeting 22 August, 2006.
And Tier 3 monitoring Tier 3 Ivan Kadochnikov LIT JINR
IRODS: the use of rules and micro services for automatic data conversion and signal pattern searching Martyn Fletcher, Tom Jackson, Bojian Liang, Michael.
Module 7 Planning and Deploying Messaging Compliance.
6/1/2001 Supplementing Aleph Reports Using The Crystal Reports Web Component Server Presented by Bob Gerrity Head.
Ruth Pordes November 2004TeraGrid GIG Site Review1 TeraGrid and Open Science Grid Ruth Pordes, Fermilab representing the Open Science.
Mark E. Fuller Senior Principal Instructor Oracle University Oracle Corporation.
INTRUSION DETECTION SYSYTEM. CONTENT Basically this presentation contains, What is TripWire? How does TripWire work? Where is TripWire used? Tripwire.
GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.
6/23/2005 R. GARDNER OSG Baseline Services 1 OSG Baseline Services In my talk I’d like to discuss two questions:  What capabilities are we aiming for.
Module 12: Configuring and Managing Storage Technologies
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Aneka Cloud ApplicationPlatform. Introduction Aneka consists of a scalable cloud middleware that can be deployed on top of heterogeneous computing resources.
Testing and integrating the WLCG/EGEE middleware in the LHC computing Simone Campana, Alessandro Di Girolamo, Elisa Lanciotti, Nicolò Magini, Patricia.
GraDS MacroGrid Carl Kesselman USC/Information Sciences Institute.
Super Computing 2000 DOE SCIENCE ON THE GRID Storage Resource Management For the Earth Science Grid Scientific Data Management Research Group NERSC, LBNL.
Vignesh Ravindran Sankarbala Manoharan. Infrastructure As A Service (IAAS) is a model that is used to deliver a platform virtualization environment with.
April 25, 2006Parag Mhashilkar, Fermilab1 Resource Selection in OSG & SAM-On-The-Fly Parag Mhashilkar Fermi National Accelerator Laboratory Condor Week.
Microsoft ® Official Course Module 6 Managing Software Distribution and Deployment by Using Packages and Programs.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Breaking the frontiers of the Grid R. Graciani EGI TF 2012.
SAM architecture EGEE 07 Service Availability Monitor for the LHC experiments Simone Campana, Alessandro Di Girolamo, Nicolò Magini, Patricia Mendez Lorenzo,
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
VO Experiences with Open Science Grid Storage OSG Storage Forum | Wednesday September 22, 2010 (10:30am)
ConfigMgr Discovering and Organizing Resources Mariusz Zarzycki, Phd, MCT, MCTS, MCITP, MCSE, MCSA.....
1 Remote Installation Service Windows 2003 Server Prof. Abdul Hameed.
Data Center Infrastructure
U.S. ATLAS Grid Production Experience
Artem Trunov and EKP team EPK – Uni Karlsruhe
20409A 7: Installing and Configuring System Center 2012 R2 Virtual Machine Manager Module 7 Installing and Configuring System Center 2012 R2 Virtual.
Leigh Grundhoefer Indiana University
Cloud computing mechanisms
DGAS Today and tomorrow
Presentation transcript:

OSG Public Storage and iRODS

Problem Statement Opportunistic Storage provisioning and usage on all OSG sites face: Limited or no support for dynamic storage allocation and automatic management VO’s relying on opportunistic storage have difficulties in locating appropriate storage, its availability and monitoring its utilization VO's need tools to manage Metadata the involvement of a Production Manager, Site Admins and VO support personnel is required to manage storage space.

iRODS - integrated Rule-Oriented Data-management System Developed since 2006 with support from the National Science Foundation, the National Archives and Records Administration Developed and supported by the Data Intensive Cyber Environments (DICE) group of the University of North Carolina at Chapel Hill and the University of California San Diego. Can manage Collections in the order of hundred million files totaling petabytes in size. Highly configurable and extensible for custom use cases through user-defined Micro-services

iRODS basic components User(s) iRODS Data Server(s) Rule Engine iCAT CATalog (RDBMS)

Proposed Implementation 1 3

User Auth and Access Info iRODS node layout 2 3 1 User Auth and Access Info

Runtime (Read) - Worker Node Case 1: File location known Case 2: File location unknown – query iRODS for storage resource Case 3: No site copy found – get via iRODS

Runtime (Write) - Worker Node Case 1: Meta data message to iCAT Case 2: Meta data & Data object(s) to iROD All writes are monitored by watchdog for resource space and quota limits Runtime (Write) - Worker Node

Hardware Infrastructure A HA central node run by Operation This node should have significant amount of iRODS disk cache (~tens of TBs) At least 2GB Memory for postgres database (iCAT)

iRODS server node Software iCAT The custom Rules/micro-service for Registration Validation Data transfer Watchdog The custom driver to access the OSG storage resources.

Worker Nodes Software The site admin must allocate a separate partition for all VOs that are using public storage via iRODS. iRODS client should be installed on the worker node via pilot job or by installing OSG WN stack VO wide write permission to files WatchDog service should run on the (storage?) site iRODS clients to be part of OSG Software ** XSEDE worker nodes’ support of iRODS

Performance Goals High Availability Service and benchmark iRODS scalability for multiple VOs simultaneous access. Federated iRODS installations Benchmark network requirements for iRODS server for file uploads Estimate requirements based on transfer patterns and also account and plan for peak needs.

Known Limitations WatchDog malfunction on quota limitations, the file could be deleted immediately after a job is finished with a successful status. The OSG site pool (disk) failure cannot be determined by iRODS, so the used space on this disk will be counted in the resource allocation. We need to come up with some mechanism of purging non-­‐available files from iCAT. The additional space added to a resource can be discovered if it is properly updated in bdii but quota needs to be changed by a Production Manager.

Implementation Plan Acquire a dedicated node with enough memory and reasonable amount of disk for testing Install and configure iRODS, iCAT etc Develop missing software modules Perform various tests to understand iRODS performance and scalability using ITB resources Identify a VO that could immediately benefit from get access to managed public storage Identify couple of site that are willing to participate Install iRODS client on VO/users machine and worker nodes of participating sites. Work with a Production Manager to test resource allocation. Work with VO admin to test resource sub-­‐allocation Help VO to modify their workflow to incorporate needed changes Monitor and measure the results of workflow execution Simulate resource failure and recovery

Operational Plan Impact on the OSG sites should be minimal. The OSG sites that offer access to public storage should provide a separate partition for all the participating VOs. VOs should be able to install iRODS client and access public storage via iRODS. The client should be also available on all the XSEDE Service Providers The Operations will have to carry out the majority of responsibilities in running iRODS and related software. It should be run as a High Availability Service. As a Service Provider the Operations have to cover software support services, including software installations and upgrades. The service upgrades should be announced and performed according to the TBD policy.

Project Plan and Phase 1 Divided into three incremental phases with phase 1 focusing on: Acquire a working iRODS osg storage driver. Acquire/Implement iRODS rules and micro-services that allow to do the following: Set quota per resource, vo, group, user Prevent users from uploading files if quota is reached Delete files in order to comply with changed quotas Send notification about success/failure Get access to FermiCloud VM that satisfy memory and storage requirements for test installation Install and Configure iRODS Register Engage VO, and several users with iRODs Register couple of sites (UCSD & Nebraska) Perform functionality tests: A PM can set/modify quota A VO admin can set/modify quota User can upload/download files User can not write via iRODS if quota is reached iRODS can delete files from storage if needed iRODS can send notification Verify accuracy of resource allocation and usage accounting and monitoring

Phase 2 Implement condor transfer plugin for file transfer from a WN with help of iRODS developers. Perform scalability tests Pre‐staged data (~100GB?) from a user laptop Upload data from the worker node to remote storage and update iRODS metadata Test basic failure condition and recovery. Test resource allocation and management with two VOs and two sites. Modify rules if needed Identify a VO, users that can benefit from access to public storage via iRODS Negotiate with two OSG sites, so a separate partition is allocated and configured Help the selected user adopt a new workflow If the results are suitable identify another VO

Phase 3 Work with iRODS developer team to Implement/test/deploy automatic VO user registration with iRODS Implement/test/deploy automatic resource discovery and verification Implement/test/deploy a WatchDog service Negotiate with the OSG sites allocation of X% of storage for public storage via iRODS Provide documentation for Operation, Production and VO teams Provide reasonable packaging of software Work with OSG Software to add iRODS client, related scripts to OSG and WN clients