Large Output and Shared File Systems

Slides:



Advertisements
Similar presentations
MyProxy Guy Warner NeSC Training.
Advertisements

More HTCondor 2014 OSG User School, Monday, Lecture 2 Greg Thain University of Wisconsin-Madison.
New Cluster for Heidelberg TRD(?) group. New Cluster OS : Scientific Linux 3.06 (except for alice-n5) Batch processing system : pbs (any advantage rather.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
The Difficulties of Distributed Data Douglas Thain Condor Project University of Wisconsin
Intermediate HTCondor: Workflows Monday pm Greg Thain Center For High Throughput Computing University of Wisconsin-Madison.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Ways to Connect to OSG Tuesday afternoon, 3:00 pm Lauren Michael Research Computing Facilitator University of Wisconsin-Madison.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
IntroductiontotoHTCHTC 2015 OSG User School, Monday, Lecture1 Greg Thain University of Wisconsin– Madison Center For High Throughput Computing.
OSG Site Provide one or more of the following capabilities: – access to local computational resources using a batch queue – interactive access to local.
Rensselaer Polytechnic Institute CSCI-4210 – Operating Systems CSCI-6140 – Computer Operating Systems David Goldschmidt, Ph.D.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Status of StoRM+Lustre and Multi-VO Support YAN Tian Distributed Computing Group Meeting Oct. 14, 2014.
OSG Storage Architectures Tuesday Afternoon Brian Bockelman, OSG Staff University of Nebraska-Lincoln.
Chapter 3: Services of Network Operating Systems Maysoon AlDuwais.
Module 9: Implementing Caching. Overview Caching Overview Configuring General Cache Properties Configuring Cache Rules Configuring Content Download Jobs.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Unified scripts ● Currently they are composed of a main shell script and a few auxiliary ones that handle mostly the local differences. ● Local scripts.
Tier 3 Status at Panjab V. Bhatnagar, S. Gautam India-CMS Meeting, July 20-21, 2007 BARC, Mumbai Centre of Advanced Study in Physics, Panjab University,
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Remote & Collaborative Visualization. TACC Remote Visualization Systems Longhorn – Dell XD Visualization Cluster –256 nodes, each with 48 GB (or 144 GB)
PROGRESS: GEW'2003 Using Resources of Multiple Grids with the Grid Service Provider Michał Kosiedowski.
CernVM-FS Infrastructure for EGI VOs Catalin Condurache - STFC RAL Tier1 EGI Webinar, 5 September 2013.
CVMFS: Software Access Anywhere Dan Bradley Any data, Any time, Anywhere Project.
Day 15 Apache. Being a web server Once your system is correctly connected to the network, you could be a web server. –When you go to a web site such as.
StoRM + Lustre Proposal YAN Tian On behalf of Distributed Computing Group
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
Five todos when moving an application to distributed HTC.
Open Science Grid Consortium Storage on Open Science Grid Placing, Using and Retrieving Data on OSG Resources Abhishek Singh Rana OSG Users Meeting July.
HTCondor-CE. 2 The Open Science Grid OSG is a consortium of software, service and resource providers and researchers, from universities, national laboratories.
Ways to Connect to OSG Tuesday, Wrap-Up Lauren Michael, CHTC.
Introduction to High Throughput Computing and HTCondor Monday AM, Lecture 1 Ian Ross Center for High Throughput Computing University of Wisconsin-Madison.
Getting the Most out of HTC with Workflows Friday Christina Koch Research Computing Facilitator University of Wisconsin.
The Common Gateway Interface (CGI) Pat Morin COMP2405.
Large Input in DHTC Thursday PM, Lecture 1 Lauren Michael CHTC, UW-Madison.
Advanced Computing Facility Introduction
Petr Škoda, Jakub Koza Astronomical Institute Academy of Sciences
Compute and Storage For the Farm at Jlab
CernVM-FS vs Dataset Sharing
Open OnDemand: Open Source General Purpose HPC Portal
XNAT at Scale June 7, 2016.
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Examples Example: UW-Madison CHTC Example: Global CMS Pool
IW2D migration to HTCondor
Chapter 2: System Structures
Thursday AM, Lecture 2 Lauren Michael CHTC, UW-Madison
OSG Connect and Connect Client
Brookhaven National Laboratory Storage service Group Hironori Ito
Globus Job Management. Globus Job Management Globus Job Management A: GRAM B: Globus Job Commands C: Laboratory: globusrun.
Composition and Operation of a Tier-3 cluster on the Open Science Grid
Grid Canada Testbed using HEP applications
Troubleshooting Your Jobs
Getting Started.
Configuring Internet-related services
Getting Started.
Introduction to High Throughput Computing and HTCondor
Brian Lin OSG Software Team University of Wisconsin - Madison
Communications & Computer Networks Resource Notes - Introduction
CS 345A Data Mining MapReduce This presentation has been altered.
Michael P. McCumber Task Force Meeting April 3, 2006
Periodic Processes Chapter 9.
Any Data, Anytime, Anywhere
Azure Container Service
Credential Management in HTCondor
Troubleshooting Your Jobs
Job Submission Via File Transfer
PU. Setting up parallel universe in your pool and when (not
Thursday AM, Lecture 1 Lauren Michael
Thursday AM, Lecture 2 Brian Lin OSG
Presentation transcript:

Large Output and Shared File Systems Thursday PM, Lecture 2 Lauren Michael CHTC, UW-Madison

Per-job transfer limits <10MB/file, 1GB total exec server exec server exec server submit server exec server HTCondor <1GB/file and total submit file executable dir/ input output (exec dir)/ executable input output

What’s Different for Output? always unique (right?) caching won’t help files not associated with your local username security barriers outside of local context security issues with world-writability (versus okay world-readability for input)

Output for HTC and OSG exec server file size method of delivery words within executable or arguments? tiny – 1GB HTCondor file transfer (up to 1 GB total per-job) 1GB+ shared file system (local execute servers)

Large input in HTC and OSG exec server file size method of delivery words within executable or arguments? tiny – 10MB per file HTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shared download from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB – TBs, unique or shared shared file system (local copy, local execute servers)

(Local) Shared Filesystems data stored on file servers, but network-mounted to local submit and execute servers use local user accounts for file permissions Jobs run as YOU! readable (input) and writable (output, most of the time) MOST perform better with fewer large files (versus many small files of typical HTC)

Shared FS Technologies via network mount NFS AFS Lustre Gluster (may use NFS mount) Isilon (may use NSF mount) distributed files systems (data on many exec servers) HDFS (Hadoop) CEPH

Shared FS Configurations Submit directories WITHIN the shared filesystem most campus clusters limits HTC capabilities!! Shared filesystem separate from local submission directories supplement local HTC systems treated more as a repository for VERY large data (>GBs) Read-only (input-only) shared filesystem Treated as a repository for VERY large input, only

Submit dir within shared FS exec server exec server exec server submit server exec server HTCondor Shared FS (submit dir)/ file.sub input, software log, error, output

Submit dir within shared FS exec server exec server exec server submit server exec server HTCondor # file.sub: should_transfer_files = NO transfer_input_files = Shared FS (submit dir)/ file.sub input, software log, error, output

Separate shared FS exec server exec server exec server submit server HTCondor submit file executable dir/ input output (exec dir)/ executable input output Separate FS

Separate shared FS - Input exec server exec server exec server submit server exec server HTCondor (exec dir)/ 1.Place compressed input into FS Separate FS /path/to/lgfile lgfile

Separate shared FS - Input exec server learn.chtc.wisc.edu exec server exec server submit server exec server HTCondor (exec dir)/ lgfile 2. Executable copies and decompressesthe file Separate FS /path/to/lgfile lgfile

Separate shared FS - Output exec server learn.chtc.wisc.edu exec server exec server submit server exec server HTCondor (exec dir)/ 3. Executable removes the file in the exec dir after use Separate FS /path/to/lgfile lgfile

Separate shared FS - Output exec server learn.chtc.wisc.edu exec server exec server submit server exec server HTCondor (exec dir)/ lgfile 1.Executable creates and compresses the file Separate FS

Separate shared FS - Output exec server learn.chtc.wisc.edu exec server exec server submit server exec server HTCondor (exec dir)/ lgfile 2. Executable copies the file Separate FS /path/to/lgfile lgfile

Separate shared FS - Output exec server learn.chtc.wisc.edu exec server exec server submit server exec server HTCondor (exec dir)/ 3. Executable removes the file in the exec dir Separate FS /path/to/lgfile lgfile

/mnt/gluster/user/lgfile At UW-Madison (Ex. 4.1-4.2) exec server learn.chtc.wisc.edu exec server exec server submit server exec server HTCondor (exec dir)/ Separate FS /mnt/gluster/user/lgfile lgfile

Shared FS Configurations Submit directories WITHIN the shared filesystem most campus clusters limits HTC capabilities!! Shared filesystem separate from local submission directories supplement local HTC systems treated more as a repository for VERY large data (>GBs) Read-only (input-only) shared filesystem Treated as a repository for VERY large input, only

Large input in HTC and OSG exec server file size method of delivery words within executable or arguments? tiny – 10MB per file HTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shared download from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB – TBs, unique or shared shared file system (local copy, local execute servers)

Output for HTC and OSG exec server file size method of delivery words within executable or arguments? tiny – 1GB HTCondor file transfer (up to 1 GB total per-job) 1GB+ shared file system (local execute servers)

Review Option Input or Output? File size limits Placing files In-job file movement Accessibility? HTCondor file transfer Both 10 MB/file (in), 1 GB/file (out); 1 GB/tot (either) via HTCondor submit node via HTCondor submit file anywhere HTCondor jobs can run Web proxy Shared input only 1 GB/file specific to VO HTTP download anywhere, by anyone StashCache Shared and unique input 10 GB/file (will increase!) via OSG Connect submit server via stashcp command (and module) OSG-wide (90% of sites), by anyone Shared filesystem Input, likely output TBs (may vary) via mount location (may vary) use directly, or copy into/out of execute dir local cluster, only by YOU (usually)

Exercises 4.1 Shared Filesystem for Large Input 4.2 Shared Filesystem for Large Output

Questions? Feel free to contact me: Next: Exercises 4.1-4.2 lmichael@wisc.edu Next: Exercises 4.1-4.2 Later: Wrap-up