Download presentation
Presentation is loading. Please wait.
Published byJohn Boone Modified over 8 years ago
1
Large Input in DHTC Thursday PM, Lecture 1 Lauren Michael CHTC, UW-Madison
2
OSG User School 2016 Hardware transfer limits 2 submit server exec server HTCondor submit file executable dir/ input output (exec dir)/ executable input output exec server <10MB/file, 1GB total <1GB/file and total
3
OSG User School 2016 Reducing data needs An HTC best practice! split large input for better throughput and less per-job data eliminate unnecessary data compress and combine files 3
4
OSG User School 2016 Large input in HTC and OSG 4 file sizemethod of delivery wordswithin executable or arguments? tiny – 10MB per fileHTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shareddownload from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBsshared file system (local copy, local execute servers) exec server
5
OSG User School 2016 Using a Web Proxy Place the file onto a local, proxy-configured web server Have HTCondor download via HTTP address 5 submit server exec server proxy web server
6
OSG User School 2016 Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 6 submit server exec server proxy web server file
7
OSG User School 2016 Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 7 submit server exec server proxy web server file proxy web cache
8
OSG User School 2016 proxy web cache Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 8 submit server exec server proxy web server HTCondor file
9
OSG User School 2016 proxy web cache Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 9 submit server exec server proxy web server HTCondor file
10
OSG User School 2016 proxy web cache Using a Web Proxy Place the file onto a proxy-configured web server Have HTCondor download via HTTP address 10 submit server exec server proxy web server HTCondor file exec server
11
OSG User School 2016 Downloading Proxy Files HTCondor submit file: (recommended) transfer_input_files = http://host.univ.edu/path/to/shared.tar.gz Anywhere (in-executable, or test download) wget http://host.univ.edu/path/to/shared.tar.gz in-executable: make sure to delete after un-tar or at the end of the job!!! (HTCondor thinks it’s ‘new’) 11
12
OSG User School 2016 Web Proxy Considerations Managed per-VO Memory limited, max file size: 1 GB Local caching at OSG sites good for shared input files, only perfect for software and common input need to rename changed files!!! Files are downloadable by ANYONE who has the specific HTTP address Will work on 100% of OSG sites, though not all sites will have a local cache 12
13
OSG User School 2016 place files in /squid/user on a local submit server address : http://proxy.chtc.wisc.edu/SQUID/user/shared.tar.gz proxy web cache At UW-Madison (Ex. 3.1) 13 any HTC submit exec server proxy web server HTCondor file exec server /squid/user local server learn.chtc.wisc.edu file
14
OSG User School 2016 Large input in HTC and OSG 14 file sizemethod of delivery wordswithin executable or arguments? tiny – 10MB per fileHTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shareddownload from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBsshared file system (local copy, local execute servers) exec server
15
OSG User School 2016 Using StashCache for Input regionally-cached repository managed by OSG Connect 15
16
OSG User School 2016 place files in /home/user/public on login.osgconnect.net regional cache Placing Files in StashCache 16 any OSG submit exec server “Stash” origin file exec server local server login.osgconnect.net /home/username/public
17
OSG User School 2016 regional cache updates from origin every hour regional cache Placing Files in StashCache 17 exec server “Stash” origin file exec server file any OSG submit local server login.osgconnect.net /home/username/public
18
OSG User School 2016 Use HTCondor transfer for other files regional cache Obtaining Files in StashCache 18 exec server “Stash” origin HTCondor file exec server file any OSG submit local server login.osgconnect.net /home/username/public
19
OSG User School 2016 Download using stashcp command (available as an OASIS software module) regional cache Obtaining Files in StashCache 19 exec server “Stash” origin HTCondor file exec server file stashcp any OSG submit /home/username/public local server login.osgconnect.net
20
OSG User School 2016 Require StashCashe sites in the submit file +WantsStashCache Require sites with OASIS modules (for stashcp ) Requirements = (HAS_MODULES =?= true) In the Submit File 20
21
OSG User School 2016 #!/bin/bash # setup:. /cvmfs/oasis.opensciencegrid.org/osg/modules/lmod/current/init/bash module load stashcp stashcp /user/username/public/file.tar.gz./ # END In the Job Executable 21
22
OSG User School 2016 Available at ~90% of OSG sites Regional caches on very fast networks Max file size: 10 GB shared OR unique data Caches are updated ~hourly rename files to update them (safest) Currently in transition to a new method, but staschcp will stay around!! StashCache Considerations 22
23
OSG User School 2016 StashCache Speed 23
24
OSG User School 2016 Large input in HTC and OSG 24 file sizemethod of delivery wordswithin executable or arguments? tiny – 10MB per fileHTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shareddownload from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBsshared file system (local copy, local execute servers) exec server
25
OSG User School 2016 Some distributed projects with LARGE, shared datasets may have project-specific repositories that exist only on certain sites (e.g. CMS, Atlas, LIGO?, FIFE?, others?) Jobs will require specific sites with local copies and use project- specific access methods OASIS? Best for lots of small files per job (e.g. software) StashCache and Proxies better for fewer larger files per job Other Options? 25
26
OSG User School 2016 For StashCache AND web proxies: make sure to delete data when you no longer need it in the origin!!! StashCache and VO-managed web proxy servers do NOT have unlimited space! Some may regularly clean old data for you. Check with local support. Cleaning Up Old Data 26
27
OSG User School 2016 Only use these options if you MUST!! Each comes with limitations on site accessibility and/or job performance, and extra data management concerns Other Considerations 27 file sizemethod of delivery wordswithin executable or arguments? tiny – 10MB per fileHTCondor file transfer (up to 1GB total per-job) 10MB – 1GB, shareddownload from web proxy (network-accessible server) 1GB - 10GB, unique or shared StashCache (regional replication) 10 GB - TBsshared file system (local copy, local execute servers)
28
OSG User School 2016 Exercises 3.1 Using a web proxy for shared input place the blast database on the web proxy 3.2 StashCache for shared input place the blast database in StashCache 3.3 StashCache for unique input convert movie files 28
29
OSG User School 2016 Questions? Feel free to contact me: lmichael@wisc.edu lmichael@wisc.edu Next: Exercises 3.1-3.3 Later: Large output and shared filesystems 29
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.