SixDesk Scripts – an Update

Slides:



Advertisements
Similar presentations
1 CRAB Tutorial 19/02/2009 CERN F.Fanzago CRAB tutorial 19/02/2009 Marco Calloni CERN – Milano Bicocca Federica Fanzago INFN Padova.
Advertisements

NovaBACKUP 10 xSP Technical Training By: Nathan Fouarge
9 Chapter Nine Compiled Web Server Programs. 9 Chapter Objectives Learn about Common Gateway Interface (CGI) Create CGI programs that generate dynamic.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
Portal User Group Meeting June 13, Agenda I. Welcome II. Updates on the following: –Migration Status –New Templates –DB Breakup –Keywords –Streaming.
SE: CHAPTER 7 Writing The Program
Computer Science and Engineering The Ohio State University  Widely used, especially in the opensource community, to track all changes to a project and.
The huge amount of resources available in the Grids, and the necessity to have the most up-to-date experimental software deployed in all the sites within.
Stuart Wakefield Imperial College London Evolution of BOSS, a tool for job submission and tracking W. Bacchi, G. Codispoti, C. Grandi, INFN Bologna D.
Continuous Integration and Code Review: how IT can help Alex Lossent – IT/PES – Version Control Systems 29-Sep st Forum1.
2-Dec Offline Report Matthias Schröder Topics: Scientific Linux Fatmen Monte Carlo Production.
2012 Objectives for CernVM. PH/SFT Technical Group Meeting CernVM/Subprojects The R&D phase of the project has finished and we continue to work as part.
Pavel Nevski DDM Workshop BNL, September 27, 2006 JOB DEFINITION as a part of Production.
STAR Scheduling status Gabriele Carcassi 9 September 2002.
THE EYESWEB PLATFORM - GDE The EyesWeb XMI multimodal platform GDE 5 March 2015.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
Log Shipping, Mirroring, Replication and Clustering Which should I use? That depends on a few questions we must ask the user. We will go over these questions.
Wouter Verkerke, NIKHEF 1 Using ‘stoomboot’ for NIKHEF-ATLAS batch computing What is ‘stoomboot’ – Hardware –16 machines, each 2x quad-core Pentium = 128.
AFS Home Directory Migration Details Andy Romero Core Computing Division.
1 April 14, Starting New Open Source Software Projects William Cohen NCSU CSC 591W April 14, 2008.
L1Calo Databases ● Overview ● Trigger Configuration DB ● L1Calo OKS Database ● L1Calo COOL Database ● ACE Murrough Landon 16 June 2008.
AFS Phase Out experience R. De Maria. AFS and LSF Phase Out (1) I have learned about relevant changes in IT services, only while iterating with IT experts.
Advanced Higher Computing Science
Start-SPPowerShell – Introduction to PowerShell for SharePoint Admins and Developers Paul BAker.
How to Contribute to System Testing and Extract Results
Jonathan Walpole Computer Science Portland State University
Processes and threads.
CSE 103 Day 20 Jo is out today; I’m Carl
How to link a test to a launcher (in this case a shell launcher)
Archiving and Document Transfer Utilities
Recent Improvements in Collsoft
AFS/LSF Phase Out personal experience
LOCO Extract – Transform - Load
Overview of Needs for SixTrack on-line Aperture Check
IW2D migration to HTCondor
Operating Systems (CS 340 D)
Overview – SOE PatchTT December 2013.
MCTS Guide to Microsoft Windows 7
Update on SixDesk Scripts
Chapter 2: System Structures
USAJOBS – Application Manager
Savannah to Jira Migration
A Short Course on Geant4 Simulation Toolkit How to learn more?
LCGAA nightlies infrastructure
Basic Work-Flow with SQL Server Standard
Software Testing With Testopia
Operating Systems (CS 340 D)
Lecture 5: GPU Compute Architecture
What is Bash Shell Scripting?
SharePoint Saturday Omaha April 2016
X in [Integration, Delivery, Deployment]
SharePoint Administrative Communications Planning: Dynamic User Notifications for Upgrades, Migrations, Testing, … Presented by Robert Freeman (
湖南大学-信息科学与工程学院-计算机与科学系
Chapter 9: Virtual-Memory Management
Learning to Program in Python
Lecture 5: GPU Compute Architecture for the last time
Chapter 5: CPU Scheduling
Turbo-Charged Transaction Logs
Exploring the Power of EPDM Tasks - Working with and Developing Tasks in EPDM By: Marc Young XLM Solutions
Loops CIS 40 – Introduction to Programming in Python
Process Description and Control
Git CS Fall 2018.
Agile testing for web API with Postman
A Short Course on Geant4 Simulation Toolkit How to learn more?
A Short Course on Geant4 Simulation Toolkit How to learn more?
Project Iterations.
PyWBEM Python WBEM Client: Overview #2
MapReduce: Simplified Data Processing on Large Clusters
Chapter 13: I/O Systems “The two main jobs of a computer are I/O and [CPU] processing. In many cases, the main job is I/O, and the [CPU] processing is.
Presentation transcript:

SixDesk Scripts – an Update A. Mereghetti Special thanks to: R. De Maria, M. Giovannozzi, I. Zacharov for stimulating discussions and guidance; E. McIntosh and F. Schmidt, for having given us the present system! P.D. Hermes and F. Van der Veken, for continuous feedback on new scripts and help with coding; 12 Apr 2017 A.Mereghetti

Outline Rationale Behind Changes Present Workflow and Updates (HTCondor) A Demo Outlook 12 Apr 2017 A.Mereghetti

Outline Rationale Behind Changes Present Workflow and Updates (HTCondor) A Demo Outlook 12 Apr 2017 A.Mereghetti

Rationale – Foreseen Changes AFS phasing out (mv home folders by 2017… - J. Iven, 2nd ABP CWG meeting, 2016-09-29): Potential migration to EOS, but with critical points: Limit to 1M files (total per user); System very slow when dealing with many small files  way better behavior with few big files; SQL db (eg 6DB) basically un-usable (TBC); Problems of synchronization between redundancy images of files (TBC); Personally not fully convinced by proposed solution  further iteration with IT; HTCondor as replacement of LSF, and as interface to BOINC; New SixTrack features, requiring new input parameters (i.e. new blocks in fort.3 treated by SixDesk): FMA, wires for bb compensation, eLenses, .zip functionality, collimation, … New variables to be added to scans – eg: x-ing angle, octupole / chroma (now outside SixDesk), momentum, …; 12 Apr 2017 A.Mereghetti

Rationale – Foreseen Changes (II) Trash bin mechanism, in case user buffer dir is full… Other additions: Scan on a squared domain in tune space, not only on a line; Extend use of SixDesk to FCC - e.g. integer part of tunes to be correctly represented; 12 Apr 2017 A.Mereghetti

Rationale – Odd Behaviors AFS work.boinc volume (SPOOLDIRs) becoming un-responsive: BOINC server: dispatching WUs to volunteers and validation/assimilation of results gets BLOCKED! User side: submission of WUs to BOINC (run_six) and retrieval of results (run_results) slowed down a lot; Countermeasures: Auto-restart of server where work.boinc volume is located; Reduce load by scripts; 12 Apr 2017 A.Mereghetti

Rationale – Odd Behaviors (II) Random problems in AFS / lxplus, mainly with run_six: Segmentation faults: The script stops, without much info useful for debugging; Painful to restart submission; Suspect: failures taking place at calling awk / gawk / sed; Countermeasures: Use bash built-in functionalities as much as possible, skipping external components (awk / gawk / sed); Avoid tmp files; Re-factorize code, optimizing operations as much as possible (zip/gzip, setacl, …) Generation of empty strings of study names: Dissemination of files in track/ folder; Entire dirs of MadX seeds being removed!  incomplete studies; Generalized multiple trial algorithm; 12 Apr 2017 A.Mereghetti

Rationale – Odd Behaviors (III) Random problems in AFS / lxplus, mainly with run_six: Sed failures: “read error on stdin: Bad address” Direct consequence: generation of empty strings; Countermeasures: Generalized multiple trial algorithm; DB corruption: run_incomplete_* becomes un-usable; Counter measures: Fixing DB (taskids, incomplete_/completed_cases, my*, lsfjobs/boincjobs files) from existing info (thanks to PDHermes!); Other failures: bsub, kinit, cp-ing to BOINC buffer… 12 Apr 2017 A.Mereghetti

Rationale – Miscellanea Separate tools from workspace: whenever an update is available, a user does not need to go through all their workspaces; Central management of scripts: Scripts are stored in a shared area on AFS: /afs/cern.ch/project/sixtrack/SixDesk_utilities It is easier to make sure that all users use the latest version! Some general clean-up and down-sizing of code, for better and faster maintenance (hopefully!); 12 Apr 2017 A.Mereghetti

Outline Rationale Behind Changes Present Workflow and Updates (HTCondor) A Demo Outlook 12 Apr 2017 A.Mereghetti

Present Workflow step script Set up of study set_env Input generation run_mad6t Submit to BOINC run_six Retrieve results run_results Re-submit to LSF run_incomplete_cases_lsf run_status General workflow foresees: First job submission to BOINC: Very large CPU power, though ETA is floating (days/weeks/months); Completion of most of the jobs; Second job submission to LSF: Finite CPU power, and ETA more under control; Completion of missing jobs; Submission to BOINC is essential to keep load on LSF at a reasonable level! 12 Apr 2017 A.Mereghetti

BOINC Is a Vital Resource BOINC server status (1st March 2017) Auto-restart of work.boinc volume (thanks to IT): seems not to be due to large traffic of WUs from/to volunteers… SixTrack app on BOINC (1st March 2017) 12 Apr 2017 A.Mereghetti

BOINC Is a Vital Resource (II) (Cumulative) submissions (1st March 2017) ~5.8M WUs being processed in 43 d (looming IPAC/LARP…) …more to come!!! Approaching limit of BOINC server in receiving new work from users! SixTrack app on BOINC (1st March 2017) 12 Apr 2017 A.Mereghetti

New Scripts New scripts are available in shared space of AFS: /afs/cern.ch/project/sixtrack/SixDesk_utilities dev/ : most recent stable updates; pro/ : production version (previous version with most recent updates); old/ : previous pro/ version; Development being tracked in Git: https://github.com/amereghe/SixDesk/tree/isolateScripts New scripts: Presently, only management of simulations  for analysis of results: 6DB…; contained in utilities/, in sub-folders divided according to implementation language; ported to bash  calls to external scripting languages kept (sed / awk / …); Code refactoring + use of getops: Reduce redundant lines / functionalities to essential minimum; Possible to collect all concerned lines of code in very few files while triggering specific actions; Each script must be called with terminal-line arguments; Mini help / how-to-use if no argument is given; Not all functionalities have been fully ported to new scripts (e.g. run_results)  old ones are available in new path! templates/ folder 12 Apr 2017 A.Mereghetti

Overview of Recent Improvements Mainly in run_six.sh, but not only: Some scripts have been (almost) fully re-written: set_env, run_mad6t, run_six; All the others: collected in one place only; In general: trying to reduce load on OS/FS, minimizing operations to bare essential ones: gzip/gunzip, cp, use of tmp files; awk/sed calls replaced by built-in bash string manipulation; points in scan computed only before the big loop in run_six.sh, …; Random errors: Glitches in OS: introduced a generalised multiple trial algorithm, used in strategic positions; Warning email to admin anytime algorithm is used, error email also sent to user when algorithm fails – mail contains also dump of envs; Fatality: -R option, to resume submission from where it was interrupted; Treating any tune / amplitude / angle values – limitation on format of integer / fractional parts no longer there; Tune scan not only on line but also on squared domain  SixTrack! Trash bin / megazip mechanisms – retrieval of trashed results still manual… Checks of buffer dirs of old studies; Reworked sixdeskmess syntax (thanks to PDHermes): Reduction of number of code lines; Removed auto-logging  anyway, it does not capture STDERR! Porting to HTCondor; 12 Apr 2017 A.Mereghetti

New Workflow DEMO! step Old script New script Set up of study set_env set_env.sh Input generation run_mad6t mad6t.sh Submit to BOINC run_six run_six.sh Retrieve results run_results run_six.sh (in progress) Re-submit to LSF run_incomplete_cases_lsf run_status Recent additions: Control.sh: teeing of STDOUT / STDERR to a log file; Suitable for acrontab jobs; List of studies / workspaces can be provided on terminal line or via txt file  manage entire workspaces! backup.sh: Back up (archiving) complete studies (no longer single points), no matter the platform; To EOS/CASTOR/local path; DEMO! 12 Apr 2017 A.Mereghetti

Outline Rationale Behind Changes Present Workflow and Updates (HTCondor) A Demo Outlook 12 Apr 2017 A.Mereghetti

Few Notes Aim: to show new scripts and their basic use  more infos available in the twiki and in the help of each scripts; Additionally, give an insight to use of HTCondor: All you have to do is to set: export platform=HTCONDOR; All the rest is for your education; Useful commands in HTCondor: condor_q, condor_rm; This morning, AFS issues when using HTCondor! IT partner on vacation for a week, more news asap! Latest news on batch system (this afternoon!): 18th Apr 2017: start of grace period (1 month), i.e. 50% HTCondor, 50% LSF; From mid May: 90% HTCondor, 10% LSF, for ~6 months; Afterwards, only HTCondor; Demo still makes use of my private version of scripts – recent bug fixes, will commit asap! 12 Apr 2017 A.Mereghetti

Key Commands Set path var (bash): export TOOLSdev=/afs/cern.ch/project/sixtrack/SixDesk_utilities/dev Set study: ${TOOLSdev}/utilities/bash/set_env.sh -s Submit MADX jobs: ${TOOLSdev}/utilities/bash/mad6t.sh -s First Submission (to BOINC): ${TOOLSdev}/utilities/bash/run_six.sh -a -p BOINC Retrieve results from BOINC: ${TOOLSdev}/utilities/bash/run_results Submit missing jobs: ${TOOLSdev}/utilities/bash/run_six.sh -i -s -p HTCONDOR Update DBs: ${TOOLSdev}/utilities/bash/run_status 12 Apr 2017 A.Mereghetti

Key Commands (II) Resume submission after killing: ${TOOLSdev}/utilities/bash/run_six.sh -a -R <last_WU_name> Selective submission: ${TOOLSdev}/utilities/bash/run_six.sh -s -S Example of acrontab job for retrieving results: 0 0,6,12,18 * * * lxplus.cern.ch /afs/cern.ch/user/a/amereghe/private/repos/SixDesk/utilities/bash/control.sh -R -f ~/studies.lis >> /dev/null 2>&1 Backing up to EOS (no need to set up env vars!): ${TOOLSdev}/utilities/bash/backUp.sh -d <study_name> Restore back up from EOS (no need to set up env vars!): ${TOOLSdev}/utilities/bash/backUp.sh -d <study_name> -R 12 Apr 2017 A.Mereghetti

Outline Rationale Behind Changes Present Workflow and Updates (HTCondor) A Demo Outlook 12 Apr 2017 A.Mereghetti

Outlook – High Priority Update SixTrack exes (thanks to Kyrre and James for new compilation procedure – very effective!) on BOINC and lxplus (LSF/HTCondor), to have new features in: fma, wires, eLenses, ZIPF, fcc lattice (bignblz), various improvements in DYNK, bignpart; User decides flavor of exe on BOINC; Adapt run_status / run_results to ZIPF functionality; Submission to HTBOINC: Removal of spooldirs! Very long simulations; Add new parameter scans: Momentum, collimation, etc… Incorporate as much as possible about the scan in .mask file, and leave the rest to SixTrack input parameters; Finalize test suite  anyone willing to have a particular case to be constantly checked at every release, please send me sixdeskenv / sysenv / .mask / .db; 12 Apr 2017 A.Mereghetti

Outlook – Mid / Low Priority Fixing DB (all DB files! - PDHermes); Faster status of study (FVanDerVeken); Archiving of study (FVanDerVeken); Import run_results / run_status functionality in run_six.sh: try to make these tools less demanding on system; Lower number of code lines; Parallelization of tasks; User workspace: Folder structure; Merge long / short / DA type of simulations: At declaration of input parameters; At handling of files / submission of jobs; Only one input file among sixdeskenv / sysenv; Cleaning: All checks performed at set_env.sh; Remove: BNL / bignblz / test flags; Rm superseded platforms: CPSS, GRID; 12 Apr 2017 A.Mereghetti

Final Remarks If you use BOINC for producing results, please remember to acknowledge the BOINC volunteers in your papers, as done here: “The volunteers of LHC@home are warmly thanked as, without their contribution of CPU-time, the intense tracking campaign reported in this paper would not have been possible.” I should update the list of papers with results from BOINC volunteers, quoted on BOINC web page (dates back to 2013!!) – plese send me any new contribution since last year!; Whoever wants to help with discussions / feedback / maintenance / coding / development is more than welcome! 8th BOINC pentathlon (message boards announcement – thanks to E. McIntosh) Additional info: New scripts: twiki page  in continuous update, so please be patient!  help is more than welcome! Development in Repo on GitHub; HTCondor: CERN guide / Submit guide; 12 Apr 2017 A.Mereghetti

Back up slides 12 Apr 2017 A.Mereghetti

User Work-Space – A First Idea The idea is to have: All files / folders in only one path; Minimize number of links; 12 Apr 2017 A.Mereghetti