Event Service Wen Guan University of Wisconsin 1.

Slides:



Advertisements
Similar presentations
Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.
Advertisements

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.
Computing Lectures Introduction to Ganga 1 Ganga: Introduction Object Orientated Interactive Job Submission System –Written in python –Based on the concept.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Supporting MPI Applications on EGEE Grids Zoltán Farkas MTA SZTAKI.
MultiJob PanDA Pilot Oleynik Danila 28/05/2015. Overview Initial PanDA pilot concept & HPC Motivation PanDA Pilot workflow at nutshell MultiJob Pilot.
Pilots 2.0: DIRAC pilots for all the skies Federico Stagni, A.McNab, C.Luzzi, A.Tsaregorodtsev On behalf of the DIRAC consortium and the LHCb collaboration.
High Throughput Parallel Computing (HTPC) Dan Fraser, UChicago Greg Thain, Uwisc.
Working Out with KURL! Shayne Koestler Kinetic Data.
Resource management system for distributed environment B4. Nguyen Tuan Duc.
Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
So, Jung-ki Distributed Computing System LAB School of Computer Science and Engineering Seoul National University Implementation of Package Management.
Weekly Report By: Devin Trejo Week of May 30, > June 5, 2015.
1 Documenting Your Project. 2 Documenting your Project Insert Banner Comments in your code Comment - coding statement used by humans not the compiler.
Bigben Pittsburgh Supercomputing Center J. Ray Scott
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
1 Evolution of OSG to support virtualization and multi-core applications (Perspective of a Condor Guy) Dan Bradley University of Wisconsin Workshop on.
Scientific Computing on Amazon Web Services Dave Cuthbert Solutions Architect
Integrating HPC into the ATLAS Distributed Computing environment Doug Benjamin Duke University.
Experiment Support CERN IT Department CH-1211 Geneva 23 Switzerland t DBES P. Saiz (IT-ES) AliEn job agents.
BOSCO Architecture Derek Weitzel University of Nebraska – Lincoln.
Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.
DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.
Introduction to Using SLURM on Discover Chongxun (Doris) Pan September 24, 2013.
FAX UPDATE 1 ST JULY Discussion points: FAX failover summary and issues Mailing issues Panda re-brokering to sites using FAX cost and access Issue.
CSF4 Meta-Scheduler Name: Zhaohui Ding, Xiaohui Wei
Contents Self-Service – End user submits a ticket and starts a Bomgar chat session Phone Support – Service Desk Rep starts a Bomgar session from a submitted.
Experience and possible evolution Danila Oleynik (UTA), Sergey Panitkin (BNL), Taylor Childers (ANL) ATLAS TIM 2014.
Nurcan Ozturk University of Texas at Arlington US ATLAS Transparent Distributed Facility Workshop University of North Carolina - March 4, 2008 A Distributed.
Edmodo for Educational Networking. Table of Contents O Getting Started with Edmodo Getting Started with Edmodo O Features of Edmodo Features of Edmodo.
Event Service Intro, plans, issues and objectives for today Torre Wenaus BNL US ATLAS S&C/PS Workshop Aug 21, 2014.
US CMS Centers & Grids – Taiwan GDB Meeting1 Introduction l US CMS is positioning itself to be able to learn, prototype and develop while providing.
MultiJob pilot on Titan. ATLAS workloads on Titan Danila Oleynik (UTA), Sergey Panitkin (BNL) US ATLAS HPC. Technical meeting 18 September 2015.
Pipeline Introduction Sequential steps of –Plugin calls –Script calls –Cluster jobs Purpose –Codifies the process of creating the data set –Reduces human.
WSV207. Cluster Public Cloud Servers On-Premises Servers Desktop Workstations Application Logic.
Mar 27, gLExec Accounting Solutions in OSG Gabriele Garzoglio gLExec Accounting Solutions in OSG Mar 27, 2008 Middleware Security Group Meeting Igor.
PERFORMANCE AND ANALYSIS WORKFLOW ISSUES US ATLAS Distributed Facility Workshop November 2012, Santa Cruz.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Issues on the operational cluster 1 Up to 4.4x times variation of the execution time on 169 cores Using -O2 optimization flag Using IBM MPI without efficient.
OSG Area Coordinator’s Report: Workload Management Maxim Potekhin BNL May 8 th, 2008.
HPC pilot code. Danila Oleynik 18 December 2013 from.
JSS Job Submission Service Massimo Sgaravatto INFN Padova.
Update on Titan activities Danila Oleynik (UTA) Sergey Panitkin (BNL)
T3g software services Outline of the T3g Components R. Yoshida (ANL)
WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.
Data Analysis w ith PROOF, PQ2, Condor Data Analysis w ith PROOF, PQ2, Condor Neng Xu, Wen Guan, Sau Lan Wu University of Wisconsin-Madison 30-October-09.
Advanced topics Cluster Training Center for Simulation and Modeling September 4, 2015.
Big PanDA on HPC/LCF Update Sergey Panitkin, Danila Oleynik BigPanDA F2F Meeting. March
HTCondor’s Grid Universe Jaime Frey Center for High Throughput Computing Department of Computer Sciences University of Wisconsin-Madison.
2004 Queue Scheduling and Advance Reservations with COSY Junwei Cao Falk Zimmermann C&C Research Laboratories NEC Europe Ltd.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
PanDA HPC integration. Current status. Danila Oleynik BigPanda F2F meeting 13 August 2013 from.
1 An unattended, fault-tolerant approach for the execution of distributed applications Manuel Rodríguez-Pascual, Rafael Mayo-García CIEMAT Madrid, Spain.
Job submission overview Marco Mambelli – August OSG Summer Workshop TTU - Lubbock, TX THE UNIVERSITY OF CHICAGO.
CREAM Status and plans Massimo Sgaravatto – INFN Padova
Scientific Data Processing Portal and Heterogeneous Computing Resources at NRC “Kurchatov Institute” V. Aulov, D. Drizhuk, A. Klimentov, R. Mashinistov,
PANDA PILOT FOR HPC Danila Oleynik (UTA). Outline What is PanDA Pilot PanDA Pilot architecture (at nutshell) HPC specialty PanDA Pilot for HPC 2.
HTCondor Accounting Update
Open OnDemand: Open Source General Purpose HPC Portal
First proposal for a modification of the GIS schema
Object Stores for Event Service and Logs
Outline Benchmarking in ATLAS Performance scaling
HPC DOE sites, Harvester Deployment & Operation
Fine grained processing with an Event Service
Panda-based Software Installation
David Cameron ATLAS Site Jamboree, 20 Jan 2017
Hodor HPC Cluster LON MNG HPN Head Node Comp Node Comp Node Comp Node
CCR Advanced Seminar: Running CPLEX Computations on the ISE Cluster
Advanced Computing Facility Introduction
Exploit the massive Volunteer Computing resource for HEP computation
Presentation transcript:

Event Service Wen Guan University of Wisconsin 1

Content EventService –Event Service Introduction –Event Service queue setup –Event Service Monitor Yoda: Event Service on HPC –Yoda on HPC –Yoda on Edison –Yoda on ARC 2

What is Event Service Event level processing 3

Event Service Processing GetJob Pilot S3 objectstore panda GetEvents Process StageOut updateEvent job Events (1-10),(10-20)…( ) Merge_job getJob stagein merge stageout Pilot dCache/dpm… 4

Difference between ES job and Normal Job Pilot runs getJob to request work from Panda. A payload is returned from Panda with can be normal or ES work –eventService=True for ES job. –Normal job doesn’t have it. Pilot parses the payload. Pilot automatically selects different processes for different jobs. 5

Define ES Queue Difference from normal queue: –Corecount can be 1. cannot be None. –catchall: localEsMerge jobseed=es or std(non-es) or all(es and non-es) –jobseed is used by panda to schedule ES jobs to the queue. –Attach Objectstore If no OS attached, ES job will fail In AGIS, associate OS to the queue A default OS is already attached to a queue. Example: 6

Attach OS to queue(1) 7

Attach OS to queue(2) 8

Event Service Monitor 9

10

Event Service Monitor 11

Event Service Monitor 12

Summary Easy to setup an ES queue. Documentation available, comments on it welcome – Including OS setup for a queue. Also some debug info. – Help: Already many ES queues 13

Content EventService –Event Service Introduction –Event Service queue setup –Event Service Monitor Yoda: Event Service on HPC –Yoda on HPC –Yoda on Edison –Yoda on ARC 14

Yoda on HPC Purpose –Make use of HPC with many CPUs in one job. –No outbound internet connection Prevent us from conventional ES Yoda: –Run ES as a single MPI job. 15

Schematic view of Yoda 16

Yoda on NERSC (in production) Frontend(login machine) submit job poll job poll outputs HPCManager slurm Plugin getJob(from Panda) stageIn getEventRanges getOutputs(from HPCmanager) stageOut RunJobHPCEvent Pilot HPC cluster HPCJob Yoda Rank 0 Droid Rank 1 Droid Rank n Share File system Input Files, PFC job.json,events.json outputs outputs 17

Yoda on ARC (testing) Frontend(login machine) submit job poll job poll outputs HPCManager slurm Plugin HPC cluster HPCJob Yoda Rank 0 Droid Rank 1 Droid Rank n Share File system Input Files, job.json,events.json outputs outputs getJob(from Panda) stageIn getEventRanges getOutputs(from HPCmanager) stageOut RunJobHPCEvent mpirun HPCManager MPI Plugin CE 18

Yoda on ARC (testing) HPC cluster HPCJob Yoda Rank 0 Droid Rank 1 Droid Rank n Share File system Pilot, Input Files, job.json,events.json outputs outputs CE ARC Control Tower Release the interactive node 19

Summary Yoda is an ES solution on HPC Production Running on NERSC. –Since last year on Edison HPC. –Switch from PBS to Slurm on Edison. –Tested on new NERSC Cori system. Yoda on ARC. –Release the interactive node. –Simulated on NERSC Edison. –Integrating testing with ARC-CT. –Will be tested on ARC sites. 20