STAR scheduling future directions Gabriele Carcassi 9 September 2002.

Slides:

Advertisements

Similar presentations

WP2: Data Management Gavin McCance University of Glasgow.

Advertisements

Andrew McNab - Manchester HEP - 17 September 2002 Putting Existing Farms on the Testbed Manchester DZero/Atlas and BaBar farms are available via the Testbed.

Dan Bradley Computer Sciences Department University of Wisconsin-Madison Schedd On The Side.

CERN LCG Overview & Scaling challenges David Smith For LCG Deployment Group CERN HEPiX 2003, Vancouver.

Part 7: CondorG A: Condor-G B: Laboratory: CondorG.

NIKHEF Testbed 1 Plans for the coming three months.

WP 1 Grid Workload Management Massimo Sgaravatto INFN Padova.

CMS HLT production using Grid tools Flavia Donno (INFN Pisa) Claudio Grandi (INFN Bologna) Ivano Lippi (INFN Padova) Francesco Prelz (INFN Milano) Andrea.

1 Software & Grid Middleware for Tier 2 Centers Rob Gardner Indiana University DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National.

GRID workload management system and CMS fall production Massimo Sgaravatto INFN Padova.

Reliability and Troubleshooting with Condor Douglas Thain Condor Project University of Wisconsin PPDG Troubleshooting Workshop 12 December 2002.

Workload Management Workpackage Massimo Sgaravatto INFN Padova.

Data Management for Physics Analysis in PHENIX (BNL, RHIC) Evaluation of Grid architecture components in PHENIX context Barbara Jacak, Roy Lacey, Saskia.

DataGrid Kimmo Soikkeli Ilkka Sormunen. What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power.

GRID Workload Management System Massimo Sgaravatto INFN Padova.

Workload Management Massimo Sgaravatto INFN Padova.

CONDOR DAGMan and Pegasus Selim Kalayci Florida International University 07/28/2009 Note: Slides are compiled from various TeraGrid Documentations.

INFN-GRID Globus evaluation (WP 1) Massimo Sgaravatto INFN Padova for the INFN Globus group

5 November 2001F Harris GridPP Edinburgh 1 WP8 status for validating Testbed1 and middleware F Harris(LHCb/Oxford)

Workload Management WP Status and next steps Massimo Sgaravatto INFN Padova.

HEP Experiment Integration within GriPhyN/PPDG/iVDGL Rick Cavanaugh University of Florida DataTAG/WP4 Meeting 23 May, 2002.

ARGONNE  CHICAGO Ian Foster Discussion Points l Maintaining the right balance between research and development l Maintaining focus vs. accepting broader.

David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL July 15, 2003 LCG Analysis RTAG CERN.

SUMS ( STAR Unified Meta Scheduler ) SUMS is a highly modular meta-scheduler currently in use by STAR at there large data processing sites (ex. RCF /

BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:

03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.

Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.

MySQL and GRID Gabriele Carcassi STAR Collaboration 6 May Proposal.

Grid Workload Management & Condor Massimo Sgaravatto INFN Padova.

Aug 13 th 2003Scheduler Tutorial1 STAR Scheduler – A tutorial Lee Barnby – Kent State University Introduction What is the scheduler and what are the advantages?

Grid Status - PPDG / Magda / pacman Torre Wenaus BNL U.S. ATLAS Physics and Computing Advisory Panel Review Argonne National Laboratory Oct 30, 2001.

DataGrid WP1 Massimo Sgaravatto INFN Padova. WP1 (Grid Workload Management) Objective of the first DataGrid workpackage is (according to the project "Technical.

David Adams ATLAS DIAL status David Adams BNL July 16, 2003 ATLAS GRID meeting CERN.

Grid Workload Management Massimo Sgaravatto INFN Padova.

Condor Week 2005Optimizing Workflows on the Grid1 Optimizing workflow execution on the Grid Gaurang Mehta - Based on “Optimizing.

MAGDA Roger Jones UCL 16 th December RWL Jones, Lancaster University MAGDA  Main authors: Wensheng Deng, Torre Wenaus Wensheng DengTorre WenausWensheng.

LHCb planning for DataGRID testbed0 Eric van Herwijnen Thursday, 10 may 2001.

NGS Innovation Forum, Manchester4 th November 2008 Condor and the NGS John Kewley NGS Support Centre Manager.

David Adams ATLAS ADA, ARDA and PPDG David Adams BNL June 28, 2004 PPDG Collaboration Meeting Williams Bay, Wisconsin.

PHENIX and the data grid >400 collaborators Active on 3 continents + Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.

TeraGrid Advanced Scheduling Tools Warren Smith Texas Advanced Computing Center wsmith at tacc.utexas.edu.

What is SAM-Grid? Job Handling Data Handling Monitoring and Information.

Review of Condor,SGE,LSF,PBS

PPDG update l We want to join PPDG l They want PHENIX to join NSF also wants this l Issue is to identify our goals/projects Ingredients: What we need/want.

T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.

US CMS Centers & Grids – Taiwan GDB Meeting1 Introduction l US CMS is positioning itself to be able to learn, prototype and develop while providing.

INFSO-RI Enabling Grids for E-sciencE Scenarios for Integrating Data and Job Scheduling Peter Kunszt On behalf of the JRA1-DM Cluster,

GLIDEINWMS - PARAG MHASHILKAR Department Meeting, August 07, 2013.

Pilot Factory using Schedd Glidein Barnett Chiu BNL

PHENIX and the data grid >400 collaborators 3 continents + Israel +Brazil 100’s of TB of data per year Complex data with multiple disparate physics goals.

MySQL and GRID status Gabriele Carcassi 9 September 2002.

December 26, 2015 RHIC/USATLAS Grid Computing Facility Overview Dantong Yu Brookhaven National Lab.

Report on the INFN-GRID Globus evaluation Massimo Sgaravatto INFN Padova for the INFN Globus group

AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.

Peter Couvares Computer Sciences Department University of Wisconsin-Madison Condor DAGMan: Introduction &

Bulk Data Transfer Activities We regard data transfers as “first class citizens,” just like computational jobs. We have transferred ~3 TB of DPOSS data.

Magda Distributed Data Manager Prototype Torre Wenaus BNL September 2001.

Jaime Frey Computer Sciences Department University of Wisconsin-Madison What’s New in Condor-G.

STAR Scheduling status Gabriele Carcassi 9 September 2002.

Grid Status - PPDG / Magda / pacman Torre Wenaus BNL DOE/NSF Review of US LHC Software and Computing Fermilab Nov 29, 2001.

WP1 Status and plans Francesco Prelz, Massimo Sgaravatto 4 th EDG Project Conference Paris, March 6 th, 2002.

STAR Scheduler Gabriele Carcassi STAR Collaboration.

10 March Andrey Grid Tools Working Prototype of Distributed Computing Infrastructure for Physics Analysis SUNY.

Dynamic Deployment of VO Specific Condor Scheduler using GT4

U.S. ATLAS Grid Production Experience

lcg-infosites documentation (v2.1, LCG2.3.1) 10/03/05

Sergio Fantinel, INFN LNL/PD

CMS report from FNAL demo week Marco Verlato (INFN-Padova)

GRID Workload Management System for CMS fall production

Frieda meets Pegasus-WMS

Presentation transcript:

STAR scheduling future directions Gabriele Carcassi 9 September 2002

Directions  Current system  refine, add other features  Queue system  Condor, Condor-g/Globus  File catalog  Active policies and connection to file movers (HRM, GDMP), migration to other tools (MAGDA, GDMP, RLS,...)

Current system  Optimize the policy  as the scheduler is being used, fine tune the policy so that the farm is used at top efficiency  Optimize the queries to the file catalog  make the queries smarter: for example, when a limit on the file exists, take the files from the machine that is less busy

Current system  Other refinement  produce less garbage (all the temporary scripts and file lists)  better installation  make it available to both BNL and LBL

Queue system  Investigating the use of Condor  first deploy just Condor, and gather experience with that  migrate to Condor-g and Globus, and start submitting jobs across sites  integrate DAGMAN with the file catalog and the file movers (?)

Queue system  Setup a testbed to gather experience with Condor by itself  RCF is investigating on how to deploy Condor  deploy Condor on some nodes to gather experience  integrate Condor in our resource broker  when everything works, we can start migrating the whole farm

Queue system  Setup a testbed to gather experience with Condor-g  setup two farms with Condor-g and gather experience  make changes to the resource broker, to be able to submit across farms  needs a file catalog that works on both sites

File catalog  Currently analyzing the options  Keep for now the STAR catalog as it is  making the query smarter  interface with tools like HRM/DRM

File catalog  Migration to other grid tools  right now, the only part to be rewritten is the method that resolves the query  MAGDA’s catalog is not completely different, though some issues has to be addressed  integration with a grid catalog might provide other feature (such as file migration) for free