Liverpool Experience of MDC 1 MAP (and in our belief any system which attempts to be scaleable to 1000s of nodes) broadcasts the code to all the nodes.

Slides:



Advertisements
Similar presentations
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Advertisements

Cloud Computing Resource provisioning Keke Chen. Outline  For Web applications statistical Learning and automatic control for datacenters  For data.
Silberschatz, Galvin and Gagne  2002 Modified for CSCI 399, Royden, Operating System Concepts Operating Systems Lecture 19 Scheduling IV.
David Adams ATLAS DIAL Distributed Interactive Analysis of Large datasets David Adams BNL March 25, 2003 CHEP 2003 Data Analysis Environment and Visualization.
New Cluster for Heidelberg TRD(?) group. New Cluster OS : Scientific Linux 3.06 (except for alice-n5) Batch processing system : pbs (any advantage rather.
Memory Management Chapter 5.
K.Harrison CERN, 23rd October 2002 HOW TO COMMISSION A NEW CENTRE FOR LHCb PRODUCTION - Overview of LHCb distributed production system - Configuration.
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Types of Operating System
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Implementing and Administrating Redundant PI-Advanced Computing Engine (ACE) Servers Craig Taylor PI Administrator.
Simulations Progress at Regina ➔ Event generation with genr8 – output in ascii format ➔ Conversion to either HDFast input (stdhep) or HDGeant input (hddm)
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Software GCSE COMPUTING.
LOGO Scheduling system for distributed MPD data processing Gertsenberger K. V. Joint Institute for Nuclear Research, Dubna.
Computing and LHCb Raja Nandakumar. The LHCb experiment  Universe is made of matter  Still not clear why  Andrei Sakharov’s theory of cp-violation.
March 3rd, 2006 Chen Peng, Lilly System Biology1 Cluster and SGE.
How *Not* to Generate Prescriptions for Landscape Analysis Projects Jeff D. Hamann Western Forest Mensurationist Conference June 26, 2001.
BaBar MC production BaBar MC production software VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question:
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
MapReduce How to painlessly process terabytes of data.
CSIU Submission of BLAST jobs via the Galaxy Interface Rob Quick Open Science Grid – Operations Area Coordinator Indiana University.
OCR GCSE Computing © Hodder Education 2013 Slide 1 OCR GCSE Computing Chapter 2: Memory.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
Analysis of the ROOT Persistence I/O Memory Footprint in LHCb Ivan Valenčík Supervisor Markus Frank 19 th September 2012.
Manchester HEP Desktop/ Laptop 30 Desktop running RH Laptop Windows XP & RH OS X Home server AFS using openafs 3 DB servers Kerberos 4 we will move.
1 Large-Scale Profile-HMM on the Grid Laurent Falquet Swiss Institute of Bioinformatics CH-1015 Lausanne, Switzerland Borrowed from Heinz Stockinger June.
LHCb Software Week November 2003 Gennady Kuznetsov Production Manager Tools (New Architecture)
GLAST Science Support CenterJuly, 2003 LAT Ground Software Workshop Status of the D1 (Event) and D2 (Spacecraft Data) Database Prototypes for DC1 Robert.
Software Preparation. High Level Goals Ensure scalability on 24 nodes Try to eliminate known bugs Faster program startup Better process management Better.
CSCI-365 Computer Organization Lecture Note: Some slides and/or pictures in the following are adapted from: Computer Organization and Design, Patterson.
T3 analysis Facility V. Bucard, F.Furano, A.Maier, R.Santana, R. Santinelli T3 Analysis Facility The LHCb Computing Model divides collaboration affiliated.
Stephen Burke – Data Management - 3/9/02 Partner Logo Data Management Stephen Burke, PPARC/RAL Jeff Templon, NIKHEF.
TeraGrid Gateway User Concept – Supporting Users V. E. Lynch, M. L. Chen, J. W. Cobb, J. A. Kohl, S. D. Miller, S. S. Vazhkudai Oak Ridge National Laboratory.
Distributed File Systems 11.2Process SaiRaj Bharath Yalamanchili.
A Fully Automated Fault- tolerant System for Distributed Video Processing and Off­site Replication George Kola, Tevfik Kosar and Miron Livny University.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Data transfers and storage Kilian Schwarz GSI. GSI – current storage capacities vobox LCG RB/CE GSI batchfarm: ALICE cluster (67 nodes/480 cores for batch.
ATLAS Distributed Analysis DISTRIBUTED ANALYSIS JOBS WITH THE ATLAS PRODUCTION SYSTEM S. González D. Liko
CCJ introduction RIKEN Nishina Center Kohei Shoji.
Software framework and batch computing Jochen Markert.
LHCb Computing activities Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group.
LHCb 2009-Q4 report Q4 report LHCb 2009-Q4 report, PhC2 Activities in 2009-Q4 m Core Software o Stable versions of Gaudi and LCG-AA m Applications.
Geant4 GRID production Sangwan Kim, Vu Trong Hieu, AD At KISTI.
A Web Based Job Submission System for a Physics Computing Cluster David Jones IOP Particle Physics 2004 Birmingham 1.
LHCb Computing 2015 Q3 Report Stefan Roiser LHCC Referees Meeting 1 December 2015.
September 26, 2003K User's Meeting1 CCJ Usage for Belle Monte Carlo production and analysis –CPU time: 170K hours (Aug.1, 02 ~ Aug.22, 03)
THE ATLAS COMPUTING MODEL Sahal Yacoob UKZN On behalf of the ATLAS collaboration.
The EDG Testbed Deployment Details
L’analisi in LHCb Angelo Carbone INFN Bologna
2. OPERATING SYSTEM 2.1 Operating System Function
Progress on NA61/NA49 software virtualisation Dag Toppe Larsen Wrocław
Tree based validation tool for track reconstruction
INFN-GRID Workshop Bari, October, 26, 2004
LHCb Computing Model and Data Handling Angelo Carbone 5° workshop italiano sulla fisica p-p ad LHC 31st January 2008.
Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group
OffLine Physics Computing
Grid Canada Testbed using HEP applications
Lecture 15 Reading: Bacon 7.6, 7.7
Overview of Workflows: Why Use Them?
MapReduce: Simplified Data Processing on Large Clusters
Production Manager Tools (New Architecture)
The LHCb Computing Data Challenge DC06
Presentation transcript:

Liverpool Experience of MDC 1 MAP (and in our belief any system which attempts to be scaleable to 1000s of nodes) broadcasts the code to all the nodes. Even using NFS mount we believe multiple servers would be needed for systems even of order a few 100 nodes. 300 Processor Farm MAP Starting with 1.1Gb of software is therefore horribly time- consuming, particularly as, when stripped down, 153 Mb turned out to be needed (atlsim.exe, root and a few other shared object libraries). This could have been done prior to distribution by the authors or, much better, distribute static executables as is done by LHCb.

Liverpool Experience of MDC 1 4-vectors as 5 lots of 100k events (2Gb each) have to be copied to each node to read only 5k events which proved very wasteful. In this format we were obliged to run as 5 queues of 20 jobs ie using 1/3rd of MAP but to the exclusion of everything else since each job took 60 hours, not really compatible with a multi-user system. A shared batch system needs the flexibility of much shorter jobs. If we could split the input files into 500 events each then this would have given much less redundancy and full useage of the system. As it was, we had to physically rewire the cluster.

Liverpool Experience of MDC 1 We really needed either code to merge output files or any code that reads the files so we can write our own merging programme. Output file sizes also led to 25% of jobs needing to be rerun and with smaller jobs we would have had much less problem and would have lost much less time due to miscalculations of space. The validation code also needs adapting to each site and we could again do with being able to run a small number of events first and check whether things are working as they should. On a completely different topic: In terms of communications: we would expect on current form to be able to ship 20 files of 1Gb to CERN over 24 hours.