Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day 17.01.05 A.Biland IPP/ETHZ.

Slides:



Advertisements
Similar presentations
Buffers & Spoolers J L Martin Think about it… All I/O is relatively slow. For most of us, input by typing is painfully slow. From the CPUs point.
Advertisements

Status GridKa & ALICE T2 in Germany Kilian Schwarz GSI Darmstadt.
1 Cplant I/O Pang Chen Lee Ward Sandia National Laboratories Scalable Computing Systems Fifth NASA/DOE Joint PC Cluster Computing Conference October 6-8,
13 May, 2005GlueX Collaboration Meeting1 Experiences with Large Data Sets Curtis A. Meyer.
SUMS Storage Requirement 250 TB fixed disk cache 130 TB annual increment for permanently on- line data 100 TB work area (not controlled by SUMS) 2 PB near-line.
Disk Drivers May 10, 2000 Instructor: Gary Kimura.
1 Lecture 21: Virtual Memory, I/O Basics Today’s topics:  Virtual memory  I/O overview Reminder:  Assignment 8 due Tue 11/21.
Data Storage Willis Kim 14 May Types of storages Direct Attached Storage – storage hardware that connects to a single server Direct Attached Storage.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
I/O Systems ◦ Operating Systems ◦ CS550. Note:  Based on Operating Systems Concepts by Silberschatz, Galvin, and Gagne  Strongly recommended to read.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Lecture 11: DMBS Internals
1 Lecture 20: Parallel and Distributed Systems n Classification of parallel/distributed architectures n SMPs n Distributed systems n Clusters.
 Introduction, concepts, review & historical perspective  Processes ◦ Synchronization ◦ Scheduling ◦ Deadlock  Memory management, address translation,
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
Performance Concepts Mark A. Magumba. Introduction Research done on 1058 correspondents in 2006 found that 75% OF them would not return to a website that.
1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
The Red Storm High Performance Computer March 19, 2008 Sue Kelly Sandia National Laboratories Abstract: Sandia National.
ATLAS DC2 seen from Prague Tier2 center - some remarks Atlas sw workshop September 2004.
Cosc 2150: Computer Organization Chapter 6, Part 2 Virtual Memory.
 Optimization and usage of D3PD Ilija Vukotic CAF - PAF 19 April 2011 Lyon.
GStore: GSI Mass Storage ITEE-Palaver GSI Horst Göringer, Matthias Feyerabend, Sergei Sedykh
Sejong STATUS Chang Yeong CHOI CERN, ALICE LHC Computing Grid Tier-2 Workshop in Asia, 1 th December 2006.
Introduction to dCache Zhenping (Jane) Liu ATLAS Computing Facility, Physics Department Brookhaven National Lab 09/12 – 09/13, 2005 USATLAS Tier-1 & Tier-2.
Experience with the Thumper Wei Yang Stanford Linear Accelerator Center May 27-28, 2008 US ATLAS Tier 2/3 workshop University of Michigan, Ann Arbor.
Building a Real Workflow Thursday morning, 9:00 am Greg Thain University of Wisconsin - Madison.
CS 140 Lecture Notes: Technology and Operating Systems Slide 1 Technology Changes Mid-1980’s2012Change CPU speed15 MHz2.5 GHz167x Memory size8 MB4 GB500x.
A User-Lever Concurrency Manager Hongsheng Lu & Kai Xiao.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
PARALLEL APPLICATIONS EE 524/CS 561 Kishore Dhaveji 01/09/2000.
ROOT and Federated Data Stores What Features We Would Like Fons Rademakers CERN CC-IN2P3, Nov, 2011, Lyon, France.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
CC-J Monthly Report Shin’ya Sawada (KEK) for CC-J Working Group
Monte Carlo Data Production and Analysis at Bologna LHCb Bologna.
Problem-solving on large-scale clusters: theory and applications Lecture 4: GFS & Course Wrap-up.
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
GRID activities in Wuppertal D0RACE Workshop Fermilab 02/14/2002 Christian Schmitt Wuppertal University Taking advantage of GRID software now.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
History & Motivations –RDBMS History & Motivations (cont’d) … … Concurrent Access Handling Failures Shared Data User.
PROOF tests at BNL Sergey Panitkin, Robert Petkus, Ofer Rind BNL May 28, 2008 Ann Arbor, MI.
Large scale data flow in local and GRID environment Viktor Kolosov (ITEP Moscow) Ivan Korolko (ITEP Moscow)
Computing Issues for the ATLAS SWT2. What is SWT2? SWT2 is the U.S. ATLAS Southwestern Tier 2 Consortium UTA is lead institution, along with University.
DMBS Internals I February 24 th, What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the.
Atlas Software Structure Complicated system maintained at CERN – Framework for Monte Carlo and real data (Athena) MC data generation, simulation and reconstruction.
1.3 ON ENHANCING GridFTP AND GPFS PERFORMANCES A. Cavalli, C. Ciocca, L. dell’Agnello, T. Ferrari, D. Gregori, B. Martelli, A. Prosperini, P. Ricci, E.
DMBS Internals I. What Should a DBMS Do? Store large amounts of data Process queries efficiently Allow multiple users to access the database concurrently.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
CPSC 231 Secondary storage (D.H.)1 Learning Objectives Understanding disk organization. Sectors, clusters and extents. Fragmentation. Disk access time.
BNL dCache Status and Plan CHEP07: September 2-7, 2007 Zhenping (Jane) Liu for the BNL RACF Storage Group.
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Alien and GSI Marian Ivanov. Outlook GSI experience Alien experience Proposals for further improvement.
9/22/10 OSG Storage Forum 1 CMS Florida T2 Storage Status Bockjoo Kim for the CMS Florida T2.
PC COMPONENTS. System Unit Cases This is the cabinet that holds the main components of a computer. It includes a plastic front panel for aesthetic purpose.
Apr. 25, 2002Why DØRAC? DØRAC FTFM, Jae Yu 1 What do we want DØ Regional Analysis Centers (DØRAC) do? Why do we need a DØRAC? What do we want a DØRAC do?
29/04/2008ALICE-FAIR Computing Meeting1 Resulting Figures of Performance Tests on I/O Intensive ALICE Analysis Jobs.
Lecture 17 Raid. Device Protocol Variants Status checks: polling vs. interrupts Data: PIO vs. DMA Control: special instructions vs. memory-mapped I/O.
September 26, 2003K User's Meeting1 CCJ Usage for Belle Monte Carlo production and analysis –CPU time: 170K hours (Aug.1, 02 ~ Aug.22, 03)
Compute and Storage For the Farm at Jlab
Getting the Most out of Scientific Computing Resources
Getting the Most out of Scientific Computing Resources
Threads vs. Events SEDA – An Event Model 5204 – Operating Systems.
Multiple Platters.
Experiences with Large Data Sets
Bernd Panzer-Steindel, CERN/IT
Metrics? Efficiency? Cost?
Simulation use cases for T2 in ALICE
Presentation transcript:

Converting ASGARD into a MC-Farm for Particle Physics Beowulf-Day A.Biland IPP/ETHZ

Adrian Biland, IPP/ETHZ2 Beowulf Concept Three Main Components:

Adrian Biland, IPP/ETHZ3 Beowulf Concept Three Main Components: CPU Nodes

Adrian Biland, IPP/ETHZ4 Beowulf Concept Three Main Components: CPU NodesNetwork

Adrian Biland, IPP/ETHZ5 Beowulf Concept Three Main Components: CPU NodesNetworkFileserver

Adrian Biland, IPP/ETHZ6 Beowulf Concept Three Main Components: CPU NodesNetworkFileserver $$$$$$$$$ ? $$$$ ? $$$ ? How much of the (limited) money to spend for what ??

Adrian Biland, IPP/ETHZ7 Beowulf Concept Intended (main) usage : “Eierlegende Woll-Milch-Sau” (one size fits everything) Put ~equal amount of money into each component ==> ok for (almost) any possible use, but waste of money for most applications

Adrian Biland, IPP/ETHZ8 Beowulf Concept Intended (main) usage : ~80% ~10% ~10% [ ASGARD, HREIDAR-I ] CPU-bound jobs with limited I/O and interCPU communication

Adrian Biland, IPP/ETHZ9 Beowulf Concept Intended (main) usage : ~50% ~40% ~10% [ HREIDAR-II ] Jobs with high interCPU communication needs: (Parallel Proc.)

Adrian Biland, IPP/ETHZ10 Beowulf Concept Intended (main) usage : ~50% ~10% ~40% Jobs with high I/O needs or large datasets: (Data Analysis)

Adrian Biland, IPP/ETHZ11 Fileserver Problems: a) Speed (parallel access) Inexpensive Fileservers reach disk-I/O ~50 MB/s 500 single-CPU jobs ==> 50 MB/s /500 jobs = 100kB/s /job (as an upper limit; typical values reached much smaller) Using several Fileservers in parallel: -- difficult data management (where is which file ?) [ use parallel filesystems ? ] -- hot spots (all jobs want to access same dataset ) [ data replication ==> $$$ ]

Adrian Biland, IPP/ETHZ12 Fileserver Problems: a) Speed (parallel access) How (not) to read/write the data: Bad: NFS (constant transfer of small chunks of data) ==> always disk repositioning ==> disk-I/O --> 0 (somewhat improved with large cache (>>100MB) in memory if write-cache full: long time to flush to disk ==> server blocks) ~ok: rcp (transfer of large blocks from/to local /scratch ) /scratch rather small on ASGARD if many jobs want to transfer at same time ??? Best: fileserver initiates rpc transfers on request user discipline, not very transparent, …

Adrian Biland, IPP/ETHZ13 Fileserver Problems: b) Capacity …. 500 jobs producing data Each writes 100kB/s ==> 50 MB/s to Fileserver ==> 4.2 TB / day !

Adrian Biland, IPP/ETHZ14 Particle Physics MC Need huge amount of statistically independent events #events >> #CPUs ==> ‘embarassingly parallel’ problem ==> 5x500 MIPS as good as 1x2500 MIPS Usually two sets of programs: a)Simulation: produce huge, very detailed MC-files (adapted standard packages [GEANT, CORSKA, …] b) Reconstruction: read MC-files, write smaller reco-files selected events, physics data (special SW developed by each experiment) Mass-Production: only reco-files needed: ==> combine both tasks in one job, use /scratch

Adrian Biland, IPP/ETHZ15 ASGARD Status: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch

Adrian Biland, IPP/ETHZ16 ASGARD Status: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 1GB / 1GB swap 4GB /scratch Fileserver: ++bandwidth, ++capacity /scratch: guaranteed space/job Needed:

Adrian Biland, IPP/ETHZ17 ASGARD Upgrade: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 4x 400GB SATA, RAID-10 (800GB usable) / frame

Adrian Biland, IPP/ETHZ18 ASGARD Upgrade: 24 nodes/ frame 10 frames /home /work /arch Local disk per node: 0.2GB / 0.3GB swap 2.5GB /scr1 2.5GB /scr2 4x 400GB SATA, RAID-10 (800GB usable) / frame Adding 10 Fileservers (~65kFr), ASGARD can serve ~2years as MC Farm and GRID Testbed …