CDF Production Farms Yen-Chu Chen, Roman Lysak, Stephen Wolbers May 29, 2003.

Slides:



Advertisements
Similar presentations
Condor use in Department of Computing, Imperial College Stephen M c Gough, David McBride London e-Science Centre.
Advertisements

1 First Considerations on LNF Tier2 Activity E. Vilucchi January 2006.
Introduction CSCI 444/544 Operating Systems Fall 2008.
23/04/2008VLVnT08, Toulon, FR, April 2008, M. Stavrianakou, NESTOR-NOA 1 First thoughts for KM3Net on-shore data storage and distribution Facilities VLV.
16/13/2015 3:30 AM6/13/2015 3:30 AM6/13/2015 3:30 AMIntroduction to Software Development What is a computer? A computer system contains: Central Processing.
Introduction  What is an Operating System  What Operating Systems Do  How is it filling our life 1-1 Lecture 1.
16/27/2015 3:38 AM6/27/2015 3:38 AM6/27/2015 3:38 AMTesting and Debugging Testing The process of verifying the software performs to the specifications.
System Integration and Build Management Christian Schröder Roman Antonov.
November 2009 Network Disaster Recovery October 2014.
Large scale data flow in local and GRID environment V.Kolosov, I.Korolko, S.Makarychev ITEP Moscow.
Analysis of Simulation Results Andy Wang CIS Computer Systems Performance Analysis.
Types of Operating System
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
US ATLAS Western Tier 2 Status and Plan Wei Yang ATLAS Physics Analysis Retreat SLAC March 5, 2007.
April 2001HEPix/HEPNT1 RAL Site Report John Gordon CLRC, UK.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
GLAST LAT ProjectDOE/NASA Baseline-Preliminary Design Review, January 8, 2002 K.Young 1 LAT Data Processing Facility Automatically process Level 0 data.
Input/ Output By Mohit Sehgal. What is Input/Output of a Computer? Connection with Machine Every machine has I/O (Like a function) In computing, input/output,
Central Reconstruction System on the RHIC Linux Farm in Brookhaven Laboratory HEPIX - BNL October 19, 2004 Tomasz Wlodek - BNL.
CDF data production models 1 Data production models for the CDF experiment S. Hou for the CDF data production team.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina, L.Lueking,
Data Import Data Export Mass Storage & Disk Servers Database Servers Tapes Network from CERN Network from Tier 2 and simulation centers Physics Software.
A Design for KCAF for CDF Experiment Kihyeon Cho (CHEP, Kyungpook National University) and Jysoo Lee (KISTI, Supercomputing Center) The International Workshop.
CS 390 Unix Programming Summer Unix Programming - CS 3902 Course Details Online Information Please check.
Click to add text Introduction to the new mainframe: Large-Scale Commercial Computing © Copyright IBM Corp., All rights reserved. Chapter 2: Capacity.
2nd September Richard Hawkings / Paul Laycock Conditions data handling in FDR2c  Tag hierarchies set up (largely by Paul) and communicated in advance.
Stephen Wolbers CHEP2000 February 7-11, 2000 Stephen Wolbers CHEP2000 February 7-11, 2000 CDF Farms Group: Jaroslav Antos, Antonio Chan, Paoti Chang, Yen-Chu.
CDF Offline Production Farms Stephen Wolbers for the CDF Production Farms Group May 30, 2001.
21 st October 2002BaBar Computing – Stephen J. Gowdy 1 Of 25 BaBar Computing Stephen J. Gowdy BaBar Computing Coordinator SLAC 21 st October 2002 Second.
DEBUGGING. BUG A software bug is an error, flaw, failure, or fault in a computer program or system that causes it to produce an incorrect or unexpected.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
19th September 2003Tim Adye1 RAL Tier A Status Tim Adye Rutherford Appleton Laboratory BaBar UK Collaboration Meeting Royal Holloway 19 th September 2003.
US ATLAS Tier 1 Facility Rich Baker Brookhaven National Laboratory DOE/NSF Review of U.S. ATLAS and CMS Computing Projects Brookhaven National Laboratory.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
1 LHCb on the Grid Raja Nandakumar (with contributions from Greig Cowan) ‏ GridPP21 3 rd September 2008.
RAL Site report John Gordon ITD October 1999
PC clusters in KEK A.Manabe KEK(Japan). 22 May '01LSCC WS '012 PC clusters in KEK s Belle (in KEKB) PC clusters s Neutron Shielding Simulation cluster.
RAL Site Report John Gordon HEPiX/HEPNT Catania 17th April 2002.
Status SC3 SARA/Nikhef 20 juli Status & results SC3 throughput phase SARA/Nikhef Mark van de Sanden.
CDMS Computing Project Don Holmgren Other FNAL project members (all PPD): Project Manager: Dan Bauer Electronics: Mike Crisler Analysis: Erik Ramberg Engineering:
Online System Status LHCb Week Beat Jost / Cern 9 June 2015.
Status of the Bologna Computing Farm and GRID related activities Vincenzo M. Vagnoni Thursday, 7 March 2002.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
Forms 1042-S, 1098, 1099, 5498, 8027, and W-2G FIRE Filing Information Returns Electronically.
ISG We build general capability Introduction to Olympus Shawn T. Brown, PhD ISG MISSION 2.0 Lead Director of Public Health Applications Pittsburgh Supercomputing.
Liverpool Experience of MDC 1 MAP (and in our belief any system which attempts to be scaleable to 1000s of nodes) broadcasts the code to all the nodes.
5 June 2003Alan Norton / Focus / EP Topics1 Other EP Topics Some 2003 Running Experiments - NA48/2 (Flavio Marchetto) - Compass (Benigno Gobbo) - NA60.
NA62 computing resources update 1 Paolo Valente – INFN Roma Liverpool, Aug. 2013NA62 collaboration meeting.
Maria Girone CERN - IT Tier0 plans and security and backup policy proposals Maria Girone, CERN IT-PSS.
CPU on the farms Yen-Chu Chen, Roman Lysak, Miro Siket, Stephen Wolbers June 4, 2003.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
US ATLAS Tier 1 Facility Rich Baker Deputy Director US ATLAS Computing Facilities October 26, 2000.
Niko Neufeld, CERN. Trigger-free read-out – every bunch-crossing! 40 MHz of events to be acquired, built and processed in software 40 Tbit/s aggregated.
Online Consumers produce histograms (from a limited sample of events) which provide information about the status of the different sub-detectors. The DQM.
D0 Farms 1 D0 Run II Farms M. Diesburg, B.Alcorn, J.Bakken, R. Brock,T.Dawson, D.Fagan, J.Fromm, K.Genser, L.Giacchetti, D.Holmgren, T.Jones, T.Levshina,
Markus Frank (CERN) & Albert Puig (UB).  An opportunity (Motivation)  Adopted approach  Implementation specifics  Status  Conclusions 2.
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Agenda: Status, progress reports and the overall schedule –L3: Status –L3 control code delivered 10 events to the working nodes L3 Filter modules –Leave.
IPEmotion License Management PM (V1.2).
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
NL Service Challenge Plans
Types of Operating System
Experiences and Outlook Data Preservation and Long Term Analysis
Bernd Panzer-Steindel, CERN/IT
Vanderbilt University
Preparations for Reconstruction of Run6 Level2 Filtered PRDFs at Vanderbilt’s ACCRE Farm Charles Maguire et al. March 14, 2006 Local Group Meeting.
The Troubleshooting theory
Presentation transcript:

CDF Production Farms Yen-Chu Chen, Roman Lysak, Stephen Wolbers May 29, 2003

The CDF Production Farms 151 PCs (317 GHz) –16 of the 151 used for I/O Servers Disk Pool (dfarm - 13 TB) Network Switch + connections –Most 100 Mbit –16 Gbit connections for the I/O PCs Software to control everything. –Most Important! –Fairly complicated system 8 input streams, 35 output datasets Thousands of files/day – kept in order People to keep it all going –Roman Lysak and Yen-Chu Chen –Lots of help from Pasha, Liz and Dmitri!

Very Short History (up to end of March) Includes Reprocessing 750 Million Events

Recent Processing Tape Problems Executable Change

Daily Reports Updated status of the farms as of : 8:00 A.M. 5/29/03 Behind by approx. 6.3 Million on Raw Data 7.5 Million total events processed in last 24 hours 6.6 Million raw events through 4.8.4g1 0.9 Million gjet08 processed through e Issues: e running. Output going to tape. 2. Caught up to calibrations on most streams. Plan for the day: 1. Process raw data with 4.8.4g1. 2. Run streams to catch up to calibrations with priority on B and G. 3. Run e reprocessing

Daily Reports Stream Run Date run was taken Events behind* Stream A May-03 1,410,665 Stream B May ,380 Stream C May ,471 Stream D May-03 2,119,913 Stream E May ,091 Stream G May ,608 Stream H May ,833 Stream J May ,415 * This is more or less behind raw data on tape. The farms cannot process events if the calibrations aren't prepared. Calibrations (07) May-03 Current data-taking

Recent Problems/Issues Tape reading problems in late March- late April/early May. –Bizarre but extremely disruptive error. –Difficult to diagnose and work around. –Finally got past the bad set of tapes. May 8 got past the last bad tape. In the end lost about 10 files. –The bug was diagnosed by STK and presumably is now fixed.

Recent Problems/Issues Reprocessing with –This did not go well. –Lots of errors, debugging. –Lesson for next time: start earlier, check at each step, expect to start more than once, fix and restart. Change to 4.8.4g1 –Tricky, not completely smooth. –4.8.4a was used from October until last Friday. –New data required change – change is not trivial. –Took a week or so to get this correct.

Goals/Near Future Keep up with the data (Up to calibrations). Prepare for compressed data. –More events/second! Work on changes proposed in processing: –Preprocessing stream G. –Validation after concatenation. –Reading/writing from/to dcache disk. –Gcc compiler. Prepare for full reprocessing for Winter ’04 conferences. Prepare estimate for FY03 farm purchases. –Understand farm utilization. –Understand ProductionExe CPU time. –Model peak and steady-state requirements. Documentation! Look for more people to work on the farms. Good communication with physics groups, physics coordinator, online operations, etc., for setting priorities, deciding what to do on the farms are essential.

Farms Utilization Input and output Nodes (Black and red)

Goals – future Prepare long-term farm hardware enhancement/replacement plan. Plan for increased L3 output. –20 Mbyte/s now. –?? Mbyte/s in future. Other enhancements/changes as needed for CDF.