Febryary 10, 1999Stefano Belforte - INFN Trieste1 CDF Run II Computing Workshop. A user’s perspective Stefano Belforte INFN - Trieste.

Slides:



Advertisements
Similar presentations
ATLAS Tier-3 in Geneva Szymon Gadomski, Uni GE at CSCS, November 2009 S. Gadomski, ”ATLAS T3 in Geneva", CSCS meeting, Nov 091 the Geneva ATLAS Tier-3.
Advertisements

Incontri di INFNET January 19, 1999 Impatto del Run II di CDF Stefano Belforte - INFN Pisa1 Reqs from CDF Run II on INFN computing infrastructures Usual.
10 May 2002Report & Plans on computers1 Status Report for CDF Italy computing fcdfsgi2 disks Budget report CAF: now, next year GRID etc. CDF – Italy meeting.
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
Performance Evaluation
OS Fall ’ 02 Performance Evaluation Operating Systems Fall 2002.
SM3121 Software Technology Mark Green School of Creative Media.
New Mexico State University Finding Useful Information The Internet can be Good and Defrag can be Worthless The League of Extraordinary Off-Campus Computer.
Basic Unix Dr Tim Cutts Team Leader Systems Support Group Infrastructure Management Team.
Virtual Network Servers. What is a Server? 1. A software application that provides a specific one or more services to other computers  Example: Apache.
SQL Server 2008 & Solid State Drives Jon Reade SQL Server Consultant SQL Server 2008 MCITP, MCTS Co-founder SQLServerClub.com, SSC
Storage Area Networks The Basics. Storage Area Networks SANS are designed to give you: More disk space Multiple server access to a single disk pool Better.
11 The Ultimate Upgrade Nicholas Garcia Bell Helicopter Textron.
RAMCloud Design Review Recovery Ryan Stutsman April 1,
Types of Operating System
October 24, 2000Milestones, Funding of USCMS S&C Matthias Kasemann1 US CMS Software and Computing Milestones and Funding Profiles Matthias Kasemann Fermilab.
Online Systems Status Review of requirements System configuration Current acquisitions Next steps... Upgrade Meeting 4-Sep-1997 Stu Fuess.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
Could You Use More Traffic?. If you’re like most marketers, the answer to this question is… YES!
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
1 Data Management D0 Monte Carlo needs The NIKHEF D0 farm The data we produce The SAM data base The network Conclusions Kors Bos, NIKHEF, Amsterdam Fermilab,
The SAMGrid Data Handling System Outline:  What Is SAMGrid?  Use Cases for SAMGrid in Run II Experiments  Current Operational Load  Stress Testing.
Remote Production and Regional Analysis Centers Iain Bertram 24 May 2002 Draft 1 Lancaster University.
03/27/2003CHEP20031 Remote Operation of a Monte Carlo Production Farm Using Globus Dirk Hufnagel, Teela Pulliam, Thomas Allmendinger, Klaus Honscheid (Ohio.
November 7, 2001Dutch Datagrid SARA 1 DØ Monte Carlo Challenge A HEP Application.
Farm Management D. Andreotti 1), A. Crescente 2), A. Dorigo 2), F. Galeazzi 2), M. Marzolla 3), M. Morandin 2), F.
D0 SAM – status and needs Plagarized from: D0 Experiment SAM Project Fermilab Computing Division.
Data Structures & Algorithms and The Internet: A different way of thinking.
3rd June 2004 CDF Grid SAM:Metadata and Middleware Components Mòrag Burgon-Lyon University of Glasgow.
Jean-Yves Nief CC-IN2P3, Lyon HEPiX-HEPNT, Fermilab October 22nd – 25th, 2002.
Fermilab User Facility US-CMS User Facility and Regional Center at Fermilab Matthias Kasemann FNAL.
Jan. 17, 2002DØRAM Proposal DØRACE Meeting, Jae Yu 1 Proposal for a DØ Remote Analysis Model (DØRAM) IntroductionIntroduction Remote Analysis Station ArchitectureRemote.
12 Maggio 1998Computing for CDF run II Sefano Belforte - INFN Pisa 1 Computing for CDF run II (2000 and beyond) (a work in progress !) SUMMARY Software.
6/26/01High Throughput Linux Clustering at Fermilab--S. Timm 1 High Throughput Linux Clustering at Fermilab Steven C. Timm--Fermilab.
9 February 2000CHEP2000 Paper 3681 CDF Data Handling: Resource Management and Tests E.Buckley-Geer, S.Lammel, F.Ratnikov, T.Watts Hardware and Resources.
3 Apr 2002Stefano Belforte – INFN Trieste Necessita’ CDF al Tier11 CDF needs at Tier 1 Many details in slides for (your) future reference Will move faster.
LCG Phase 2 Planning Meeting - Friday July 30th, 2004 Jean-Yves Nief CC-IN2P3, Lyon An example of a data access model in a Tier 1.
EGEE is a project funded by the European Union under contract IST HEP Use Cases for Grid Computing J. A. Templon Undecided (NIKHEF) Grid Tutorial,
1 MONGODB: CH ADMIN CSSE 533 Week 4, Spring, 2015.
Building a Real Workflow Thursday morning, 9:00 am Lauren Michael Research Computing Facilitator University of Wisconsin - Madison.
The LHCb Italian Tier-2 Domenico Galli, Bologna INFN CSN1 Roma,
File sharing requirements of remote users G. Bagliesi INFN - Pisa EP Forum on File Sharing 18/6/2001.
Stefano Belforte INFN Trieste 1 CMS Simulation at Tier2 June 12, 2006 Simulation (Monte Carlo) Production for CMS Stefano Belforte WLCG-Tier2 workshop.
Outline: Tasks and Goals The analysis (physics) Resources Needed (Tier1) A. Sidoti INFN Pisa.
8-Dec-15T.Wildish / Princeton1 CMS analytics A proposal for a pilot project CMS Analytics.
UTA MC Production Farm & Grid Computing Activities Jae Yu UT Arlington DØRACE Workshop Feb. 12, 2002 UTA DØMC Farm MCFARM Job control and packaging software.
Outline: Status: Report after one month of Plans for the future (Preparing Summer -Fall 2003) (CNAF): Update A. Sidoti, INFN Pisa and.
MC Production in Canada Pierre Savard University of Toronto and TRIUMF IFC Meeting October 2003.
Scientific Computing Facilities for CMS Simulation Shams Shahid Ayub CTC-CERN Computer Lab.
10-Jan-00 CERN Building a Regional Centre A few ideas & a personal view CHEP 2000 – Padova 10 January 2000 Les Robertson CERN/IT.
January 20, 2000K. Sliwa/ Tufts University DOE/NSF ATLAS Review 1 SIMULATION OF DAILY ACTIVITITIES AT REGIONAL CENTERS MONARC Collaboration Alexander Nazarenko.
A UK Computing Facility John Gordon RAL October ‘99HEPiX Fall ‘99 Data Size Event Rate 10 9 events/year Storage Requirements (real & simulated data)
Hans Wenzel CDF CAF meeting October 18 th -19 th CMS Computing at FNAL Hans Wenzel Fermilab  Introduction  CMS: What's on the floor, How we got.
1 5/4/05 Fermilab Mass Storage Enstore, dCache and SRM Michael Zalokar Fermilab.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
John Samuels October, Why Now?  Vista Problems  New Features  >4GB Memory Support  Experience.
CDF ICRB Meeting January 24, 2002 Italy Analysis Plans Stefano Belforte - INFN Trieste1 Strategy and present hardware Combine scattered Italian institutions.
11th September 2002Tim Adye1 BaBar Experience Tim Adye Rutherford Appleton Laboratory PPNCG Meeting Brighton 11 th September 2002.
Canadian Bioinformatics Workshops
Bernd Panzer-Steindel CERN/IT/ADC1 Medium Term Issues for the Data Challenges.
1 P. Murat, Mini-review of the CDF Computing Plan 2006, 2005/10/18 An Update to the CDF Offline Plan and FY2006 Budget ● Outline: – CDF computing model.
Canadian Bioinformatics Workshops
Storage Area Networks The Basics.
INTRODUCTION TO NETWORKS AND COMMUNICATIONS
Proposal for the LHCb Italian Tier-2
OffLine Physics Computing
SLAC B-Factory BaBar Experiment WAN Requirements
Proposal for a DØ Remote Analysis Model (DØRAM)
Development of LHCb Computing Model F Harris
Presentation transcript:

Febryary 10, 1999Stefano Belforte - INFN Trieste1 CDF Run II Computing Workshop. A user’s perspective Stefano Belforte INFN - Trieste

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste2 “A user of what ?” (Avi Yagil) Perspective of a User(s) of data at a remote institution  Data Analysis in Italy for CDF Run II Why ? I have to make a plan for computing for the Italian CDF collaborators for Run II:  what hardware  where  when  how much money

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste3 Which Hardware for data Analysis in Italy ? CPU, Disks, Tapes (Robots?), Network (LAN, WAN) I am going to share my exercise now with you  hopefully I learn something from discussion  maybe I help you focus your questions Italy is many institutions, all sizes, one will be like yours what is really different ? The WAN maybe.  But in RunII many transoceanic institutions….

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste4 Bottleneck: I/O, very difficult to put data into the CPU solution: bring the CPU to the data, build powerful cluster Run I vs. Run II Beware these numbers. Very difficult to make good predictions. Hope that conclusions do not change if numbers are a bit wrong.

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste5 Hardware at home. Copy FNAL setup again ? FERMILAB 500 GBytes   20 TBytes  x 40 ! VMS cluster  high performance Fiber Channel based Storage Area Network Hand tapes + Silo  million $ robot PADOVA / PISA 30 GBytes  2 TByte ? VMS cluster  Just a Bunch of Unix/linux boxes hand tapes  hand tapes ? Anyhow, simple scaling doesn’t work. Data can not be parted among physicists. Also would like to do better then Run I, more analysis more easily

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste6 How much data must I handle at home? Liz’s Table, see : offline_minutes/buckley_vr_talk_jan_99.ps Page 5.

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste7 PAD vs. Ntuple Ntuple: 1000 variables/event = 4KByte/ev (PAD = 60 KB) High Pt:  O(1% total data) keep all events  PAD: 2TBytes, 20 tapes  analyze at home  Ntuple: 200 GBytes  keep on PC hard disk  but ! Need several versions of the Ntuple, reduce 1/4 at most Low Pt:  O(10% total data)  PAD: 20TB, 200 tapes  have to do something  Ntuple: 2TB  don’t fit on disk !  Reduce data sample ? 1/10th  Low-Pt = High-Pt anyhow is analysis dependent,,many people... many opinions still… how do I bring those data in Italy ? Few tens to few TB..

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste8 Network needs for Analysis (from a talk to INFN in Bologna January 19, 1999) 3 Scenarios (extremes and middle way):  copy all PADs in Italy, need almost no net to US  leave all PADs & “ntuple” in US, use Xterm from Italy  copy some PADs here, keep most Ntuples here (copy/create) Difficult to estimate requirements. Better the other way around. Given 4 Mbit/sec dedicated to CDF, what can we do ?  4Mb/sec = 200GByte/week = 2 tapes/week, can’t beat DHL !  1 tape a day = 100GByte/day = 10 Mbit/sec  PADs don’t travel on the net  4Mb/sec / 10 users = 1GByte/5hr/person for copying Ntuples  one analysis = 1/10th data  PAD=20TB, Ntuple=20GB ? refreshing Ntuple takes 4 days min ! More data, more users..  Converging argument: 10GB ntuple/physicist = minimum !  Can’t make Ntuple offsite and copy locally on the net

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste9 What goes out of FNAL ? PADs don’t travel on the net Ntuples don’t travel on the net what do I do ?

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste10 What is an Ntuple anyway? Do we really need to refresh ~200 GBytes of Ntuples “continuously” ? The Ntuple is what we use for interactive histogramming.  If it takes one hour to get the histogram, may very well submit a job and get the histograms back.  Data transfer is limited, it makes no difference where the job runs !  An Ntuple is a data-set you go through in a few minutes at most disk  CPU: 50 Mbytes/sec at most  3 GBytes/min at most. Ntuple will always fit in your desk(lap)top PC !  Notice: Run I equivalent (200MByte) required good chunk of 5” big SCSI disk !

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste11 Disk to CPU standard desktop PC vs. Powerserver

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste12 Getting Ntuple home Easier way: the WAN. T1 for 2 (500Kbit/sec/user)  0.5GByte/hour (now we deal with CDF notes, a few MBytes, here we go x1000 !)  6 hours to get my Ntuple, a day or two more likely… NO WAY !  internet future may be brighter, Ntuples may be bigger…  if possible: maybe slow, likely unsafe, but easy 3 alternative solutions:  Don’t do it! Run “PAW” at Fnal (just Xterm+telnet). Fast, easy and safe  500Kbit/s = 10 good Xterm sessions (or 5 perfect)  FedEx (1 lb, 5 days a week) easy and safe  Fnal -> US 1st(2nd) day 400(200) $/year  Fnal -> Europe 7k$/year  create Ntuple locally from FedEx’ed PADs safe but hard

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste13 Data Analysis in Pisa: The Final Answer (from a talk to INFN in Pisa May 12, 1998) We will have to try, can’t pick the right approach before collaboration has finalized data handling and distribution tools, and analysis topics have been pinpointed We will try everything, user pressure will drive Needs will be dominated by physics output (maybe we find SUSY in 3- lepton samples and everybody looks at this small data set…) We will exploit local computing as much as possible to reduce network load (likely bottleneck, as it always has been) Still will need to access FNAL PADs to produce data sets to copy to Pisa. If network is no good will use tapes (expensive though!). But we desperately need guaranteed bandwidth for interactive work If can not log in FNAL, no way to do most analysis here, only use “dead” data sets: no express data, no hot topics, just late sidelines… the good old way: take the plane and go to FNAL.

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste14 Dealing with PADs at home Easily 1~5 TB, what to do ?  All on disk ? 20 to 100 disks… and no VMS cluster...  Multi CPU Server+RAID ?  Small SUN ? Or Big PC ?  PC farm (our small Level 3?) ? LAN !  Tape stacker ? 1~2 TB only? A couple of drive ? So slow !  Taking shift at tape mounting ? 5 PCs, 10 drives, 50 tapes… but… will beat the robot ?  Power server at FCC ? Up to 500GB: all on disk. 2 ~ 3 PC’s working together.  LAN, LAN, LAN ! Morale: the less you need it, the better it is.

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste15 Where to put a Power Unix Server with 5 TB disk ? See e.g.: doc/hardware/hard_arch.ps a.k.a. cdfsga:/cdf/pub/cdf4707_r2dh_hard_arch.ps Figure 3

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste16 Tapes Anyhow will need tapes for more then import to disk  PADs, simulation, MonteCarlo, … Will need to run analysis jobs from tape, just like at FNAL But in Run II all tape data must be spooled to disk first spool space: 100 GBytes each (one full tape) ?  Not likely  Better spool 10 GBytes a time  Better be a free parameter

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste17 Summary need FedEx need to run at FNAL: low latency WAN need flexible software at home need good LAN at home need flexible expandable hardware at home

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste18 Conclusions recommendation to INFN recommendations (requests) to managers

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste19 To my colleagues buy “nothing” now buy little (few PC’s, few tapes, little disk) next year, add disks as needed (JIT !) get the best LAN and WAN you can try to do the most at FNAL, ship tapes every day if need be, put CPU and or disks in FCC if needed see how it goes, see where the market goes be prepared to handle few TB in 2001/2  get a computer room ready  don’t know which hardware will be best, but likely it will not fit on your desktops

CDF RunII Computing Workshop Fermilab Febryary 10, 1999 A user’s perspective Stefano Belforte - INFN Trieste20 To the Offline Managers tapes, disks & CPU for everybody (lot’s of) friendly, low latency batch i/f from home (www ?) fast, easy-to-use i/f from Robot to FedEx help for simple linux system at home:  suggested/supported hardware configuration  easy to use/install software, adaptable to limited hardware setup  one example on-site OFF the LAN