Download presentation
Presentation is loading. Please wait.
Published byDelilah Patterson Modified over 9 years ago
1
18-sep-021 Computing for CDF Status and Requests for 2003 Stefano Belforte INFN – Trieste
2
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing2 The CDF-Italy Computing Plan Presented on June 24, 2002 Referees (and CSN1) postponed discussion/approval until November 2002: decide based on experience Collecting experience now No reason to modify plan so far Today: Status report on analysis farm at FNAL Update on work toward de-centralization GRID - CNAF Progress toward MOU/MOF Rational for 2003 requests
3
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing3 Status of CAF FNAL Central Analysis Farm (CAF): a big success so far Easy to use Effective Convenient Measure of success 100% used now Upgrade in progress Many institutions spending their $$$ there Cloning started (Korea)
4
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing4 CDF Central Analysis Farm Compile/link/debug everywhere Submit from everywhere Execute @ FNAL Submission of N parallel jobs with single command Access data from CAF disks now Access tape data via transparent cache soon now Get job output everywhere Store small output on local scratch area for later analysis Access to scratch area from everywhere IT WORKS NOW FNAL Local Data serversA pile of PC’s My Desktop My favorite Computer gateway ftp switch job Log out NFS rootd N jobs rootd scratch server
5
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing5 Tape to Disk to CPU 2TB/day From disk From tape Days in September 2002 “Spec. from 2000 review”: Disk cache should satisfy 80% of all data requests
6
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing6 CAF promise fulfilled Giorgio Chiarelli runs 100 section jobs and integrates 120x7x24x3% = 600 CPU hours in a few days using up to more then half the full CAF at the same time Go through 1TB of data in a few hours All of this with one single few lines script that automatically divides the input among the various job sections Made in Italy
7
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing7 Monitoring jobs and sections on the Web Made in Italy
8
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing8 Managing user’s area on CAF O(100GB) Made in Italy
9
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing9 CAF this summer CAF stage 1 saved the day for summer conferences 61 duals (10 INFN 16Pitt/CMU) 15 fileservers (4 INFN 1 MIT) CPU usage ~90% since June Users happy Made in Italy
10
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing10 CAF today Wait times get longer Users want more Ready for Stage 2 New hardware ready this fall for ski conferences Made in Italy
11
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing11 CAF Stage 2 (Stage1 x4) FNAL/CD centralized bid ~ two times/year CDF procurement for Stage 2 this summer JustInTime to catch INFN funds released in June (x3) Bids are in Hope for HW up and running in November CSN1 users = 6 months Many others will join CAF in Stage2 KEK-Japan: 2 fileservers 38 duals Korea : 0.5 fileserver (+ 2 later) Spain : 1 fileserver Canada : 1 fileserver US (8 universities) : 10 fileservers 4 duals More to come
12
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing12 Why is CAF a success CAF is more than a pile of PC’s Integrated hw/sw design for farm and access tools Designed for optimized access to data Lots of disk resident data Large transparent disk cache in front of tape robot Tuning of disk access (data striping, minimal NFS,…) Designed for users convenience Simple GUI’s, Kerberos based authentication, large local user areas Professional system management and close loop with vendors Several hw/firmware/sw problems solved so far RAID controller, defective RAM, file system or kernel bugs … Plus the normal failure rate of disks, power supplies etc. 2 FTE on CAF infrastructure
13
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing13 Will CAF success last ? User community: ramping up in these days: 20 200 From the pioneers to the masses Exposure to all kinds of access patterns Hardware expansion: up to a factor 10 over the next 2 years Only experience will tell CAF is build with the cheapest hardware Will have to learn to live with 10~20% of hardware broken at any given time
14
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing14 Beyond CAF FERMILAB wants to join the GRID FNAL will be Tier1 for CMS-US Foreign CDF institutions want to integrate their local farms Spain, Korea, UK, Germany, Canada, Italy In many case to exploit LHC/GRID hardware So far no big offer of help for common work, unlikely D0 Exception: Canada: 224 nodes “now” for CDF MC No software tool to do this integration “transparently” yet Not clear how much this will help CDF analysis
15
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing15 Decentralizing analysis computing FNAL-CD working hard to promote SAM for remote work SAM: Metadata catalog + distributed disk caches Run analysis locally Copy data as needed (only 1 st time) Works in Trieste (as other places) SAM to become “the” CDF data access tool SAM integration with (EuroData)GRID being tried CDF working on “packaging CAF for export” Decentralized CAFs Each handling data independently Cloning FNAL CAF is the easiest way (Korea choice) Remote farms = extra costs for FNAL
16
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing16 CDF computing outside US (approx) 20022003 Notes TBdualsTBduals Spain--1050 Shared with CMS, plan for EDG tools No plan for shared access Germany 3 + 1 20 + 10 20 + 20 50 + 40 Tier1 (shared with LHC) + Tier3 (CDF) No plan for shared access Testing SAM on Tier3 UK (4 sites) 24168064 Maybe 5x the CPU if 8-way duals No EDG, Kerberos for user access, SAM for data. maybe open Korea120740 Want to clone CAF by end of 2002 Kerberos for user access, open to all Start w/o SAM Canada1828224 No GRID tools Run official CDF MC and copy to FNAL Italy15729 No plan for shared access Exploring SAM on single node
17
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing17 MOU/MOF Moving to a way to recognize foreign contribution IFC and Scrutiny Group to work on this INFN present in both Issues being talked about: Computing will have to enter MOF somehow Allow and encourage contribution Take into account history and present situation No indication of a “crisis” that has to be dumped on the collaborators for help
18
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing18 2003 requests detailed: 5 items Stick to June plan : 1)Invest majority of resources on FNAL CAF 2)Modest growth in Italy for interactive work Summer experience: needs do not scale down with luminosity No reason to expect large variation from June numbers Requested resources well within June forecast Nevertheless, prudent, incremental approach ( referees) New in 2003 3)Start MC 4)Interactive work at FNAL 5)Start transition to CNAF
19
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing19 Tevatron keeps us busy By next summer tune analysis to same level as Run1: Alignments, precision tracking, secondary vertexes, B-tag Jet energy corrections, underlying event Do interesting physics in the meanwhile Example: All italian D hh By end of year (100pb -1 ) 10^6 events in the mass peak, 10^7 in the histogram 4TB of data by spring, 16TB by end 2003 This channel alone saturates disk financed so far (15TB) Learning field for B hh
20
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing20 Monte Carlo CDF has talked about central production But no overall estimate of needs yet Next year safe bet: everybody on his/her own Just the same as Run 1 Italian groups starting on this now Plan for capacity of 10^7 events/months Modest hw need: 10 dual cpu nodes Adequate for most analysis (10x a given dataset) Future growth should be small Further requests only on basis of clear “cases”
21
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing21 Interactive work at FNAL When at FNAL can not run root on Italy’s machines Need “some” “better then desktop” PC (Cfr. June’s talk) Referees asked for central management: Defined total cap at 10 “power PCs” Asked for 5 in 2003 4 full time physicists doing analysis at FNAL P.Azzi, R.Carosi, S.Giagu, M.Rescigno Explore central alternative in 2003 Interactive login pool in CAF Some ideas so far, will try and see
22
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing22 Moving CAF to CNAF Spend money in Italy Join INFN effort in building world class computing center Easier access to 3 rd data and/or interactive resources GARR vs WAN Tap on GRID/LHC hardware pool for peak needs Import here tools and experience learnt on CAF Not an “experiment need” FNAL CAF may be enough Costs more Poor access to main data repository (FNAL tapes) Need to replicate easy of use and operation of FNAL CAF Different hardware = different problems Have to divert time and effort from data analysis PRO’sCON’s
23
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing23 Moving CAF to CNAF: the proposal Start with limited, but significant hardware 2003 at CNAF ½ of private share of CAF in 2002 7TB of disk and 29 dual processor estimated on the basis of expected data needs for top 6j and Z bbar Explore effectiveness of work environment Don’t give up on CAF features Look for added value Will need help (manpower) Will try and see, decision to leave FNAL will have to be based on proof of existence of valid alternative here
24
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing24 Summary of requests June 24 “plan” After CSN1’s June decision Analysis at FNAL FNAL CAF: 22TB disk + 63 dual nodes = 132+173=306KEu Monte Carlo: 10 dual nodes = 28KEu (FNAL price) CNAF: 7TB disk + 29 dual nodes = 70+96=166KEu Interactive FNAL: 5 “power PC” = 22.5KEu Interactive Italy: disk and cpu Pd/Pi/Rm/Ts/… = 50KEu total
25
18-sep-0225 SPARE Spare slides from here on
26
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing26 Working on CDF CAF is easy 1.Pick a dataset by name 2.Decide how many parallel execution threads (sections) 3.Prepare 1 executable, 1 tcl and 1 script file Submit from anywhere via simple GUI Query CAF status at any time via web monitor Retrieve log/data anywhere via simple GUI 2 step submission of 100 sections 1) In the script: setenv TOT_SECT 100 @ section = $1 - 1 setenv CAF_SECTION $section 2) In the tcl file (only one tcl file) module talk DHInput include dataset bhmu03 setInput cache=DCACHE splitInput slots=$env(TOT_SECT) this=$env(CAF_SECTION)
27
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing27 Working on CAF is effective Quickly go through any CDF dataset (disk or tape) Create personalized output and store it locally Run on that output (data file or root ntuple) Locally on CAF nodes Remotely via rootd (e.g. Root from desktop)
28
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing28 CAF is convenient: can work from anywhere All needed code and tools for CDF offline via anonymous ftp or simply from /afs/infn.it Everything runs on plain RedHat 6.x, 7.x even on GRID testbed no need for customized system install Need Kerberos ticket to talk to FNAL, but.. One click install of kerberos client from the web No need for system manager Just type “kinit” and your Fermilab password Many people work from their laptop !
29
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing29 CAF future
30
CSN1 18-sep-02 Stefano Belforte – INFN Trieste CDF computing30 Little data ? No way ! DAQ runs at full speed Typical Luminosity better then Run1 2 track trigger from SVT is full of charm We are refocusing attention on samples that in the default scenario would have been limited in statistics Low Pt jets (20GeV) and leptons (8GeV) Charm Interesting for physics improve on PDG in charm sector Fundamental control samples Particle ID on D hh as learning field for B hh Heavy flavor content in jets B-jet tagging Jets resolution …
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.