Offline/MC status report M. Moulson 6th KLOE Physics Workshop Sabaudia, 10-13 May 2006.

Offline/MC status report M. Moulson 6th KLOE Physics Workshop Sabaudia, 10-13 May 2006

2 Offline/MC tasks for 2006 Define definitive KLOE data set: Close holes in data reconstruction and DST coverage DST file size problem Make sure all reconstructed runs are complete Reprocessing of various data sets Most critical is 2004 data with bad wire maps Data quality Completeness of HepDB entries for MC production Fine survey for analysis purposes MC production for 2004-2005 Re-reconstruction of 2001/2002 MC sample? Requested by K  group - problem of bad wire maps

3  L dt (pb –1 ) tag 100 2001169 2002292 2004691 20051242 total2394 Reconstruction of 2004-2005 data 1930 1740 1970 990 1480 1520 1480 pb –1 Runs 28700 (9 May 04) to 41902 (5 Dec 05) Updated to reflect DB work 21 Mar 06

4 DST size problem StreamDST MCDS T dkc/mkc210200 dk0/mk0290 d3p/m3p2700780 drn/mrn680014600 drc/mrc7302800 32-bit I/O pointers in Fortran: Maximum file size = 2 GB There may be a workaround (esp. for reading w/ KID) but not easy! What does 2 GB limit mean? Big headache! Max sizes difficult to calculate: Fluctuating background components 10% for most streams 33% for dkc/mkc MC numbers assume: Full cross section for stream Size refers to scaled luminosity Need to split DSTs into pieces 100 nb -1 chunks Must modify scripts Working on DST standalone script to plug holes Max size (nb -1 )

5 Reconstructed KLOE data DBV2001200220042005ScanComment 12-15 16334 Bad cut in FILFO No bias sample for FILFO No bias sample for rad 19-21  645 bad wires ! 217 Various significant ECL mods Minor changes to recon No bias sample for rad 22  222395 Stable ECL Old bias sample for rad 23-24 1250261 Stable ECL for  s = m f ufo as bias sample 25 16205 Stable ECL for off-peak ufo as bias sample analyzed 1632696451114266 total 1692926911242284 trgmon luminosity in pb -1 by year and DBV 645 pb -1 to be reprocessed for sure, 278 pb -1 (617 pb -1 ) as time permits

6 Notes on reprocessing Reprocessing most critical for 2004 data with bad wire maps Reprocess 2004 data first Other reprocessing priorities can be decided upon afterwards When data are reprocessed, old reconstructed files will be deleted: If data have already been used for analysis (e.g. 2001-2002 data): –files deleted, database record and old DSTs will be kept. If data have not been used for analysis (e.g. 2004-2005 data): –database record and DSTs deleted, as if run had never been reconstructed in the first place. Basically same treatment planned for incomplete runs in data set Off-peak data reconstructed w/ DBV-24 need reprocessing for ppg use 61 pb -1 : all scan data, small amount of  s = 1000 MeV data Keep current files until ready to start reprocessing Plan: Start 2004 reprocessing and 2005 MC production in parallel If there are problems with DH-induced latency, hold reprocessing until additional disk space for the I/O cache arrives (it is expected in June).

7 MC development items New IR geometryCB Revision of constants in GEANFICB Generators: h  pp ee, ee  wp 0, K S  pp ee CG RV ADS Nuclear interactions/regenerationCB Adjustments to EmC energy responsePG Adjustments to EmC time resolutionCG Simulation of EmC cluster efficiencyMP TS SQZ (compression) fixMM CB Fix CELE/CSPS/cluster banks/structuresMM MP Data-quality parameters (√ s, etc)GV Trigger params.: quality, DC thresh, DISH mapsMP BS lsb background insertionMM MP Correlated noise for charged kaonsEDL PDS dE/dx simulationVP Check ECL code: ee  wp 0 SG Split large (> 2GB) DSTs (MCDSTs)MM

8 MC sign-off: Data quality & DB issues Data-quality parameters & DB loading Not to be confused with fine survey for analysis (S. Fiore) 2004 run parameters (  s, p f, etc.) already loaded into HepDB 2005 run parameters obtained, need to be loaded Trigger parameters in DB2 for 2004 & 2005 Cluster efficiency, time resolution parameters updated Dead/hot wire maps, DC efficiencies: need to unify tables HepDB-to-DB2 migration? F. Sborzacchi has developed: DB2 tables to contain data currently in HepDB: All detector calibration data Much run-condition information (  s, p f, etc.) Code to fill and maintain new tables Interface code (drop-in replacements for HepDB calls) Ready to go (but wait until MC started?)

9 MC sign-off: LSB background Work on LSB background to deal with: Inconsistent cluster/KINE matching Timescale alignment for MC and data Treatment of “noise” hits (missing t A or t B ) Diagnose performance: t 0 rec - t 0 MC distribution K S  p + p -, K L crash events K S  p + p -, K L crash t 0 true from p + p - t 0 rec - t 0 true (ns) linear scale K S  p 0 p 0 from MC, t 0 rec from T0_FIND t 0 from MC track t 0 from LSB cluster mixed t 0 t 0 rec - t 0 MC (ns) –data MC

10 MC sign-off: LSB background Presenter Accidental rate Reconstr. LSB files 7 MeV in window E 3.6%3.1% B 1.8%1.7% W 2.8%2.7% LSB files look OK! Overall prob. for LSB cluster to set t 0 = 0.82(4)% Roughly half will steal t 0 in f event Frac. events w/ stolen t 0 in:DataMC ORMC AND From tail in K S  p + p -, K L crash events 0.5%0.7%1.1% From MC truth ( t 0 cluster acc. or mixed) 0.9%1.4% Harder filtering of noise hits increases stolen t 0 prob Are we dropping LSB events w/ no clusters? Before starting production, must check DC timescale alignment! Compare to data and reconstructed MC (all_phys test runs):

11 New A/C module (SIMKBCK) adds background hits correlated to K  tracks Sample distribution of K -correlated background hits in: –layer, distance (in wires) from reconstructed track, time Private reconstruction of ~20 pb -1 of 2002 MC events to study how new background parameterization affects MC tracking efficiency + e(MC) + e(data) e trk+vtx as a function of t*(K ± ) Note: Large correction (5  ) for absolute probability of background hits: Background measured using reconstructed tracks Don’t account for tracks not reconstructed because of background MC sign-off: K ± background t*(K ± ) (ns) beforeafter

12 MC sign-off: dE/dx simulation Calibration of A/C module (DIGIDCADC) to simulate dE/dx response of DC D E distribution rescaled and smeared to match data Different s-t relations for data/MC  effective integration gates different General note: Difficulties in calibrating for space-charge effects undermine resolution dE/dx resolution adequate for K  ID, but not e.g. K e2 / K m 2 separation dE/dx (count/cm) K p2 sample K m2 sample dE/dx (count/cm) dE/dx distributionTruncated mean distribution K p2 sample K m2 sample +data –MC

13 Monte Carlo tests in 2006 DatesL (pb -1 )RunsComments 20/2-13/3105 39000- 39600 Test new cluster efficiency parameters Test new EmC time resolution parameters Additional EmC studies on energy scale calibration Test lsb cluster background Complications from bugs in MC truth variables for EmC Fortran structures DH problems complicate CPU/runtime analysis 3/5-4/519 39601- 39750 Reconstructed with DBV-26 Correct EmC structures Test SIMKBCK module 4/5-5/529 39751- 40000 Fix EmC timescale for noise hits 7/5-8/542 40001- 40250 Attempt to optimize EmC background All tests based on all_phys card, LSF = 0.2

14 Monte Carlo production plans 2001-2002 450 pb -1 2004-2005 2000 pb -1 1.85G evts 8800 B80 days 8.25G evts 39000 B80 days Averaged over entire MC sample: 0.21M evts/B80 day = 2.4 Hz 0.41 s/evt (simulation + reconstruction + DST) f  all, scale = 0.2 K S K L, scale = 1 K + K -, scale = 1 f radiative, scale = 5 Other (1M evts/pb -1 ) Total: 3.1M evts/pb -1 (about same as number of f decays in data) 2001-2002 MC production Estimated time for 2004-2005 MC f  all, K S K L, K + K - campaigns to be combined: all_phys, LSF = 1.2 Start with 2005 data: Reprocessing not necessary for comparison

15 Offline resources: CPU NodesTypeCPUs ms/ev MC ms/ev Rec B80 “MC” B80 “CW” fibm14-15,17-34 P3 375 MHz 8021119280 fibm35-44 P4+ 1.45 GHz 4098858896 fibm45-47 P5 1.5 GHz SMT “96”105202126184 All294360 1 “B80 CPU” = 1 P3 CPU installed in B80 server - KLOE standard unit Accuracy of estimates depends on B80/P4+ and B80/P5 ratios “MC” results based on MC tests (all_phys, ‘05 data) 3-8 May 06: 87 CPU (132 B80 “MC”) configuration - no competing offline work No serious DH problems observed Conventional wisdom (“CW”) based on past experience: 1 P4+ = 2.4 B80 (compare 2.2 above) 1 P5 = 0.8 P4+ = 1.9 B80 (compare 1.3 above)

16 Offline CPU needs 2006 offline projectsB80 daysDays/214 B80 Reprocessing of 2004 data1600075 MC production for 2004-2005 data39000180 All reprocessing for ’02-’04-’05 data (900 pb -1 ) 22000100 Minimum total55000260 Maximum total77000360 9-12 months if there are no serious DH problems Assuming: 294 B80 total (MC test, not CW) 80 B80 left to users for analysis 214 B80 for offline

17 Offline resources: Disk space DSTs cached on nfs-mounted disks for fast analysis access CurrentFinal 35 TB data 10 TB MC 45 TB total 42 TB data 44 TB MC 86 TB total DST volume* Current DST cache capacity: 12 TB New purchases: 21 TB FC disk + new controller Gara closed yesterday Delivery by end of June? ~30 TB available for disk cache *Updated to account for scan data

18 Offline resources: Tape space TypeCurrent (TB) Temporary allocation (TB) Final allocation (TB) raw248.7250 rec181.2250 DST34.745 MC61.2150350 MC DST10.02560 Total535.8720955 1.Allocations include currently occupied space 2.MC DSTs probably appear as datarec files to archiver 3.Current library system capacity ~720 GB New cassettes will have to be ordered in future 4.Temporary allocation based on 720 GB library Assumes MC production slow 5.Final allocation assumes completion of KLOE offline program

19 Outlook and summary Starting MC production a very high priority, but we need to make a few last checks We are planning for a near simultaneous start of –MC production, all_phys, LSF = 1.2, 2005 data –Reprocessing 2004 data with bad wire maps We have the CPU power needed to generate a definitive MC sample and reprocess as necessary on a time scale compatible with beginning of 2007 We will probably want to expand DST disk cache We will need to order new cassettes towards the end of the year

