CDF Offline Operations Status: Rerun zee validation sample for 5.1.1. No discrepancies found as expected. Checked farm crashes. Reproduced 3 crashes: CdfTrack.cc (in prewrite). if (_siHits.size() > 0) { CdfTrackHits* storedSvxHits; storedSvxHits = new CdfTrackHits; for (SiHitIterator ihit = beginSIHits(); ihit != endSIHits(); ++ihit) { int packed = ((*ihit)->id() & 0x1FFFFFFF) | (((*ihit)->getAmbIndex() & 0x7 ) << 29); crash (ihit !=0x0) storedSvxHits->accumulate(packed); } Matt and Chris
Crashes Mark Fischler (needs help with debugging) The location in ELextendedID is basic_string& operator=(const basic_string& str); basic_string& operator=(const charT* s) {return assign( s,traits::length(s) );} crash basic_string& operator=(charT c) {return assign( size_type(1), c );} and in ErrorObj::clear() is mySerial = 0; myXid.clear(); crash myIdOverflow = ""; Mark Fischler (needs help with debugging) 0x8fa13bd in SiStripCorrectorManager::correctStripSet (this=0xcd5b338,stripSet=0xe392094) at /home/cdfsoft/dist/packages/SvxDaqObjects/V00-00-74/src/SiStripCorrectorManager.cc:62 Matt (fixed)
Valgrind Run valgrind over the other crashes: Other: (Matt & Jason) ==18449== Conditional jump or move depends on uninitialised value(s) ==18449== at 0x420A6879: __mktime_internal (in /lib/i686/libc-2.2.5.so) ==18449== by 0x420A6EBE: timelocal (in /lib/i686/libc-2.2.5.so) ==18449== by 0x9B0D0C1: DateUtil::time_from_string(char const *) (/home/cdfsoft/dist/packages/DBObjects/V00-00-72/src/TimeStamp.cc:264) ==18449== by 0x904C794: ChipStatus::__ct(std::basic_string<char,std::char_traits<char>,std::allocator<char>>, int) (/home/cdfsoft/dist/packages/TrackingObjects/V00-01-73/src/ChipStatus.cc:54) ==18449== by 0x8F94AE5: PedestalUpdator::changed(void) (/home/cdfsoft/dist/packages/SvxDaqObjects/V00-0074/src/PedestalUpdator.cc:226) Other: (Matt & Jason) ==18449== at 0x904EFBB: ChipStatus::putBit(char *, int, int) (/home/cdfsoft/dist/packages/TrackingObjects/V00-01-73/src/ChipStatus.cc:133) ==18449== by 0x904F372: ChipStatus::sortBitString(int, int, char *) (/home/cdfsoft/dist/packages/TrackingObjects/V00-01-73/src/ChipStatus.cc:252) ==18449== by 0x904EC15: ChipStatus::makeMap(int) (/home/cdfsoft/dist/packages/TrackingObjects/V00-01-73/src/ChipStatus.cc:212) ==18449== by 0x904C8CC: ChipStatus::__ct(std::basic_string<char,std::char_traits<char>,std::allocator<char>>, int ) (/home/cdfsoft/dist/packages/TrackingObjects/V00-01-73/src/ChipStatus.cc:67) ==18449== by 0x8F94AE5: PedestalUpdator::changed(void) (/home/cdfsoft/dist/packages/SvxDaqObjects/V00-00-74/src/PedestalUpdator.cc:226)
Valgrind Still there (1X) (Aseet) ==6977== Conditional jump or move depends on uninitialised value(s) ==6977== at 0x914484D: PadSqz::Huffman_T::operator<<( (PadSqz::BitStream_T &)) (/home/cdfsoft/dist/packages/PADSObjects/V00-00-23/src/Huffman.cc:368) ==6977== by 0x9145E4C: PadSqz::PadRawBank::Fluff( (int)) (/home/cdfsoft/dist/packages/PADSObjects/V00-00-23/src/PadRawBank.cc:173) ==6977== by 0x84CF42C: PadRawModule<PadSqz::COTQ>::event(EventRecord *) (/home/cdfsoft/dist/releases/5.1.1/include/PADSMods/PadRawModule.icc:57)
Nodes Check crash rate per node: Node 171 (Take out)
Memory usage
Memory usage per Run Large memory usage
Memory increase
Daily checking New cron job checks in log files for sever errors: Found yesterday: %ERLOG-s : *Fluffed bank(s) != original(s) PadRawBanks %ERLOG-s CalDataMaker: /home/cdfsoft/dist/packages/Calor/V00-01-52/src/CalDataMaker.cc : 754 unpack HATD bank : more than 8 hits in PHA GlobalLibraryLogger vxfit0() 28-Oct-2003 10:26:23 CST run = 163956 event = 262325 /home/cdfsoft/dist/packages/Calor/V00-01-52/src/CalDataMaker.cc: 745 unpack HATD bank : more than 8 hits in WHA GlobalLibraryLogger chi2wrtVertex() 28-Oct-2003 10:07:22 CST run = 163955 event =191711
fcdflnx3 Problems with disk space Take more scratch space Get a new disk