John Marshall, 1 John Marshall, University of Cambridge LCD WG6 Meeting, April
John Marshall, 2 Overview Reconstruction of events with overlaid background is a challenge for our reconstruction software. Even functions that are intrinsically efficient cannot help but be affected by the huge increases in combinations of tracks and calorimeter hits. CPU time clearly important, so efforts have been made to specifically address the problems of overlaid γγ hadrons background, with default NumberBackground=3.2 André has been focusing on improving the performance of processors in MarlinReco library, whilst I have examined PandoraPFANew. There are some difference between the CPU times we report, but we are now satisfied that these are due to machine specifications rather than build configurations, etc. André has been able to access Intel VTune Amplifier XE 2011 package to provide impressive amount of profiling information. This can report the actual CPU time required by each function and can even provide a line-by-line breakdown of CPU time. Start with a reminder of performance at the time of the previous meeting...
John Marshall, 3 Status at last meeting Processor Name Seconds per event (10 event sample), 01/04/2011* MarlinPandora FullLDCTracking LCIOOutputProcessor7.964 LEPTrackingProcessor7.039 SiliconTrackingCLIC3.579 TPCDigiProcessor2.107 KinkFinder0.325 V0Finder0.244 RecoMCTruthLinker0.197 ILDCaloDigi0.097 Total Division of total CPU time between Marlin processors for ten 91GeV Z->uds events with overlaid γγ hadrons background and NumberBackground=3.2 Without background, total CPU-time is just 0.565s per event. * MarlinReco revision 2151, PandoraPFANew revision 1100
John Marshall, 4 Costly functions FunctionCPU Time 01/04/2011 ClusterHelper::GetTrackClusterDistance42.839s IsolatedHitMergingAlgorithm::GetDistanceToHit32.070s ConeClusteringAlgorithm::GetGenericDistanceToHit29.802s ConeClusteringAlgorithm::GetDistanceToHitInSameLayer20.070s CartesianVector::GetZ16.702s ClusterHelper::GetDistanceToClosestHit15.620s ClusterContact::HitDistanceComparison11.077s Cluster::GetCentroid8.950s CaloHitHelper::GetDensityWeightContribution6.739s operator- (CartesianVector)6.562s ConeClusteringAlgorithm::FindHitsInSameLayer5.399s TrackClusterAssociationAlgorithm::Run4.279s CaloHitHelper::IsolationCountNearbyHits4.139s ConeClusteringAlgorithm::GetConeApproachDistanceToHit3.870s ClusterHelper::GetDistanceToClosestCentroid3.661s ConeClusteringAlgorithm::GetConeApproachDistanceToHit3.330s FragmentRemovalHelper::GetClusterContactDetails2.591s ClusterHelper::GetTrackClusterDistance2.461s CartesianVector::GetUnitVector2.362s TestPandora application used with input Pandora binary files to perform standalone Pandora reconstruction and concentrate purely on PandoraPFANew. MarlinPandora not considered.
John Marshall, 5 Reduce function calls Most costly function is GetTrackClusterDistance, used to help identify track-cluster associations. With background, this function is called for many track-cluster combinations. For each combination, examine hits in first n cluster layers to find closest perpendicular distance between a straight-line (defined by track state at calorimeter) and a hit in the cluster. After basic C++ optimization, difficult to further reduce CPU time without changing function behaviour. Instead, try to avoid comparison of tracks and clusters with very different “expected directions”. Similar cuts implemented in cone-based clustering algorithms, SoftClusterMerging, IsolatedHitMerging and FragmentRemoval algorithms. Potentially dangerous, but......cut values are configurable, default cut cos(angle)>0 should be safe. Validation crucial. Track direction Parallel distance region Find smallest perpendicular distance to hit within parallel distance region
John Marshall, 6 Change approach Another costly function is used by the IsolatedHitMerging algorithm, which matches isolated hits to nearby clusters, based on the distance to the nearest hit in the cluster. This algorithm is not unimportant, but still a rather small part of Pandora reconstruction. That it is one of the most time consuming processes justifies a change in approach. Isolated hits are now matched to clusters based upon distances to the nearest layer centroid position. This allows the nested loop over hits in each layer to be avoided. Small change to behaviour, but not obvious if any better/worse. Again, validation is crucial. Get distance to nearest layer centroid Get distance to nearest hit
John Marshall, 7 Change approach The CartesianVector class is crucial to Pandora reconstruction. Used extensively throughout all algorithms. Even small efficiency improvements to this class can help. Previously, this class offered a default constructor: inline CartesianVector::CartesianVector(float x, float y, float z) : m_x(x), m_y(y), m_z(z) { } No longer any need for the initialization flag, removing checks from many important functions: inline CartesianVector::CartesianVector() : m_x(0.f), m_y(0.f), m_z(0.f), m_isInitialized(false) { } This meant that each instance needed an initialization flag, set to true only when explicit component values were assigned. The flag needed to be checked in most member functions. Removal of the default constructor means that the fully qualified constructor must be used: GetDotProduct, GetCrossProduct, GetMagnitude, GetOpeningAngle, GetX,Y,Z,...
John Marshall, 8 General optimization Two of the functions badly affected by the increased calorimeter occupancies are those used to calculate the “density weight” and “surrounding energy” values for each hit. These quantities are intended for use with digital calorimeters and are not actually used in CLIC_CDR reconstruction. Can add entries to PandoraSettings to skip these calculations: Finally, attempted a general optimization of remaining costly functions. Try to avoid square roots, avoid trigonometric functions and simply avoid unnecessary instructions. However, not too much gained here; functions already designed for efficiency. There are still some potential further changes/savings, but now need more aggressive changes. Such changes likely to make code less readable/maintainable (e.g. repeated code to avoid function calls) and/or introduce changes to physics output (require step-by-step validation). … false …
John Marshall, 9 Impact of changes Function CPU Time 01/04/2011CPU Time 15/04/2011* ConeClusteringAlgorithm::GetGenericDistanceToHit29.802s28.010s IsolatedHitMergingAlgorithm::GetDistanceToHit32.070s15.310s ClusterHelper::GetTrackClusterDistance42.839s10.450s ClusterHelper::GetDistanceToClosestHit15.620s10.150s ConeClusteringAlgorithm::GetDistanceToHitInSameLayer20.070s9.742s ClusterContact::HitDistanceComparison11.077s9.089s Cluster::GetCentroid8.950s8.730s CaloHitHelper::GetDensityWeightContribution6.739s(6.230s) CartesianVector::GetCosOpeningAngle 0.190s 5.540s ConeClusteringAlgorithm::FindHitsInSameLayer5.399s5.158s CaloHitHelper::IsolationCountNearbyHits4.139s4.040s ClusterHelper::GetDistanceToClosestCentroid3.661s3.019s CartesianVector::GetUnitVector 0.070s 2.430s ConeClusteringAlgorithm::GetConeApproachDistanceToHit3.870s2.160s FragmentRemovalHelper::GetClusterContactDetails2.591s1.870s ConeClusteringAlgorithm::FindHitsInPreviousLayers 2.360s 1.691s CaloHitHelper::MipCountNearbyHits 1.480s 1.660s TrackClusterAssociationAlgorithm::Run4.279s1.600s CaloHitHelper::GetSurroundingEnergyContribution 1.781s (1.529s) Analysis of PandoraPFANew after efficiency improvements: it is interesting to see how the load has been redistributed. There is an large overall decrease in CPU time. * MarlinReco revision 2179, PandoraPFANew revision 1137
John Marshall, 10 A. Sailer MarlinReco Since the previous meeting, André’s examination of MarlinReco has focused on FullLDCTracking and, in particular, the assignment of tpc hits to tracks:
John Marshall, 11 MarlinReco A. Sailer MarlinReco revision 2161MarlinReco revision 2162
John Marshall, 12 Current status Processor Name Seconds per event, 01/04/2011 Seconds per event, 15/04/2011 MarlinPandora FullLDCTracking LCIOOutputProcessor LEPTrackingProcessor SiliconTrackingCLIC TPCDigiProcessor KinkFinder V0Finder RecoMCTruthLinker ILDCaloDigi Total Great success in improving efficiency of reconstruction software in presence of background. For Pandora, declared “first pass” of efficiency improvements complete. Still some gains to be made, but becoming difficult to make changes.
John Marshall, 13 Validation EjEj 45GeV100GeV250GeV500GeV Status at 01/04/ ± ± ± ± 0.06 Status at 15/04/ ± ± ± ± 0.06 Efficiency changes carefully implemented to avoid affecting physics output. Have confirmed that Pandora jet energy reconstruction performance is unaffected. Have also examined a number of low- and high-energy single particle files to help confirm that particle id performance is unaffected. Jacopo has performed a full validation of particle id and reported good results. All efficiency improvements now in Ilcsoft v01-11 pre-release 04. Remember to use up-to-date steering files. In particular, there are changes to PandoraSettings.xml file. No other changes for CLIC_ILD or CLIC_SiD. Have hopefully saved many CPU cycles!