ESR 2 / ER 2 Testing Campaign Review A. CrivellaroY. Verdie
Pose Parametrization Pose: 6 dofs of the rigid transform between a fixed world reference system and the camera ref. system: 3x3 rotation matrix (3 dofs) translation (3 dofs) Accuracy We considered the following error measures: L2 norm of rotation array [ ] L2 norm of translation array [m] 3D distance between the the predicted position of the box and its real position [m] 2D reprojection error [pixels]
Evaluation Metric: AUC Error Threshold Proportion of frames Area Under Curve score is normalized to lie in [0,1]
Testing pipeline 1. Take a video 2. Extract GT – Marker based tracking – Manual labeling of frames 3.Run tracking for all frames and evaluate quantitative metrics Pose GT has to be known Run accuracy tests offline:
Our Algorithm IDEA: Instead of using all image information, ambiguous and misleading, we directly focus on how to exploit a minimal amount of reliable information object parts. Detect Object Parts Compute pose of each part Compute Object pose
Our Algorithm Detect Object Parts : 2D real time, robust detector (TILDE, CNN) Compute “pose” for each part : Learn a regressor to predict the projection of a stencil of 3D control points on the detected patch Compute object pose: solve an iterative PnP (Gauss Newton) minimizing reprojection error and distance from prior 3D -2D corresp 3D -2D corresp
Our Algorithm Detect object parts Compute pose for each part Compute object pose 10 Pose Priors WRM segments Final pose = best hypothesis 10 Pose Hypothesis Score each hypothesis
Quantitative Results 1.Quantitative results on offline videos were much better than online results -> lack of generalization of the learning. 2. We empirically validated some optimizations choices (RANSAC detection selection, pose scoring, etc.) 3. WRM showed comparable performances to much slower segment detector (LSD)
Since then … We introduced: - better detections CNN detector - better virtual points from 3 to7 points and non-linear regressor - better hypothesis scoring Much better accuracy, at an increased computational cost (by now)
New Qualitative Results
Computational Time [ms] DetectionsVPPose estimation Segment Detection Total OLD NEW
Learning from Experience Our current code has to support 144 configurations: -Server – PTU -WRM – LSD – Canny -Stand alone – TCP/IP - Ubitrack interface -Marker - markerless -2 logging protocols (Ubitrack – JSON) -Closed – Open Box We need to reduce the number of configurations: - Use WRM -Only one bewteen ‘server’ and ‘PTU’ architecture -Only one interface supported (Ubitrack? Shared lib? TCP/IP ?) -Only one logging protocol -Both marker and markerless will be supported CERN server ?
Learning from Experience Divide et impera: It was very useful to have pieces of code simulating all SW interacting with us. Don’t trust Machine Learning: It is crucial to get to test in ‘real life’ situation ASAP. HW can hide surprises: image artifacts and instabilities in testing HW, not completely under control (afawk). “Experience is simply the name we give our mistakes.” O. Wilde
Learning from Experience: BEFORE the final Testing Campaign: Define testing HW and testing scenario ASAP (now?), for collecting learning data and test robustness in ‘real life’ situation Run quantitative tests on the chosen platform Reduce computational time ( <= 200ms) Get rid of HW constraints (GPGPU) Better exploit (more sophisticated) WRM information
Final Testing Campaign Plan - Extensive (offline) quantitative tests on improved visual pose estimation -Online tests including sensor fusion and rendering; visual quality evaluation -Integration and support of additional features (e.g. logging) -Integration and profiling with final version of WRM
Thank you