Shape Recovery Using Robust Light Striping

Shape Recovery Using Robust Light Striping
Visual Perception and Robotic Manipulation Springer Tracts in Advanced Robotics Chapter 3 Shape Recovery Using Robust Light Striping Geoffrey Taylor Lindsay Kleeman

Contents Motivation for stereoscopic stripe ranging.
Benefits of our scanner. Validation/reconstruction framework. Image-based calibration technique. Experimental results. Conclusions and future work.

Motivation Allow robot to model and locate objects in the environment as first step in manipulation. Capture registered colour and depth data to aid intelligent decision making.

Conventional Scanner Stripe generator Camera Scanned object B D sweep stripe Camera Image Triangulate from two independent measurements: image plane data and laser stripe position. Depth image constructed by sweeping stripe.

Difficulties Light stripe assumed to be the brightest feature:
Objects specially prepared (matte white paint) Scans performed in low ambient light Use high contrast camera (can’t capture colour) Noise sources invalidate brightness assumption: Specular reflections Cross talk between robots Stripe-like textures in the environment For service robots, we need a robust scanner that does not rely on brightness assumption!

Related Work Robust scanners must validate stripe measurements
Robust single camera scanners: Validation from motion: Nygårds et al, 1994 Validation from modulation: Haverinen et al,1998 Two intersecting stripes: Nakano et al, 1988 Robust stereo camera scanners: Independent sensors: Trucco et al, 1994 Known scene structure: Magee et al, 1994 Existing methods suffer from assumed scene structure, acquisition delay, lack of error recovery

Stereo Scanner Stereoscopic light striping approach:
Validation through three redundant measurements: Measured stripe location on stereo image planes Known angle of light plane Validation/reconstruction constraint: There must be some point on the known light plane that projects to the stereo measurements (within a threshold error) for these measurements to be valid. Reconstructed point is optimal with respect to measurement noise (uniform image plane error) System parameters can be calibrated from scan of an arbitrary non-planar target

Validation/Reconstruction
X Scanned surface Unknown 3D reconstruction (constrained to light plane) Left measurement Right measurement  Light plane params Lx Rx Left Image Plane Right Image Plane Left Camera LP Right Camera RP Laser plane at known position and angle

Validation/Reconstruction
Unknown reconstruction projects to measurements: Lx = LPX, Rx = RPX Find reconstruction X that minimizes image error: E = d2(Lx, LPX) + d2(Rx, RPX) Subject to constraint TX = 0 ie. X is on laser plane d2(a, b) is Euclidean distance between points a and b If E < Eth then (Lx, Rx) are valid measurements and X is the optimal reconstruction The above constrained optimization has analytical solution in the case of rectilinear stereo

Etot = j=framesi=scanlinesE(Lxij, Rxij, ej, p)
Calibration System params: p = (k1, k2, k3, x, z, B0, m, c) k1, k2, k3 relate to laser position and camera baseline x, z, B0 relate to plane orientation m, c relate laser encoder counts e to angle, x = me + c Take scan of arbitrary non-planar scene Initially assume laser is given by brightest feature Form total reconstruction error Etot over all points: Etot = j=framesi=scanlinesE(Lxij, Rxij, ej, p)

Calibration Find p that minimizes total error (assuming Lxij, Rxij, ej are fixed): p* = minp[Etot(p)] Use Levenberg-Marquardt numerical minimization Refinement steps: Above solution will be inaccurate due to incorrect correspondences caused by brightness assumption Use initial p* to validate (Lxij, Rxij) and reject invalid measurements, then recalculate p* Above solution also assumes no error in encoder counts ej. Error removed by iterative refinement of ej and p*.

Implementation Laser Stripe Optical Encoder Right Camera Left Camera

Image processing Raw stereo image Left field Right field Image difference Extract maxima Left candidates Right candidates When multiple candidates appear on each scan line, calculate reconstruction error E for every candidate pair and choose pair with smallest E < Eth

Scanning Process

Results Refection/cross-talk mirror experiment: mirror generates bright false stripe measurements

Laser extracted as brightest feature per line
Results Laser extracted as brightest feature per line All bright candidates extracted and matched using validation condition.

Office phone mirror scan results:
More Results Office phone mirror scan results: Brightest feature without Validation With Validation

Tin can scan with specular reflections
Specular Objects Tin can scan with specular reflections

Specular Objects Tin can scan results: Brightest feature
without Validation With Validation

Depth Discontinuities
Depth discontinuities cause association ambiguity

Depth Discontinuities
Depth discontinuity scan results: Without Validation With Validation

Conclusions We have developed a mechanism for eliminating:
sensor noise cross talk ‘ghost’ stripes (reflections, striped textures, etc) Developed an image based calibration technique requiring an arbitrary non-planar target. Operation in ambient light allows registered range and colour to be captured in a single sensor. Experimental results validate the above techniques.

Future Directions Use multiple simultaneous stripes to increase acquisition rate Multiple stripes can be disambiguated using the same framework that provides validation Perform object segmentation, modeling, and recognition on scanned data to support grasp planning and manipulation on a service robot.

3D Object Modelling and Classification
Visual Perception and Robotic Manipulation Springer Tracts in Advanced Robotics Chapter 4 3D Object Modelling and Classification Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical and Computer Systems Engineering Monash University, Australia

Contents Introduction and motivation.
Split-and-merge segmentation algorithm New method for surface type classification based on Gaussian image and convexity analysis Fitting geometric primitives Experimental results Conclusions

Metalman: an upper-torso
Introduction Motivation: enable a humanoid robot to perform ad hoc tasks in a domestic or office environment. Flexibility in an unknown environment requires data driven segmentation to support object classification. Metalman: an upper-torso humanoid robot

Introduction Object modelling in robotic applications:
CAD models (Kragić, 2001) Generalized cylinders (Rao et al, 1989) Non-parametric (Müller & Wörn, 2000) Geometric primitives (Yang & Kak, 1986) Many domestic objects can be adequately modelled with geometric primitives. Colour/range data provided by robust stereoscopic light stripe scanner (Taylor et al, 2002).

Segmentation Basic techniques:
Region Growing: iteratively grow seed segments. Split-and-Merge: find region boundaries. Clustering: transform and group points. Region growing requires accurate range data for fitting primitives to small seed regions. Split-and-Merge maintains large regions that can be robustly fitted to primitives.

Segmentation Raw range/colour data from stereoscopic light stripe camera. Calculate normal vector and surface type for each range element.

Segmentation Remove range discontinuities and creases. Fit primitives.
Compare best model to dominant surface type. Split poorly modelled regions by surface type and fit primitives again.

Segmentation Iteratively grow regions by adding unlabelled pixels that satisfy model. Merge regions using iterative boundary cost minimization to compensate for over-segmentation.

Segmentation Extract primitives and add texture using projected colour data. Use models for object classification, tracking and task planning

Surface Type Determine local shape of NxN element patch:

Classification methods
Conventional method: Fit surface, calculate mean and Gaussian curvature Classify based on curvature sign ( > 0, < 0, = 0) Sensitive to noise (second-order derivatives required) Arbitrary approximating function introduces bias. Our novel method: Based on convexity and principal curvatures. Non-parametric (no approximating surface) Robust to noise

Classification Number of non-zero principal curvatures Zero One Two
Convexity convex concave neither

Principal Curvatures Determine number of principal curvatures from Gaussian image of surface patch. Surface representation Gaussian image

Principal Curvatures Spread of normal vectors in Gaussian image of patch indicates non-zero principal curvature. plane ridge/valley pit/peak/saddle

Principal Curvatures Align central normal to z-axis.
Measure spread in direction  using MMSE: Optimize with respect to  Two solutions: (, e)max and (, e)min Non-zero curvature when emax > eth or emin > eth min  max y x amin

Convexity n0 (n1 x n0) x d n1 n1 n0 n1 x n0 d d n1 x n0 (n1 x n0) x d
Concave

Convexity For each element in patch, calculate:
Let S=Ncv/Ncc, ratio of convex to concave elements. Global convexity given by dominant local property: convex (peak, ridge): S > Sth concave (pit, valley): S < 1/Sth neither (plane, saddle): 1/Sth < S < Sth

Surface Type Summary principal curvatures raw 3D scan surface type
convexity

Fitting Primitives Planes: Spheres, cylinders, cones:
Principal component analysis Spheres, cylinders, cones: Minimize distance to fitted surface: Levenberg-Marquardt numerical optimization. Initial estimate of parameters required. Choose model with minimum error, e < eth.

Cylinder Estimation Estimate cylinder axis from Gaussian image:  y x
min  y x a a Cylindrical region and axis Gaussian image and direction of minimum spread

Results Box, ball and cup: Raw colour/range scan
Discontinuities, surface type

Results Box, ball and cup: Region growing, merging
Extracted object models

Results Bowl, funnel and goblet: Raw colour/range scan
Discontinuities, surface type

Results Bowl, funnel and goblet: Region growing, merging
Extracted object models

Results Comparison with curvature-based method: Non-parametric result
Besl and Jain, 1988

Conclusions Split-and-merge segmentation using surface type and geometric primitives is capable of modelling a variety of domestic objects using planes, spheres, cylinders and cones. New surface type classifier based on principal curvatures and convexity provides greater robustness than curvature-based methods without additional computational cost.

Chapter 5 Multi-Cue 3D Model-Based Object Tracking
Visual Perception and Robotic Manipulation Springer Tracts in Advanced Robotics Chapter 5 Multi-Cue 3D Model-Based Object Tracking Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical and Computer Systems Engineering Monash University, Australia

Contents Motivation and Background Overview of proposed framework
Kalman filter Colour tracking Edge tracking Texture tracking Experimental results

Introduction Metalman Research aim: Previous results:
Enable a humanoid robot to manipulate a priori unknown objects in an unstructured office or domestic environment. Previous results: Visual servoing Robust 3D stripe scanning 3D segmentation, object modelling Metalman

Why Object Tracking? Metalman uses visual servoing to execute manipulations: control signals are calculated from observed relative pose of gripper and object. Object tracking allows Metalman to: Handle dynamic scenes Detect unstable grasps Detect motion from accidental collisions Compensate for calibration errors in kinematic and camera models We ask the question: can’t we just measure the object once and then move the hand to the desired location?

Why Multi-Cue? Individual cues only provide robust tracking under limited conditions: Edges fail in low contrast, distracted by texture Textures not always available, distracted by reflections Colour gives only partial pose Fusion of multiple cues provides robust tracking in unpredictable conditions.

Multi-Cue Tracking Mainly applied to 2D feature-based tracking.
Sequential cue tracking: Selector (focus of attention) followed by tracker Can be extended to multi-level selector/tracker framework (Tomaya and Hager 1999). Cue integration: Voting, fuzzy logic (Kragić and Christensen 2001) Bayesian fusion, probabilistic models ICondensation (Isard and Blake 1998)

Proposed framework 3D Model-based tracking: models extracted using segmentation of range data from stripe scanner. Colour (selector), edges and texture (trackers) optimally fused in a Kalman filter framework. Colour + range scan Textured polygonal models

Kalman filter Optimally estimate object state xk given previous state xk-1 and new measurements yk. System state comprises pose and velocity screw: xk = [pk, vk]T State Prediction (constant velocity dynamics): p*k = pk-1 + vk-1·t vk = vk-1 State Update: xk = x*k + Kk  [ yk - y*(x*k) ] Need measurement function for each cue: y*(x*k)

Captured frame, predicted pose & ROI
Measurements For each new frame, predict object pose of and project model onto image to define region of interest (ROI): only process within ROI to eliminate distractions and reduce computational expense Captured frame, predicted pose & ROI

Colour Tracking Colour filter created from RGB histogram of texture
Image processing: Apply filter to ROI Calculate centroid of the largest connected blob Measurement prediction: Project centroid of model vertices at predicted pose onto the image plane

Edge Tracking To avoid texture, only consider silhouette edges
Image processing: Extract directional edge pixels (Sobel masks) Combine colour data to extract silhouette edges Match pixels to projected model edge segments

Edge Tracking Fit line to matched points for each segment and extract angle and mean position Measurement prediction: Project model vertices to image plane For each model edge, calculate angle and distance to measured mean point

Texture Tracking Textures represented as 8×8 pixel templates with high spatial variation of intensity Image processing: Render textured object in predicted pose Apply feature detector (Shi & Tomasi 1994) Extract templates, match to captured frame by SSD

Texture Tracking Apply outlier rejection:
Consistent motion vectors Invertible matching Calculate the 3D position of texture features on the surface of the model Measurement prediction: Project 3D surface features in current pose onto image plane

Experimental Results Three tracking scenarios:
Poor visual conditions Occluding obstacles Rotation about axis of symmetry Off-line processing of captured video sequence: Direct comparison of tracking performance using edges only, texture only, and nultimodal fusion. Actual processing rate is about 15 frames/sec

Poor Visual Conditions
Colour, texture and edge tracking

Poor Visual Conditions
Edges only Texture only

Occlusions Colour, texture and edge tracking

Occlusions Edges only Texture only

Occlusions Tracking precision

Symmetrical Objects Colour, texture and edge tracking

Symmetrical Objects Object orientation

Conclusions Fusion of multimodal visual features overcomes weaknesses in individual cues, and provides robust tracking where single cue tracking fails. The proposed framework is extensible; additional modalities can be fused provided a suitable measurement model is devised.

Open Issues Include additional modalities:
optical flow (motion) depth from stereo Calculate measurement errors as part of feature extraction for measurement covariance matrix. Modulate size of ROI to reflect current state covariance, so ROI automatically increases as visual conditions degrade, and decreases under good conditions to increase processing speed.

Hybrid Position-Based Visual Servoing
Visual Perception and Robotic Manipulation Springer Tracts in Advanced Robotics Chapter 6 Hybrid Position-Based Visual Servoing Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical and Computer Systems Engineering Monash University, Australia

Overview Motivation for hybrid visual servoing
Visual measurements and online calibration Kinematic measurements Implementation of controller and IEKF Experimental comparison of hybrid visual servoing with existing techniques

Motivation Manipulation tasks for a humanoid robot are characterized by: Autonomous planning from internal models Arbitrarily large initial pose error Background clutter and occluding obstacles Cheap sensors  camera model errors Light, compliant limbs  kinematic calibration errors Metalman: upper-torso humanoid hand-eye system

Visual Servoing Image-based visual servoing (IBVS):
Robust to calibration errors if target image known Depth of target must be estimated Large pose error can cause unpredictable trajectory Position-based visual servoing (PBVS): Allows 3D trajectory planning Sensitive to calibration errors End-effector may leave field of view Linear approximations (affine cameras, etc) Deng et al (2002) suggest little difference between visual servoing schemes

Conventional PBVS Endpoint open-loop (EOL):
Controller observes only the target End-effector pose estimated using kinematic model and calibrated hand-eye transformation Not affected by occlusion of the end-effector Endpoint closed-loop (ECL): Controller observes both target and end-effector Less sensitive to kinematic calibration errors but fails when the end-effector is obscured Accuracy depends on camera model and 3D pose reconstruction method

Proposed Scheme Hybrid position-based visual servoing using fusion of visual and kinematic measurements: Visual measurements provide accurate positioning Kinematic measurements provide robustness to occlusions and clutter End-effector pose is estimated from fused measure-ments using Iterated Extended Kalman Filter (IEKF) Additional state variables included for on-line calibration of camera and kinematic models Hybrid PBVS has the benefits of both EOL and ECL control and the deficiencies of neither.

Coordinate Frames EOL ECL Hybrid

PBVS Controller Conventional approach (Hutchinson et al, 1999).
Control error (pose error): WHE estimated by visual/kinematic fusion in IEKF. Proportional velocity control signal:

Implementation

Visual Measurements Gripper tracked using active LED features, represented by an internal point model Gi C gi image plane camera centre 3D gripper model measurements IEKF measurement model:

Camera Model Errors In practical system, baseline and verge angle may not be known precisely. 2b* scaled reconstruction -* affine reconstruction left image plane 2b left camera centre right camera centre right image plane reconstruction

Camera Model Errors How does scale error affect pose estimation?
Consider the case of translation only by TE: Predicted measurements: Actual measurements: Relationship between actual and estimated pose: Estimated pose for different objects in the same position with same scale error is different!

Camera Model Errors Scale error will cause non-convergence of PBVS!
Although the estimated gripper and object frames align, the actual frames are not aligned.

Visual Measurements To remove model errors, scale term is estimated by IEKF using modified measurement equation: Scale estimate requires four observed points with at least one in each stereo field.

Kinematic Model Kinematic measurement from PUMA is BHE
Measurement prediction (for IEKF): Hand-eye transformation BHW is treated as a dynamic bias and estimated in the IEKF Estimating BHW requires visual estimation of WHE, and is therefore dropped from the state vector if the gripper is obscured.

Kalman Filter Kalman filter state vector (position, velocity, calibration parameters): Measurement vector (visual + kinematic): Dynamic models: Constant velocity model for pose Static model for calibration parameters Initial state from kinematic measurements.

Constraints Three points required for visual pose recovery
Stereo measurements required for scale estimation LED association required multiple observed LEDs Estimation of BHW requires visual observations Use a hierarchy of estimators (nL,R = no. points): nL,R < 3: EOL control, no estimation of K1 or BHW nL > 3 xor nR > 3: Hybrid control, no K1 nL,R > 3: Hybrid control (visual + kinematic) Excluded state variables are discarded by setting rows and columns of Jacobian to zero

LED Measurement LEDs centroids measured with red colour filter
Measured and model LEDs associated using a global matching procedure. Robust global matching requires  3 LEDs. Predicted LEDs Observed LEDs

Experimental Results Positioning experiment: Accuracy evaluation:
Align midpoint between thumb and forefinger at coloured marker A Align thumb and forefinger on line between A and B Accuracy evaluation: Translation error: distance between midpoint of thumb/forefinger and A Orientation error: angle between line joining thumb/forefinger and line joining A/B

Positioning Accuracy Hybrid controller, initial pose
(right camera only) Hybrid controller, final pose (right camera only)

Positioning Accuracy ECL controller, final pose
(right camera only) EOL controller, final pose (right camera only)

Positioning Accuracy Accuracy measured over 5 trial per controller.

Tracking Robustness Initial pose: gripper outside FOV (ECL control)
Gripper enters field of view (Hybrid control, stereo) Final pose: gripper obscured (Hybrid control, mono)

Tracking Robustness Translational component of pose error
Hybrid stereo Hybrid mono Hybrid stereo Hybrid mono EOL EOL Translational component of pose error Estimated scale (camera calibration parameter)

Baseline Error Error introduced in calibrated baseline:
Baseline scaled between 0.7 to 1.5 Hybrid PBVS performance in presence of error:

Verge Error Error introduced in calibrated verge:
Offset between –6 to +8 degrees Hybrid PBVS performance in presence of error:

Servoing Task

Conclusions We have proposed a hybrid PBVS scheme to solve problems in real-world tasks: Kinematic measurements overcome occlusions Visual measurements improve accuracy and overcome calibration errors Experimental results verify the increased accuracy and robustness compared to conventional methods.

System Integration and Experimental Results
Visual Perception and Robotic Manipulation Springer Tracts in Advanced Robotics Chapter 7 System Integration and Experimental Results Geoffrey Taylor Lindsay Kleeman Intelligent Robotics Research Centre (IRRC) Department of Electrical and Computer Systems Engineering Monash University, Australia

Overview Stereoscopic light stripe scanning
Object Modelling and Classification Multicue tracking (edges, texture, colour) Visual servoing Real-world experimental manipulation tasks with an upper-torso humanoid robot

Motivation To enable a humanoid robot to perform manipulation tasks in a domestic environment: A domestic helper for the elderly and disabled Key challenges: Ad hoc tasks with unknown objects Robustness to measurement noise/interference Robustness to calibration errors Interaction to resolve ambiguities Real-time operation

Architecture

Triangulation-based depth measurement.
Light Stripe Scanning Scanned object D B Stripe generator Camera Triangulation-based depth measurement.

Stereo Stripe Scanner Scanned object X Left image plane Right image plane xL xR Laser diode Left camera L Right camera R θ 2b Three independent measurements provide redundancy for validation.

Reflections/Cross Talk

Robust stereoscopic scanner
Single Camera Result Single camera scanner Robust stereoscopic scanner

Surface type classification
3D Object Modelling Want to find objects with minimal prior knowledge. Use geometric primitives to represent objects Segment 3D scan based on local surface shape. Surface type classification

Segmentation Fit plane, sphere, cylinder and cone to segments.
Merge segments to improve fit of primitives. Raw scan Surface type classification Final segmentation Geometric models

Object Classification
Scene described by adjacency graph of primitives. Objects described by known sub-graphs.

Modeling Results Box, ball and cup: Raw colour/range scan
Textured polygonal models

Multi-Cue Tracking Individual cues are only robust under limited conditions: Edges fail in low contrast, distracted by texture Textures not always available, distracted by reflections Colour gives only partial pose Fusion of multiple cues provides robust tracking in unpredictable conditions.

Tracking Framework 3D Model-based tracking: models modelled from light stripe range data. Colour (selector), edges and texture (trackers) are measured simultaneously in every frame. Measurements fused in Extended Kalman filter: Cues interact with state through measurement models Individual cues need not recover the complete pose Extensible to any cues/cameras for which a measurement model exists.

Captured image used to generate filter
Colour Cues Filter created from colour histogram in ROI: Foreground colours promoted in histogram Background colours supressed in histogram Captured image used to generate filter Output of resulting filter

Edge Cues Predicted projected edges Sobel mask directional edges
Combine with colour to get silhouette edges Fitted edges

Final matched features
Texture Cues Rendered prediction Feature detector Matched templates Outlier rejection Final matched features

Tracking Result

Visual Servoing Position-based 3D visual servoing (IROS 2004).
Fusion of visual and kinematic measurements.

Visual Servoing 6D pose of hand estimated using extended Kalman filter with visual and kinematic measurements. State vector also includes hand-eye transformation and camera model parameters for calibration.

Grasping Task Grasp a yellow box without prior knowledge of objects in the scene.

Grasping Task

Pouring Task Pour the contents of a cup into a bowl.

Pouring Task

Smell Experiment Fusion of vision, smell and airflow sensing to locate and grasp a cup containing ethanol.

Summary Integration of stereoscopic light stripe sensing, geometric object modelling, multi-cue tracking and visual servoing allows robot to perform ad hoc tasks with unknown objects. Suggested directions for future research: Integrate tactile and force sensing Cooperative visual servoing of both arms Interact with objects to learn and refine models Verbal and gestural human-machine interaction

Shape Recovery Using Robust Light Striping

Similar presentations

Presentation on theme: "Shape Recovery Using Robust Light Striping"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Shape Recovery Using Robust Light Striping

Similar presentations

Presentation on theme: "Shape Recovery Using Robust Light Striping"— Presentation transcript:

Similar presentations

About project

Feedback