Automatic Image Collection of Objects with Similar Function by Learning Human Grasping Forms Shinya Morioka, Tadashi Matsuo, Yasuhiro Hiramoto, Nobutaka Shimada, Yoshiaki Shirai Ritsumeikan University
Outline Motivation Related Work Proposed Method Results Conclusions
Motivation(1/2) How to classify unknown objects into categories such as “for drinking”, “for cutting”... ? To learn this relation, many labeled images are required. for drinkingfor cutting but... Object Hand
Motivation(2/2) How to collect labeled images? Object features may be invisible ! We propose a method to estimate an object region and standard coordinate system based on the wrist. Extracted SURF features Labeled image normalized with wrist
Related work R. Filipovych, E. Ribeiro, “Recognizing Primitive Interactions by Exploring Actor-Object States”, Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.1-8(2008). They focus on estimation of a time sequence of states when one interact with an object. It is based on the relation between local appearances and interaction state. It is difficult to estimate an object region in a single image. We propose a method to estimate an object region and standard coordinate system based on the wrist.
Proposed Method 1.Training a local estimator of wrist position 2.Estimating wrist position 3.Training a local estimator of object position 4.Generating wrist-object coordinate system 5.Training a estimator of object region on the wrist-object coordinate system 6.Estimating an object region TrainingEstimating
Proposed Method(1/6) 1.Training a local estimator of wrist position 2.Estimating wrist position 3.Training a local estimator of object position 4.Generating wrist-object coordinate system 5.Training a estimator of object region on the wrist-object coordinate system 6.Estimating an object region TrainingEstimating
Training a local estimator of wrist position wrist position (manual ) Local coordinate for SURF feature 1 Local coordinate for SURF feature2 Train Randomized Trees(RTs) with relative wrist positions for SURF features. B : block B w : the block including wrist position With trained RTs, we can calculate......
Proposed Method(2/6) 1.Training a local estimator of wrist position 2.Estimating wrist position 3.Training a local estimator of object position 4.Generating wrist-object coordinate system 5.Training a estimator of object region on the wrist-object coordinate system 6.Estimating an object region TrainingEstimating
Estimating wrist position(1/2) SURF feature extraction Classification of features by SVM Reduced features classified as hand As preprocess, we remove features apparently from outside of a hand because they interfere the following voting process. Hand Background 2-Class SVM Teachers HandBackground Input : SURF descriptor Output : Hand or Background
Probability distribution by local estimators Likelihood as wrist position Estimated wrist position Reduced SURF features accumulate Estimating wrist position(2/2) + RTs... Find Maximum
RESULTS : estimated wrist position : wrist position
Proposed Method(3/6) 1.Training a local estimator of wrist position 2.Estimating wrist position 3.Training a local estimator of object position 4.Generating wrist-object coordinate system 5.Training a estimator of object region on the wrist-object coordinate system 6.Estimating an object region TrainingEstimating
Likelihood as a hand part LowHigh Likelihood as a hand part If the j-th SURF feature originates from a hand, should be high around the estimated wrist position. We take as a likelihood as a hand part of the j-th SURF feature. Likelihood as a hand part is useful to estimate a hand region and an object region.
RESULTS : Likelihood as a hand part High Low
Object center High Low Wrist position K-means clustering Training a local estimator of object center We train RTs with represented by each local coordinate. (Here, we collect the vectors from images with simple background.) With trained RTs, we can calculate...
Proposed Method(4/6) 1.Training a local estimator of wrist position 2.Estimating wrist position 3.Training a local estimator of object position 4.Generating wrist-object coordinate system 5.Training a estimator of object region on the wrist-object coordinate system 6.Estimating an object region TrainingEstimating
Probability distribution by local estimators Likelihood as object center Estimated object center Reduced SURF features accumulate Estimating object center + RTs... Find Maximum
RESULTS : estimated object center : Object center : wrist position
Generating wrist-object coordinate system wrist-object coordinate system Normalized image wrist position = (0,0) object center = (-1, 0)
Proposed Method(5/6) 1.Training a local estimator of wrist position 2.Estimating wrist position 3.Training a local estimator of object position 4.Generating wrist-object coordinate system 5.Training a estimator of object region on the wrist-object coordinate system 6.Estimating an object region TrainingEstimating
Training a estimator of object region on the wrist-object coordinate system One-Class SVMs are built for hand and object. SVM for Hand region SVM for Object region Input : Output :Whether is on an object or not Classification into hand, object, and background classes Input : Output :Whether is on a hand or not We collect teachers from the K-means clustering result of images with simple background
Proposed Method(6/6) 1.Training a local estimator of wrist position 2.Estimating wrist position 3.Training a local estimator of object position 4.Generating wrist-object coordinate system 5.Training a estimator of object region on the wrist-object coordinate system 6.Estimating an object region TrainingEstimating
TOTAL RESULTS (movie) HandObjectBackground
A cup used on training
Conclusion Future work Object recognition by learning the relation between an object and a posture of a hand grasping it. With the proposed method, a wrist can be found and an object center can be estimated from a set of the wrist and local features. The wrist and the object center make a wrist- object coordinate suitable for learning a shape of a grasped object.
Limitation Local estimators and SVMs must be generated for each category. Local estimator for a cup Local estimator for a scissors
Limitation A local estimator and a SVM must be trained with images known to be in a single category. Teachers for a cup Teachers for a scissors
Limitation Classification of unknown objects is not implemented. for drinking Unknown object image
Training a local estimator of wrist position 学習画像に教師信号 (手首位置)を与える SURF 特徴検出 画面座標から相対位置関係の SURF 特徴を基準とした 局所座標に変換する。 座標系上で SURF 特徴を入力すると手首位置確率分布 を出力する Randomized Trees (RTs ) を学習する。 wrist position (manual )
Background(2/2) How to collect labeled images? Object features may be invisible ! We use local features on a hand to estimate object regions and its relative position. Extracted SURF features Labeled image normalized with wrist
Training a local estimator of object position 高 低 手首位置 物体重心位置 物体重心 手首位置 SURF 特徴の位置と手らしさ から特 徴と手と物体にラベル付け。 手ラベルのついた SURF 特徴を入力すると物体 重心位置確率分布を出力する RTs を学習。 手首位置検出と の算出手と物体クラス分類局所座標系の設定
Generating wrist-object coordinate system SURF 特徴を RTs に入力すると、物体重心位置確率 分布が得られる。 手首位置と物体重心位置から手と物体の大まかな 領域が予測できる。 物体の位置・方向・スケールを正規化するために、 手首ー物体座標系を基準に、手・物体特徴を変換 する。 物体重心位置 投票結果物体重心位置検出結果手首ー物体座標系