SFNet: Learning Object-aware Semantic Correspondence

Slides:

Advertisements

Similar presentations

Image Registration  Mapping of Evolution. Registration Goals Assume the correspondences are known Find such f() and g() such that the images are best.

Advertisements

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Global spatial layout: spatial pyramid matching Spatial weighting the features Beyond bags of features: Adding spatial information.

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Spatial Pyramid Pooling in Deep Convolutional

Tinghui Zhou1, Yong Jae Lee2, Stella X. Yu1,3, Alexei A. Efros1

Overview Introduction to local features

Lecture 12 Stereo Reconstruction II Lecture 12 Stereo Reconstruction II Mata kuliah: T Computer Vision Tahun: 2010.

A Local Adaptive Approach for Dense Stereo Matching in Architectural Scene Reconstruction C. Stentoumis 1, L. Grammatikopoulos 2, I. Kalisperakis 2, E.

Overview Harris interest points Comparing interest points (SSD, ZNCC, SIFT) Scale & affine invariant interest points Evaluation and comparison of different.

Local invariant features Cordelia Schmid INRIA, Grenoble.

Building local part models for category-level recognition C. Schmid, INRIA Grenoble Joint work with G. Dorko, S. Lazebnik, J. Ponce.

Detection, Segmentation and Fine-grained Localization

Computer Vision 776 Jan-Michael Frahm 12/05/2011 Many slides from Derek Hoiem, James Hays.

Local invariant features Cordelia Schmid INRIA, Grenoble.

ECE 6504: Deep Learning for Perception Dhruv Batra Virginia Tech Topics: –(Finish) Backprop –Convolutional Neural Nets.

VIP: Finding Important People in Images Clint Solomon Mathialagan Andrew C. Gallagher Dhruv Batra CVPR

Fully Convolutional Networks for Semantic Segmentation

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Stereo Vision Local Map Alignment for Robot Environment Mapping Computer Vision Center Dept. Ciències de la Computació UAB Ricardo Toledo Morales (CVC)

776 Computer Vision Jan-Michael Frahm Spring 2012.

Instructor: Mircea Nicolescu Lecture 5 CS 485 / 685 Computer Vision.

COS 429 PS3: Stitching a Panorama Due November 10 th.

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Jo˜ao Carreira, Abhishek Kar, Shubham Tulsiani and Jitendra Malik University of California, Berkeley CVPR2015 Virtual View Networks for Object Reconstruction.

776 Computer Vision Jan-Michael Frahm Spring 2012.

Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.

Recent developments in object detection

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Learning to Compare Image Patches via Convolutional Neural Networks

Summary of “Efficient Deep Learning for Stereo Matching”

Object Detection based on Segment Masks

Object detection with deformable part-based models

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Deep Predictive Model for Autonomous Driving

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Saliency-guided Video Classification via Adaptively weighted learning

CS4670 / 5670: Computer Vision Kavita Bala Lec 27: Stereo.

Depth estimation and Plane detection

Compositional Human Pose Regression

Nonparametric Semantic Segmentation

Fast Preprocessing for Robust Face Sketch Synthesis

Paper Presentation: Shape and Matching

Li Fei-Fei, UIUC Rob Fergus, MIT Antonio Torralba, MIT

CS6890 Deep Learning Weizhen Cai

Cheng-Ming Huang, Wen-Hung Liao Department of Computer Science

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601.

Computer Vision James Hays

Image Classification.

CSE 455 – Guest Lectures 3 lectures Contact Interest points 1

Brief Review of Recognition + Context

8-3 RRAM Based Convolutional Neural Networks for High Accuracy Pattern Recognition and Online Learning Tasks Z. Dong, Z. Zhou, Z.F. Li, C. Liu, Y.N. Jiang,

CornerNet: Detecting Objects as Paired Keypoints

KFC: Keypoints, Features and Correspondences

Spatial Transformer Networks

Outline Background Motivation Proposed Model Experimental Results

RCNN, Fast-RCNN, Faster-RCNN

Introduction to Object Tracking

Computational Photography

Recognition and Matching based on local invariant features

Human-object interaction

Deep Object Co-Segmentation

Occlusion and smoothness probabilities in 3D cluttered scenes

Semantic Segmentation

Multi-UAV to UAV Tracking

Weak-supervision based Multi-Object Tracking

End-to-End Facial Alignment and Recognition

Report 4 Brandon Silva.

Point Set Representation for Object Detection and Beyond

Computing the Stereo Matching Cost with a Convolutional Neural Network

Presentation transcript:

SFNet: Learning Object-aware Semantic Correspondence Junghyup Lee1, Dohyung Kim1, Jean Ponce2,3, Bumshub Ham1 1Yonsei University 2DI ENS 3INRIA CVPR 2019(Oral) 1

Background Dense correspondence: Focus on different views of same scene/object. Stereo matching、optical flow of adjacent frames、 Image warp/alignment… Establishing dense correspondences across images is one of the fundamental tasks in computer vision Fig.1 SIFT correspondences

Outlier cleansing(RANSAC) Background Dense correspondence: Focus on different views of same scene/object. Stereo matching、optical flow of adjacent frames、 Image warp/alignment… Keypoint detection Keypoint description Matching Outlier cleansing(RANSAC) Establishing dense correspondences across images is one of the fundamental tasks in computer vision Fig.1 SIFT correspondences

Background Semantic correspondence: Focus on different instances of same scene/object category Hand crafted features do not capture high-level semantics Semantic correspondence algorithms (e.g., SIFT Flow [30]) go one step further, finding a dense flow field between images depicting different instances of the same object or scene category. This is very challenging especially in the presence of large changes in appearance/scene layout and background clutter. Reseaches remain stagnant util the revolution of deep learning hand-crafted features do not capture high-level semantics (e.g., appearance and shape variations), and are not robust to image-specific details (e.g., texture, background clutter, occlusion). Fig.2 semantic correspondences Jonathan L Long, Ning Zhang, and Trevor Darrell. Do convnets learn correspondence? In NIPS, 2014.

Related Methods Semantic Correspondence(Deep Learning Methods) 1. Pixel-level correspondence annotation is extremely labor-intensive and subjective. 2. Keypoint-level correspondence annotation is not sufficient for training CNN. ……. Proposals approach Geometric alignment problem all nodes in a neighborhood has the same label. People surely make progress.

Related Methods Semantic Correspondence(Deep Learning Methods) Proposals approach Generate proposals(Fast R-CNN) => Proposal matching => convert to dense correspondence Matching result is not smooth. A little complicated. all nodes in a neighborhood has the same label

Related Methods Semantic Correspondence(Deep Learning Methods) Geometric alignment problem To estimate parameters of a global transformation (e.g. affine, homography, thin plate spline) Fig.3 Pipeline of a typical geometric alignment approach. The input is two picture, the output is the 6 parameters for affine transformation. all nodes in a neighborhood has the same label Do not rely on annotation. Naturally smoothing. Sensitive to non-rigid deformation. Do not really establish semantic correspondence.

Proposed Method Overview all nodes in a neighborhood has the same label

Proposed Method Feature extraction and matching ℎ ∗𝑤 ∗𝑑 Adaption layer Resnet(conv5) Conv Batch-norm ReLu ℎ ∗𝑤 ∗𝑑 Adaption layer 𝑝: position of source feature map. 𝑞: position of target feature map. all nodes in a neighborhood has the same label

Proposed Method Kernel soft argmax layer 𝑐 𝑝 : correlation map for position 𝑝 at each spatial position 𝑞. (𝑤 ∗ℎ) We have to find the position with max response. Argmax: not differentiable, not enable fine-grained localization at image level. Soft argmax: not accurate for multi-modal distribution. (E = p*index) data=np.array([0.1, 0.3, 0.6, 2.1 ,0.55]) data = np.exp(data)/np.sum(np.exp(data)) # array([0.07795756, 0.09521758, 0.12853029, 0.57603278, 0.12226179]) Position = sum(data * array([0,1,2,3,4])) # Position:2.5694236670240085 all nodes in a neighborhood has the same label

Proposed Method Kernel soft argmax layer Apply Gaussian kernel to suppress others, make the distribution with one-peak. 𝑐 𝑝 : correlation map for position 𝑝 at each spatial position 𝑞. (𝑤 ∗ℎ) 𝑛 𝑝 : L2-normalized 𝑐 𝑝 . [0,0,0,0,…,0] [0,0,0,1,…,0] …… [0,0,0,0,…,0] (w x h) all nodes in a neighborhood has the same label 𝑘 𝑝 is a 2-d Gaussian kernel center at argmax( 𝑛 𝑝 ). 𝛽 is the hyper parameter.

Proposed Method How to use Loss For a pixel located at (x, y) in the source image with 320 x 320 resolution. The position in feature map is int(x/16, y/16). (suppose the feature map is 20 x 20) Using argmax to find the corresponding position in target feature map via 𝜙( 𝑥 16 , 𝑦 16 ). Loss all nodes in a neighborhood has the same label

Proposed Method Mask consistency loss Prevent the matches from foreground to background. First, we define flow filed of source Then, we reconstruct the position of source mask where is the ground truth target mask Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. Spatial transformer networks. In NIPS, 2015.

Proposed Method a x a x b b Flow consistency loss One-to-one matching a x a x b b Unstable matching Many-to-one matching If the flow fields are consistent with others, have the same magnitude with opposite directions. Tinghui Zhou, Philipp Krahenbuhl, Mathieu Aubry, Qixing Huang, and Alexei A Efros. Learning dense correspondence via 3D-guided cycle consistency. In CVPR, 2016.

Proposed Method Smooth loss The matching within the foreground object should be smooth.

Experiments Dataset WILLOW: 900 image pairs of 4 categories. (key point annotation and bounding box annotation). PASCAL: 1351 image pairs of 20 categories. (key point and bounding box) Caltech101: annotated with object masks. all nodes in a neighborhood has the same label

Experiments Metric 1. PCK (probability of correct keypoint) Geometric alignment approach T: transformation parameters P: Image pairs depicting different instances B: Bounding boxes M: Masks Proposal approach (flow field approach)

Experiments Metric 2. LT-ACC/IoU PCK only cares the keypoint matching. The matching results or dense flow can be used to align image.

Experiments Ablation

Experiments Examples of alignment

Conclusion Present a CNN model for learning semantic flow end-to-end. Using binary masks to learn pixel-to-pixel correspondences. Kernel soft argmax, loss function design.