Presentation is loading. Please wait.

Presentation is loading. Please wait.

SFNet: Learning Object-aware Semantic Correspondence

Similar presentations


Presentation on theme: "SFNet: Learning Object-aware Semantic Correspondence"— Presentation transcript:

1 SFNet: Learning Object-aware Semantic Correspondence
Junghyup Lee1, Dohyung Kim1, Jean Ponce2,3, Bumshub Ham1 1Yonsei University 2DI ENS 3INRIA CVPR 2019(Oral) 1

2 Background Dense correspondence: Focus on different views of same scene/object. Stereo matching、optical flow of adjacent frames、 Image warp/alignment… Establishing dense correspondences across images is one of the fundamental tasks in computer vision Fig.1 SIFT correspondences

3 Outlier cleansing(RANSAC)
Background Dense correspondence: Focus on different views of same scene/object. Stereo matching、optical flow of adjacent frames、 Image warp/alignment… Keypoint detection Keypoint description Matching Outlier cleansing(RANSAC) Establishing dense correspondences across images is one of the fundamental tasks in computer vision Fig.1 SIFT correspondences

4 Background Semantic correspondence: Focus on different instances of same scene/object category Hand crafted features do not capture high-level semantics Semantic correspondence algorithms (e.g., SIFT Flow [30]) go one step further, finding a dense flow field between images depicting different instances of the same object or scene category. This is very challenging especially in the presence of large changes in appearance/scene layout and background clutter. Reseaches remain stagnant util the revolution of deep learning hand-crafted features do not capture high-level semantics (e.g., appearance and shape variations), and are not robust to image-specific details (e.g., texture, background clutter, occlusion). Fig.2 semantic correspondences Jonathan L Long, Ning Zhang, and Trevor Darrell. Do convnets learn correspondence? In NIPS, 2014.

5 Related Methods Semantic Correspondence(Deep Learning Methods)
1. Pixel-level correspondence annotation is extremely labor-intensive and subjective. 2. Keypoint-level correspondence annotation is not sufficient for training CNN. ……. Proposals approach Geometric alignment problem all nodes in a neighborhood has the same label. People surely make progress.

6 Related Methods Semantic Correspondence(Deep Learning Methods)
Proposals approach Generate proposals(Fast R-CNN) => Proposal matching => convert to dense correspondence Matching result is not smooth. A little complicated. all nodes in a neighborhood has the same label

7 Related Methods Semantic Correspondence(Deep Learning Methods)
Geometric alignment problem To estimate parameters of a global transformation (e.g. affine, homography, thin plate spline) Fig.3 Pipeline of a typical geometric alignment approach. The input is two picture, the output is the 6 parameters for affine transformation. all nodes in a neighborhood has the same label Do not rely on annotation. Naturally smoothing. Sensitive to non-rigid deformation. Do not really establish semantic correspondence.

8 Proposed Method Overview
all nodes in a neighborhood has the same label

9 Proposed Method Feature extraction and matching ℎ ∗𝑤 ∗𝑑 Adaption layer
Resnet(conv5) Conv Batch-norm ReLu ℎ ∗𝑤 ∗𝑑 Adaption layer 𝑝: position of source feature map. 𝑞: position of target feature map. all nodes in a neighborhood has the same label

10 Proposed Method Kernel soft argmax layer
𝑐 𝑝 : correlation map for position 𝑝 at each spatial position 𝑞. (𝑤 ∗ℎ) We have to find the position with max response. Argmax: not differentiable, not enable fine-grained localization at image level. Soft argmax: not accurate for multi-modal distribution. (E = p*index) data=np.array([0.1, 0.3, 0.6, 2.1 ,0.55]) data = np.exp(data)/np.sum(np.exp(data)) # array([ , , , , ]) Position = sum(data * array([0,1,2,3,4])) # Position: all nodes in a neighborhood has the same label

11 Proposed Method Kernel soft argmax layer
Apply Gaussian kernel to suppress others, make the distribution with one-peak. 𝑐 𝑝 : correlation map for position 𝑝 at each spatial position 𝑞. (𝑤 ∗ℎ) 𝑛 𝑝 : L2-normalized 𝑐 𝑝 . [0,0,0,0,…,0] [0,0,0,1,…,0] …… [0,0,0,0,…,0] (w x h) all nodes in a neighborhood has the same label 𝑘 𝑝 is a 2-d Gaussian kernel center at argmax( 𝑛 𝑝 ). 𝛽 is the hyper parameter.

12 Proposed Method How to use Loss
For a pixel located at (x, y) in the source image with 320 x 320 resolution. The position in feature map is int(x/16, y/16). (suppose the feature map is 20 x 20) Using argmax to find the corresponding position in target feature map via 𝜙( 𝑥 16 , 𝑦 16 ). Loss all nodes in a neighborhood has the same label

13 Proposed Method Mask consistency loss Prevent the matches from foreground to background. First, we define flow filed of source Then, we reconstruct the position of source mask where is the ground truth target mask Max Jaderberg, Karen Simonyan, Andrew Zisserman, and Koray Kavukcuoglu. Spatial transformer networks. In NIPS, 2015.

14 Proposed Method a x a x b b Flow consistency loss
One-to-one matching a x a x b b Unstable matching Many-to-one matching If the flow fields are consistent with others, have the same magnitude with opposite directions. Tinghui Zhou, Philipp Krahenbuhl, Mathieu Aubry, Qixing Huang, and Alexei A Efros. Learning dense correspondence via 3D-guided cycle consistency. In CVPR, 2016.

15 Proposed Method Smooth loss
The matching within the foreground object should be smooth.

16 Experiments Dataset WILLOW: 900 image pairs of 4 categories. (key point annotation and bounding box annotation). PASCAL: 1351 image pairs of 20 categories. (key point and bounding box) Caltech101: annotated with object masks. all nodes in a neighborhood has the same label

17 Experiments Metric 1. PCK (probability of correct keypoint)
Geometric alignment approach T: transformation parameters P: Image pairs depicting different instances B: Bounding boxes M: Masks Proposal approach (flow field approach)

18 Experiments Metric 2. LT-ACC/IoU
PCK only cares the keypoint matching. The matching results or dense flow can be used to align image.

19 Experiments Ablation

20 Experiments Examples of alignment

21 Conclusion Present a CNN model for learning semantic flow end-to-end.
Using binary masks to learn pixel-to-pixel correspondences. Kernel soft argmax, loss function design.


Download ppt "SFNet: Learning Object-aware Semantic Correspondence"

Similar presentations


Ads by Google