Learning to Compare Image Patches via Convolutional Neural Networks

Slides:



Advertisements
Similar presentations
A brief review of non-neural-network approaches to deep learning
Advertisements

A Fast Local Descriptor for Dense Matching
Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,
BRISK (Presented by Josh Gleason)
Patch to the Future: Unsupervised Visual Prediction
December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.
Aula 5 Alguns Exemplos PMR5406 Redes Neurais e Lógica Fuzzy.
RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Radial-Basis Function Networks
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Overview Introduction to local features
Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab
December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.
A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )
Deep Learning Overview Sources: workshop-tutorial-final.pdf
Lecture 4b Data augmentation for CNN training
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.
Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.
Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.
Deep Learning for Dual-Energy X-Ray
Automatic Lung Cancer Diagnosis from CT Scans (Week 1)
Predicting Visual Search Targets via Eye Tracking Data
The Relationship between Deep Learning and Brain Function
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Neural Net Scenery Generation
Object Detection based on Segment Masks
an introduction to: Deep Learning
Energy models and Deep Belief Networks
Automatic Lung Cancer Diagnosis from CT Scans (Week 2)
Data Mining, Neural Network and Genetic Programming
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Jure Zbontar, Yann LeCun
Classification: Logistic Regression
Learning Mid-Level Features For Recognition
Intelligent Information System Lab
Training Techniques for Deep Neural Networks
Efficient Deep Model for Monocular Road Segmentation
Convolutional Networks
CS6890 Deep Learning Weizhen Cai
Common Classification Tasks
A Convolutional Neural Network Cascade For Face Detection
Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang
Computer Vision James Hays
Two-Stream Convolutional Networks for Action Recognition in Videos
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
RGB-D Image for Scene Recognition by Jiaqi Guo
From a presentation by Jimmy Huff Modified by Josiah Yoder
Very Deep Convolutional Networks for Large-Scale Image Recognition
Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.
Creating Data Representations
KFC: Keypoints, Features and Correspondences
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen
Outline Background Motivation Proposed Model Experimental Results
Analysis of Trained CNN (Receptive Field & Weights of Network)
Attention for translation
An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,
Department of Computer Science Ben-Gurion University of the Negev
Department of Computer Science Ben-Gurion University of the Negev
Automatic Handwriting Generation
Introduction to Neural Networks
Deep Object Co-Segmentation
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
SFNet: Learning Object-aware Semantic Correspondence
Directional Occlusion with Neural Network
Presented By: Firas Gerges (fg92)
Shengcong Chen, Changxing Ding, Minfeng Liu 2018
Presentation transcript:

Learning to Compare Image Patches via Convolutional Neural Networks Sergey Zagoruyko & Nikos Komodakis Presented by Ilan Schvartzman

Problem description - Matching two patches in the general case (where occlusions, difference in illumination, shading etc.) is a non trivial task. - A wide range of applications such as, stereo matching, registration and in many cases an essential preprocess step - Up until now the problem was solved using descriptors such as Scale-Invariant-Feature-Transform

SIFT Seeking interesting points, easy to distinguish Creating a local descriptor for the point, that remains invariant to scaling, rotation and translation.

The goal of the article Generating a patch similarity function without attempting to use any manually designed features, but directly learn it.

Preliminary assumptions Only grayscale image was used (since this is the only available labeled training set they had). Patch size is 64*64 (with the exception of SPP model described later) Three basic architectures were proposed

Siamese\Pseudo-Siamese architecture Similar to having a descriptor. Each patch has a branch. In the Siamese the weights are forced to be the same in both branches, in the Pseudo they are not.

Two channel Both patches are inserted as a two channel image. The structure is similar to a classification CNN. Harder for testing

Deep Networks The patch is broken into convolutional layers of 3*3 separated by ReLU activations, therefore deep CNN can be created. Accounts for the non-linearities in the data.

Central surround two stream network Two branches – one for the center, one for the whole patch. Assuming the information in the information in the center should be taken with greater significance.

Spatial Pyramid Pooling – SPP The input patches can be of different size. The last layer’s pooling should be proportional to the patch’s size.

Learning All Models are trained in a strongly supervised manner using hinge-based loss term and squared l2-norm regularization. w are the weights of the neural network Oinet is the network output for the i-th training sample Yi corresponds to {1,-1} ASGD with constant learning rate 1.0, momentum 0.9 and weight decay = 0:0005 is used to train the models. Training is done in mini-batches of size 128.

Data Augmentation In order to increase the training set the data was flipped vertically and horizontally, as well as rotated hence using 8 times each sample

Experiments Local image patches benchmark For obtaining the image patch benchmark, a standard benchmark dataset [Image descriptor data] that consists of three subsets, Yosemite, Notre Dame, and Liberty, each of which contains more than 450,000 image patches (64 x 64 pixels) sampled around Difference of Gaussians feature points is used .

Experiments Local image patches benchmark 2-channel-based architectures (e.g., 2ch, 2ch-deep, 2ch-2stream) exhibit the best performance among all models. This indicates that it is important to jointly use information from both patches right from the first layer of the network. 2ch-2stream managed to outperform the previous state-of-the-art model by a large margin. The difference with SIFT is also larger, with the current model giving 6.65 times better score in this case. Performance of Siamese models when their top decision layer is replaced with the L2 Euclidean distance of the two convolutional descriptors is analysed. In this case, prior to applying the Euclidean distance, the descriptors are L2-normalized. For pseudo-Siamese only one branch was used to extract descriptors. In this case the two-stream network (siam-2stream-L2) computes better distances than the Siamese network (siam-L2), which, in turn, computes better distances than the pseudo-Siamese model (pseudo-siam-L2).

Experiments Local image patches benchmark

Experiments Wide baseline stereo evaluation For this evaluation, the “fountain “ dataset made by Strecha et al [Multi-view evaluation] is used. This contains several image sequences with ground truth homographies and laser-scanned depthmaps. The goal was to show that a photometric cost computed with neural network competes favourably against costs produced by a state-of-the-art hand-crafted feature descriptor, so we chose to compare with DAISY.

Conclusions This article explains several ways of comparing patches using CNN. Most of the architectures described in the article show improved results comparing to a non-neural networks algorithms. It can be noted that 2-channel-based ones were clearly the superior in terms of results. It is, therefore, worth investigating how to further accelerate the evaluation of these networks in the future. Regarding Siamese-based architectures, 2-stream multi-resolution models turned out to be extremely strong, providing always a significant boost in performance and verifying the importance of multi-resolution information when comparing patches.