Learning to Compare Image Patches via Convolutional Neural Networks

Slides:

Advertisements

Similar presentations

A brief review of non-neural-network approaches to deep learning

Advertisements

A Fast Local Descriptor for Dense Matching

Evaluating Color Descriptors for Object and Scene Recognition Koen E.A. van de Sande, Student Member, IEEE, Theo Gevers, Member, IEEE, and Cees G.M. Snoek,

BRISK (Presented by Josh Gleason)

Patch to the Future: Unsupervised Visual Prediction

December 5, 2013Computer Vision Lecture 20: Hidden Markov Models/Depth 1 Stereo Vision Due to the limited resolution of images, increasing the baseline.

Aula 5 Alguns Exemplos PMR5406 Redes Neurais e Lógica Fuzzy.

RBF Neural Networks x x1 Examples inside circles 1 and 2 are of class +, examples outside both circles are of class – What NN does.

Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.

Radial-Basis Function Networks

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Overview Introduction to local features

Hurieh Khalajzadeh Mohammad Mansouri Mohammad Teshnehlab

December 4, 2014Computer Vision Lecture 22: Depth 1 Stereo Vision Comparing the similar triangles PMC l and p l LC l, we get: Similarly, for PNC r and.

A Tutorial on using SIFT Presented by Jimmy Huff (Slightly modified by Josiah Yoder for Winter )

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Lecture 4b Data augmentation for CNN training

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Sparse Coding: A Deep Learning using Unlabeled Data for High - Level Representation Dr.G.M.Nasira R. Vidya R. P. Jaia Priyankka.

Learning to Compare Image Patches via Convolutional Neural Networks SERGEY ZAGORUYKO & NIKOS KOMODAKIS.

Combining Models Foundations of Algorithms and Machine Learning (CS60020), IIT KGP, 2017: Indrajit Bhattacharya.

Deep Learning for Dual-Energy X-Ray

Automatic Lung Cancer Diagnosis from CT Scans (Week 1)

Predicting Visual Search Targets via Eye Tracking Data

The Relationship between Deep Learning and Brain Function

Summary of “Efficient Deep Learning for Stereo Matching”

Deep Neural Net Scenery Generation

Object Detection based on Segment Masks

an introduction to: Deep Learning

Energy models and Deep Belief Networks

Automatic Lung Cancer Diagnosis from CT Scans (Week 2)

Data Mining, Neural Network and Genetic Programming

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Jure Zbontar, Yann LeCun

Classification: Logistic Regression

Learning Mid-Level Features For Recognition

Intelligent Information System Lab

Training Techniques for Deep Neural Networks

Efficient Deep Model for Monocular Road Segmentation

Convolutional Networks

CS6890 Deep Learning Weizhen Cai

Common Classification Tasks

A Convolutional Neural Network Cascade For Face Detection

Zan Gao, Deyu Wang, Xiangnan He, Hua Zhang

Computer Vision James Hays

Two-Stream Convolutional Networks for Action Recognition in Videos

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

RGB-D Image for Scene Recognition by Jiaqi Guo

From a presentation by Jimmy Huff Modified by Josiah Yoder

Very Deep Convolutional Networks for Large-Scale Image Recognition

Object Classes Most recent work is at the object level We perceive the world in terms of objects, belonging to different classes. What are the differences.

Creating Data Representations

KFC: Keypoints, Features and Correspondences

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

المشرف د.يــــاســـــــــر فـــــــؤاد By: ahmed badrealldeen

Outline Background Motivation Proposed Model Experimental Results

Analysis of Trained CNN (Receptive Field & Weights of Network)

Attention for translation

An introduction to: Deep Learning aka or related to Deep Neural Networks Deep Structural Learning Deep Belief Networks etc,

Department of Computer Science Ben-Gurion University of the Negev

Department of Computer Science Ben-Gurion University of the Negev

Automatic Handwriting Generation

Introduction to Neural Networks

Deep Object Co-Segmentation

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

SFNet: Learning Object-aware Semantic Correspondence

Directional Occlusion with Neural Network

Presented By: Firas Gerges (fg92)

Shengcong Chen, Changxing Ding, Minfeng Liu 2018

Presentation transcript:

Learning to Compare Image Patches via Convolutional Neural Networks Sergey Zagoruyko & Nikos Komodakis Presented by Ilan Schvartzman

Problem description - Matching two patches in the general case (where occlusions, difference in illumination, shading etc.) is a non trivial task. - A wide range of applications such as, stereo matching, registration and in many cases an essential preprocess step - Up until now the problem was solved using descriptors such as Scale-Invariant-Feature-Transform

SIFT Seeking interesting points, easy to distinguish Creating a local descriptor for the point, that remains invariant to scaling, rotation and translation.

The goal of the article Generating a patch similarity function without attempting to use any manually designed features, but directly learn it.

Preliminary assumptions Only grayscale image was used (since this is the only available labeled training set they had). Patch size is 64*64 (with the exception of SPP model described later) Three basic architectures were proposed

Siamese\Pseudo-Siamese architecture Similar to having a descriptor. Each patch has a branch. In the Siamese the weights are forced to be the same in both branches, in the Pseudo they are not.

Two channel Both patches are inserted as a two channel image. The structure is similar to a classification CNN. Harder for testing

Deep Networks The patch is broken into convolutional layers of 3*3 separated by ReLU activations, therefore deep CNN can be created. Accounts for the non-linearities in the data.

Central surround two stream network Two branches – one for the center, one for the whole patch. Assuming the information in the information in the center should be taken with greater significance.

Spatial Pyramid Pooling – SPP The input patches can be of different size. The last layer’s pooling should be proportional to the patch’s size.

Learning All Models are trained in a strongly supervised manner using hinge-based loss term and squared l2-norm regularization. w are the weights of the neural network Oinet is the network output for the i-th training sample Yi corresponds to {1,-1} ASGD with constant learning rate 1.0, momentum 0.9 and weight decay = 0:0005 is used to train the models. Training is done in mini-batches of size 128.

Data Augmentation In order to increase the training set the data was flipped vertically and horizontally, as well as rotated hence using 8 times each sample

Experiments Local image patches benchmark For obtaining the image patch benchmark, a standard benchmark dataset [Image descriptor data] that consists of three subsets, Yosemite, Notre Dame, and Liberty, each of which contains more than 450,000 image patches (64 x 64 pixels) sampled around Difference of Gaussians feature points is used .

Experiments Local image patches benchmark 2-channel-based architectures (e.g., 2ch, 2ch-deep, 2ch-2stream) exhibit the best performance among all models. This indicates that it is important to jointly use information from both patches right from the first layer of the network. 2ch-2stream managed to outperform the previous state-of-the-art model by a large margin. The difference with SIFT is also larger, with the current model giving 6.65 times better score in this case. Performance of Siamese models when their top decision layer is replaced with the L2 Euclidean distance of the two convolutional descriptors is analysed. In this case, prior to applying the Euclidean distance, the descriptors are L2-normalized. For pseudo-Siamese only one branch was used to extract descriptors. In this case the two-stream network (siam-2stream-L2) computes better distances than the Siamese network (siam-L2), which, in turn, computes better distances than the pseudo-Siamese model (pseudo-siam-L2).

Experiments Local image patches benchmark

Experiments Wide baseline stereo evaluation For this evaluation, the “fountain “ dataset made by Strecha et al [Multi-view evaluation] is used. This contains several image sequences with ground truth homographies and laser-scanned depthmaps. The goal was to show that a photometric cost computed with neural network competes favourably against costs produced by a state-of-the-art hand-crafted feature descriptor, so we chose to compare with DAISY.

Conclusions This article explains several ways of comparing patches using CNN. Most of the architectures described in the article show improved results comparing to a non-neural networks algorithms. It can be noted that 2-channel-based ones were clearly the superior in terms of results. It is, therefore, worth investigating how to further accelerate the evaluation of these networks in the future. Regarding Siamese-based architectures, 2-stream multi-resolution models turned out to be extremely strong, providing always a significant boost in performance and verifying the importance of multi-resolution information when comparing patches.