‘Learning Image Matching by Simply Watching Video’

Slides:

Advertisements

Similar presentations

ImageNet Classification with Deep Convolutional Neural Networks

Advertisements

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.

Spatial Pyramid Pooling in Deep Convolutional

Information Fusion Yu Cai. Research Article “Comparative Analysis of Some Neural Network Architectures for Data Fusion”, Authors: Juan Cires, PA Romo,

Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.

Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue

Convolutional Neural Network

Deep Learning Overview Sources: workshop-tutorial-final.pdf

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Deeply-Recursive Convolutional Network for Image Super-Resolution

Recent developments in object detection

Convolutional Sequence to Sequence Learning

Convolutional Neural Network

The Relationship between Deep Learning and Brain Function

Summary of “Efficient Deep Learning for Stereo Matching”

Deep Learning Amin Sobhani.

Data Mining, Neural Network and Genetic Programming

Data Mining, Neural Network and Genetic Programming

Semi-Global Matching with self-adjusting penalties

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Saliency-guided Video Classification via Adaptively weighted learning

A Neural Approach to Blind Motion Deblurring

Article Review Todd Hricik.

Principal Solutions AWS Deep Learning

Combining CNN with RNN for scene labeling (segmentation)

Compositional Human Pose Regression

Ajita Rattani and Reza Derakhshani,

Intelligent Information System Lab

Efficient Deep Model for Monocular Road Segmentation

Dipartimento di Ingegneria «Enzo Ferrari»

CS6890 Deep Learning Weizhen Cai

Adversarially Tuned Scene Generation

By: Kevin Yu Ph.D. in Computer Engineering

Layer-wise Performance Bottleneck Analysis of Deep Neural Networks

Computer Vision James Hays

CNNs and compressive sensing Theoretical analysis

Introduction to Neural Networks

Two-Stream Convolutional Networks for Action Recognition in Videos

Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.

CS 4501: Introduction to Computer Vision Training Neural Networks II

Deep learning Introduction Classes of Deep Learning Networks

Very Deep Convolutional Networks for Large-Scale Image Recognition

network of simple neuron-like computing elements

Single Image Rolling Shutter Distortion Correction

KFC: Keypoints, Features and Correspondences

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

LECTURE 35: Introduction to EEG Processing

Neural Networks Geoff Hulten.

Outline Background Motivation Proposed Model Experimental Results

LECTURE 33: Alternative OPTIMIZERS

Neural Speech Synthesis with Transformer Network

Visualizing and Understanding Convolutional Networks

Neural networks (3) Regularization Autoencoder

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Heterogeneous convolutional neural networks for visual recognition

Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Learning Deconvolution Network for Semantic Segmentation

Multi-UAV to UAV Tracking

Deep learning: Recurrent Neural Networks CV192

Week 3 Volodymyr Bobyr.

CSC 578 Neural Networks and Deep Learning

Week 7 Presentation Ngoc Ta Aidean Sharghi

Recent Developments on Super-Resolution

SDSEN: Self-Refining Deep Symmetry Enhanced Network

Presentation transcript:

‘Learning Image Matching by Simply Watching Video’ *A presentation for this article: ‘Gucan Long, Laurent Kneip etc, Learning Image Matching by Simply Watching Video, CVPR, 2016.’ ‘Learning Image Matching by Simply Watching Video’ Reporter: Xi Mo Ph.D. Candidate of ITTC, University of Kansas Date: Feb 20rd, 2017

What will be expected in this article? 1.Mechanism Analysis Temporal coherency that is naturally contained in real-world video sequences The CNN for frame-interpolation can be trained in an unsupervised manner Learn image matching 2.Unsupervised Learning Train and apply a neural network for frame interpolation Inverting the CNN to find the correspondence 3.Experiments and Results Presented approach achieves surprising performance comparable to traditional empirically designed methods

1.1 Matching by Inverting[2] a Deep neural network (MIND) Dense matching can be regarded as a sub-problem of frame-interpolation correspondence-based image warping Dense inter-frame matches are available interpolation could be immediately generated Train a deep neural network for frame interpolation, its application would implicitly also generate knowledge about dense image correspondences

1.2 Analysis by Synthesis[1] Learning is described as the acquisition of a measurement synthesizing model, and inference of generating parameters as model inversion once correct synthesis is achieved. In the context, synthesis simply refers to frame interpolation. Input image pair: 2 Input frames and 1 output interpolation frame (ground truth) The two input pixels (one per input frame) that have the maximum influence are considered as a match Produces sensitivity maps for each interpolated pixel

Traditional Definition 1 *Supplement: Network inversion Traditional[3] vs Author Traditional Definition Inverting a learned network is defined as reconstructing the input from the output of an artificial neural network Authors’ Definition Back-propagation through a learned network in order to obtain the gradient map with respect to the input signals Sensitivity map Indicates how much each input pixel influences a particular output pixel, e.g. the gradients

2.1 Process of Unsupervised Learning It can be split up into h X w non-linear sub-functions, each one producing the corresponding pixel in the output image. Define the absolute gradients of the output point (i,j) with respect to each one of the input images, and evaluated for the concrete inputs By computing the two gradient maps for each point in the output image and extracting each time the most responsible point, we thus obtain the following two lists of points. Then combine same-index elements.

2.2 Structure of Deep Learning Neural Network Inspired by FlowNetSimple[4] Consists of a convolutional part (encoder) and a deconvolutional part (decoder) Set the size of the receptive field of all convolution filters to three[5] along with a stride and a padding of one and duplicate [CONV ->PRELU] three times to better model the non-linearity. Fully connected, thus being able to feed it with images of different resolutions.

3.1 Benchmarks 3 KITTI Sintel 56 image sequences with in total 16,951 frames. 63 video clips with in total 5,670 frames from the movie http://www.cvlibs.net/datasets/kitti/ https://durian.blender.org/download/

3.2 Configuration 3 The training is performed using Caffe 2 K40c GPUs The weights of the network are initialized by Xavier's[6] approach Optimized by the Adam solver[7] with a fixed momentum of 0.9. The initial learning rate is set to 1e-3 and then manually tuned down once ceasing of loss reduction sets in. For training on the KITTI RAW data, the images are scaled to 384 X 128. For training on the Sintel dataset, the images are scaled to 256 X 128. The batch size is 16. Run the training on KITTI RAW from scratch for about 20 epochs, and then fine-tuned it on the Sintel movie images for 15 epochs. Did not observe over-fitting during training, and terminated the training after 5 days

3.3 Generalization Ability of the Trained Model. Examples of Frame interpolation

3.3 Generalization Ability of the Trained Model. Examples for image matching

3.4 Quantitative Performance of Image Matching Matching performance on the KITTI 2012 flow training set. DeepM denotes Deep Matching. Metrics: Average Point Error (APE) (the lower the better), and Accuracy@T (the higher the better). Bold numbers indicate best performance, underlined numbers 2nd best. Matching performance on the KITTI 2012 flow training set. DeepM denotes Deep Matching. Metrics: Average Point Error (APE) (the lower the better), and Accuracy@T (the higher the better). Bold numbers indicate best performance, underlined numbers 2nd best.

3.4 Quantitative Performance of Image Matching Flow performance on KITTI 2012 flow training set (non-occluded areas). out-x refers to the percentage of pixels where flow estimation has an error above x pixels. Flow performance on MPI-Sintel flow training set. s0-10 is the APE for pixels with motions between 0 and 10 pixels. Similarly for s10-40 and s40+

Thank you for your attention! THANKS Thank you for your attention!

Reference For The Presentation [1] Yildirim, I., Kulkarni, T., Freiwald, W., Tenenbaum, J.B.: Efficient and robust analysis-by-synthesis in vision: A computational framework, behavioral tests, and modeling neuronal representations. In: Annual Conference of the Cognitive Science Society. (2015)[2] Zeiler, M., Taylor, G., and Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. In ICCV, 2011. [2] Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. arXiv preprintarXiv:1505.01596 (2015) [3] Jensen, C., Reed, R.D., Marks, R.J., El-Sharkawi, M., Jung, J.B., Miyamoto, R.T.,Anderson, G.M., Eggen, C.J., et al.: Inversion of feedforward neural networks: Algorithms and applications. Proceedings of the IEEE 87(9) (1999) 1536-1549 [4] Fischer, P., Dosovitskiy, A., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: Learning optical ow with convolutional networks. arXiv preprint arXiv:1504.06852 (2015) [5] Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531(2014) [6] Glorot, X., Bengio, Y.: Understanding the dificulty of training deep feed forward neural networks. In: International conference on articial intelligence and statistics.(2010) 249-256 [7] Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)