‘Learning Image Matching by Simply Watching Video’

Slides:



Advertisements
Similar presentations
ImageNet Classification with Deep Convolutional Neural Networks
Advertisements

Tiled Convolutional Neural Networks TICA Speedup Results on the CIFAR-10 dataset Motivation Pretraining with Topographic ICA References [1] Y. LeCun, L.
Spatial Pyramid Pooling in Deep Convolutional
Information Fusion Yu Cai. Research Article “Comparative Analysis of Some Neural Network Architectures for Data Fusion”, Authors: Juan Cires, PA Romo,
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.
Intelligent Database Systems Lab 國立雲林科技大學 National Yunlin University of Science and Technology 1 A self-organizing map for adaptive processing of structured.
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue
Convolutional Neural Network
Deep Learning Overview Sources: workshop-tutorial-final.pdf
When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.
Deeply-Recursive Convolutional Network for Image Super-Resolution
Recent developments in object detection
Convolutional Sequence to Sequence Learning
Convolutional Neural Network
The Relationship between Deep Learning and Brain Function
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Learning Amin Sobhani.
Data Mining, Neural Network and Genetic Programming
Data Mining, Neural Network and Genetic Programming
Semi-Global Matching with self-adjusting penalties
Computer Science and Engineering, Seoul National University
Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek
Saliency-guided Video Classification via Adaptively weighted learning
A Neural Approach to Blind Motion Deblurring
Article Review Todd Hricik.
Principal Solutions AWS Deep Learning
Adversaries.
Combining CNN with RNN for scene labeling (segmentation)
Compositional Human Pose Regression
Ajita Rattani and Reza Derakhshani,
Intelligent Information System Lab
Efficient Deep Model for Monocular Road Segmentation
Dipartimento di Ingegneria «Enzo Ferrari»
CS6890 Deep Learning Weizhen Cai
Adversarially Tuned Scene Generation
By: Kevin Yu Ph.D. in Computer Engineering
Layer-wise Performance Bottleneck Analysis of Deep Neural Networks
Computer Vision James Hays
CNNs and compressive sensing Theoretical analysis
Introduction to Neural Networks
Two-Stream Convolutional Networks for Action Recognition in Videos
Towards Understanding the Invertibility of Convolutional Neural Networks Anna C. Gilbert1, Yi Zhang1, Kibok Lee1, Yuting Zhang1, Honglak Lee1,2 1University.
CS 4501: Introduction to Computer Vision Training Neural Networks II
Deep learning Introduction Classes of Deep Learning Networks
Very Deep Convolutional Networks for Large-Scale Image Recognition
network of simple neuron-like computing elements
Single Image Rolling Shutter Distortion Correction
KFC: Keypoints, Features and Correspondences
A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE
LECTURE 35: Introduction to EEG Processing
Neural Networks Geoff Hulten.
Outline Background Motivation Proposed Model Experimental Results
LECTURE 33: Alternative OPTIMIZERS
Neural Speech Synthesis with Transformer Network
Visualizing and Understanding Convolutional Networks
Neural networks (3) Regularization Autoencoder
边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University
Heterogeneous convolutional neural networks for visual recognition
Open Topics.
Deep Learning Authors: Yann LeCun, Yoshua Bengio, Geoffrey Hinton
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Learning Deconvolution Network for Semantic Segmentation
Multi-UAV to UAV Tracking
Deep learning: Recurrent Neural Networks CV192
Week 3 Volodymyr Bobyr.
CSC 578 Neural Networks and Deep Learning
Week 7 Presentation Ngoc Ta Aidean Sharghi
Recent Developments on Super-Resolution
SDSEN: Self-Refining Deep Symmetry Enhanced Network
Presentation transcript:

‘Learning Image Matching by Simply Watching Video’ *A presentation for this article: ‘Gucan Long, Laurent Kneip etc, Learning Image Matching by Simply Watching Video, CVPR, 2016.’ ‘Learning Image Matching by Simply Watching Video’ Reporter: Xi Mo Ph.D. Candidate of ITTC, University of Kansas Date: Feb 20rd, 2017

What will be expected in this article? 1.Mechanism Analysis Temporal coherency that is naturally contained in real-world video sequences The CNN for frame-interpolation can be trained in an unsupervised manner Learn image matching 2.Unsupervised Learning Train and apply a neural network for frame interpolation Inverting the CNN to find the correspondence 3.Experiments and Results Presented approach achieves surprising performance comparable to traditional empirically designed methods

1.1 Matching by Inverting[2] a Deep neural network (MIND) Dense matching can be regarded as a sub-problem of frame-interpolation correspondence-based image warping Dense inter-frame matches are available interpolation could be immediately generated Train a deep neural network for frame interpolation, its application would implicitly also generate knowledge about dense image correspondences

1.2 Analysis by Synthesis[1] Learning is described as the acquisition of a measurement synthesizing model, and inference of generating parameters as model inversion once correct synthesis is achieved. In the context, synthesis simply refers to frame interpolation. Input image pair: 2 Input frames and 1 output interpolation frame (ground truth) The two input pixels (one per input frame) that have the maximum influence are considered as a match Produces sensitivity maps for each interpolated pixel

Traditional Definition 1 *Supplement: Network inversion Traditional[3] vs Author Traditional Definition Inverting a learned network is defined as reconstructing the input from the output of an artificial neural network Authors’ Definition Back-propagation through a learned network in order to obtain the gradient map with respect to the input signals Sensitivity map Indicates how much each input pixel influences a particular output pixel, e.g. the gradients

2.1 Process of Unsupervised Learning It can be split up into h X w non-linear sub-functions, each one producing the corresponding pixel in the output image. Define the absolute gradients of the output point (i,j) with respect to each one of the input images, and evaluated for the concrete inputs By computing the two gradient maps for each point in the output image and extracting each time the most responsible point, we thus obtain the following two lists of points. Then combine same-index elements.

2.2 Structure of Deep Learning Neural Network Inspired by FlowNetSimple[4] Consists of a convolutional part (encoder) and a deconvolutional part (decoder) Set the size of the receptive field of all convolution filters to three[5] along with a stride and a padding of one and duplicate [CONV ->PRELU] three times to better model the non-linearity. Fully connected, thus being able to feed it with images of different resolutions.

3.1 Benchmarks 3 KITTI Sintel 56 image sequences with in total 16,951 frames. 63 video clips with in total 5,670 frames from the movie http://www.cvlibs.net/datasets/kitti/ https://durian.blender.org/download/

3.2 Configuration 3 The training is performed using Caffe 2 K40c GPUs The weights of the network are initialized by Xavier's[6] approach Optimized by the Adam solver[7] with a fixed momentum of 0.9. The initial learning rate is set to 1e-3 and then manually tuned down once ceasing of loss reduction sets in. For training on the KITTI RAW data, the images are scaled to 384 X 128. For training on the Sintel dataset, the images are scaled to 256 X 128. The batch size is 16. Run the training on KITTI RAW from scratch for about 20 epochs, and then fine-tuned it on the Sintel movie images for 15 epochs. Did not observe over-fitting during training, and terminated the training after 5 days

3.3 Generalization Ability of the Trained Model. Examples of Frame interpolation

3.3 Generalization Ability of the Trained Model. Examples for image matching

3.4 Quantitative Performance of Image Matching Matching performance on the KITTI 2012 flow training set. DeepM denotes Deep Matching. Metrics: Average Point Error (APE) (the lower the better), and Accuracy@T (the higher the better). Bold numbers indicate best performance, underlined numbers 2nd best. Matching performance on the KITTI 2012 flow training set. DeepM denotes Deep Matching. Metrics: Average Point Error (APE) (the lower the better), and Accuracy@T (the higher the better). Bold numbers indicate best performance, underlined numbers 2nd best.

3.4 Quantitative Performance of Image Matching Flow performance on KITTI 2012 flow training set (non-occluded areas). out-x refers to the percentage of pixels where flow estimation has an error above x pixels. Flow performance on MPI-Sintel flow training set. s0-10 is the APE for pixels with motions between 0 and 10 pixels. Similarly for s10-40 and s40+

Thank you for your attention! THANKS Thank you for your attention!

Reference For The Presentation [1] Yildirim, I., Kulkarni, T., Freiwald, W., Tenenbaum, J.B.: Efficient and robust analysis-by-synthesis in vision: A computational framework, behavioral tests, and modeling neuronal representations. In: Annual Conference of the Cognitive Science Society. (2015)[2] Zeiler, M., Taylor, G., and Fergus, R. Adaptive deconvolutional networks for mid and high level feature learning. In ICCV, 2011. [2] Agrawal, P., Carreira, J., Malik, J.: Learning to see by moving. arXiv preprintarXiv:1505.01596 (2015) [3] Jensen, C., Reed, R.D., Marks, R.J., El-Sharkawi, M., Jung, J.B., Miyamoto, R.T.,Anderson, G.M., Eggen, C.J., et al.: Inversion of feedforward neural networks: Algorithms and applications. Proceedings of the IEEE 87(9) (1999) 1536-1549 [4] Fischer, P., Dosovitskiy, A., Ilg, E., Hausser, P., Hazirbas, C., Golkov, V., van der Smagt, P., Cremers, D., Brox, T.: Flownet: Learning optical ow with convolutional networks. arXiv preprint arXiv:1504.06852 (2015) [5] Chatfield, K., Simonyan, K., Vedaldi, A., Zisserman, A.: Return of the devil in the details: Delving deep into convolutional nets. arXiv preprint arXiv:1405.3531(2014) [6] Glorot, X., Bengio, Y.: Understanding the dificulty of training deep feed forward neural networks. In: International conference on articial intelligence and statistics.(2010) 249-256 [7] Kingma, D., Ba, J.: Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980 (2014)