Announcements Project proposal due tomorrow

Slides:

Advertisements

Similar presentations

Imbalanced data David Kauchak CS 451 – Fall 2013.

Advertisements

ImageNet Classification with Deep Convolutional Neural Networks

Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.

Large-Scale Object Recognition with Weak Supervision

1 Learning to Detect Objects in Images via a Sparse, Part-Based Representation S. Agarwal, A. Awan and D. Roth IEEE Transactions on Pattern Analysis and.

Abstract We present a model of curvilinear grouping using piecewise linear representations of contours and a conditional random field to capture continuity.

Three kinds of learning

1 Accurate Object Detection with Joint Classification- Regression Random Forests Presenter ByungIn Yoo CS688/WST665.

Ensemble Learning (2), Tree and Forest

Recognition using Regions (Demo) Sudheendra V. Outline Generating multiple segmentations –Normalized cuts [Ren & Malik (2003)] Uniform regions –Watershed.

Detection, Segmentation and Fine-grained Localization

Object Detection with Discriminatively Trained Part Based Models

Supervised Learning of Edges and Object Boundaries Piotr Dollár Zhuowen Tu Serge Belongie.

Object detection, deep learning, and R-CNNs

Fully Convolutional Networks for Semantic Segmentation

Category Independent Region Proposals Ian Endres and Derek Hoiem University of Illinois at Urbana-Champaign.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Object Recognition by Integrating Multiple Image Segmentations Caroline Pantofaru, Cordelia Schmid, Martial Hebert ECCV 2008 E.

Unsupervised Visual Representation Learning by Context Prediction

Rich feature hierarchies for accurate object detection and semantic segmentation 2014 IEEE Conference on Computer Vision and Pattern Recognition Ross Girshick,

Spatial Localization and Detection

Gaussian Conditional Random Field Network for Semantic Segmentation

Deep Learning Overview Sources: workshop-tutorial-final.pdf

Spatial Pyramid Pooling in Deep Convolutional Networks for Visual Recognition arXiv: v4 [cs.CV(CVPR)] 23 Apr 2015 Kaiming He, Xiangyu Zhang, Shaoqing.

When deep learning meets object detection: Introduction to two technologies: SSD and YOLO Wenchi Ma.

Recent developments in object detection

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

- photometric aspects of image formation gray level images

The Relationship between Deep Learning and Brain Function

Summary of “Efficient Deep Learning for Stereo Matching”

Object Detection based on Segment Masks

Object detection with deformable part-based models

ECE 5424: Introduction to Machine Learning

Computer Science and Engineering, Seoul National University

Convolutional Neural Fabrics by Shreyas Saxena, Jakob Verbeek

Combining CNN with RNN for scene labeling (segmentation)

Nonparametric Semantic Segmentation

Structured Predictions with Deep Learning

CS6890 Deep Learning Weizhen Cai

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Non-linear classifiers Neural networks

Adversarially Tuned Scene Generation

Object detection.

Fully Convolutional Networks for Semantic Segmentation

Computer Vision James Hays

Introduction to Neural Networks

Image Classification.

Efficient Graph Cut Optimization for Full CRFs with Quantized Edges

“The Truth About Cats And Dogs”

Brief Review of Recognition + Context

On-going research on Object Detection *Some modification after seminar

KFC: Keypoints, Features and Correspondences

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

A Proposal Defense On Deep Residual Network For Face Recognition Presented By SAGAR MISHRA MECE

Semantic segmentation

Neural network training

On Convolutional Neural Network

Lecture: Deep Convolutional Neural Networks

Grouping/Segmentation

边缘检测年度进展概述 Ming-Ming Cheng Media Computing Lab, Nankai University

Heterogeneous convolutional neural networks for visual recognition

Human-object interaction

“Traditional” image segmentation

Winter in Kraków photographed by Marcin Ryczek

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Semantic Segmentation

Image recognition.

Object Detection Implementations

Learning Deconvolution Network for Semantic Segmentation

SFNet: Learning Object-aware Semantic Correspondence

Presentation transcript:

Announcements Project proposal due tomorrow Altered time for OH tomorrow: 9:00-10:00 am. Please complete mid-semester feedback

Semantic Segmentation

The Task person grass trees motorbike road

Evaluation metric Pixel classification! Accuracy? Heavily unbalanced Intersection over Union Average across classes and images Per-class accuracy

Things vs Stuff THINGS Person, cat, horse, etc Constrained shape Individual instances with separate identity May need to look at objects STUFF Road, grass, sky etc Amorphous, no shape No notion of instances Can be done at pixel level “texture”

Challenges in data collection Precise localization is hard to annotate Annotating every pixel leads to heavy tails Common solution: annotate few classes (often things), mark rest as “Other” Common datasets: PASCAL VOC 2012 (~1500 images, 20 categories), COCO (~100k images, 20 categories)

Pre-convnet semantic segmentation Things Do object detection, then segment out detected objects Stuff ”Texture classification” Compute histograms of filter responses Classify local image patches

Semantic segmentation using convolutional networks 3 h

Semantic segmentation using convolutional networks h/4

Semantic segmentation using convolutional networks h/4

Semantic segmentation using convolutional networks h/4

Semantic segmentation using convolutional networks #classes c Convolve with #classes 1x1 filters w/4 h/4

Semantic segmentation using convolutional networks Pass image through convolution and subsampling layers Final convolution with #classes outputs Get scores for subsampled image Upsample back to original size

Semantic segmentation using convolutional networks person bicycle

The resolution issue Problem: Need fine details! Shallower network / earlier layers? Not very semantic! Remove subsampling? Looks at only a small window!

Solution 1: Image pyramids Higher resolution Less context Learning Hierarchical Features for Scene Labeling. Clement Farabet, Camille Couprie, Laurent Najman, Yann LeCun. In TPAMI, 2013.

Solution 2: Skip connections upsample

Solution 2: Skip connections

Skip connections Fully convolutional networks for semantic segmentation. Evan Shelhamer, Jon Long, Trevor Darrell. In CVPR 2015

Skip connections Problem: early layers not semantic Horse If you look at what one of these higher layers is detecting, here I am showing the input image patches that are highly scored by one of the units. This particular layer really likes bicycle wheels, so if this unit fires, you know that there is a bicycle wheel in the image. Unfortunately, you don’t know where the bicycle wheel is, because this unit doesn’t care where the bicycle wheel appears, or at what orientation or what scale. Visualizations from : M. Zeiler and R. Fergus. Visualizing and Understanding Convolutional Networks. In ECCV 2014.

Solution 3: Dilation Need subsampling to allow convolutional layers to capture large regions with small filters Can we do this without subsampling?

Solution 3: Dilation Need subsampling to allow convolutional layers to capture large regions with small filters Can we do this without subsampling?

Solution 3: Dilation Need subsampling to allow convolutional layers to capture large regions with small filters Can we do this without subsampling?

Solution 3: Dilation Instead of subsampling by factor of 2: dilate by factor of 2 Dilation can be seen as: Using a much larger filter, but with most entries set to 0 Taking a small filter and “exploding”/ “dilating” it Not panacea: without subsampling, feature maps are much larger: memory issues

Solution 4: Conditional Random Fields Idea: take convolutional network prediction and sharpen using classic techniques Conditional Random Field

Fully Connected CRFs Typically, only adjacent pixels connected Fewer connections => Easier to optimize Dense connectivity: every pixel connected to everything else Intractable to optimize except if pairwise potential takes specific form Efficient Inference in Fully Connected CRFs with Gaussian Edge Potentials. Philipp Krahenbuhl, Vladlen Koltun. In NIPS, 2011.

Fully Connected CRFs Grid CRF Fully connected CRF Ground truth

Putting it all together Best Non-CNN approach: ~46.4% Semantic Image Segmentation with Deep Convolutional Nets and Fully Connected CRFs. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan Yuille. In ICLR, 2015.

Other additions Method mean IoU (%) VGG16 + Skip + Dilation 65.8 ResNet101 68.7 ResNet101 + Pyramid 71.3 ResNet101 + Pyramid + COCO 74.9 DeepLab: Semantic Image Segmentation with Deep Convolutional Nets, Atrous Convolution, and Fully Connected CRFs. Liang-Chieh Chen, George Papandreou, Iasonas Kokkinos, Kevin Murphy, Alan Yuille. Arxiv 2016.

Image-to-image translation problems

Image-to-image translation problems Segmentation Optical flow estimation Depth estimation Normal estimation Boundary detection …

Learning to segment images

The 3 core problems Reconstruction Machine Learning Reorganization (Grouping) Recognition Convolutional Networks

Revisiting contour detection

The BSDS Benchmark 5 humans annotate boundaries, take union Algorithm assigns “probability of boundary” to each pixel Threshold gives a binary map A B C

The BSDS Benchmark For particular threshold, get predicted boundary map Match predicted boundaries to ground truth Construct bipartite graph Construct optimal matching

The BSDS Benchmark For particular threshold, get predicted boundary map Match predicted boundaries to ground truth Construct bipartite graph Construct optimal matching Matched predicted boundaries = correct Precision = #matched predictions / #predictions Recall = #matched GT / #GT

The BSDS Benchmark Compute precision recall curve F-measure = harmonic mean of precision and recall Pick point with maximum F measure

Learning to detect boundaries Two steps in boundary detection: Compute local gradients: brightness, color, texture Combine information over entire image using normalized cuts Local gradients (e.g., texture gradients) involve complex pipelines Can we replace local gradients by something better?

Learning to detect boundaries Learn to identify local boundary pattern

Decision Trees and Random Forests Non-linear classifiers Has four legs? Has fur? Lives in water? cat zebra snake fish

Decision trees for classification Leaf nodes store distribution of labels For each test data point, find correct leaf node

Decision trees for classification Grow node by node At each node, try candidate splits based on sampled features Pick split that minimizes entropy of daughters

Shape? Color? Size? ,- ,+ ,+ ,- ,- ,+ Shape? square not square

Decision trees for classification End of training: entropy below threshold / maximum depth Each leaf node contains subset of training data points compute distribution of labels at leaves

Random forests Looking for best decision at every node bad: overfitting Sample a few candidates Construct an ensemble of trees: random forests

Random forests for boundary detection Not classification : label is boundary map Two key questions: How do we split nodes during training? How do we make predictions during testing?

Random forests for boundary detection Idea: cluster boundary maps into classes at each node Fast Edge Detection Using Structured Forests. P. Dollar, C. Lawrence Zitnick. In TPAMI 2015

Random forests for boundary detection Not classification : label is boundary map Two key questions: How do we split nodes during training? How do we make predictions during testing?

Random forests for boundary detection Each leaf node has set of boundary patches Simply output average of boundary patches at leaf

Convolutional network based edge detection Deep supervision: Skip connections, but additional loss at each layer Method Max F measure Structured Edges 74.6 HED without deep supervision 77.1 HED with deep supervision 78.2 Humans 80 Holistically-Nested Edge Detection. Saining Xie, Zhuowen Tu. In ICCV, 2015.

Convolutional network based edge detection

Why do convolutional networks work for edges? Structured forests look at patches, convnet receptive fields are much larger Convnets are pretrained on ImageNet : can implicitly recognize objects?