Deep screen image crop and enhance Week 8 (Aaron Ott, Amir Mazaheri)
Problem We have taken a photo of an image, and we want the original image. This network for this can be broken into 2 parts: Image Detector/Cropper Image Enhancer
Cropper Uses a frozen VGG-19 model to get feature map Applies convolutions, normalizations, and activations Final dense layer creates 6-number theta value for affine transformation STN takes input image and applies affine transformation
Enhancer Pretrained EDSR (trained on DIV2K) Modified form of Resnet https://github.com/krasserm/super-resolution Pretrained EDSR (trained on DIV2K) Modified form of Resnet Uses modified residual block, which excludes batch normalization and final ReLU layer 16 Residual blocks Subpixel Conv2D layers for upscaling the image Scales the image 4x Lim, Son, Kim, Nah, Lee. “Enhanced Deep Residual Networks for Single Image Super-Resolution”. 10 July 2017
Combined Cropper and Enhancer Trained with 2 outputs and 2 Loss Functions: - Trained Cropper on VGG + Cosine Proximity (Inception Loss) - Trained Enhancer on VGG + MSE
Results Metric\Model Cropper Cropper & Enhancer PSNR 11.1903 16.2060 SSIM 0.4254 0.4909 MSE 0.0796 0.0281 MOS 2.6143 2.8857 Results Cropper & Enhancer Input Cropper Actual
Synthetic Dataset Problem: There is no existing dataset to use when solving this problem, and taking pictures takes too much time Solution: Automatically generate images with various transformations over various backgrounds - Current problems: sometimes image edges get cut out, difficult to get full variety of possible images, doesn’t yet account for discoloration or image noise, dataset only includes birds http://www.vision.caltech.edu/visipedia/CUB-200.html, http://places2.csail.mit.edu/download.html
Synthetic Dataset Results Original Cropper + Enhancer Cropper w/ SD Cropper w/ 2 SDs Original Cropper Input Truth * Note: Used separate validation data set that none of the networks had been trained on. PSNR 12.5088 12.3735 12.6044 12.8537 SSIM 0.3366 0.3335 0.3450 0.3437 MSE 0.0609 0.0586 0.0578
Projective Transformation Issues It turns out the STN we were using cannot handle projective transformations (it doesn’t take in account a z axis in any of the equations) After searching through many implementations, we could not find a STN implementation that allowed for projective transformations. Existing projective transformation functions don’t allow for passing gradients. Workarounds?
New Objective: Can we give our network an input image with multiple images, tell it which class of image to retrieve, and retrieve the correct image? 1 - Balloon 2 - Birdhouse 4 – Persian Cat 3 - French Bulldog
Other additions to our network: Attention module – Identify area of the photo where the image specified is Multiple Croppers – Try to progressively crop the image to get better and better crops
Next Week Continue running experiments Get paper written