Cutout RL (reinforcement learning)

Slides:

Advertisements

Similar presentations

Request Dispatching for Cheap Energy Prices in Cloud Data Centers

Advertisements

SpringerLink Training Kit

Luminosity measurements at Hadron Colliders

From Word Embeddings To Document Distances

Choosing a Dental Plan Student Name

Virtual Environments and Computer Graphics

Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI

THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –

D. Phát triển thương hiệu

NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN

Điều trị chống huyết khối trong tai biến mạch máu não

BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.

Nasal Cannula X particulate mask

Evolving Architecture for Beyond the Standard Model

HF NOISE FILTERS PERFORMANCE

Electronics for Pedestrians – Passive Components –

Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel

L-Systems and Affine Transformations

CMSC423: Bioinformatic Algorithms, Databases and Tools

Some aspect concerning the LMDZ dynamical core and its use

Bayesian Confidence Limits and Intervals

实习总结（Internship Summary)

Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,

Front End Electronics for SOI Monolithic Pixel Sensor

Face Recognition Monday, February 1, 2016.

Solving Rubik's Cube By: Etai Nativ.

CS284 Paper Presentation Arpad Kovacs

انتقال حرارت 2 خانم خسرویار.

Summer Student Program First results

Theoretical Results on Neutrinos

HERMESでのHard Exclusive生成過程による核子内クォーク全角運動量についての研究

Wavelet Coherence & Cross-Wavelet Transform

yaSpMV: Yet Another SpMV Framework on GPUs

Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.

MOCLA02 Design of a Compact L-band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Fuel cell development program for electric vehicle

Overview of TST-2 Experiment

Optomechanics with atoms

داده کاوی سئوالات نمونه

Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium

ლექცია 4 - ფული და ინფლაცია

10. predavanje Novac i financijski sustav

Wissenschaftliche Aussprache zur Dissertation

FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,

Particle acceleration during the gamma-ray flares of the Crab Nebular

Interpretations of the Derivative Gottfried Wilhelm Leibniz

Advisor: Chiuyuan Chen Student: Shao-Chun Lin

Widow Rockfish Assessment

SiW-ECAL Beam Test 2015 Kick-Off meeting

On Robust Neighbor Discovery in Mobile Wireless Networks

Chapter 6 并发：死锁和饥饿 Operating Systems: Internals and Design Principles

You NEED your book!!! Frequency Distribution

Y V =0 a V =V0 x b b V =0 z

Fairness-oriented Scheduling Support for Multicore Systems

Climate-Energy-Policy Interaction

Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,

Ch48 Statistics by Chtan FYHSKulai

The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.

Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs

Online Learning: An Introduction

Factor Based Index of Systemic Stress (FISS)

What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.

THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*

Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.

The Toroidal Sporadic Source: Understanding Temporal Variations

FW 3.4: More Circle Practice

ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف

Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM

Limits on Anomalous WWγ and WWZ Couplings from DØ

Presentation transcript:

Cutout RL (reinforcement learning) Presented by: Yonatan Zarecki, Ziv Izhar

Project’s GOAL We set out to build a new kind of task for neural network, the task of segmenting an image using CutOut. Using reinforcement learning1 algorithms we wanted to design a model that could use the CutOut algorithm to create lines that were interpretable to humans. Giving us some insight on how these networks operate. 1 https://en.wikipedia.org/wiki/Reinforcement_learning

GAME DEFINITION We start with 2 points. A point on the object. A point on the background.

GAME DEFINITION The points can move in 16 directions.

GAME DEFINITION The model chooses one step, and a point moves in the desired direction

GAME DEFINITION CutOut Segmentation Ground truth The reward at each step is the intersection-over-union of the resulting CutOut segmentation. 𝑟𝑒𝑤𝑎𝑟𝑑= 𝑎𝑟𝑒𝑎 𝑜𝑓 𝑖𝑛𝑡𝑒𝑠𝑒𝑐𝑡𝑖𝑜𝑛 𝑎𝑟𝑒𝑎 𝑜𝑓 𝑢𝑛𝑖𝑜𝑛 Ground truth

GAME DEFINITION And the game continues .. Until we run out of steps

GAME DEFINITION - overview The game start with 2 points, on the object and in the background. At every step we choose from 16 directions to move out current points to. We keep making steps in the image until we run out of steps. Then the game is finished.

Proposed pipeline We opted to use a reinforcement learning based model for our problem as it is very similar to games like PONG and Breakout. RL models worked very well on these games while working on only the pixel data. BREAKOUT PONG

Convolutional Neural Network Proposed pipeline Dir 1 Convolutional Neural Network First we enter the image to the CNN, and then from it’s features we decide the wanted direction, sounds simple right ? Dir 16 CNN

Work throughout the Project

Looking for a baseline First we went and looked for working implementation of RL algorithms for games similar to our which rely on CNNs and get their information from pixels. We started from the PONG game, famously solved by NN-based RL models. There is a list of implementation with the highest scores on OpenAI’s gym website. However, none of them actually worked ! They top model won 33% of the time after 2 days of training.

Looking for a baseline (2) BREAKOUT Mean Score BREAKOUT We continued searching for a working model to start with, and switched to looking for Breakout models. At the end the last implementation1 we tried from the top 10 in the gym’s website was the first one to actually work ! This implementation also worked on PONG which was promising, So we set out to understand the code and use it for our problem ! Win-rate graph on PONG 1 https://github.com/ppwwyyxx/tensorpack

Implementing our Problem We’ve started implementing our problem environment, using Elad CutOut algorithm in C++ we could: Segment images based on lines Feed the network with an updated version of the image with the new lines over it We implemented a function for calculating the IoU We started each “game” with a point on the object in (50, 50) and outside the object in (0, 0)

Training the model After fixing some small errors, we started training. We trained the model overnight and saw no progress  (also after more hours of training we didn’t any progression) After thinking over the results, we decided to try a different input to the network. Instead of using only the image with scribbles we opted to use 3 layers, the original image, object line and background line.

Training the model - Step count After visualizing the model’s work on several images we saw the amount of steps we used for the model (16 steps with length of 8 pixels) was insufficient for the model to create a meaningful scribble. We’ve experimented with 16, 40, 60 and 100 steps and only 60,100 seemed sufficient. After training with a larger step count we can see some small progress, but still not by a very significant amount.

Training the model - Reward at end Seeing the model takes a very long time to train, but makes a relatively small amount of epochs in the process. We’ve measured the amounts of time spent in each part of the system and got some interesting results. Most of the time is spent calculating the reward function, which involves calling CutOut at every step. This resulted in in a very high CPU utilization (100% on all cores) and very low GPU utilization (2-3% at most).

Training the model - Reward at end (2) In order to increase the GPU utilization and make the model make more updates. We decided to try a reward-at-end approach, where the reward is 0 at all steps until the very end where we run CutOut, similarly to games like chess. (We’ve also made the epochs much larger) We also switched the graph to show only the end results IoU instead of the sum over all the steps. The results were disappointing even tho the model made much more epochs and GPU utilization was high, the overall IoU reward was lower than before, and very low in general. Reward-at-end IoU Hours We think this is before the model get less information this way

Training the model - Changing the learning rate As it is commonly known, the learning-rate is one of the most important hyper-parameters for any NN model. So we decided it would be important to play with it a bit. We thought maybe a larger LR would help the model converge faster (as it starts from completely random weights). The IoU was much better than before, however the qualitative results not as much. Larger LR IoU Hours

Training the model - Changing the learning rate (2)

Training the model – Enlarging the epoch size We thought maybe the epoch size was too small, resulting in steps in the wrong direction. In order to estimate the gradient better, we enlarged the epoch size by the multiple of 4, 8, 16 and run the training procedure. In this case, the model ran so slowly it didn’t make many epochs at all, so we halted the experiments mid-way without seeing any significant improvements.

Training the model – Enlarge the step length Echoing the problems state beforehand, CutOut takes most of our runtime. In another effort to reduce the time it takes to run, we wanted to keep the game short by enlarging the step length and reducing the number of steps. This ideas has indeed shorten our runtimes without any loss in overall score. Small step length Large step length

Training the model – Enlarge the step length (2) But even running the model for an entire week didn’t seem to show an increase in performance over the established 0.8 score. Qualitative results even show that some games are stuck in a loop. Larger step length IoU Hours

Training the model – ENLARGING THE INPUT IMAGE The image’s size was reduced to 128x128, we decided to try and enlarge it to 256x256. This has also didn’t change results significantly. Larger input image IoU Hours

Training the model At this stage we stopped experimenting new ideas

Future goals Taking over the world ! We talked bout this with Elad, and decided to end the project here  We need to: Write a final report going through all the experiments we’ve tried. Submit the code to GIP, it may be useful for future teams working on reinforcement learning tasks (as a basic model to start with).