DeepCount Mark Lenson
Applications for Counting Satellite images Cars, People Biology Cell counting Agriculture Plants, animals, bugs
Previous Work: Counting through Density Estimation Counts objects by estimating object density [1] Done by optimizing coefficients W to obtain a density function value at each pixel
Previous Work: Counting through Density Estimation Optimization of loss function: Linear mapping transforms each pixel into a density value No deep learning https://youtu.be/hgA2BkR1igo
Deep Learning Deep learning is a type of machine learning that uses algorithms and artificial neural networks with many layers An artificial neural network is a biologically inspired network of nodes, connected by weights, and arranged into layers A large dataset is provided for the network to learn from, known as training examples Example: This network can then be used to predict handwritten digits by recognizing patterns [2]
Alexnet Convolutional neural network with hidden layers Used for image recognition 5 convolutional layers, 2 fully connected layers, 1 softmax output layer Typical Alexnet Structure [3]
Process/Definitions Netlogo used to generate dataset of two-thousand of images with random number of objects (shapes) Images contained 1 to 10 objects Ran in Jupyter Notebook, an interface for running python TensorFlow is a library used for machine learning and training neural networks Tflearn is a deep learning library built in Tensorflow that contains Alexnet Tensorboard used to analyze and inspect data from TensorFlow
Parameters Changed in Alexnet Original Alexnet from tflearn demo had accuracy ~ 29% Filter changed from 11 to 3x3 Stride changed from 5 to 2 to include more pixels in the convolution Accuracy increased to ~ 80%
Original Alexnet
Original Alexnet
New Alexnet http://99.58.99.245:18888/notebooks/notebooks/DeepCount.ipynb#
New Alexnet When new Alexnet allowed to run for the full 500 epochs Accuracy > 90%
New Alexnet Accuracy for images with 1 to 10 objects Accuracy vs. Number of Objects in the Image
Confusion matrix Shows which classifications were mistaken Row index: What it should have predicted Column index: Actual prediction Fuzziness possibly due to resolution error (Two objects stacked on top of each other)
How Alexnet compares to human performance Verbal subitizing, children 6-16 years old [4] Alexnet
References [1] “Learning to Count Objects in Images” Visual Geometry Group, University of Oxford [2] http://neuralnetworksanddeeplearning.com/chap1.html [3] https://www.saagie.com/blog/object-detection-part1 [4] https://www.researchgate.net/figure/273911999_fig2_FIGURE-2- Verbal-subitizing-accuracy-as-a-function-of-numerosity-TD-14- typically