Image Recognition using Hierarchical Temporal Memory Radoslav Škoviera Ústav merania SAV Fakulta matematiky, fyziky a informatiky UK
Image Recognition Applications: Digital image databases, surveillance, industry, medicine Tasks: Object recognition, automatic annotation, content based image search Input: Digital Image – Single object – Scene (multiple objects – clutter, occlusion, merging) Output: Description of the input image – Keywords, scene semantics, similar images Subtasks: image segmentation, feature extraction, classification
Motivation Image recognition – Very easy for us humans (and [other] animals) – Computers can‘t do it neither quickly, nor accurately enough, yet Good motivation for the researchers in the field of AI – bio-inspired models
Hierarchical Temporal Memory (HTM) Developed by Jeff Hawkins and Dileep George (Numenta) Hierarchical tree-shaped network Bio-inspired – based on large scale model of the neocortex Consists of basic operational units – nodes – Each node uses the same two-stage learning algorithm: 1) Spatial Learning (Pooling) 2) Temporal Learning (Pooling) – Learning is performed layer-by-layer – Nodes have receptive fields – each (except for the top node) can look only at a portion of the input image
Spatial Learning Observe common patterns in the input space (training images) Group them into clusters of spatially simillar patterns Use only one representative of each cluster – Generate „codebook“ Input space and spatial noise reduction
Temporal Learning Uses time sequences to learn correlations of spatial patterns
Temporal Learning
In each training step, TAM is increased at the locations corresponding with the co- occurring codebook patterns according to the update function defined as follows:
Inference & Classification Uses simlar dataflow as learning Two stages of inference in each node: – Spatial inference – find the closest pattern in the codebook – Temporal inference – calculate membership into temporal groups Classification – HTM itself does not classify images, it only transforms input space into another (hopefully more inviariant) space – External classifier must be used
ATM Security ATM (automatic teller machine) semiatomatic fraud detection system – Detection of masked individuals interacting with the ATM through the ATM‘s camera – possibility of illegal activity Pilot system implemented and tested in an experimental environment Using Kinect as an input device
Kinect RGB camera developed for the XBOX game console – Capable of providing depth image for the scene and a „skeleton“ if a person is detected on the scene
Experiment Setup
Face Image Segmentation using Kinect
Two image classes: normal and anomalous faces
ATM Security – Results Image set inflated with translated, rotated and mirrored copies of the original images k-NN classifier in the input space was compared with the combination of the HTM and k-NN and HTM and SVM classifier Scenario 1: The whole data set was used and Scenario 2: Translated images were excluded from the training set
New features and algorithms for the HTM New temporal pooler Images transformed to different image spaces – different image features Various settings for the temporal pooler SOM as spatial pooler
Testing of new image features Dataset: selected images from Caltech 256 – 10 classes, 30 testing and 30 training images per class Single layer network – With 1-NN classifier as top node – Image features extracted from image patches corresponding to the receptive fields of nodes
Results % TE window step size in pixels s1s2s4s8 RGB CA42,8741,6140,8638,00 med42,5041,3341,0038,17 Grey CA40,1339,6338,4134,68 med39,6739,3337,8335,67 Canny CA40,3542,3343,6643,55 med40,5041,8343,00 Lab CA44,9244,1744,2343,17 med44,8344,5043,67 GLD CA45,9546,0146,4346,10 med46,0046,1246,1746,00
problems - background
Thank you for your attention