Understanding and Predicting Image Memorability at a Large Scale
Problem How can human visual memory be predicted? Unlike visual classification, images that are memorable, or forgettable do not even look alike:
Dataset As part of this work LaMem dataset is created: 60,000 images Diverse sources – AVA dataset, MIR Flickr, MIT 1003, NUSEF, SUN image popularity dataset, Abnormal objects dataset, aPascal dataset
Let’s Play You will be shown a stream of images, each for 1 second. Just CLAP if you think you have seen the image before in this game.
Collecting memorability data – Visual Memory Game Each task lasted about 4.5 minutes consisting of a total of 186 images divided into 66 targets, 30 fillers, and 12 vigilance repeats. Vigilance repeats are used to ensure that subjects are paying attention.
0.68
Understanding Memorability Memorability scores are normalized to lie between 0 and 1.
Flickr Fixation Flickr Affective dataset AVA dataset
MemNet Pre-trained Hybrid CNN Trained on ILSVRC 2012 and Places dataset Memorability is a single real valued output Last layer is a Euclidean loss layer to fine-tune the network For both HOG2X2 and features from CNNs, a linear Support Vector Regression machine is trained to predict memorability. False Alarms (FA) are used to account for instances when people may remember some images simply because they are familiar but not memorable. Human performance – 0.68
Visualization From top to bottom, we find the neurons could be specializing for the following: people, busy images (lots of gradients), specific objects, buildings, and finally open scenes. This matches our intuition of what objects might make an image memorable.
The segmentations produced by neurons in conv5 that are strongly correlated, either positively or negatively, with memorability.
Memorability Maps To generate memorability maps, images are scaled up and apply MemNet to overlapping regions of the image. This is done for multiple scales of the image and average the resulting memorability maps. Convert the fully-connected layers, fc6 and fc7 to convolutional layers of size 1 1, making the network fully-convolutional. http://memorability.csail.mit.edu/demo.html Slide Credit: Aditya Khosla
Use non-realistic photo renderings or cartoonization to emphasize/de-emphasize different parts of an image based on the memorability maps, and evaluate its impact on the memorability of an image.
Applications and Conclusion Automatically modifying the memorability of images Advertising Gaming Education Social Networking