Textual Video Prediction REU Student: Emily Cosgrove Graduate Student: Amir Mazaheri Professor: Dr. Shah
PSNR PSNR (peak signal to noise ratio) Most common measurement for video prediction PSNR = 10 log 10 𝑀𝐴𝑋 𝑖 2 𝑀𝑆𝐸
Prednet We trained and tested PREDNET on our data Movie dataset It was originally trained on the KITTI datasets It predicts one frame We are working on to change the code to predict multiple frames
PSNR Table Method Name PSNR Details PREDNET 17.58 Predicts Just one Frame N/A Predicts Multiple Frames (Working on the code) ConvLSTM 21.3 Predicts Multiple Frames STN (Prediction of tx and ty) 31.235 ConvLSTM + Text
Next steps Compute the spatial attention Possible Usage of Text Copy pixels out of attention area Predict pixels inside the attention area
Video Spatial Attention Generated Text LSTM Background Video Video Spatial Attention