Textual Video Prediction REU Student: Emily Cosgrove Graduate Student: Amir Mazaheri Professor: Dr. Shah
Preliminary Overview Deep Learning, CNNs, and RNNs Computer Vision and Natural Language Processing General Adversarial Networks (GANs) Video Prediction Missing Idea?
Problem description Goal: Use NLP and textual information for video prediction Possible Contribution: Enhanced/different video prediction
Problem Description Current Video Prediction Systems: Our System: Input Frames GAN Predicted Frames Our System: Input Frames GAN Predicted Frames Input Sentence
Tasks Step 3: Prepare our measurements Step 4: Formulate our solution Step 1: Study current methods to predict videos Learn how to run and setup current method’s codes Step 2: Study datasets which have been used for video prediction so far Possibly provide textual annotations for some of them. Step 3: Prepare our measurements How do we evaluate our results? Which other methods can we compare with? Step 4: Formulate our solution Discuss ideas to solve the problem Step 5: Implementation We will use Keras or Tensorflow to implement our ideas. Step 6: Baseline experiments
Weekly Progress Introductory Meetings with Mentor Read papers related to topic General Adversarial Networks (Goodfellow) Decomposing Motion and Content for Natural Video Sequence Prediction (Ruben Villegas, et.) Began Step 1 https://github.com/tensorflow/models/tree/master/video_prediction Model we are currently working with
Research Paper: General Adversarial Networks Author: Goodfellow Generator v. Discriminator Input: Random Noise Loss Functions Discriminator Generator 𝛻 θ g 1 m 𝑖=1 𝑚 log (1 −𝐷 𝐺 𝑧 𝑖 ) 𝛻 θ d 1 m 𝑖=1 𝑚 log 𝐷 𝑥 𝑖 + log (1 −𝐷 𝐺 𝑧 𝑖 )
Next week Continue Step 1 Preprocess movie dataset Study codes for current methods Read and study paper related to code Preprocess movie dataset
References Goodfellow, Ian, et al. "Generative adversarial nets." Advances in neural information processing systems. 2014. Mathieu, Michael, Camille Couprie, and Yann LeCun. "Deep multi-scale video prediction beyond mean square error." arXiv preprint arXiv:1511.05440 (2015). Villegas, Ruben, Jimei Yang, Seunghoon Hong, Xunyun Lin, and Honglak Lee. “Decomposing Motion and Content for Natural Video Sequence Prediction.” ICLR (2017).