Download presentation
Presentation is loading. Please wait.
1
Textual Video Prediction
REU Student: Emily Cosgrove Graduate Student: Amir Mazaheri Professor: Dr. Shah
2
Project Overview Video Prediction
Construct future frames, given a sequence of input frames Goal: Enhanced video prediction Textual information (NLP) for video prediction Decomposing Motion and Content for Natural Video Sequence Prediction (Ruben, et.)
3
Problem overview Current Video Prediction Systems: Our System:
Input Frames GAN Predicted Frames Our System: Input Frames GAN Predicted Frames Input Sentence
4
Dataset Large Scale Movie Description Challenge (LSMDC) challenge dataset 128,000 videos Each 2-20 seconds Annotations Clipped each video into 1 second clips (about 30 frames) Now have 150,000 video clips We use standard training-validation-test split (used in LSMDC ) 90% data as training, 5% validation, and 5% testing
7
MODEL Input Frames 4 5 6 1 2 3 LSTM CNN LSTM CNN Reconstructed Frame
DE-CONV 4 LSTM CNN DE-CONV Output DE-CONV 5 LSTM CNN Output MODEL Reconstructed Frame Output 6 LSTM CNN LSTM LSTM LSTM CNN CNN CNN 1 2 3 Predicted Frames Input Frames
8
Results LOSS Iteration #
9
Next steps Improve results Implement Adversarial Network Add text
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.