Textual Video Prediction

Textual Video Prediction
REU Student: Emily Cosgrove Graduate Student: Amir Mazaheri Professor: Dr. Shah

Project Overview Video Prediction
Construct future frames, given a sequence of input frames Goal: Enhanced video prediction Textual information (NLP) for video prediction Decomposing Motion and Content for Natural Video Sequence Prediction (Ruben, et.)

Problem overview Current Video Prediction Systems: Our System:
Input Frames GAN Predicted Frames Our System: Input Frames GAN Predicted Frames Input Sentence

Dataset Large Scale Movie Description Challenge (LSMDC) challenge dataset 128,000 videos Each 2-20 seconds Annotations Clipped each video into 1 second clips (about 30 frames) Now have 150,000 video clips We use standard training-validation-test split (used in LSMDC ) 90% data as training, 5% validation, and 5% testing

MODEL Input Frames 4 5 6 1 2 3 LSTM CNN LSTM CNN Reconstructed Frame
DE-CONV 4 LSTM CNN DE-CONV Output DE-CONV 5 LSTM CNN Output MODEL Reconstructed Frame Output 6 LSTM CNN LSTM LSTM LSTM CNN CNN CNN 1 2 3 Predicted Frames Input Frames

Results LOSS Iteration #

Next steps Improve results Implement Adversarial Network Add text

Textual Video Prediction

Similar presentations

Presentation on theme: "Textual Video Prediction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Textual Video Prediction

Similar presentations

Presentation on theme: "Textual Video Prediction"— Presentation transcript:

Similar presentations

About project

Feedback