Self-Supervised Cross-View Action Synthesis

Self-Supervised Cross-View Action Synthesis
Kara Schatz Advisor: Dr. Yogesh Rawat UCF CRCV – REU, Summer 2019

Synthesize a video from an unseen view.
Project Goal Synthesize a video from an unseen view.

Synthesize a video from an unseen view.
Project Goal Synthesize a video from an unseen view. Given: video of the same scene from a different viewpoint single image from the desired viewpoint

Motivation

Motivation Humans can do this easily. Can machines too?

Motivation Humans can do this easily. Can machines too?
Cross-view image synthesis has been done

Motivation Humans can do this easily. Can machines too?
Cross-view image synthesis has been done Cross-view video synthesis has not

Datasets

Datasets NTU 13K+ training videos 5K+ testing videos 3 camera angles:
-45°, 0°, +45°

Datasets NTU PANOPTIC 13K+ training videos 5K+ testing videos
3 camera angles: -45°, 0°, +45° ~4000 training samples ~500 testing samples 100 cameras

Approach

Approach Key Point Extraction Key Point Extraction Key-points

Approach Key Point Extraction Trans- formation Key Point Extraction
viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Approach Key Point Extraction Trans- formation Consistency losses
viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Total Loss vs. Epochs Dataset = NTU Batch size = 20 Frame count = 16
Skip rate = 2 Old network New network

Output Frames: NTU Network 1 Input: Output: Ground Truth:

Output Frames: NTU Network 1 Network 2 Input: Output: Ground Truth:

Output Frames: Panoptic
Network 1 Input: Output: Ground Truth:

Network 1 Network 2 Input: Output: Ground Truth:

Output Frames: NTU FRAME 1 FRAME 2 Ground Truth: Output:

Output Frames: NTU . . . . . . FRAME 1 FRAME 2 FRAME 15 FRAME 16
Ground Truth: . . . Output:

Ground Truth: Output:

. . . Ground Truth: . . . Output:

Next Step Key Point Extraction Trans- formation Consistency losses
viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Next Step Improve key-point prediction and transformation to hopefully capture the actions in the videos Tomas Jakab, Ankush Gupta, Hakan Bilen, and Andrea Vedaldi. Unsupervised learning of object landmarks through conditional image generation, 2018.

Self-Supervised Cross-View Action Synthesis

Similar presentations

Presentation on theme: "Self-Supervised Cross-View Action Synthesis"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Self-Supervised Cross-View Action Synthesis

Similar presentations

Presentation on theme: "Self-Supervised Cross-View Action Synthesis"— Presentation transcript:

Similar presentations

About project

Feedback