Presentation is loading. Please wait.

Presentation is loading. Please wait.

Self-Supervised Cross-View Action Synthesis

Similar presentations


Presentation on theme: "Self-Supervised Cross-View Action Synthesis"— Presentation transcript:

1 Self-Supervised Cross-View Action Synthesis
Kara Schatz Advisor: Dr. Yogesh Rawat UCF CRCV – REU, Summer 2019

2 Synthesize a video from an unseen view.
Project Goal Synthesize a video from an unseen view.

3 Synthesize a video from an unseen view.
Project Goal Synthesize a video from an unseen view. Given: video of the same scene from a different viewpoint single image from the desired viewpoint

4 Motivation

5 Motivation Humans can do this easily. Can machines too?

6 Motivation Humans can do this easily. Can machines too?
Cross-view image synthesis has been done

7 Motivation Humans can do this easily. Can machines too?
Cross-view image synthesis has been done Cross-view video synthesis has not

8 Datasets

9 Datasets NTU 13K+ training videos 5K+ testing videos 3 camera angles:
-45°, 0°, +45°

10 Datasets NTU PANOPTIC 13K+ training videos 5K+ testing videos
3 camera angles: -45°, 0°, +45° ~4000 training samples ~500 testing samples 100 cameras

11 Approach

12 Approach

13 Approach

14 Approach Key Point Extraction Key Point Extraction Key-points

15 Approach Key Point Extraction Trans- formation Key Point Extraction
viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

16 Approach Key Point Extraction Trans- formation Consistency losses
viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

17 Total Loss vs. Epochs Dataset = NTU Batch size = 20 Frame count = 16
Skip rate = 2 Old network New network

18 Output Frames: NTU Network 1 Input: Output: Ground Truth:

19 Output Frames: NTU Network 1 Network 2 Input: Output: Ground Truth:

20 Output Frames: Panoptic
Network 1 Input: Output: Ground Truth:

21 Output Frames: Panoptic
Network 1 Network 2 Input: Output: Ground Truth:

22 Output Frames: NTU FRAME 1 FRAME 2 Ground Truth: Output:

23 Output Frames: NTU . . . . . . FRAME 1 FRAME 2 FRAME 15 FRAME 16
Ground Truth: . . . Output:

24 Output Frames: Panoptic
Ground Truth: Output:

25 Output Frames: Panoptic
. . . Ground Truth: . . . Output:

26 Next Step Key Point Extraction Trans- formation Consistency losses
viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

27 Next Step Improve key-point prediction and transformation to hopefully capture the actions in the videos Tomas Jakab, Ankush Gupta, Hakan Bilen, and Andrea Vedaldi. Unsupervised learning of object landmarks through conditional image generation, 2018.


Download ppt "Self-Supervised Cross-View Action Synthesis"

Similar presentations


Ads by Google