Download presentation
Presentation is loading. Please wait.
1
Self-Supervised Cross-View Action Synthesis
Kara Schatz Advisor: Dr. Yogesh Rawat UCF CRCV – REU, Summer 2019
2
Synthesize a video from an unseen view.
Project Goal Synthesize a video from an unseen view.
3
Synthesize a video from an unseen view.
Project Goal Synthesize a video from an unseen view. Given: video of the same scene from a different viewpoint single image from the desired viewpoint
4
Motivation
5
Motivation Humans can do this easily. Can machines too?
6
Motivation Humans can do this easily. Can machines too?
Cross-view image synthesis has been done
7
Motivation Humans can do this easily. Can machines too?
Cross-view image synthesis has been done Cross-view video synthesis has not
8
Datasets
9
Datasets NTU 13K+ training videos 5K+ testing videos 3 camera angles:
-45°, 0°, +45°
10
Datasets NTU PANOPTIC 13K+ training videos 5K+ testing videos
3 camera angles: -45°, 0°, +45° ~4000 training samples ~500 testing samples 100 cameras
11
Approach
12
Approach
13
Approach
14
Approach Key Point Extraction Key Point Extraction Key-points
15
Approach Key Point Extraction Trans- formation Key Point Extraction
viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint
16
Approach Key Point Extraction Trans- formation Consistency losses
viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint
17
Total Loss vs. Epochs Dataset = NTU Batch size = 20 Frame count = 16
Skip rate = 2 Old network New network
18
Output Frames: NTU Network 1 Input: Output: Ground Truth:
19
Output Frames: NTU Network 1 Network 2 Input: Output: Ground Truth:
20
Output Frames: Panoptic
Network 1 Input: Output: Ground Truth:
21
Output Frames: Panoptic
Network 1 Network 2 Input: Output: Ground Truth:
22
Output Frames: NTU FRAME 1 FRAME 2 Ground Truth: Output:
23
Output Frames: NTU . . . . . . FRAME 1 FRAME 2 FRAME 15 FRAME 16
Ground Truth: . . . Output:
24
Output Frames: Panoptic
Ground Truth: Output:
25
Output Frames: Panoptic
. . . Ground Truth: . . . Output:
26
Next Step Key Point Extraction Trans- formation Consistency losses
viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint
27
Next Step Improve key-point prediction and transformation to hopefully capture the actions in the videos Tomas Jakab, Ankush Gupta, Hakan Bilen, and Andrea Vedaldi. Unsupervised learning of object landmarks through conditional image generation, 2018.
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.