Self-Supervised Cross-View Action Synthesis

Slides:



Advertisements
Similar presentations
SPONSORED BY SA2014.SIGGRAPH.ORG Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB.
Advertisements

Self-Supervised Segmentation of River Scenes Supreeth Achar *, Bharath Sankaran ‡, Stephen Nuske *, Sebastian Scherer *, Sanjiv Singh * * ‡
: Chapter 1: Introduction 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition.
KYLE PATTERSON Automatic Age Estimation and Interactive Museum Exhibits Advisors: Prof. Cass and Prof. Lawson.
Height Estimation from Egocentric Video- Week 1 Dr. Ali Borji Aisha Urooj Khan Jessie Finocchiaro UCF CRCV REU 2016.
Machine learning & object recognition Cordelia Schmid Jakob Verbeek.
Date of download: 7/8/2016 Copyright © 2016 SPIE. All rights reserved. A scalable platform for learning and evaluating a real-time vehicle detection system.
A Hierarchical Deep Temporal Model for Group Activity Recognition
Naifan Zhuang, Jun Ye, Kien A. Hua
A Plane-Based Approach to Mondrian Stereo Matching
Unsupervised Learning of Video Representations using LSTMs
Predicting Visual Search Targets via Eye Tracking Data
Week 3 (June 6 – June10 , 2016) Summary :
DeepCount Mark Lenson.
Textual Video Prediction Week 2
Jure Zbontar, Yann LeCun
Summary of Week 1 (May 23 – May 27, 2016)
Query-Focused Video Summarization – Week 1
Evaluating Techniques for Image Classification
Tracking Objects with Dynamics
Rotational Rectification Network for Robust Pedestrian Detection
Compositional Human Pose Regression
Introductory Seminar on Research: Fall 2017
Machine Learning Ali Ghodsi Department of Statistics
Azure Machine Learning Noam Brezis Madeira Data Solutions
Structured Predictions with Deep Learning
Unsupervised Learning and Autoencoders
Adversarially Tuned Scene Generation
Textual Video Prediction
Modeling Motion Blur in Computer – Generated Images
Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601.
Learning to See in the Dark
Rob Fergus Computer Vision
Two-Stream Convolutional Networks for Action Recognition in Videos
Deep neural networks (DNNs) can successfully identify, count, and describe animals in camera-trap images. Deep neural networks (DNNs) can successfully.
CellNetQL Image Segmentation without Feature Definition
Road Traffic Sign Recognition
Single Image Rolling Shutter Distortion Correction
CAR EVALUATION SIYANG CHEN ECE 539 | Dec
Supervised vs. unsupervised Learning
Image to Image Translation using GANs
Lip movement Synthesis from Text
Visual Navigation Yukun Cui.
Depth Aware Inpainting for Novel View Synthesis Jayant Thatte
Machine learning overview
Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS
Video Imagination from a Single Image with Transformation Generation
Abnormally Detection
Automatic Handwriting Generation
INTRODUCTION TO Machine Learning
Presented By: Harshul Gupta
Unrolling the shutter: CNN to correct motion distortions
Multi-UAV to UAV Tracking
Weak-supervision based Multi-Object Tracking
CRCV REU 2019 Kara Schatz.
Appearance Transformer (AT)
Week 3 Volodymyr Bobyr.
Self-Supervised Cross-View Action Synthesis
Self-Supervised Cross-View Action Synthesis
End-to-End Speech-Driven Facial Animation with Temporal GANs
Week 7 Presentation Ngoc Ta Aidean Sharghi
THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU
Self-Supervised Cross-View Action Synthesis
Sign Language Recognition With Unsupervised Feature Learning
Week 6: Moving Target Detection Using Infrared Sensors
Jiahe Li
Week 5 Cecilia La Place.
REU Program 2019 Week 5 Alex Ruiz Jyoti Kini.
Truman Action Recognition Status update
Self-Supervised Cross-View Action Synthesis
Presentation transcript:

Self-Supervised Cross-View Action Synthesis Kara Schatz Advisor: Dr. Yogesh Rawat UCF CRCV – REU, Summer 2019

Synthesize a video from an unseen view. Project Goal Synthesize a video from an unseen view.

Synthesize a video from an unseen view. Project Goal Synthesize a video from an unseen view. Given: video of the same scene from a different viewpoint single image from the desired viewpoint

Motivation

Motivation Humans can do this easily. Can machines too?

Motivation Humans can do this easily. Can machines too? Cross-view image synthesis has been done

Motivation Humans can do this easily. Can machines too? Cross-view image synthesis has been done Cross-view video synthesis has not

Datasets

Datasets NTU 13K+ training videos 5K+ testing videos 3 camera angles: -45°, 0°, +45°

Datasets NTU PANOPTIC 13K+ training videos 5K+ testing videos 3 camera angles: -45°, 0°, +45° ~4000 training samples ~500 testing samples 100 cameras

Approach

Approach

Approach

Approach Key Point Extraction Key Point Extraction Key-points

Approach Key Point Extraction Trans- formation Key Point Extraction viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Approach Key Point Extraction Trans- formation Consistency losses viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Total Loss vs. Epochs Dataset = NTU Batch size = 20 Frame count = 16 Skip rate = 2 Old network New network

Output Frames: NTU Network 1 Input: Output: Ground Truth:

Output Frames: NTU Network 1 Network 2 Input: Output: Ground Truth:

Output Frames: Panoptic Network 1 Input: Output: Ground Truth:

Output Frames: Panoptic Network 1 Network 2 Input: Output: Ground Truth:

Output Frames: NTU FRAME 1 FRAME 2 Ground Truth: Output:

Output Frames: NTU . . . . . . FRAME 1 FRAME 2 FRAME 15 FRAME 16 Ground Truth: . . . Output:

Output Frames: Panoptic Ground Truth: Output:

Output Frames: Panoptic . . . Ground Truth: . . . Output:

Next Step Key Point Extraction Trans- formation Consistency losses viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Next Step Improve key-point prediction and transformation to hopefully capture the actions in the videos Tomas Jakab, Ankush Gupta, Hakan Bilen, and Andrea Vedaldi. Unsupervised learning of object landmarks through conditional image generation, 2018.