Self-Supervised Cross-View Action Synthesis

Slides:

Advertisements

Similar presentations

SPONSORED BY SA2014.SIGGRAPH.ORG Annotating RGBD Images of Indoor Scenes Yu-Shiang Wong and Hung-Kuo Chu National Tsing Hua University CGV LAB.

Advertisements

Self-Supervised Segmentation of River Scenes Supreeth Achar *, Bharath Sankaran ‡, Stephen Nuske *, Sebastian Scherer *, Sanjiv Singh * * ‡

: Chapter 1: Introduction 1 Montri Karnjanadecha ac.th/~montri Principles of Pattern Recognition.

KYLE PATTERSON Automatic Age Estimation and Interactive Museum Exhibits Advisors: Prof. Cass and Prof. Lawson.

Height Estimation from Egocentric Video- Week 1 Dr. Ali Borji Aisha Urooj Khan Jessie Finocchiaro UCF CRCV REU 2016.

Machine learning & object recognition Cordelia Schmid Jakob Verbeek.

Date of download: 7/8/2016 Copyright © 2016 SPIE. All rights reserved. A scalable platform for learning and evaluating a real-time vehicle detection system.

A Hierarchical Deep Temporal Model for Group Activity Recognition

Naifan Zhuang, Jun Ye, Kien A. Hua

A Plane-Based Approach to Mondrian Stereo Matching

Unsupervised Learning of Video Representations using LSTMs

Predicting Visual Search Targets via Eye Tracking Data

Week 3 (June 6 – June10 , 2016) Summary :

DeepCount Mark Lenson.

Textual Video Prediction Week 2

Jure Zbontar, Yann LeCun

Summary of Week 1 (May 23 – May 27, 2016)

Query-Focused Video Summarization – Week 1

Evaluating Techniques for Image Classification

Tracking Objects with Dynamics

Rotational Rectification Network for Robust Pedestrian Detection

Compositional Human Pose Regression

Introductory Seminar on Research: Fall 2017

Machine Learning Ali Ghodsi Department of Statistics

Azure Machine Learning Noam Brezis Madeira Data Solutions

Structured Predictions with Deep Learning

Unsupervised Learning and Autoencoders

Adversarially Tuned Scene Generation

Textual Video Prediction

Modeling Motion Blur in Computer – Generated Images

Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network Nathan Sun CIS601.

Learning to See in the Dark

Rob Fergus Computer Vision

Two-Stream Convolutional Networks for Action Recognition in Videos

Deep neural networks (DNNs) can successfully identify, count, and describe animals in camera-trap images. Deep neural networks (DNNs) can successfully.

CellNetQL Image Segmentation without Feature Definition

Road Traffic Sign Recognition

Single Image Rolling Shutter Distortion Correction

CAR EVALUATION SIYANG CHEN ECE 539 | Dec

Supervised vs. unsupervised Learning

Image to Image Translation using GANs

Lip movement Synthesis from Text

Visual Navigation Yukun Cui.

Depth Aware Inpainting for Novel View Synthesis Jayant Thatte

Machine learning overview

Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS

Video Imagination from a Single Image with Transformation Generation

Abnormally Detection

Automatic Handwriting Generation

INTRODUCTION TO Machine Learning

Presented By: Harshul Gupta

Unrolling the shutter: CNN to correct motion distortions

Multi-UAV to UAV Tracking

Weak-supervision based Multi-Object Tracking

CRCV REU 2019 Kara Schatz.

Appearance Transformer (AT)

Week 3 Volodymyr Bobyr.

Self-Supervised Cross-View Action Synthesis

Self-Supervised Cross-View Action Synthesis

End-to-End Speech-Driven Facial Animation with Temporal GANs

Week 7 Presentation Ngoc Ta Aidean Sharghi

THE ASSISTIVE SYSTEM SHIFALI KUMAR BISHWO GURUNG JAMES CHOU

Self-Supervised Cross-View Action Synthesis

Sign Language Recognition With Unsupervised Feature Learning

Week 6: Moving Target Detection Using Infrared Sensors

Week 5 Cecilia La Place.

REU Program 2019 Week 5 Alex Ruiz Jyoti Kini.

Truman Action Recognition Status update

Self-Supervised Cross-View Action Synthesis

Presentation transcript:

Self-Supervised Cross-View Action Synthesis Kara Schatz Advisor: Dr. Yogesh Rawat UCF CRCV – REU, Summer 2019

Synthesize a video from an unseen view. Project Goal Synthesize a video from an unseen view.

Synthesize a video from an unseen view. Project Goal Synthesize a video from an unseen view. Given: video of the same scene from a different viewpoint single image from the desired viewpoint

Motivation

Motivation Humans can do this easily. Can machines too?

Motivation Humans can do this easily. Can machines too? Cross-view image synthesis has been done

Motivation Humans can do this easily. Can machines too? Cross-view image synthesis has been done Cross-view video synthesis has not

Datasets

Datasets NTU 13K+ training videos 5K+ testing videos 3 camera angles: -45°, 0°, +45°

Datasets NTU PANOPTIC 13K+ training videos 5K+ testing videos 3 camera angles: -45°, 0°, +45° ~4000 training samples ~500 testing samples 100 cameras

Approach

Approach

Approach

Approach Key Point Extraction Key Point Extraction Key-points

Approach Key Point Extraction Trans- formation Key Point Extraction viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Approach Key Point Extraction Trans- formation Consistency losses viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Total Loss vs. Epochs Dataset = NTU Batch size = 20 Frame count = 16 Skip rate = 2 Old network New network

Output Frames: NTU Network 1 Input: Output: Ground Truth:

Output Frames: NTU Network 1 Network 2 Input: Output: Ground Truth:

Output Frames: Panoptic Network 1 Input: Output: Ground Truth:

Output Frames: Panoptic Network 1 Network 2 Input: Output: Ground Truth:

Output Frames: NTU FRAME 1 FRAME 2 Ground Truth: Output:

Output Frames: NTU . . . . . . FRAME 1 FRAME 2 FRAME 15 FRAME 16 Ground Truth: . . . Output:

Output Frames: Panoptic Ground Truth: Output:

Output Frames: Panoptic . . . Ground Truth: . . . Output:

Next Step Key Point Extraction Trans- formation Consistency losses viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points Consistency losses Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Next Step Improve key-point prediction and transformation to hopefully capture the actions in the videos Tomas Jakab, Ankush Gupta, Hakan Bilen, and Andrea Vedaldi. Unsupervised learning of object landmarks through conditional image generation, 2018.