Self-Supervised Cross-View Action Synthesis

Slides:

Advertisements

Similar presentations

Unsupervised Learning Clustering K-Means. Recall: Key Components of Intelligent Agents Representation Language: Graph, Bayes Nets, Linear functions Inference.

Advertisements

Modeling 3D Deformable and Articulated Shapes Yu Chen, Tae-Kyun Kim, Roberto Cipolla Department of Engineering University of Cambridge.

Computer Vision REU Week 2 Adam Kavanaugh. Video Canny Put canny into a loop in order to process multiple frames of a video sequence Put canny into a.

Self-Supervised Segmentation of River Scenes Supreeth Achar *, Bharath Sankaran ‡, Stephen Nuske *, Sebastian Scherer *, Sanjiv Singh * * ‡

November 9, 2010Neural Networks Lecture 16: Counterpropagation 1 Unsupervised Learning So far, we have only looked at supervised learning, in which an.

Traffic Sign Recognition Jacob Carlson Sean St. Onge Advisor: Dr. Thomas L. Stewart.

Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

Titre. Geographic Information System GIS offer powerful tools for adding spatial perspectives to: –Planning –Research –Technology transfer –Impact assessment.

Height Estimation from Egocentric Video- Week 1 Dr. Ali Borji Aisha Urooj Khan Jessie Finocchiaro UCF CRCV REU 2016.

Week 4 Report UCF Computer Vision REU 2012 Paul Finkel 6/11/12.

A Hierarchical Deep Temporal Model for Group Activity Recognition

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Naifan Zhuang, Jun Ye, Kien A. Hua

Unsupervised Learning of Video Representations using LSTMs

Neural Network Architecture Session 2

Data Mining, Neural Network and Genetic Programming

Example, BP learning function XOR

Summary of Week 1 (May 23 – May 27, 2016)

Automatic Lung Cancer Diagnosis from CT Scans (Week 4)

Compositional Human Pose Regression

Structured Predictions with Deep Learning

CSCI 5922 Neural Networks and Deep Learning: NIPS Highlights

Adversarially Tuned Scene Generation

Textual Video Prediction

Video Summarization via Determinantal Point Processes (DPP)

INTRODUCTION TO Machine Learning

Project Name: Country:

Two-Stream Convolutional Networks for Action Recognition in Videos

Change in Expression after modifying fix_b?

Change in Expression after modifying fix_b?

Change in Expression after modifying fix_b?

Project 7: Modeling Social Network Structures and their Dynamic Evolutions with User- Generated Data from IoT REU Student: Emma Ambrosini Graduate mentors:

Change in Expression after modifying fix_b?

CAR EVALUATION SIYANG CHEN ECE 539 | Dec

Image to Image Translation using GANs

Deep Cross-media Knowledge Transfer

Controlling BOH4M.

Lip movement Synthesis from Text

Example, BP learning function XOR

Project # 12, Smart Walker REU student: Jonathan Guilbe Graduate mentors: Sharare Zehtabian, Siavash Khodadadeh Faculty mentor(s): Dr. Turgut, Dr. Boloni.

Viewpoint in Photography

Neural Network Pipeline CONTACT & ACKNOWLEDGEMENTS

Project Midterm Presentation

Count by 10’s, 5’s and 2’s and then fill in the missing numbers!

Human-object interaction

INTRODUCTION TO Machine Learning

REU - End to End Self Driving Car

Background Task Fashion image inpainting Some conceptions

Unrolling the shutter: CNN to correct motion distortions

CRCV REU UCF Summer 2019 Arisa Kitagishi.

Week 3: Moving Target Detection Using Infrared Sensors

Multi-UAV to UAV Tracking

Deep screen image crop and enhance

Weak-supervision based Multi-Object Tracking

CRCV REU 2019 Kara Schatz.

Cengizhan Can Phoebe de Nooijer

Appearance Transformer (AT)

Week 3 Volodymyr Bobyr.

Self-Supervised Cross-View Action Synthesis

Week 7 Presentation Ngoc Ta Aidean Sharghi

Self-Supervised Cross-View Action Synthesis

Sign Language Recognition With Unsupervised Feature Learning

Self-Supervised Cross-View Action Synthesis

Week 6: Moving Target Detection Using Infrared Sensors

REU Program 2019 Week 5 Alex Ruiz Jyoti Kini.

Truman Action Recognition Status update

Self-Supervised Cross-View Action Synthesis

Iterative Projection and Matching: Finding Structure-preserving Representatives and Its Application to Computer Vision.

Presentation transcript:

Self-Supervised Cross-View Action Synthesis Kara Schatz Advisor: Dr. Yogesh Rawat UCF CRCV – REU, Summer 2019

Synthesize a video from an unseen view. Project Goal Synthesize a video from an unseen view. The goal of this project is to be able to synthesize a video from an unseen view

Synthesize a video from an unseen view. Project Goal Synthesize a video from an unseen view. Given: video of the same scene from a different viewpoint appearance conditioning from the desired viewpoint In order to achieve this, our approach will use a video of the same scene from a different viewpoint as will as appearance conditioning from the desired viewpoint

Approach This diagram shows the approach that we are using to accomplish our goal. The overall idea is to use a network to learn the appearance of the desired view and another network to learn a representation for the 3D pose in a different view of the video. Then, we will take both of those and input them into a video generator that will reconstruct the video from the desired view. To do the training, we will run the network on two different views and reconstruct both viewpoints. Once trained, we will only need to give one view of the video an one frame of the desired view.

Datasets NTU 13K+ training videos 5K+ testing videos 3 camera angles: -45°, 0°, +45° So pan has far less samples, but way more viewpoints so the training set is more diverse

Datasets NTU PANOPTIC 13K+ training videos 5K+ testing videos 3 camera angles: -45°, 0°, +45° 3800 training samples 500 testing samples 100 cameras So pan has far less samples, but way more viewpoints so the training set is more diverse

Total Loss vs. Epochs Batch size = 20 Frame count = 16 Skip rate = 2 NTU Panoptic

Total Loss vs. Epochs Batch size = 20 Frame count = 16 Skip rate = 2 NTU Panoptic

Output Frames

Output Frames NTU Noticed that the people get cropped out in pan a lot…

Output Frames PANOPTIC NTU Noticed that the people get cropped out in pan a lot… Think diff is that the colors are so close in pan its hard to differenentiate

Modified Network After that, I can start making changes to hopefully improve the model. I can make changes to the network, the loss function I am using, and the data input strategies to see how those impact performance.

Modified Network After that, I can start making changes to hopefully improve the model. I can make changes to the network, the loss function I am using, and the data input strategies to see how those impact performance.

Modified Network Key Point Extraction Key Point Extraction Key-points After that, I can start making changes to hopefully improve the model. I can make changes to the network, the loss function I am using, and the data input strategies to see how those impact performance. Key Point Extraction Key-points

Modified Network Key Point Extraction Trans- formation viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points After that, I can start making changes to hopefully improve the model. I can make changes to the network, the loss function I am using, and the data input strategies to see how those impact performance. Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Modified Network Key Point Extraction Trans- formation viewpoint Key Point Extraction Trans- formation Key-points Estimated Keypoints Key-points After that, I can start making changes to hopefully improve the model. I can make changes to the network, the loss function I am using, and the data input strategies to see how those impact performance. Key Point Extraction Trans-formation Key-points Estimated Keypoints Key-points viewpoint

Total Loss vs. Epochs Dataset = NTU Batch size = 20 Frame count = 16 Skip rate = 2 New network Old network

Total Loss vs. Epochs Dataset = Panoptic Batch size = 20 Frame count = 16 Skip rate = 2 New network Old network

Next Steps Reconstruction with new network After that, I can start making changes to hopefully improve the model. I can make changes to the network, the loss function I am using, and the data input strategies to see how those impact performance.

Next Steps Reconstruction with new network Fix dataset issues Missing data Cropping people out After that, I can start making changes to hopefully improve the model. I can make changes to the network, the loss function I am using, and the data input strategies to see how those impact performance.

Next Steps Reconstruction with new network Fix dataset issues Missing data Cropping people out Using close cameras After that, I can start making changes to hopefully improve the model. I can make changes to the network, the loss function I am using, and the data input strategies to see how those impact performance.

Next Steps Reconstruction with new network Fix dataset issues Missing data Cropping people out Using close cameras Modify Network design After that, I can start making changes to hopefully improve the model. I can make changes to the network, the loss function I am using, and the data input strategies to see how those impact performance.