Actor-Object Relation in Videos

Slides:



Advertisements
Similar presentations
Learning Convolutional Feature Hierarchies for Visual Recognition
Advertisements

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.
Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.
MULTIPLE MOVING OBJECTS TRACKING FOR VIDEO SURVEILLANCE SYSTEMS.
Given Connections Solution
A Real-Time for Classification of Moving Objects
Feature extraction Feature extraction involves finding features of the segmented image. Usually performed on a binary image produced from.
What’s Making That Sound ?
CS 6825: Binary Image Processing – binary blob metrics
Compound Inequalities “And” & “Or” Graphing Solutions.
Geog. 579: GIS and Spatial Analysis - Lecture Overheads 1 Raster Filters Topics: Lecture 03-04: Neighborhood Operations References: Chapter 7 in.
Visual Computing Computer Vision 2 INFO410 & INFO350 S2 2015
Gholamreza Anbarjafari, PhD Video Lecturers on Digital Image Processing Digital Image Processing Spatial Domain Filtering: Part I.
Spatial Analysis – vector data analysis Lecture 8 10/12/2006.
Gaussian Conditional Random Field Network for Semantic Segmentation
A Hierarchical Deep Temporal Model for Group Activity Recognition
Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University
Unsupervised Learning of Video Representations using LSTMs
CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.
Summary of “Efficient Deep Learning for Stereo Matching”
Deep Predictive Model for Autonomous Driving
Krishna Kumar Singh, Yong Jae Lee University of California, Davis
Jure Zbontar, Yann LeCun
Regularizing Face Verification Nets To Discrete-Valued Pain Regression
Week 6 Cecilia La Place.
Lecture 5 Smaller Network: CNN
Learning to Detect a Salient Object
R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.
Adversarially Tuned Scene Generation
Deep Learning Convoluted Neural Networks Part 2 11/13/
הפקולטה להנדסת חשמל - המעבדה לבקרה ורובוטיקה גילוי תנועה ועקיבה אחר מספר מטרות מתמרנות הטכניון - מכון טכנולוגי לישראל TECHNION.
Bird-species Recognition Using Convolutional Neural Network
Image Classification.
0.69 B A E F G H I C D time Figure 1. Example of a minimal spatiotemporal configuration. A short initial video clip showing.
Two-Stream Convolutional Networks for Action Recognition in Videos
CS654: Digital Image Analysis
Image Classification via Attribute Detection
Object Detection Creation from Scratch Samsung R&D Institute Ukraine
Papers 15/08.
Ying Dai Faculty of software and information science,
Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions
Coding neural networks: A gentle Introduction to keras
Sequence-to-Segments Networks for Segment Detection Zijun Wei1, Boyu Wang1, Minh Hoai1, Jianming Zhang2, Xiaohui Shen3, Zhe Lin2, Radomír Měch2, Dimitris.
实习生汇报 ——北邮 张安迪.
Ying Dai Faculty of software and information science,
Example segmentations - unseen images
Lecture 7 Spatial filtering.
Department of Computer Science Ben-Gurion University of the Negev
U-Net: Convolutional Network for Segmentation
Deep Object Co-Segmentation
Task Fashion Landmark Detection. Task Fashion Landmark Detection.
Motivation State-of-the-art two-stage instance segmentation methods depend heavily on feature localization to produce masks.
Semantic Segmentation
Object Detection Implementations
Learning Deconvolution Network for Semantic Segmentation
Deep screen image crop and enhance
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Report 7 Brandon Silva.
Week 8 Presentation Ngoc Ta Aidean Sharghi.
Appearance Transformer (AT)
Week 3 Volodymyr Bobyr.
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Problem Image and Volume Segmentation:
Self-Supervised Cross-View Action Synthesis
Week 7 Presentation Ngoc Ta Aidean Sharghi
Deep screen image crop and enhance
Multi-Target Detection and Tracking of UAVs from a UAV
Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Jiahe Li
Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,
Presentation transcript:

Actor-Object Relation in Videos Volodymyr Bobyr and Aayushjungbahadur Rana

Task Input: Dataset: VidOR – 10,000 Video-Clips Output: A video with: Actors: Adult, Child, Dog Objects: toys, furniture, etc. Actions: “holding”, “in front”, “talking to”, etc. Output: Spatial & Temporal Pixel-Perfect Localization of actors, objects, and actions Dataset: VidOR – 10,000 Video-Clips

Approach Convolutional encoder/decoder network: 4 Stages: Encoder backbone: I3D pretrained on kinetics Decoder: Feature pyramid network with diluted convolutions and side-connections 4 Stages: Actor & Object spatial segmentation Centroid Detection Action spatial segmentation Temporal connection – postprocessing

Details Input: (n_frames, 224, 224, 3) Output: Class Imbalance: Actor/Object Segmentation: (n_frames, 56, 56, 80) Centroid Detection: (n_frames, 56, 56, 1) Action Segmentation: (n_frames, 56, 56, 52) Class Imbalance: People: 56% of all objects Background: in every videoclip Solution: class weights

Mean Intersection over Union among pixels in each frame IoU Metrics Mean Intersection over Union among pixels in each frame

Data Preparation & Output Example Original image Augmented Image Experimental Segmentation Output Original centroids Augmented Centroids

Experimental Results In the past: Loss: Binary Cross-Entropy

Experimental Results Before: Loss: Categorical Cross-Entropy

Experimental Results Now: Categorical Cross-Entropy + Augmentation Tweaks