Actor-Object Relation in Videos

Slides:

Advertisements

Similar presentations

Learning Convolutional Feature Hierarchies for Visual Recognition

Advertisements

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

Unsupervised Learning With Neural Nets Deep Learning and Neural Nets Spring 2015.

MULTIPLE MOVING OBJECTS TRACKING FOR VIDEO SURVEILLANCE SYSTEMS.

Given Connections Solution

A Real-Time for Classification of Moving Objects

Feature extraction Feature extraction involves finding features of the segmented image. Usually performed on a binary image produced from.

What’s Making That Sound ?

CS 6825: Binary Image Processing – binary blob metrics

Compound Inequalities “And” & “Or” Graphing Solutions.

Geog. 579: GIS and Spatial Analysis - Lecture Overheads 1 Raster Filters Topics: Lecture 03-04: Neighborhood Operations References: Chapter 7 in.

Visual Computing Computer Vision 2 INFO410 & INFO350 S2 2015

Gholamreza Anbarjafari, PhD Video Lecturers on Digital Image Processing Digital Image Processing Spatial Domain Filtering: Part I.

Spatial Analysis – vector data analysis Lecture 8 10/12/2006.

Gaussian Conditional Random Field Network for Semantic Segmentation

A Hierarchical Deep Temporal Model for Group Activity Recognition

Yann LeCun Other Methods and Applications of Deep Learning Yann Le Cun The Courant Institute of Mathematical Sciences New York University

Unsupervised Learning of Video Representations using LSTMs

CS 4501: Introduction to Computer Vision Object Localization, Detection, Semantic Segmentation Connelly Barnes Some slides from Fei-Fei Li / Andrej Karpathy.

Summary of “Efficient Deep Learning for Stereo Matching”

Deep Predictive Model for Autonomous Driving

Krishna Kumar Singh, Yong Jae Lee University of California, Davis

Jure Zbontar, Yann LeCun

Regularizing Face Verification Nets To Discrete-Valued Pain Regression

Week 6 Cecilia La Place.

Lecture 5 Smaller Network: CNN

Learning to Detect a Salient Object

R-CNN region By Ilia Iofedov 11/11/2018 BGU, DNN course 2016.

Adversarially Tuned Scene Generation

Deep Learning Convoluted Neural Networks Part 2 11/13/

הפקולטה להנדסת חשמל - המעבדה לבקרה ורובוטיקה גילוי תנועה ועקיבה אחר מספר מטרות מתמרנות הטכניון - מכון טכנולוגי לישראל TECHNION.

Bird-species Recognition Using Convolutional Neural Network

Image Classification.

0.69 B A E F G H I C D time Figure 1. Example of a minimal spatiotemporal configuration. A short initial video clip showing.

Two-Stream Convolutional Networks for Action Recognition in Videos

CS654: Digital Image Analysis

Image Classification via Attribute Detection

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

Ying Dai Faculty of software and information science,

Recurrent Encoder-Decoder Networks for Time-Varying Dense Predictions

Coding neural networks: A gentle Introduction to keras

Sequence-to-Segments Networks for Segment Detection Zijun Wei1, Boyu Wang1, Minh Hoai1, Jianming Zhang2, Xiaohui Shen3, Zhe Lin2, Radomír Měch2, Dimitris.

实习生汇报 ——北邮张安迪.

Ying Dai Faculty of software and information science,

Example segmentations - unseen images

Lecture 7 Spatial filtering.

Department of Computer Science Ben-Gurion University of the Negev

U-Net: Convolutional Network for Segmentation

Deep Object Co-Segmentation

Task Fashion Landmark Detection. Task Fashion Landmark Detection.

Motivation State-of-the-art two-stage instance segmentation methods depend heavily on feature localization to produce masks.

Semantic Segmentation

Object Detection Implementations

Learning Deconvolution Network for Semantic Segmentation

Deep screen image crop and enhance

Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Report 7 Brandon Silva.

Week 8 Presentation Ngoc Ta Aidean Sharghi.

Appearance Transformer (AT)

Week 3 Volodymyr Bobyr.

Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Problem Image and Volume Segmentation:

Self-Supervised Cross-View Action Synthesis

Week 7 Presentation Ngoc Ta Aidean Sharghi

Deep screen image crop and enhance

Multi-Target Detection and Tracking of UAVs from a UAV

Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Introduction Face detection and alignment are essential to many applications such as face recognition, facial expression recognition, age identification,

Presentation transcript:

Actor-Object Relation in Videos Volodymyr Bobyr and Aayushjungbahadur Rana

Task Input: Dataset: VidOR – 10,000 Video-Clips Output: A video with: Actors: Adult, Child, Dog Objects: toys, furniture, etc. Actions: “holding”, “in front”, “talking to”, etc. Output: Spatial & Temporal Pixel-Perfect Localization of actors, objects, and actions Dataset: VidOR – 10,000 Video-Clips

Approach Convolutional encoder/decoder network: 4 Stages: Encoder backbone: I3D pretrained on kinetics Decoder: Feature pyramid network with diluted convolutions and side-connections 4 Stages: Actor & Object spatial segmentation Centroid Detection Action spatial segmentation Temporal connection – postprocessing

Details Input: (n_frames, 224, 224, 3) Output: Class Imbalance: Actor/Object Segmentation: (n_frames, 56, 56, 80) Centroid Detection: (n_frames, 56, 56, 1) Action Segmentation: (n_frames, 56, 56, 52) Class Imbalance: People: 56% of all objects Background: in every videoclip Solution: class weights

Mean Intersection over Union among pixels in each frame IoU Metrics Mean Intersection over Union among pixels in each frame

Data Preparation & Output Example Original image Augmented Image Experimental Segmentation Output Original centroids Augmented Centroids

Experimental Results In the past: Loss: Binary Cross-Entropy

Experimental Results Before: Loss: Categorical Cross-Entropy

Experimental Results Now: Categorical Cross-Entropy + Augmentation Tweaks