Actor-Object Relation in Videos

Actor-Object Relation in Videos
Volodymyr Bobyr and Aayushjungbahadur Rana

Task Input: Dataset: VidOR – 10,000 Video-Clips Output: A video with:
Actors: Adult, Child, Dog Objects: toys, furniture, etc. Actions: “holding”, “in front”, “talking to”, etc. Output: Spatial & Temporal Pixel-Perfect Localization of actors, objects, and actions Dataset: VidOR – 10,000 Video-Clips

Approach Convolutional encoder/decoder network: 4 Stages:
Encoder backbone: I3D pretrained on kinetics Decoder: Feature pyramid network with diluted convolutions and side-connections 4 Stages: Actor & Object spatial segmentation Centroid Detection Action spatial segmentation Temporal connection – postprocessing

Details Input: (n_frames, 224, 224, 3) Output: Class Imbalance:
Actor/Object Segmentation: (n_frames, 56, 56, 80) Centroid Detection: (n_frames, 56, 56, 1) Action Segmentation: (n_frames, 56, 56, 52) Class Imbalance: People: 56% of all objects Background: in every videoclip Solution: class weights

Mean Intersection over Union among pixels in each frame
IoU Metrics Mean Intersection over Union among pixels in each frame

Data Preparation & Output Example
Original image Augmented Image Experimental Segmentation Output Original centroids Augmented Centroids

Experimental Results In the past: Loss: Binary Cross-Entropy

Experimental Results Before: Loss: Categorical Cross-Entropy

Experimental Results Now:
Categorical Cross-Entropy + Augmentation Tweaks

Actor-Object Relation in Videos

Similar presentations

Presentation on theme: "Actor-Object Relation in Videos"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Actor-Object Relation in Videos

Similar presentations

Presentation on theme: "Actor-Object Relation in Videos"— Presentation transcript:

Similar presentations

About project

Feedback