Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Volodymyr Bobyr Supervised by Aayushjungbahadur Rana
Week 4 Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Goals – Week 4 Goal Completed YES PARTIAL NO
Read 3 papers related to new project YES Develop a data loader for the network Develop a running mockup of the network PARTIAL Measure baselines on VidOR dataset NO

Overview My Project Papers
Overview, Plan, Data Loader, Challenges, Next Steps Papers Detecting & Recognizing human-object interactions Feature pyramids

My Project Actor-Object Relation for Action Detection Dataset: VidOR
Videos: (82 hrs) Relations: 52 Object Classes: 80 Chance to participate in ACM’s Grand Challenge

My Project -- Plan Structure: Encoder/Decoder Model
Three-Step Process: Step 1: Object localization & classification Encoder / decoder system Pixel-perfect Step 2: Action Detection & localization Step 3: Relation Detection: Subject & Object localization

My Project – Data Loader
Data Augmentation: Resize, Random Crop, Horizontal Flip Challenges: Handling bounding boxes during augmentation Handling overlapping bounding boxes without depth perception Large dataset, so frames are currently extracted live – optimization is key In process of being extracted to Newton

My Project – Next Steps Get a baseline on VidOR using I3D network
Pixel-wise classification & center detection Get a basic decoder network running Implement more complex ideas: Feature pyramids Atrous Convolutions Action target proposals

Paper: Detecting and Recognizing Human-Object Interactions
Task: Bounding box around action and target of action Builds on top of Faster-RCNN Uses 3 branches to compose the final output Gkioxari, Georgia, et al. “Detecting and Recognizing Human-Object Interactions.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

Network Structure Same as Fast-RCNN Human-centric Branch
Object classification Bounding-box regression Non-max suppression Human-centric Branch Predict actions for each human from a) Predict the mean offset position μ of the target Calculates the density over possible locations Compatibility g of object with action σ – hyperparameter | b – ground truth Gkioxari, Georgia, et al. “Detecting and Recognizing Human-Object Interactions.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

Network Structure c) Interaction branch
Scores action based on human & object features Total Loss: summation of losses from all branches Final Computation: Output: Triplets of (human, action, target) Gkioxari, Georgia, et al. “Detecting and Recognizing Human-Object Interactions.” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2018

Feature Pyramids -- Paper
Combines low and high level features to make predictions Makes multiple prediction per traversal State of the art results with little overhead Lin, Tsung-Yi, et al. “Feature Pyramid Networks for Object Detection.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

Feature Pyramids – Details
Components: Forward channel: Backward channel x2 nearest-neighbor upscaling Number of channels is kept constant Lateral connections 1x1 conv to bring down number of channels Concatenation After Concatenation: 3x3 conv to mediate upscaling No activation – proven ineffective Lin, Tsung-Yi, et al. “Feature Pyramid Networks for Object Detection.” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2017.

Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Similar presentations

Presentation on theme: "Volodymyr Bobyr Supervised by Aayushjungbahadur Rana"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Volodymyr Bobyr Supervised by Aayushjungbahadur Rana

Similar presentations

Presentation on theme: "Volodymyr Bobyr Supervised by Aayushjungbahadur Rana"— Presentation transcript:

Similar presentations

About project

Feedback