1 Human Detection under Partial Occlusions using Markov Logic Networks Raghuraman Gopalan and William Schwartz Center for Automation Research University.

Slides:

Advertisements

Similar presentations

Semantic Contours from Inverse Detectors Bharath Hariharan et.al. (ICCV-11)

Advertisements

University of Texas at Austin Machine Learning Group Department of Computer Sciences University of Texas at Austin Discriminative Structure and Parameter.

Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Three things everyone should know to improve object retrieval

Foreground Focus: Finding Meaningful Features in Unlabeled Images Yong Jae Lee and Kristen Grauman University of Texas at Austin.

Olivier Duchenne ， Armand Joulin ， Jean Ponce Willow Lab ， ICCV2011.

EE462 MLCV Lecture 5-6 Object Detection – Boosting Tae-Kyun Kim.

Ľubor Ladický1 Phil Torr2 Andrew Zisserman1

Structured SVM Chen-Tse Tsai and Siddharth Gupta.

Machine learning continued Image source:

Markov Logic Networks: Exploring their Application to Social Network Analysis Parag Singla Dept. of Computer Science and Engineering Indian Institute of.

- Recovering Human Body Configurations: Combining Segmentation and Recognition (CVPR’04) Greg Mori, Xiaofeng Ren, Alexei A. Efros and Jitendra Malik -

Markov Logic Networks Instructor: Pedro Domingos.

Undirected Probabilistic Graphical Models (Markov Nets) (Slides from Sam Roweis)

Mixture of trees model: Face Detection, Pose Estimation and Landmark Localization Presenter: Zhang Li.

Intelligent Systems Lab. Recognizing Human actions from Still Images with Latent Poses Authors: Weilong Yang, Yang Wang, and Greg Mori Simon Fraser University,

Review Markov Logic Networks Mathew Richardson Pedro Domingos Xinran(Sean) Luo, u

Speeding Up Inference in Markov Logic Networks by Preprocessing to Reduce the Size of the Resulting Grounded Network Jude Shavlik Sriraam Natarajan Computer.

Enhancing Exemplar SVMs using Part Level Transfer Regularization 1.

Adbuctive Markov Logic for Plan Recognition Parag Singla & Raymond J. Mooney Dept. of Computer Science University of Texas, Austin.

Detecting Pedestrians by Learning Shapelet Features

São Paulo Advanced School of Computing (SP-ASC’10). São Paulo, Brazil, July 12-17, 2010 Looking at People Using Partial Least Squares William Robson Schwartz.

Markov Logic: A Unifying Framework for Statistical Relational Learning Pedro Domingos Matthew Richardson

Discriminative and generative methods for bags of features

Content Based Image Clustering and Image Retrieval Using Multiple Instance Learning Using Multiple Instance Learning Xin Chen Advisor: Chengcui Zhang Department.

Beyond Actions: Discriminative Models for Contextual Group Activities Tian Lan School of Computing Science Simon Fraser University August 12, 2010 M.Sc.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Generic Object Detection using Feature Maps Oscar Danielsson Stefan Carlsson

Expectation Maximization Method Effective Image Retrieval Based on Hidden Concept Discovery in Image Database By Sanket Korgaonkar Masters Computer Science.

Prénom Nom Document Analysis: Data Analysis and Clustering Prof. Rolf Ingold, University of Fribourg Master course, spring semester 2008.

CSE 574: Artificial Intelligence II Statistical Relational Learning Instructor: Pedro Domingos.

Semantics For the Semantic Web: The Implicit, the Formal and The Powerful Amit Sheth, Cartic Ramakrishnan, Christopher Thomas CS751 Spring 2005 Presenter:

Presented by Zeehasham Rasheed

Statistical Relational Learning Pedro Domingos Dept. Computer Science & Eng. University of Washington.

Review Rong Jin. Comparison of Different Classification Models  The goal of all classifiers Predicating class label y for an input x Estimate p(y|x)

Discriminative and generative methods for bags of features

Generic object detection with deformable part-based models

Graph-based consensus clustering for class discovery from gene expression data Zhiwen Yum, Hau-San Wong and Hongqiang Wang Bioinformatics, 2007.

Step 3: Classification Learn a decision rule (classifier) assigning bag-of-features representations of images to different classes Decision boundary Zebra.

Learning Based Hierarchical Vessel Segmentation

Shape-Based Human Detection and Segmentation via Hierarchical Part- Template Matching Zhe Lin, Member, IEEE Larry S. Davis, Fellow, IEEE IEEE TRANSACTIONS.

A General Framework for Tracking Multiple People from a Moving Camera

“Secret” of Object Detection Zheng Wu (Summer intern in MSRNE) Sep. 3, 2010 Joint work with Ce Liu (MSRNE) William T. Freeman (MIT) Adam Kalai (MSRNE)

Bayesian networks Classification, segmentation, time series prediction and more. Website: Twitter:

Markov Logic And other SRL Approaches

Reading Between The Lines: Object Localization Using Implicit Cues from Image Tags Sung Ju Hwang and Kristen Grauman University of Texas at Austin Jingnan.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

Exploiting Context Analysis for Combining Multiple Entity Resolution Systems -Ramu Bandaru Zhaoqi Chen Dmitri V.kalashnikov Sharad Mehrotra.

Deformable Part Models (DPM) Felzenswalb, Girshick, McAllester & Ramanan (2010) Slides drawn from a tutorial By R. Girshick AP 12% 27% 36% 45% 49% 2005.

Markov Logic Networks Pedro Domingos Dept. Computer Science & Eng. University of Washington (Joint work with Matt Richardson)

Visual Categorization With Bags of Keypoints Original Authors: G. Csurka, C.R. Dance, L. Fan, J. Willamowski, C. Bray ECCV Workshop on Statistical Learning.

Overview of the final test for CSC Overview PART A: 7 easy questions –You should answer 5 of them. If you answer more we will select 5 at random.

CpSc 881: Machine Learning

DeepDive Model Dongfang Xu Ph.D student, School of Information, University of Arizona Dec 13, 2015.

Poselets: Body Part Detectors Trained Using 3D Human Pose Annotations ZUO ZHEN 27 SEP 2011.

Happy Mittal (Joint work with Prasoon Goyal, Parag Singla and Vibhav Gogate) IIT Delhi New Rules for Domain Independent Lifted.

Convolutional Restricted Boltzmann Machines for Feature Learning Mohammad Norouzi Advisor: Dr. Greg Mori Simon Fraser University 27 Nov

Scalable Statistical Relational Learning for NLP William Y. Wang William W. Cohen Machine Learning Dept and Language Technologies Inst. joint work with:

1 Bilinear Classifiers for Visual Recognition Computational Vision Lab. University of California Irvine To be presented in NIPS 2009 Hamed Pirsiavash Deva.

Brief Intro to Machine Learning CS539

Segmentation of Building Facades using Procedural Shape Priors

An Introduction to Markov Logic Networks in Knowledge Bases

Chapter 7. Classification and Prediction

Data Driven Attributes for Action Detection

Markov Logic Networks for NLP CSCI-GA.2591

CS 2750: Machine Learning Support Vector Machines

Learning Markov Networks

RCNN, Fast-RCNN, Faster-RCNN

Random Neural Network Texture Model

Presentation transcript:

1 Human Detection under Partial Occlusions using Markov Logic Networks Raghuraman Gopalan and William Schwartz Center for Automation Research University of Maryland, College Park

2 Human Detection

3 Holistic window-based: Dalal and Triggs CVPR (2005) Tuzel et al CVPR (2007) Part-based: Wu and Nevatia ICCV (2005) Mikolajczyk et al ECCV (2004) Scene-related cues: Torralba et al IJCV (2006)

4 The occlusion challenge * Probability of presence of a human obtained from Schwartz et al ICCV (2009) Body parts occluded by objectsPerson occluded by another person *

5 Related work Bilattice-based logical reasoning: Shet et al CVPR (2007) Integrating probability of human parts using first-order logic (FOL): Schwartz et al ICB (2009)

6 Our approach: Motivation A data-driven, part-based method 1. Probabilistic logical inference using Markov logic networks (MLN) [Domingos et al, Machine Learning (2006)] 2. Representing `semantic context’ between the detection probabilities of parts. Within-window, and between-windows With and without occlusions

7 Our approach: An overview Multiple detection windows Part detector’s outputs Face detector outputs Instantiation of the MLN Inference Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Learning contextual rules

8 Main questions How to integrate detector’s outputs to detect people under occlusion?  Enforce consistency according to spatial location of detectors → removal of false alarms.  Exploit relations between persons to solve inconsistencies → explain occlusions.  Both using MLN, which combines FOL and graphical models in a single representation → avoids contradictions.

9 Our approach: An overview Multiple detection windows Part detector’s outputs Face detector outputs Instantiation of the MLN Inference Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Learning contextual rules

10 Part-based detectors  To handle human detection under occlusion, our original detector is split into parts, then MLN is used to integrate their outputs. original top torso legs top-torso torso-legs top-legs

11 Detector – An overview Exploit the use of more representative features to provide richer set of descriptors to improve detection results – edges, textures, and color. Consequences of the feature augmentation:  extremely high dimensional feature space (>170,000)  number of samples in the training dataset is smaller than the dimensionality These characteristics prevent the use of classical machine learning such as SVM, but make an ideal setting for Partial Least Squares (PLS)*. * H. Wold, Partial Least Squares, Encyclopedia of statistical sciences, 6: (1985)

12 Detector – Partial Least Squares (PLS) PLS is a wide class of methods for modeling relations between sets of observations by means of latent variables. Although originally proposed as a regression technique, PLS can be also be used as a class aware dimensionality reduction tool. By setting the dependent variable to a set of discrete values (class ids), we use PLS for dimensionality reduction followed by classification using a classifier in low dimensional space. The extracted feature vector is projected onto a set o latent vectors (estimated using PLS), then a classifier is used in the resulting low dimensional sub-space.

13 Detection using PLS T, U are (n x h) matrices of h extracted latent vectors. P (p x h) and q (1 x h) represent the matrices loadings and E (n x p) and f (n x 1) are the residuals of X and Y, respectively. PLS method NIPALS (nonlinear iterative partial least squares) finds the set of weight vectors W (p x h) ={w 1,w 2,….w h } such that PLS models relations between predictors variables in matrix X (n x p) and response variables in vector y (n x 1), where n denotes number of samples, p the number of features.

14 Our approach: An overview Multiple detection windows Part detector’s outputs Face detector outputs Instantiation of the MLN Inference Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Learning contextual rules

15 Context: Consistency between the detector outputs topTorso(d1) ^ top(d1) ^ torso(d1) → person(d1) (consistent) topTorso(d1) ^ (¬top(d1) v ¬torso(d1)) → ¬person(d1) (false alarm) First order logic rules:  Each detector acts in a specific region of the body. One can look at the output of sensors acting in the same spatial location to check for consistency – similar responses are expected. Example: top-torsotoptorso Given that top-torso detector outputs high probability, top and torso detectors need to output high probability as well since they intersect the region covered by top-torso.

16 Context: Understanding relationship between different windows d1 d2 intersect(d1,d2) ^ person(d1) ^ matching(d1,d2) → person(d2) ^ occluded(d2) ^ occludedby(d2,d1) First order logic rule: matching(d1,d2) is true if: - Detectors at visible parts of d2 have high response. - detectors at occluded parts of d2 have low response while sensors located at the corresponding positions of d1 have high response.  Low response given by a detector might be caused by a second detection window (a person may be occluding another and causing low response of the detectors). - d1, and d2 are persons - d1 and d2 intersect

17 Our approach: An overview Multiple detection windows Part detector’s outputs Face detector outputs Instantiation of the MLN Inference Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Learning contextual rules F i

18 3. Inference using MLN* - The basic idea A logical knowledge base (KB) is a set of hard constraints (F i ) on the set of possible worlds Let’s make them soft constraints: When a world violates a formula, It becomes less probable, not impossible Give each formula a weight (w i ) (Higher weight  Stronger constraint) Contents of the next three slides are partially adapted from Markov Logic Networks tutorial by Domingos et al, ICML (2007)

19 MLN – At a Glance Logical language: First-order logic Probabilistic language: Markov networks  Syntax: First-order formulas with weights  Semantics: Templates for Markov net features Learning:  Parameters: Generative or discriminative  Structure: ILP with arbitrary clauses and MAP score Inference:  MAP: Weighted satisfiability  Marginal: MCMC with moves proposed by SAT solver  Partial grounding + Lazy inference / Lifted inference

20 MLN- Definition A Markov Logic Network (MLN) is a set of pairs (F i, w i ) where  F i is a formula in first-order logic  w i is a real number

21 Example: Humans & Occlusions

22 Example: Humans & Occlusions

23 Example: Humans & Occlusions

24 Example: Humans & Occlusions Two constants: Detection window 1 (D1) and Detection window 2 (D2) D1 D2

25 Example: Humans & Occlusions Parts(D1) Human(D1)Human(D2) Parts(D2) Two constants: Detection window 1 (D1) and Detection window 2 (D2) One node for each grounding of each predicate in the MLN

26 Example: Humans & Occlusions Parts(D1) Human(D1) Occlusion(D1,D1) Occlusion(D2,D1) Human(D2) Occlusion(D1,D2) Parts(D2) Occlusion(D2,D2) Two constants: Detection window 1 (D1) and Detection window 2 (D2)

27 Example: Humans & Occlusions Parts(D1) Human(D1) Occlusion(D1,D1) Occlusion(D2,D1) Human(D2) Occlusion(D1,D2) Parts(D2) Occlusion(D2,D2) Two constants: Detection window 1 (D1) and Detection window 2 (D2) One feature for each grounding of each formula Fi in the MLN, with the corresponding weight wi

28 Example: Humans & Occlusions Parts(D1) Human(D1) Occlusion(D1,D1) Occlusion(D2,D1) Human(D2) Occlusion(D1,D2) Parts(D2) Occlusion(D2,D2) Two constants: Detection window 1 (D1) and Detection window 2 (D2)

29 Example: Humans & Occlusions Parts(D1) Human(D1) Occlusion(D1,D1) Occlusion(D2,D1) Human(D2) Occlusion(D1,D2) Parts(D2) Occlusion(D2,D2) Two constants: Detection window 1 (D1) and Detection window 2 (D2)

30 Instantiation MLN is template for ground Markov nets Probability of a world x : Learning of weights, and inference performed using the open-source Alchemy system [Domingos et al (2006)] Weight of formula Fi No. of true groundings of formula F i

31 Our approach: An overview Multiple detection windows Part detector’s outputs Face detector outputs Instantiation of the MLN Inference Final Result Queries: - person(d1)? - occluded(d1)? - occludedby(d1,d2)? Learning contextual rules

Results

35 Comparisons Dataset details: 200 images 5 to 15 humans per image Occluded humans ~ 35%

36 Comparisons

37 Comparisons

38 Conclusions A data-driven approach to detect humans under occlusions Modeling semantic context of detector probabilities across spatial locations Probabilistic contextual inference using Markov logic networks Question of interest: Integrating analytical models for occlusions and context with this data-driven method

39 Questions ?