Improving Human Action Recognition using Score Distribution and Ranking Minh Hoai Nguyen Joint work with Andrew Zisserman 1.

Slides:

Advertisements

Similar presentations

Is Random Model Better? -On its accuracy and efficiency-

Advertisements

Using Closed Captions to Train Activity Recognizers that Improve Video Retrieval Sonal Gupta and Raymond Mooney University of Texas at Austin.

Recognizing Human Actions by Attributes CVPR2011 Jingen Liu, Benjamin Kuipers, Silvio Savarese Dept. of Electrical Engineering and Computer Science University.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

Limin Wang, Yu Qiao, and Xiaoou Tang

© Tan,Steinbach, Kumar Introduction to Data Mining 4/18/ Other Classification Techniques 1.Nearest Neighbor Classifiers 2.Support Vector Machines.

Ensemble Methods An ensemble method constructs a set of base classifiers from the training data Ensemble or Classifier Combination Predict class label.

Human Action Recognition across Datasets by Foreground-weighted Histogram Decomposition Waqas Sultani, Imran Saleemi CVPR 2014.

Tracking Learning Detection

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Introduction to Data Analysis

Real-Time Human Pose Recognition in Parts from Single Depth Images Presented by: Mohammad A. Gowayyed.

Intro to DPM By Zhangliliang. Outline Intuition Introduction to DPM Model Inference(matching) Training latent SVM Training Procedure Initialization Post-processing.

Second order cone programming approaches for handing missing and uncertain data P. K. Shivaswamy, C. Bhattacharyya and A. J. Smola Discussion led by Qi.

1 A scheme for racquet sports video analysis with the combination of audio-visual information Visual Communication and Image Processing 2005 Liyuan Xing,

Paper presentation for CSI5388 PENGCHENG XI Mar. 23, 2005

Proximity Computations between Noisy Point Clouds using Robust Classification 1 Jia Pan, 2 Sachin Chitta, 1 Dinesh Manocha 1 UNC Chapel Hill 2 Willow Garage.

Image classification Given the bag-of-features representations of images from different classes, how do we learn a model for distinguishing them?

On the Relationship between Visual Attributes and Convolutional Networks Paper ID - 52.

CS335 Principles of Multimedia Systems Content Based Media Retrieval Hao Jiang Computer Science Department Boston College Dec. 4, 2007.

Ensemble Learning: An Introduction

Focused Reducts Janusz A. Starzyk and Dale Nelson.

5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.

Examples of Ensemble Methods

Learning from Multiple Outlooks Maayan Harel and Shie Mannor ICML 2011 Presented by Minhua Chen.

For Better Accuracy Eick: Ensemble Learning

Selective Transfer Machine for Personalized Facial Action Unit Detection Wen-Sheng Chu, Fernando De la Torre and Jeffery F. Cohn Robotics Institute, Carnegie.

Ranking with High-Order and Missing Information M. Pawan Kumar Ecole Centrale Paris Aseem BehlPuneet DokaniaPritish MohapatraC. V. Jawahar.

Convolutional Neural Networks for Image Processing with Applications in Mobile Robotics By, Sruthi Moola.

Latent (S)SVM and Cognitive Multiple People Tracker.

Mining Discriminative Components With Low-Rank and Sparsity Constraints for Face Recognition Qiang Zhang, Baoxin Li Computer Science and Engineering Arizona.

Qual Presentation Daniel Khashabi 1. Outline  My own line of research  Papers:  Fast Dropout training, ICML, 2013  Distributional Semantics Beyond.

Multi-Output Learning for Camera Relocalization Abner Guzmán-Rivera UIUC Pushmeet Kohli Ben Glocker Jamie Shotton Toby Sharp Andrew Fitzgibbon Shahram.

Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of.

Classifiers Given a feature representation for images, how do we learn a model for distinguishing features from different classes? Zebra Non-zebra Decision.

A Comparative Study of Kernel Methods for Classification Applications Yan Liu Oct 21, 2003.

Ensembles. Ensemble Methods l Construct a set of classifiers from training data l Predict class label of previously unseen records by aggregating predictions.

PSEUDO-RELEVANCE FEEDBACK FOR MULTIMEDIA RETRIEVAL Seo Seok Jun.

ISQS 6347, Data & Text Mining1 Ensemble Methods. ISQS 6347, Data & Text Mining 2 Ensemble Methods Construct a set of classifiers from the training data.

Human pose recognition from depth image MS Research Cambridge.

Week 10 Presentation Wesna LaLanne - REU Student Mahdi M. Kalayeh - Mentor.

Discriminative Sub-categorization Minh Hoai Nguyen, Andrew Zisserman University of Oxford 1.

Random Forests Ujjwol Subedi. Introduction What is Random Tree? ◦ Is a tree constructed randomly from a set of possible trees having K random features.

Classification Ensemble Methods 1

Discriminative Frequent Pattern Analysis for Effective Classification By Hong Cheng, Xifeng Yan, Jiawei Han, Chih- Wei Hsu Presented by Mary Biddle.

Chapter 5 – Evaluating Predictive Performance Data Mining for Business Analytics Shmueli, Patel & Bruce.

Narration/dialogue: Camera motion: Video effect: Audio effect: Shot duration: Transition to next scene: Storyboard Panel #

Max-Confidence Boosting With Uncertainty for Visual tracking WEN GUO, LIANGLIANG CAO, TONY X. HAN, SHUICHENG YAN AND CHANGSHENG XU IEEE TRANSACTIONS ON.

Loss-based Learning with Weak Supervision M. Pawan Kumar.

Parsing Natural Scenes and Natural Language with Recursive Neural Networks INTERNATIONAL CONFERENCE ON MACHINE LEARNING (ICML 2011) RICHARD SOCHER CLIFF.

Using Web Structure for Classifying and Describing Web Pages

Neural networks and support vector machines

The Relationship between Deep Learning and Brain Function

Compact Bilinear Pooling

Data Mining, Neural Network and Genetic Programming

Traffic Sign Recognition Using Discriminative Local Features Andrzej Ruta, Yongmin Li, Xiaohui Liu School of Information Systems, Computing and Mathematics.

Action Recognition in the Presence of One

Real-Time Human Pose Recognition in Parts from Single Depth Image

CS 188: Artificial Intelligence

CS 2750: Machine Learning Support Vector Machines

Two-Stream Convolutional Networks for Action Recognition in Videos

Introduction to Data Mining, 2nd Edition

The Open World of Micro-Videos

Progress Report Meng-Ting Zhong 2015/9/10.

Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits Boyu Wang and Minh Hoai Stony Brook University Experiments:

Comparison of EET and Rank Pooling on UCF101 (split 1)

Eigen-Evolution Dense Trajectory Descriptors

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Week 3 Volodymyr Bobyr.

Presentation transcript:

Improving Human Action Recognition using Score Distribution and Ranking Minh Hoai Nguyen Joint work with Andrew Zisserman 1

2 Inherent Ambiguity: When does an action begin and end?

Precise Starting Moment? 3 -Hands are being extended? -Hands are in contact?

4 When Does the Action End? -Action extends over multiple shots -Camera shows a third person in the middle

Video clip Latent location of action Consider subsequences Max HandShake classifier Action Location as Latent Information HandShake scores Recognition score (in testing) Update the classifier (in training)

Poor Performance of Max 6 DatasetWholeMax Hollywood TVHID Mean Average Precision (higher is better) Possible reasons:  The learned action classifier is far from perfect  The output scores are noisy  The maximum score is not robust Action recognition is … a hard problem 

Video clip Latent location of action Considered subsequences HandShake classifier Can We Use Mean Instead? HandShake scores Mean On Hollywood2, Mean is generally better than Max WholeMaxMean Hollywood2-Handshake But not always

Another HandShake Example 8 The proportion of HandShake is small For Whole and Mean, the Signal-to-Noise ratio is small

Latent location of actionVideo clip HandShake scores Sampled subsequences Sort Improved HandShake score Distribution-based classification Base HandShake classifier Proposed Method: Use the Distribution

Case 1: equivalent to using Mean Learning Formulation Subsequence-score distribution Video label weights bias Hinge loss Weights for Distribution Emphasize the relative importance of classifier scores Special cases: Case 2: equivalent to using Max

Controlled Experiments 11 Random action location Synthetic video Two controlled parameters: -The action percentage -, the separation between non-action and action features

Controlled Experiments 12

Hollywood2 – Progress over Time %9.3% Best Published Results Mean Average Precision (higher is better)

Hollywood2 – State-of-the-art Methods 14 Dataset Introduction (STIP + scene context) Deep Learning features Mined compound features Dense Trajectory Descriptor (DTD) Improved DTD (better motion est.) DTD + saliency same Mean Average Precision (higher is better)

Results on TVHI Dataset % Mean Average Precision (higher is better)

Weights for SSD classifiers 16

AnswerPhone Example 1 17

AnswerPhone Example 2 18

The End 19