Comparison of EET and Rank Pooling on UCF101 (split 1)

Slides:

Advertisements

Similar presentations

2005/01/191/14 Overview of Fine Granularity Scalability in MPEG-4 Video Standard Weiping Li Fellow, IEEE IEEE Transactions on Circuits and Systems for.

Advertisements

Aggregating local image descriptors into compact codes

Limin Wang, Yu Qiao, and Xiaoou Tang

Improving Human Action Recognition using Score Distribution and Ranking Minh Hoai Nguyen Joint work with Andrew Zisserman 1.

Nonsmooth Nonnegative Matrix Factorization (nsNMF) Alberto Pascual-Montano, Member, IEEE, J.M. Carazo, Senior Member, IEEE, Kieko Kochi, Dietrich Lehmann,

Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.

Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.

A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.

(plain black-on-white slides are Evan’s). Dense NRSFM Approach Overview.

Two-Dimensional Channel Coding Scheme for MCTF- Based Scalable Video Coding IEEE TRANSACTIONS ON MULTIMEDIA,VOL. 9,NO. 1,JANUARY Yu Wang, Student.

Real-time Combined 2D+3D Active Appearance Models Jing Xiao, Simon Baker,Iain Matthew, and Takeo Kanade CVPR 2004 Presented by Pat Chan 23/11/2004.

ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.

Yuping Lin and Gérard Medioni.  Introduction  Method  Register UAV streams to a global reference image ▪ Consecutive UAV image registration ▪ UAV to.

A Discriminative CNN Video Representation for Event Detection

Action recognition with improved trajectories

Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July

Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of.

Mentor: Salman Khokhar Action Recognition in Crowds Week 7.

A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.

Adaptive Rate Control for HEVC Visual Communications and Image Processing (VCIP), 2012 IEEE Junjun Si, Siwei Ma, Xinfeng Zhang, Wen Gao 1.

Locality-constrained Linear Coding for Image Classification

A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.

Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.

First-Person Activity Recognition: What Are They Doing to Me? M. S. Ryoo and Larry Matthies Jet Propulsion Laboratory, California Institute of Technology,

A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER

Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue

Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.

Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.

Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,

Naifan Zhuang, Jun Ye, Kien A. Hua

Unsupervised Learning of Video Representations using LSTMs

Robust and Fast Collaborative Tracking with Two Stage Sparse Optimization Authors: Baiyang Liu, Lin Yang, Junzhou Huang, Peter Meer, Leiguang Gong and.

Action-Grounded Push Affordance Bootstrapping of Unknown Objects

Compact Bilinear Pooling

Deep Predictive Model for Autonomous Driving

Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.

Saliency-guided Video Classification via Adaptively weighted learning

Presented by Omer Shakil

Mauricio Hess-Flores1, Mark A. Duchaineau2, Kenneth I. Joy3

Training Techniques for Deep Neural Networks

Introduction Feature Extraction Discussions Conclusions Results

CS6890 Deep Learning Weizhen Cai

Action Recognition in Temporally Untrimmed Videos

Computer Vision James Hays

CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes

Two-Stream Convolutional Networks for Action Recognition in Videos

The Open World of Micro-Videos

Human Action Recognition Week 8

Numerical Computation and Optimization

Parallelization of Sparse Coding & Dictionary Learning

Object Tracking: Comparison of

Iterative Crowd Counting

Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits Boyu Wang and Minh Hoai Stony Brook University Experiments:

Sequence-to-Segments Networks for Segment Detection Zijun Wei1, Boyu Wang1, Minh Hoai1, Jianming Zhang2, Xiaohui Shen3, Zhe Lin2, Radomír Měch2, Dimitris.

Boyu Wang and Minh Hoai Stony Brook University

Introduction to Object Tracking

Eigen-Evolution Dense Trajectory Descriptors

Dynamic modeling of gene expression data

Human-object interaction

Scalable light field coding using weighted binary images

VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION

Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.

Presented By: Harshul Gupta

Strength of relation High Low Number of data Relationship Data

Week 3 Volodymyr Bobyr.

Self-Supervised Cross-View Action Synthesis

Week 7 Presentation Ngoc Ta Aidean Sharghi

GIF2Video: Color Dequantization and Temporal Interpolation of GIF images Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai. Stony Brook.

Point Set Representation for Object Detection and Beyond

Presentation transcript:

Comparison of EET and Rank Pooling on UCF101 (split 1) Eigen-Evolution Dense Trajectory Descriptors Yang Wang, Vinh Tran, Minh Hoai Stony Brook University Introduction Question: How to encode a sequence of feature vectors ? Naïve Approach: Averaging This ignores the temporal information of the sequence This paper: We proposed a new method for pooling feature sequences Encodes the temporal evolution of feature sequences in principle speed/directions bType equation here. Eigen-Evolution Trajectory Descriptors Experiments c Eigen-Evolution Pooling Datasets Hollywood2: 12 actions, 1707 video clips UCF101: 101 actions, 13320 video clips View a sequence of feature vectors as an ordered set of 1D functions Comparison of EET and Rank Pooling on UCF101 (split 1) Feature vectors Ordered set of 1D functions Decompose each function as a linear combination of basis functions Proposed descriptors 𝐅 Rank EET1 EET2 EET3 EET1+2 EET2+3 EET1+2+3 82.4 78.0 82.3 81.7 82.8 83.4 83.8 Comparison of EET and TDD on Hollywood2 and UCF101 (EET significantly outperform TDD in both datasets) Dataset Feature Maps TDD EET Improve Hollywood2 Spatial 43.5 54.4 10.9 Temporal 63.1 66.0 2.9 2-Stream 64.7 68.7 4.0 UCF101 (split 1) 77.5 84.4 6.9 77.9 81.0 3.1 86.1 88.8 2.7 The basis functions 𝐆 ∗ can be found by optimizing the reconstruction error: 𝐆 ∗ = argmin 𝐆 T 𝐆=𝐈 𝐅 i 𝐆 𝐆 𝐓 𝐚 i − 𝐚 i 2 Deep-Learning Descriptors for Trajectories Comparison of EET and state-of-the-art action recognition methods (at multi-layers and multi-scales, video pooling) 𝐆 ∗ can be found using eigen decomposition of 𝐁, the covariance matrix between time steps: Hollywood2 UCF101 Method Mean AP (%) 2-stream TSN *62.6 iDT 64.7 Non-Action 71.0 SSD + RCS 73.6 VideoDarwin 73.7 HRP + iDT 76.7 TDD *68.4 TDD + iDT *76.7 EET 74.5 EET + iDT 78.7 Method Accuracy (%) iDT 85.9 C3D + iDT 90.4 HRP + iDT 91.4 TSN 94.2 I3D 98.0 TDD 90.3 TDD + iDT 91.5 EET 91.8 EET + iDT 92.2 EET + iDT + TSN 94.5 𝐁= 𝐅 𝐅 𝑇 𝐅 = 𝑖=1 𝐿 𝜆 𝑖 𝐞 𝑖 𝐞 𝑖 𝑇 , 𝜆 1 ≥⋯≥ 𝜆 𝐿 Input Video Feature Maps Feature Sequence Trajectory Descriptors with an example trajectory span L frames 𝐡 𝐰 𝐓 Eigen-Evolution Functions Average pooling 𝐇 𝐖 𝐓 𝐓𝐃𝐃 for original feature sequences: for accumulated feature sequences: 𝑑 ⋮ 𝐿 Eigen-Evolution Pooling 𝐡 𝐰 𝐓 𝐄𝐄𝐓 New state-of-the-art on Hollywood2 Acknowledgement: This project is partially supported by the National Science Foundation Award IIS-1566248 and Samsung Global Research Outreach. Visualization of learned basis functions