Comparison of EET and Rank Pooling on UCF101 (split 1)

Slides:



Advertisements
Similar presentations
2005/01/191/14 Overview of Fine Granularity Scalability in MPEG-4 Video Standard Weiping Li Fellow, IEEE IEEE Transactions on Circuits and Systems for.
Advertisements

Aggregating local image descriptors into compact codes
Limin Wang, Yu Qiao, and Xiaoou Tang
Improving Human Action Recognition using Score Distribution and Ranking Minh Hoai Nguyen Joint work with Andrew Zisserman 1.
Nonsmooth Nonnegative Matrix Factorization (nsNMF) Alberto Pascual-Montano, Member, IEEE, J.M. Carazo, Senior Member, IEEE, Kieko Kochi, Dietrich Lehmann,
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Watching Unlabeled Video Helps Learn New Human Actions from Very Few Labeled Snapshots Chao-Yeh Chen and Kristen Grauman University of Texas at Austin.
A Comprehensive Study on Third Order Statistical Features for Image Splicing Detection Xudong Zhao, Shilin Wang, Shenghong Li and Jianhua Li Shanghai Jiao.
(plain black-on-white slides are Evan’s). Dense NRSFM Approach Overview.
Two-Dimensional Channel Coding Scheme for MCTF- Based Scalable Video Coding IEEE TRANSACTIONS ON MULTIMEDIA,VOL. 9,NO. 1,JANUARY Yu Wang, Student.
Real-time Combined 2D+3D Active Appearance Models Jing Xiao, Simon Baker,Iain Matthew, and Takeo Kanade CVPR 2004 Presented by Pat Chan 23/11/2004.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
Yuping Lin and Gérard Medioni.  Introduction  Method  Register UAV streams to a global reference image ▪ Consecutive UAV image registration ▪ UAV to.
A Discriminative CNN Video Representation for Event Detection
Action recognition with improved trajectories
Group Sparse Coding Samy Bengio, Fernando Pereira, Yoram Singer, Dennis Strelow Google Mountain View, CA (NIPS2009) Presented by Miao Liu July
Beauty is Here! Evaluating Aesthetics in Videos Using Multimodal Features and Free Training Data Yanran Wang, Qi Dai, Rui Feng, Yu-Gang Jiang School of.
Mentor: Salman Khokhar Action Recognition in Crowds Week 7.
A hybrid SOFM-SVR with a filter-based feature selection for stock market forecasting Huang, C. L. & Tsai, C. Y. Expert Systems with Applications 2008.
Adaptive Rate Control for HEVC Visual Communications and Image Processing (VCIP), 2012 IEEE Junjun Si, Siwei Ma, Xinfeng Zhang, Wen Gao 1.
Locality-constrained Linear Coding for Image Classification
A DISTRIBUTION BASED VIDEO REPRESENTATION FOR HUMAN ACTION RECOGNITION Yan Song, Sheng Tang, Yan-Tao Zheng, Tat-Seng Chua, Yongdong Zhang, Shouxun Lin.
Event retrieval in large video collections with circulant temporal encoding CVPR 2013 Oral.
First-Person Activity Recognition: What Are They Doing to Me? M. S. Ryoo and Larry Matthies Jet Propulsion Laboratory, California Institute of Technology,
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Zuxuan Wu, Xi Wang, Yu-Gang Jiang, Hao Ye, Xiangyang Xue
Learning Photographic Global Tonal Adjustment with a Database of Input / Output Image Pairs.
Chapter 61 Chapter 7 Review of Matrix Methods Including: Eigen Vectors, Eigen Values, Principle Components, Singular Value Decomposition.
Hierarchical Motion Evolution for Action Recognition Authors: Hongsong Wang, Wei Wang, Liang Wang Center for Research on Intelligent Perception and Computing,
Naifan Zhuang, Jun Ye, Kien A. Hua
Unsupervised Learning of Video Representations using LSTMs
Robust and Fast Collaborative Tracking with Two Stage Sparse Optimization Authors: Baiyang Liu, Lin Yang, Junzhou Huang, Peter Meer, Leiguang Gong and.
Action-Grounded Push Affordance Bootstrapping of Unknown Objects
Compact Bilinear Pooling
Deep Predictive Model for Autonomous Driving
Deep Compositional Cross-modal Learning to Rank via Local-Global Alignment Xinyang Jiang, Fei Wu, Xi Li, Zhou Zhao, Weiming Lu, Siliang Tang, Yueting.
Saliency-guided Video Classification via Adaptively weighted learning
Presented by Omer Shakil
Mauricio Hess-Flores1, Mark A. Duchaineau2, Kenneth I. Joy3
Training Techniques for Deep Neural Networks
Introduction Feature Extraction Discussions Conclusions Results
CS6890 Deep Learning Weizhen Cai
Action Recognition in Temporally Untrimmed Videos
Computer Vision James Hays
CVPR 2014 Orientational Pyramid Matching for Recognizing Indoor Scenes
Two-Stream Convolutional Networks for Action Recognition in Videos
The Open World of Micro-Videos
Human Action Recognition Week 8
Numerical Computation and Optimization
Parallelization of Sparse Coding & Dictionary Learning
Papers 15/08.
Object Tracking: Comparison of
Iterative Crowd Counting
Predicting Body Movement and Recognizing Actions: an Integrated Framework for Mutual Benefits Boyu Wang and Minh Hoai Stony Brook University Experiments:
Sequence-to-Segments Networks for Segment Detection Zijun Wei1, Boyu Wang1, Minh Hoai1, Jianming Zhang2, Xiaohui Shen3, Zhe Lin2, Radomír Měch2, Dimitris.
Boyu Wang and Minh Hoai Stony Brook University
Introduction to Object Tracking
Eigen-Evolution Dense Trajectory Descriptors
Dynamic modeling of gene expression data
Human-object interaction
Scalable light field coding using weighted binary images
VERY DEEP CONVOLUTIONAL NETWORKS FOR LARGE-SCALE IMAGE RECOGNITION
Bug Localization with Combination of Deep Learning and Information Retrieval A. N. Lam et al. International Conference on Program Comprehension 2017.
Presented By: Harshul Gupta
Strength of relation High Low Number of data Relationship Data
Week 3 Volodymyr Bobyr.
Self-Supervised Cross-View Action Synthesis
Week 7 Presentation Ngoc Ta Aidean Sharghi
GIF2Video: Color Dequantization and Temporal Interpolation of GIF images Yang Wang, Haibin Huang, Chuan Wang, Tong He, Jue Wang, Minh Hoai. Stony Brook.
Point Set Representation for Object Detection and Beyond
Presentation transcript:

Comparison of EET and Rank Pooling on UCF101 (split 1) Eigen-Evolution Dense Trajectory Descriptors Yang Wang, Vinh Tran, Minh Hoai Stony Brook University Introduction Question: How to encode a sequence of feature vectors ? Naïve Approach: Averaging This ignores the temporal information of the sequence This paper: We proposed a new method for pooling feature sequences Encodes the temporal evolution of feature sequences in principle speed/directions bType equation here. Eigen-Evolution Trajectory Descriptors Experiments c Eigen-Evolution Pooling Datasets Hollywood2: 12 actions, 1707 video clips UCF101: 101 actions, 13320 video clips View a sequence of feature vectors as an ordered set of 1D functions Comparison of EET and Rank Pooling on UCF101 (split 1) Feature vectors Ordered set of 1D functions Decompose each function as a linear combination of basis functions Proposed descriptors 𝐅 Rank EET1 EET2 EET3 EET1+2 EET2+3 EET1+2+3 82.4 78.0 82.3 81.7 82.8 83.4 83.8 Comparison of EET and TDD on Hollywood2 and UCF101 (EET significantly outperform TDD in both datasets) Dataset Feature Maps TDD EET Improve Hollywood2 Spatial 43.5 54.4 10.9 Temporal 63.1 66.0 2.9 2-Stream 64.7 68.7 4.0 UCF101 (split 1) 77.5 84.4 6.9 77.9 81.0 3.1 86.1 88.8 2.7 The basis functions 𝐆 ∗ can be found by optimizing the reconstruction error: 𝐆 ∗ = argmin 𝐆 T 𝐆=𝐈 𝐅 i 𝐆 𝐆 𝐓 𝐚 i − 𝐚 i 2 Deep-Learning Descriptors for Trajectories Comparison of EET and state-of-the-art action recognition methods (at multi-layers and multi-scales, video pooling) 𝐆 ∗ can be found using eigen decomposition of 𝐁, the covariance matrix between time steps: Hollywood2 UCF101 Method Mean AP (%) 2-stream TSN *62.6 iDT 64.7 Non-Action 71.0 SSD + RCS 73.6 VideoDarwin 73.7 HRP + iDT 76.7 TDD *68.4 TDD + iDT *76.7 EET 74.5 EET + iDT 78.7 Method Accuracy (%) iDT 85.9 C3D + iDT 90.4 HRP + iDT 91.4 TSN 94.2 I3D 98.0 TDD 90.3 TDD + iDT 91.5 EET 91.8 EET + iDT 92.2 EET + iDT + TSN 94.5 𝐁= 𝐅 𝐅 𝑇 𝐅 = 𝑖=1 𝐿 𝜆 𝑖 𝐞 𝑖 𝐞 𝑖 𝑇 , 𝜆 1 ≥⋯≥ 𝜆 𝐿 Input Video Feature Maps Feature Sequence Trajectory Descriptors with an example trajectory span L frames 𝐡 𝐰 𝐓 Eigen-Evolution Functions Average pooling 𝐇 𝐖 𝐓 𝐓𝐃𝐃 for original feature sequences: for accumulated feature sequences: 𝑑 ⋮ 𝐿 Eigen-Evolution Pooling 𝐡 𝐰 𝐓 𝐄𝐄𝐓 New state-of-the-art on Hollywood2 Acknowledgement: This project is partially supported by the National Science Foundation Award IIS-1566248 and Samsung Global Research Outreach. Visualization of learned basis functions