Comparison of EET and Rank Pooling on UCF101 (split 1) Eigen-Evolution Dense Trajectory Descriptors Yang Wang, Vinh Tran, Minh Hoai Stony Brook University Introduction Question: How to encode a sequence of feature vectors ? Naïve Approach: Averaging This ignores the temporal information of the sequence This paper: We proposed a new method for pooling feature sequences Encodes the temporal evolution of feature sequences in principle speed/directions bType equation here. Eigen-Evolution Trajectory Descriptors Experiments c Eigen-Evolution Pooling Datasets Hollywood2: 12 actions, 1707 video clips UCF101: 101 actions, 13320 video clips View a sequence of feature vectors as an ordered set of 1D functions Comparison of EET and Rank Pooling on UCF101 (split 1) Feature vectors Ordered set of 1D functions Decompose each function as a linear combination of basis functions Proposed descriptors 𝐅 Rank EET1 EET2 EET3 EET1+2 EET2+3 EET1+2+3 82.4 78.0 82.3 81.7 82.8 83.4 83.8 Comparison of EET and TDD on Hollywood2 and UCF101 (EET significantly outperform TDD in both datasets) Dataset Feature Maps TDD EET Improve Hollywood2 Spatial 43.5 54.4 10.9 Temporal 63.1 66.0 2.9 2-Stream 64.7 68.7 4.0 UCF101 (split 1) 77.5 84.4 6.9 77.9 81.0 3.1 86.1 88.8 2.7 The basis functions 𝐆 ∗ can be found by optimizing the reconstruction error: 𝐆 ∗ = argmin 𝐆 T 𝐆=𝐈 𝐅 i 𝐆 𝐆 𝐓 𝐚 i − 𝐚 i 2 Deep-Learning Descriptors for Trajectories Comparison of EET and state-of-the-art action recognition methods (at multi-layers and multi-scales, video pooling) 𝐆 ∗ can be found using eigen decomposition of 𝐁, the covariance matrix between time steps: Hollywood2 UCF101 Method Mean AP (%) 2-stream TSN *62.6 iDT 64.7 Non-Action 71.0 SSD + RCS 73.6 VideoDarwin 73.7 HRP + iDT 76.7 TDD *68.4 TDD + iDT *76.7 EET 74.5 EET + iDT 78.7 Method Accuracy (%) iDT 85.9 C3D + iDT 90.4 HRP + iDT 91.4 TSN 94.2 I3D 98.0 TDD 90.3 TDD + iDT 91.5 EET 91.8 EET + iDT 92.2 EET + iDT + TSN 94.5 𝐁= 𝐅 𝐅 𝑇 𝐅 = 𝑖=1 𝐿 𝜆 𝑖 𝐞 𝑖 𝐞 𝑖 𝑇 , 𝜆 1 ≥⋯≥ 𝜆 𝐿 Input Video Feature Maps Feature Sequence Trajectory Descriptors with an example trajectory span L frames 𝐡 𝐰 𝐓 Eigen-Evolution Functions Average pooling 𝐇 𝐖 𝐓 𝐓𝐃𝐃 for original feature sequences: for accumulated feature sequences: 𝑑 ⋮ 𝐿 Eigen-Evolution Pooling 𝐡 𝐰 𝐓 𝐄𝐄𝐓 New state-of-the-art on Hollywood2 Acknowledgement: This project is partially supported by the National Science Foundation Award IIS-1566248 and Samsung Global Research Outreach. Visualization of learned basis functions