Part-based visual tracking with online latent structural learning -Rui Yao et al. ICCV 2013 Cvlab Jung ilchae
Abstract Part based tracking On-line structural SVM training Two stage training
2.1 representation 𝐵 𝑡 =𝑏𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑏𝑜𝑥 𝑏 𝑡 𝑖 = 𝑖 𝑡ℎ 𝑝𝑎𝑟𝑡 𝑏𝑜𝑥⇒(c,r,h,w) 𝑦 𝑡 =𝑏𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑏𝑜𝑥 𝑜𝑓𝑓𝑠𝑒𝑡 𝑧 𝑡 𝑖 =𝑝𝑎𝑟𝑡 𝑏𝑜𝑥 𝑜𝑓𝑓𝑠𝑒𝑡⇒ ∆𝑐,∆𝑟,∆𝑤,∆ℎ Φ 𝑥 𝑡 , 𝑦, 𝑧 = [ 𝜙 1 𝑥 𝑡 , 𝑧 1 , 𝜙 1 𝑥 𝑡 , 𝑧 2 ,⋅⋅⋅ 𝜙 1 𝑥 𝑡 , 𝑧 𝑀 , 𝜙 2 𝑥 𝑡 , 𝑦 , 𝜙 3 𝑦, 𝑧 1 , 𝜙 3 𝑦, 𝑧 2 ⋅⋅⋅ 𝜙 3 𝑦, 𝑧 𝑀 ] 𝑏 𝑡 1 𝐵 𝑡 𝑏 𝑡 2 𝑏 𝑡 3 𝑏 𝑡 4 𝜙 1 ()=𝐴𝑝𝑝𝑒𝑎𝑟𝑎𝑛𝑐𝑒 𝑚𝑜𝑑𝑒𝑙 𝑓𝑜𝑟 𝑝𝑎𝑟𝑡 𝑏𝑜𝑥 𝜙 2 ()=𝐴𝑝𝑝𝑒𝑎𝑟𝑎𝑛𝑐𝑒 𝑚𝑜𝑑𝑒𝑙 𝑓𝑜𝑟 𝑏𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑏𝑜𝑥 𝜙 3 ()=𝑟𝑒𝑙𝑎𝑡𝑖𝑜𝑛𝑠 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝑏𝑜𝑢𝑛𝑑𝑖𝑛𝑔 𝑏𝑜𝑥 𝑎𝑛𝑑 𝑝𝑎𝑟𝑡 𝑏𝑜𝑥 𝜙 1 (), 𝜙 2 ()=𝑐𝑜𝑚𝑝𝑢𝑡𝑒𝑑 𝑓𝑟𝑜𝑚 2𝑠𝑐𝑎𝑙𝑒𝑠, 6 𝑡𝑦𝑝𝑒 ℎ𝑎𝑎𝑟−𝑙𝑖𝑘𝑒 𝑚𝑎𝑠𝑘𝑠 𝜙 3 ()= 𝑎, 𝑎 2 𝑠.𝑡 𝑎=𝑑𝑖𝑠𝑡𝑎𝑛𝑐𝑒 𝑏𝑒𝑡𝑤𝑒𝑒𝑛 𝐵 𝑡 𝑎𝑛𝑑 𝑏 𝑡 𝑖
Framework Finding target with 𝑤 𝑡 Training S-SVM sampling true Training S-SVM true sampling Finding target with sampling near 𝐵 𝑡−1 , 𝑏 𝑡−1 𝑖 𝑖=1,,𝑀 𝑦 𝑡 ∗ , 𝑧 𝑡 ∗ = arg max 𝑦,𝑧 𝑤 𝑡 Φ 𝑥 𝑡 , 𝑦, 𝑧 Sampling training data near the new target Training structured SVM to maximize target’s score
2.2 latent pegasos for training online 𝑤 𝑡+1 = arg min{ 𝑤 𝜆 2 𝑤 2 + 1 𝑁 𝑖=1 𝑁 ∆ 𝑦 𝑡 , 𝑦 𝑡,𝑖 + max 𝑧′ <𝑤,Φ( 𝑥 𝑡 , 𝑦 𝑡,𝑖 , 𝑧 ′ )>− max 𝑧 <𝑤,Φ( 𝑥 𝑡 , 𝑦 𝑡 , 𝑧 ′ )> + } ∆ 𝑦 𝑡 ,𝑦 =1− ( 𝐵 𝑡−1 + 𝑦 𝑡 )∩( 𝐵 𝑡−1 +𝑦) ( 𝐵 𝑡−1 + 𝑦 𝑡 )∪( 𝐵 𝑡−1 +𝑦) 𝑎 + =max(0,𝑎) Find 𝑤 by gradient descent algorithm 𝑤 𝑡+1 ← 1− 𝜂 𝑡 𝜆 𝑤 𝑡 + 𝜂 𝑡 𝑁 𝑖=1 𝑀 1[ max 𝑧 ′ 𝑓( 𝑥 𝑡 , 𝑦 𝑡,𝑖 , 𝑧 ′ ; 𝑤 𝑡 ) − max 𝑧 𝑓 𝑥 𝑡 , 𝑦 𝑡 ,𝑧; 𝑤 𝑡 +∆ 𝑦 𝑡 , 𝑦 𝑡,𝑖 >0 ] 𝛿Φ t y t 𝛻 𝑡 =𝜆 𝑤 𝑡 − 1 𝑁 𝑖=1 𝑁 1 max 𝑧 ′ 𝑓( 𝑥 𝑡 , 𝑦 𝑡,𝑖 , 𝑧 ′ ; 𝑤 𝑡 − max 𝑧 𝑓 𝑥 𝑡 , 𝑦 𝑡 ,𝑧; 𝑤 𝑡 +∆ 𝑦 𝑡 , 𝑦 𝑡,𝑖 >0 ]δ Φ t y t 𝑠.𝑡 𝑧 = arg max 𝑧 𝑓 𝑥 𝑡 , 𝑦 𝑡 ,𝑧; 𝑤 𝑡 , 𝑧′ = arg max 𝑧′ 𝑓( 𝑥 𝑡 , 𝑦 𝑡,𝑖 , 𝑧 ′ ; 𝑤 𝑡 ) δ Φ t y t =Φ( 𝑥 𝑡 , 𝑦 𝑡 , 𝑧 )-Φ( 𝑥 𝑡 , 𝑦 𝑡 , 𝑧′ )
2.2 latent pegasos for training online The label cost ∆ does not take into account the part boxes ∆ 𝑦 𝑡 ,𝑦 =1− ( 𝐵 𝑡−1 + 𝑦 𝑡 )∩( 𝐵 𝑡−1 +𝑦) ( 𝐵 𝑡−1 + 𝑦 𝑡 )∪( 𝐵 𝑡−1 +𝑦)
3. Two stage training Stage 1. Update 𝑢 𝑡+1 𝑖 𝑖=1,,𝑀 for part boxes 𝑢 𝑡+1 𝑗 = arg min 𝑢 𝑗 𝜆 2 𝑢 𝑗 2 + 1 𝑁 𝑘=1 𝑁 ∆ 𝑧 𝑡 , 𝑧 𝑡,𝑘 𝑗 +< 𝑢 𝑗 ,Φ 𝑥 𝑡 , 𝑧 𝑡,𝑘 𝑗 >−< 𝑢 𝑗 ,Φ( 𝑥 𝑡 , 𝑧 𝑡 𝑗 )> + Stage 2. Update 𝑣 𝑡+1 𝑖 𝑖=0,,, bounding box 𝑣 𝑡+1 ← 1− 𝜂 𝑡 𝜆 𝑣 𝑡 + 𝜂 𝑡 𝑁 𝑖=1 𝑀 1 max 𝑧 ′ 𝑓( 𝑥 𝑡 , 𝑦 𝑡,𝑖 , 𝑧 ′ ; 𝑣 𝑡 − max 𝑧 𝑓 𝑥 𝑡 , 𝑦 𝑡 ,𝑧; 𝑣 𝑡 +∆ 𝑦 𝑡 , 𝑦 𝑡,𝑖 >0]𝛿 Φ t y t δ Φ t y t =Φ( 𝑥 𝑡 , 𝑦 𝑡 , 𝑧 )-Φ( 𝑥 𝑡 , 𝑦 𝑡 , 𝑧′ )
3. Two stage training
Another problem Part box initialization
Another problem Tracking of a non-rigid object via patch based dynamic appearance modeling and adaptive Basin hopping Monte Carlo Sampling –CVPR 09’ Part box initialization This Paper Is sufficiently Big part-box advantageous?
3. Result
3. Result
3. Experiment
contribution Strong at Partial occlusion & shape deformatation Online learning latent SVM 2 stage training -> more accurate
Discussion No accumulation of positive targets Problems of this paper No accumulation of positive targets Restriction of fixed size of bounding box Problem of part based tracking Part initialization – location, size Relations between bounding box and part boxes
Feedback My recent work: Tracking with part graph matching - Part box initialization - Feature Change : size, or others - Definition of relation between bounding box and part box