Mean transform , a tutorial

Slides:



Advertisements
Similar presentations
Object Recognition from Local Scale-Invariant Features David G. Lowe Presented by Ashley L. Kapron.
Advertisements

Eigenfaces for Recognition Presented by: Santosh Bhusal.
Lecture 11: Two-view geometry
Object Specific Compressed Sensing by minimizing a weighted L2-norm A. Mahalanobis.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
Activity Recognition Aneeq Zia. Agenda What is activity recognition Typical methods used for action recognition “Evaluation of local spatio-temporal features.
Computer vision: models, learning and inference
A novel supervised feature extraction and classification framework for land cover recognition of the off-land scenario Yan Cui
Mean transform, a tutorial KH Wong mean transform v.5a1.
An Introduction to Sparse Coding, Sparse Sensing, and Optimization Speaker: Wei-Lun Chao Date: Nov. 23, 2011 DISP Lab, Graduate Institute of Communication.
Robust Object Tracking via Sparsity-based Collaborative Model
Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.
Large-Scale, Real-World Face Recognition in Movie Trailers Week 2-3 Alan Wright (Facial Recog. pictures taken from Enrique Gortez)
International Conference on Image Analysis and Recognition (ICIAR’09). Halifax, Canada, 6-8 July Video Compression and Retrieval of Moving Object.
Image Denoising via Learned Dictionaries and Sparse Representations
Ensemble Tracking Shai Avidan IEEE TRANSACTIONS ON PATTERN ANALYSIS AND MACHINE INTELLIGENCE February 2007.
CONTENT BASED FACE RECOGNITION Ankur Jain 01D05007 Pranshu Sharma Prashant Baronia 01D05005 Swapnil Zarekar 01D05001 Under the guidance of Prof.
(1) Feature-point matching by D.J.Duff for CompVis Online: Feature Point Matching Detection, Extraction.
ICA Alphan Altinok. Outline  PCA  ICA  Foundation  Ambiguities  Algorithms  Examples  Papers.
6.829 Computer Networks1 Compressed Sensing for Loss-Tolerant Audio Transport Clay, Elena, Hui.
Multiscale transforms : wavelets, ridgelets, curvelets, etc.
A Hybrid Self-Organizing Neural Gas Network James Graham and Janusz Starzyk School of EECS, Ohio University Stocker Center, Athens, OH USA IEEE World.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Face Recognition Using Neural Networks Presented By: Hadis Mohseni Leila Taghavi Atefeh Mirsafian.
Jinhui Tang †, Shuicheng Yan †, Richang Hong †, Guo-Jun Qi ‡, Tat-Seng Chua † † National University of Singapore ‡ University of Illinois at Urbana-Champaign.
Multiclass object recognition
Computer vision.
Feature and object tracking algorithms for video tracking Student: Oren Shevach Instructor: Arie nakhmani.
Introduction to Computer Vision Olac Fuentes Computer Science Department University of Texas at El Paso El Paso, TX, U.S.A.
Feature extraction 1.Introduction 2.T-test 3.Signal Noise Ratio (SNR) 4.Linear Correlation Coefficient (LCC) 5.Principle component analysis (PCA) 6.Linear.
Element 2: Discuss basic computational intelligence methods.
Local Non-Negative Matrix Factorization as a Visual Representation Tao Feng, Stan Z. Li, Heung-Yeung Shum, HongJiang Zhang 2002 IEEE Presenter : 張庭豪.
An Information Fusion Approach for Multiview Feature Tracking Esra Ataer-Cansizoglu and Margrit Betke ) Image and.
Terrorists Team members: Ágnes Bartha György Kovács Imre Hajagos Wojciech Zyla.
BING: Binarized Normed Gradients for Objectness Estimation at 300fps
Fitting: The Hough transform
21 June 2009Robust Feature Matching in 2.3μs1 Simon Taylor Edward Rosten Tom Drummond University of Cambridge.
CSE 185 Introduction to Computer Vision Face Recognition.
COMP322/S2000/L171 Robot Vision System Major Phases in Robot Vision Systems: A. Data (image) acquisition –Illumination, i.e. lighting consideration –Lenses,
Optimal Component Analysis Optimal Linear Representations of Images for Object Recognition X. Liu, A. Srivastava, and Kyle Gallivan, “Optimal linear representations.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Real-Time Tracking with Mean Shift Presented by: Qiuhua Liu May 6, 2005.
MACHINE LEARNING 7. Dimensionality Reduction. Dimensionality of input Based on E Alpaydın 2004 Introduction to Machine Learning © The MIT Press (V1.1)
Frank Bergschneider February 21, 2014 Presented to National Instruments.
Computer Vision Lecture 7 Classifiers. Computer Vision, Lecture 6 Oleh Tretiak © 2005Slide 1 This Lecture Bayesian decision theory (22.1, 22.2) –General.
Jianchao Yang, John Wright, Thomas Huang, Yi Ma CVPR 2008 Image Super-Resolution as Sparse Representation of Raw Image Patches.
Martina Uray Heinz Mayer Joanneum Research Graz Institute of Digital Image Processing Horst Bischof Graz University of Technology Institute for Computer.
Ch. 4: Feature representation
Lecture 16: Image alignment
Computer vision: models, learning and inference
Compressive Coded Aperture Video Reconstruction
Bag-of-Visual-Words Based Feature Extraction
Lecture 8:Eigenfaces and Shared Features
Recognition: Face Recognition
Principal Component Analysis (PCA)
Machine Learning Feature Creation and Selection
Ch. 4: Feature representation
Outline Peter N. Belhumeur, Joao P. Hespanha, and David J. Kriegman, “Eigenfaces vs. Fisherfaces: Recognition Using Class Specific Linear Projection,”
Detecting Artifacts and Textures in Wavelet Coded Images
Image Segmentation Techniques
Outline S. C. Zhu, X. Liu, and Y. Wu, “Exploring Texture Ensembles by Efficient Markov Chain Monte Carlo”, IEEE Transactions On Pattern Analysis And Machine.
Exposing Digital Forgeries by Detecting Traces of Resampling Alin C
First Homework One week
Parallelization of Sparse Coding & Dictionary Learning
Paper Reading Dalong Du April.08, 2011.
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
AHED Automatic Human Emotion Detection
Midterm Exam Closed book, notes, computer Similar to test 1 in format:
Progress Report Alvaro Velasquez.
Presentation transcript:

Mean transform , a tutorial KH Wong mean transform v.5a

Introduction What is object tracking Track an object in a video, the user gives an initial bounding box Find the bounding box that cover the target pattern in every frame of the video It is difficult because :scale, orientation and location changes mean transform v.5a

Motivation Target tracking is useful in surveillance, security and virtual reality applications Examples: Car tracking Human tracking mean transform v.5a

Our method We use Sparse coding to Modify mean-shift that handle location, scale, orientation changes Each is a tracking space, combine them together. Ref: Zhe Zhang, Kin Hong Wong, "Pyramid-based Visual Tracking Using Sparsity Represented Mean Transform", IEEE international conference on Computer vision and pattern recognition, CVPR 14, Columbus, Ohio, USA, June 24-27,2014. mean transform v.5a

What have been achieved Target tracking method that outperforms many other approaches Show testing result Demo videos mean transform v.5a

Overview of the method and algorithm Part 1: Theory and methods Sparse coding, an introduction Mean-shift Location Orientation Scale Histogram calculation Probability calculation (Bhattacharyya coefficient.) Part 2: Mean transform tracking Algorithm mean transform v.5a

Part 1 Theory and methods mean transform v.5a

Sparse coding A method to extract parameters that represent a pattern. It belongs to the method of “L1 compressed sensing” ?? Compressed sensing (also known as compressive sensing, compressive sampling, or sparse sampling) is a signal processing technique for efficiently acquiring and reconstructing a signal, by finding solutions tounderdetermined linear systems. (http://en.wikipedia.org/wiki/Compressed_sensing) An alternative is called PCA which uses L2 norm (http://en.wikipedia.org/wiki/Principal_component_analysis) ?? Ref: http://lear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf http://www.math.hkbu.edu.hk/~ttang/UsefulCollections/compressed-sensing1.pdf mean transform v.5a

Sparse coding programs Spams: http://spams-devel.gforge.inria.fr/ We used the following commands in matlab mexTrainDL (train dictionary) mexLasso mean transform v.5a

Dictionary ?? mean transform v.5a http://lear.inrialpes.fr/people/mairal/resources/pdf/Grenoble2010.pdf

Sparse coding operation 32 User select a window Resize X into 32x32 pixels Collect many patches of X, each patch is 16x16, by shifting the patch window Use sparse coding to learn the dictionary D(256*40): You may treat D as a pool of features describing the image X Each patch is a feature. 32 Each patch (red box) is 16x16 pixels 16x16 patches found for training D mean transform v.5a

Sparse coding idea 256 because each patch has 256 pixels, hence 256 dimensions for each feature D(m=256*p=40) A face has many features Left_eye, right_eye, noise, chin, nose, left ear, right_eye, hair…. In theory p=256 because we have 16x16 =256 patches for training, but we drop the unimportant patches (features), and keep only 40 here. It is found by trail-and-error. Image X  dictionary D(m*p) If you are given a new unknown picture Y, the sparse coding algorithm would calculate  for Y, so Y=D(m*p)*(p*1) (p*1) is the sparse index vector describing the picture Y. It has many zeroes, just highlight the important features E.g. Face measurement= Left_eye*0.2+ right_eye,*0.22+ nose*0.3+ , left ear*0.4+…. If |  | is high it is a face , otherwise not. (p*1) has many zeroes, only highlight important features When using Sparse coding , we have two steps, Use an image X to Obtain the dictionary D (mexTrainDL in tracker.m) When a new Y image arrraives, claute coefeictnt |  | using 40 mean transform v.5a

Our dictionary finding procedures around line 58 of tracker.m For the object window X (32x32), it is first selected by the user Want to track location, scale, orientation of X in subsequent images= what to see if an unknown window contain X or not, check if its has the important features. From Image X (resize to 32x32), represent X based on the combination of patches (each patch =16x16, the red boxes in the diagram). It is achieved by training, so (32-16)*(32-16)=256 patches available for training Use mexTrainDL( ) of Spams (the open source library http://spams-devel.gforge.inria.fr/ to obtain a dictionary (D) of size 256*40 sparse dictionary matrix. 40 is selected arbitrary , found by experiment. In theory you can choose 256 in max but longer than 40 does not increase accuracy X=object window Spams: http://spams-devel.gforge.inria.fr/ mean transform v.5a

The meaning of the Dictionary Each column of D is a sparse sample of size linearized 16*16=256? Same size of a patch but linearized to become a vector of 256*1. We can consider Y as a parameter of X I.e. Y256*1=D256*40*40*1 Y (size 256*1) is the reconstructed image ptach (linearized, can be reshaped back to a 16x16 image) that correspond to the selected ?? The dictionary D (size 256*40)and The sparse index  vector (size 40*1), contains many zeroes, that’s why it is called sparse coding mean transform v.5a

Theory Mean shift See lecture note mean transform v.5a

Location transformation mean transform v.5a

Rotation transformation mean transform v.5a

Scale transformation mean transform v.5a

Histogram representation mean transform v.5a

Trivial templates mean transform v.5a

Pyramid mean transform v.5a

Particle filter mean transform v.5a

The mean-transform tracking algorithm Part 2 The mean-transform tracking algorithm mean transform v.5a

Mean – transform tracking algorithm see tracker.m Dictionary formation Dictionary and Trivial template for handling lighting change For frame =1 until the last frame Iterate n time (n=5 typical) Form Pyramid Use Particle filter Mean shift for location, scale, orientation mean transform v.5a

Our dictionary finding procedures around line 58 of tracker.m For the object window X (32x32), it is first selected by the user Want to track location, scale, orientation of X in subsequent images From Image X (resize to 32x32), represent X based on the combination of patches (each patch =16x16, the red boxes in the diagram). It is achieved by training, so (32-16)*(32-16)=256 patches available for training Use mexTrainDL( ) of Spams (the open source libaray http://spams-devel.gforge.inria.fr/ to obtain a dictionary (D) of size 256*40 sparse dictionary matrix. 40 is selected arbitrary , found by experiment. In theory you can choose 256 in max but longer than 40 does not increase accuracy X=object window Spams: http://spams-devel.gforge.inria.fr/ mean transform v.5a

The meaning of the Dictionary Each column of D is a sparse sample of size linearized 16*16=256? Same size of a patch but linearized to become a vector of 256*1. We can consider Y as a parameter of X I.e. Y256*1=D256*40*40*1 Y (size 256*1) is the reconstructed image ptach (linearized, can be reshaped back to a 16x16 image) that correspond to the selected ?? The dictionary D (size 256*40)and The sparse index  vector (size 40*1), contains many zeroes, that’s why it is called sparse coding mean transform v.5a

Trivial templates Trivial templates are added to overcome light over exploded images D is 256*40, T is 256*16 -T is 256*16 so the final dictionary D’ is 256*72 Explain why we use Trivial Templates?? -T=Mostly white with a small black square window T=Mostly black with a small white square window mean transform v.5a

Location Mean transform mean transform v.5a

Orientation Mean transform mean transform v.5a

Scale Mean transform mean transform v.5a