Histograms of Oriented Gradients for Human Detection

Histograms of Oriented Gradients for Human Detection
Navneet Dalal and Bill Triggs French National Institute for Research in Computer Science and Control (INRIA) CVPR 05 法國國家計算機技術和控制研究所

OpenCV implement peopledetect.cpp

Introduction Challenge: variable appearance and the wide range of poses Histogram of Oriented Gradients (HOG) are feature descriptors used in computer vision and image processing for the purpose of object detection. Basic idea : local object appearance and shape can be characterized rather well by the distribution of local intensity gradients or edge directions. Similar with edge orientation histograms [4,5], SIFT descriptors [12] and shape contexts [1] 由於人具有多種形狀、多種姿態，並且受到周圍複雜的環境、攝像機拍攝角度差異及行人尺度差異等眾多因素的影響，因此使得行人檢測成為一個非常困難的問題。 normalized Histogram of Oriented Gradient (HOG) descriptors provide excellent performance relative to other existing feature sets including wavelets [17,22]. HOG描述器最重要的思想是：在一副圖像中，局部目標的表象和形狀（appearance and shape）能夠被梯度或邊緣的方向密度分佈很好地描述。

Dataset(1/2) 64x128

Dataset(2/2) INRIA negative images (64x128 samples)

An overview of our feature extraction and object detection chain
local spatial regions that we call cells. 具體的實現方法是：首先將圖像分成小的連通區域，我們把它叫細胞單元(cell)。然後採集細胞單元中各像素點的梯度或邊緣的方向直方圖。最後把這些直方圖組合起來就可以構成特徵描述器。減少細胞間的對比差異還可以把這些局部直方圖在圖像的更大的區間(block)中進行對比度歸一化(contrast-normalized)，此方法通過先計算各直方圖在這個區間(block)中的密度，然後根據這個密度值對區間中的各個方格單元做歸一化。通過這個歸一化後，能對光照變化和陰影獲得更好的穩定性。與其他的特徵描述方法相比，HOG描述器後很多優點。首先，由於HOG方法是在圖像的局部細胞單元上操作，所以它對圖像幾何的（geometric）和光學的（photometric）形變都能保持很好的不變性，這兩種形變只會出現在更大的空間領域上。其次，作者通過實驗發現，在粗的空域抽樣（coarse spatial sampling）、精細的方向抽樣（fine orientation sampling）以及較強的局部光學歸一化（strong local photometric normalization）等條件下，只要行人大體上能夠保持直立的姿勢，就容許行人有一些細微的肢體動作，這些細微的動作可以被忽略而不影響檢測效果。綜上所述，HOG方法是特別適合於做圖像中的行人檢測的。 Person / non-person classification

Implementation(1/7) Color / gamma normalization Gradient Computation
Grayscale, RGB and LAB color spaces optionally with power law (gamma) equalization Not obvious effect Gradient Computation 1-D point derivatives : uncentred [-1, 1], centred [-1, 0, 1] and cubic-corrected [1,-8, 0, 8,-1] 3*3 Sobel masks 2*2 diagonal ones Gaussian smoothing with σ 1-D at σ =0 work best The simplest scheme turns out to be the best DET(Detection Error Tradeoff) 作者分別在灰度空間、RGB色彩空間和LAB色彩空間上對圖像進行色彩和伽馬歸一化，但實驗結果顯示，這個歸一化的預處理工作對最後的結果沒有影響，原因可能是：在後續步驟中也有歸一化的過程，那些過程可以取代這個預處理的歸一化。所以，在實際應用中，這一步可以省略。 These normalizations have only a modest effect on performance, perhaps because the subsequent descriptor normalization achieves similar results. 最常用的方法是：簡單地使用一個一維的離散微分模板（1-D centered point discrete derivative mask）在一個方向上或者同時在水平和垂直兩個方向上對圖像進行處理，更確切地說，這個方法需要使用下面的濾波器核濾除圖像中的色彩或變化劇烈的數據（color or intensity data）作者也嘗試了其他一些更複雜的模板，如3×3 Sobel 模板，或對角線模板（diagonal masks），但是在這個行人檢測的實驗中，這些複雜模板的表現都較差，所以作者的結論是：模板越簡單，效果反而越好。作者也嘗試了在使用微分模板前加入一個高斯平滑濾波，但是這個高斯平滑濾波的加入使得檢測效果更差，原因是：許多有用的圖像信息是來自變化劇烈的邊緣，而在計算梯度之前加入高斯濾波會把這些邊緣濾除掉。為了更好地瞭解演算法的性能，本文中評價檢測器性能的DET(Detection Error Tradeoff)曲線來進行演算法間性能的對比。使用分類器對測試集中的所有圖片進行檢測，並記錄其在不同分類器閾值下的漏檢率(miss rate)和誤報率 FPPW(False Positive Per Window)，從而得到DET 曲線。其中，miss rate 根據下式進行計算：漏檢率=FalseNeg/ (TruePos + FalseNeg) FPPW 根據下式進行計算：誤報率=FalsePos/(TrueNeg + FalsePos) Reducing gradient scale from 3 to 0 decreases false positives by 10 times

Implementation(2/7) Creating the orientation histograms
Weighted vote for an edge orientation histogram over cells. Unsigned gradients used in conjunction with 9 histogram channels performed best in their human detection experiments Weight: gradient magnitude itself, or some function of the magnitude (square, square root, clipped) Gradient magnitude itself generally produces the best results. 第三步就是為圖像的每個細胞單元構建梯度方向直方圖。細胞單元中的每一個像素點都為某個基於方向的直方圖通道（orientation-based histogram channel）投票。投票是採取加權投票（weighted voting）的方式，即每一票都是帶權值的，這個權值是根據該像素點的梯度幅度計算出來。可以採用幅值本身或者它的函數來表示這個權值，實際測試表明：使用幅值來表示權值能獲得最佳的效果，當然，也可以選擇幅值的函數來表示，比如幅值的平方根（square root）、幅值的平方（square of the gradient magnitude）、幅值的截斷形式（clipped version of the magnitude）等。細胞單元（cell）可以是矩形的（rectangular），也可以是星形的（radial）。直方圖通道是平均分佈在0°-180°（無向）或0°-360°（有向）範圍內。作者發現，採用無向的梯度和9個直方圖通道，能在行人檢測試驗中取得最佳的效果。 For humans, the wide range of clothing and background colours presumably makes the signs of contrasts uninformative. However note that including sign information does help substantially in some other object recognition tasks, e.g. cars, motorbikes. Increasing orientation bins from 4 to 9 decreases false positives by 10 times cell

Implementation(3/7) Normalization and descriptor blocks
Owing to local variations of illumination and foreground-background contrast Group cells into larger, spatially connected blocks and normalize each block separately Two main block geometries : rectangular R-HOG blocks and circular C-HOG blocks. R-HOG : 3 parameter # of cells per block # of pixels per cell # of channels per cell histogram Optimal : 3x3 cell blocks of 6x6 pixel cells with 9 channels. Gaussian spatial weight 由於局部光照的變化（variations of illumination）以及前景-背景對比度（foreground-background contrast）的變化，使得梯度強度（gradient strengths）的變化範圍非常大。這就需要對梯度強度做歸一化，作者採取的辦法是：把各個細胞單元組合成大的、空間上連通的區間（blocks）。這樣以來，HOG 描述器就變成了由各區間所有細胞單元的直方圖成分所組成的一個向量。這些區間是互有重疊的，這就意味著：每一個細胞單元的輸出都多次作用於最終的描述器。區間有兩個主要的幾何形狀——矩形區間（R-HOG）和環形區間（C-HOG）。R-HOG區間大體上是一些方形的格子，它可以有三個參數來表徵：每個區間中細胞單元的數目、每個細胞單元中像素點的數目、每個細胞的直方圖通道數目。作者通過實驗表明，行人檢測的最佳參數設置是：3×3細胞/區間、6×6像素/細胞、9個直方圖通道。作者還發現，在對直方圖做處理之前，給每個區間（block）加一個高斯空域窗口（Gaussian spatial window）是非常必要的，因為這樣可以降低邊緣的周圍像素點（pixels around the edge）的權重。 Trade off between need for local spatial invariance and need for finer spatial resolution

Implementation(4/7) Normalization and descriptor blocks
C-HOG : 4 parameter # of angular bins # of radial bins The radius of the center bin The expansion factor for the radius of additional radial bins Optimal: 4,2,4,2, Gaussian spatial weight is not need Block Normalization schemes L2-norm : L2-Hys : L2-norm ,clip (limit v<=0.2) and renormalize L1-norm : L1-sqrt : C- HOG區間（blocks）有兩種不同的形式，它們的區別在於：一個的中心細胞是完整的，一個的中心細胞是被分割的。作者發現C-HOG的這兩種形式都能取得相同的效果。C-HOG區間（blocks）可以用四個參數來表徵：角度盒子的個數（number of angular bins）、半徑盒子個數（number of radial bins）、中心盒子的半徑（radius of the center bin）、半徑的伸展因子（expansion factor for the radius）。通過實驗，對於行人檢測，最佳的參數設置為：4個角度盒子、2個半徑盒子、中心盒子半徑為4個像素、伸展因子為2。前面提到過，對於R- HOG，中間加一個高斯空域窗口是非常有必要的，但對於C-HOG，這顯得沒有必要。C-HOG看起來很像基於形狀上下文（Shape Contexts）的方法，但不同之處是：C-HOG的區間中包含的細胞單元有多個方向通道（orientation channels），而基於形狀上下文的方法僅僅只用到了一個單一的邊緣存在數（edge presence count）。作者採用了四中不同的方法對區間進行歸一化，並對結果進行了比較。引入v表示一個還沒有被歸一化的向量，它包含了給定區間（block）的所有直方圖信息。| | vk | |表示v的k階範數，這裡的k取1、2。用e表示一個很小的常數。這時，歸一化因子可以表示如下：還有第四種歸一化方式：L2-Hys，它可以通過先進行L2-norm，對結果進行截短（clipping），然後再重新歸一化得到。作者發現：採用L2- Hys L2-norm 和L1-sqrt方式所取得的效果是一樣的，L1-norm稍微表現出一點點不可靠性。但是對於沒有被歸一化的數據來說，這四種方法都表現出來顯著的改進。

Implementation(5/7)

Implementation(6/7) R/C-HOG give near perfect separation on MIT database Have 1-2 order lower false positives than other descriptors

Implementation(7/7) Feed the descriptors into some recognition system :SVM classifier 最後一步就是把提取的HOG特徵輸入到SVM分類器中，尋找一個最優超平面作為決策函數。作者採用的方法是：使用免費的SVMLight軟件包加上HOG分類器來尋找測試圖像中的行人。 Close examination of g. 6(b,f) shows that the most important cells are the ones that typically contain major human contours (especially the head and shoulders and the feet) common in our training set . the detector cues mainly on the contrast of silhouette contours against the background, not on internal edges or on silhouette contours against the foreground. Similarly, g. 6(c,g) illustrate that gradients inside the person (especially vertical ones) typically count as negative cues, presumably because this suppresses false positives in which long vertical lines trigger vertical head and leg cells Overlapping blocks just outside the contour are most important The Support Vector Machine classifier is a binary classifier which looks for an optimal hyperplane as a decision function.

Summary Histograms of edge orientations edge 8*8 cell size
[-1, 0, 1] gradient filter with no smoothing 8*16 cells 9 unsighted bins=> 9 dimension vector Gaussian spatial window with = 8 在他們的實驗中，供訓練的正例與反例影像大小為64 ´128，首先對訓練影像每一個像素進行邊緣偵測(Edge Detection)，可得到每一個像素的邊緣方向與邊緣強度，接下來，再將訓練影像分成大小為8´8互不重疊的cell，如此可得到8´16個cell，如圖四所示；由於邊緣方向相差180度可視為同一方向，因此將每個cell依邊緣方向在0~180度分成九個方向bin，該cell內所有像素分別對其所屬的方向bin做投票統計，所投的票數為該像素的邊緣強度，這九個方向bin的資訊可用九維的向量來代表，如圖五所示；將四個相鄰cell視為一個block，不同block間可相互重疊，block用其內4個cell的方向bin來描述訓練影像在該位置的局部邊緣資訊，可以36維向量來代表，該36維向量經正規化(Normalize)使向量長度為1，如圖六所示，將所有7x15個block的向量(36維)組合起來可得到3780維的向量，該向量包含了行人整體與侷部的資訊；利用HOG方法對所有行人訓練影像與非行人訓練影像做特徵抽取後，可得到許多正例（行人）與反例（非行人）的資料(均為3780維的向量)，至於在偵測器的學習上，採用LSVM(Linear Support Vector Machine)的分類器，在資料維度(3780維)上學習一將正例與反例分最開的超平面(HyperPlane) R-HOG, 2*2 block size => 36 dimension vector 7*15 blocks => descriptor: 3780 dimension vector L2-Hys overlap=1/2

Conclusion We show experimentally that dense grids of Histograms of Oriented Gradient (HOG) descriptors significantly outperform existing feature sets for human detection. We study the influence of each stage of the computation on performance.

Histograms of Oriented Gradients for Human Detection

Similar presentations

Presentation on theme: "Histograms of Oriented Gradients for Human Detection"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Histograms of Oriented Gradients for Human Detection

Similar presentations

Presentation on theme: "Histograms of Oriented Gradients for Human Detection"— Presentation transcript:

Similar presentations

About project

Feedback