Object Detection Creation from Scratch Samsung R&D Institute Ukraine

Object Detection Creation from Scratch Samsung R&D Institute Ukraine
Vitaliy Bulygin

Problem formulation and dataset
Udacity dataset near images: 21000 – train – test Do not use bounding box with 𝑆<0.5% Problem: find bounding boxes for cars

Naive solution: sliding window
convolution layer Rectangles with different aspect ration and sizes max pooling layer fully connected layer Binary classifier Yes (0,1) is car? No (1,0)

Naive solution: sliding window
Very slow! Rectangles with different aspect ration and sizes Binary classifier Yes (0,1) is car? No (1,0)

Several words about two-stage detectors
Proposals Two-stage detectors First stage generates proposals Second stage is classifier Two-stage detectors is slower but accurate than the single-stage However difference in accuracy becomes smaller in 2018

Naive solution: location as output
𝑥 𝑐 1 , 𝑦 𝑐 1 , 𝑤 1 , ℎ 1 𝑥 𝑐 2 , 𝑦 𝑐 2 , 𝑤 2 , ℎ 2 NN output size is 4⋅𝑁, 𝑁 is bounding box number

Naive solution: location as output
We do not know the object number! 𝑥 𝑐 1 , 𝑦 𝑐 1 , 𝑤 1 , ℎ 1 𝑥 𝑐 2 , 𝑦 𝑐 2 , 𝑤 2 , ℎ 2 NN output size is 4⋅𝑁, 𝑁 is bounding box number

Output in the view of the Grid
Predict rectangle and class inside cell Grid 𝑁 𝑥 × 𝑁 𝑦 Ground Truth (GT) 𝑌= 𝑦 𝑖,𝑗 𝑖,𝑗=1 𝑁 𝑥 , 𝑁 𝑦 𝑝=1 −is object 𝑝=0 −is not object 𝑦 𝑖,𝑗 = 𝑝 𝑥 𝑐 𝑦 𝑐 𝑤 ℎ rectangle center coordinates rectangle width and height

Output in the view of the Grid( calculate it!)
𝑥,𝑦,𝑤,ℎ in relative to cell coordinates 𝑥,𝑦∈[0,1] 𝑤,ℎ could be >1 if 𝑝=0⇒ set 𝑥=𝑦=𝑤=ℎ=0 𝑦 𝑖,𝑗 = 0, 0, 0, 0, 0 𝑡 , 𝑖,𝑗≠ 1,0 ,(1,1) 𝑦 1,0 =(1, 0.6, 0.6, 0.5, 0.4) 𝑦 1,1 =(1, 0.6, 0.6, 0.5, 0.4) GitHub: data_generator.py -> convert_GT_to_YOLO(...)

... Output in the view of the Grid (papers)
Predict rectangle and class inside cell Recent papers with the similar output: RFB Net : Songtao Liu et al, 2018 RefineDet : Shifeng Zhang et al, 2018 YOLOv3: Joseph Redmonet al , 2018 Pelee Net: Robert J. Wang et al, 2018 FSSD: Zuo-Xin Li et al, 2018 DSOD: Zhiqiang Shen et al, 2018 ...

Output in the view of the Grid (general case)
𝐶 −class number 𝑁 −box in cell number Feature Extractor predict several boxes at the single case with aspect ration 1:1, 2:1, 1:2, 3:1, ...

𝐶 −class number 𝑁 1 −box in cell number class prediction box prediction … … predict several boxes at the single case with aspect ration 1:1, 2:1, 1:2, 3:1, ... 𝑁 1 ⋅𝐶 𝑁 1 ⋅4 small objects prediction

𝐶 −class number 𝑁 2 −box in cell number for middle size objects class prediction box prediction grid size is smaller … … 𝑁 2 ⋅𝐶 𝑁 2 ⋅4 middle size objects prediction

𝐶 −class number 𝑁 3 −box in cell number for middle size objects box prediction class prediction … … 𝑁 2 ⋅𝐶 𝑁 2 ⋅4 large objects prediction

Single stage object detector components
𝑰. Preprocessing image normalization augmentation GT encoding batch generator We have image dataset and GT rectangles What do we need to transform the data model input? data_preprocessing.py data_generator.py

𝑰. Preprocessing image normalization augmentation GT encoding batch generator 𝑰𝑰. Feature extractor model.py

𝑰. Preprocessing image normalization augmentation GT encoding batch generator 𝑰𝑰. Feature extractor model.py 𝑰𝑰𝑰. Model head (output) ... 𝑏𝑜 𝑥 1 𝑏𝑜 𝑥 2

𝑰. Preprocessing image normalization augmentation GT encoding batch generator 𝑳= 𝟏 𝑵 𝒐𝒃𝒋 𝒊=𝟏 𝑾⋅𝑯 𝜹 𝒊 𝒐𝒃𝒋 ⋅(… 𝑰𝑽) Loss function train.ipynb 𝑰𝑰. Feature extractor 𝑰𝑰𝑰. Model head (output) ... 𝑏𝑜 𝑥 1 𝑏𝑜 𝑥 2

𝑰. Preprocessing image normalization augmentation GT encoding batch generator 𝑳= 𝟏 𝑵 𝒐𝒃𝒋 𝒊=𝟏 𝑾⋅𝑯 𝜹 𝒊 𝒐𝒃𝒋 ⋅(… 𝑰𝑽) Loss function 𝑽) Postprocessing: filtering + NMS 𝑰𝑰. Feature extractor data_postprocessing.py 𝑰𝑰𝑰. Model head (output) ... 𝑏𝑜 𝑥 1 𝑏𝑜 𝑥 2

𝑰. Preprocessing image normalization augmentation GT encoding batch generator 𝑳= 𝟏 𝑵 𝒐𝒃𝒋 𝒊=𝟏 𝑾⋅𝑯 𝜹 𝒊 𝒐𝒃𝒋 ⋅(… 𝑰𝑽) Loss function 𝑽) Postprocessing: filtering + NMS 𝑰𝑰. Feature extractor 𝑽𝑰) Accuracy evaluation 1 𝑰𝑰𝑰. Model head (output) precision ... 𝑏𝑜 𝑥 1 𝑏𝑜 𝑥 2 evaluator.py 1 recall

𝑰. Preprocessing : data augmentation
horizontal flip vertical flip zoom in-out width and height shift rotation at some range shear image brightness shift channel shift hue changing saturation changing contrast changing gamma correction histogram equalization

𝑰. Preprocessing : data augmentation
data_preprocessing.py gives more than 10% accuracy (mAP) only horizontal flip, width and height shift original augmented

𝑰. Preprocessing : data normalization
data_preprocessing.py augmented normalized normalization could include : 0,255 →(0,1) or (−1,1) mean subtraction deviation division rect coor→ 0,1 ⇒ independent from scale

... ... 𝑰. Preprocessing : data generator data_preprocessing.py
__getitem__() ... images GT labels ...

... ... ... 𝑰. Preprocessing : data generator data_preprocessing.py
__getitem__() generate batch of 𝑿 , 𝒀 ... 𝑿 𝟏 𝑿 𝒏 images augment, normalize ... 𝒀 𝟏 𝒀 𝒏 labels ... 𝒑 𝒙 𝒄 Grid output 𝒚 𝒄 𝒘 𝒉

𝑰𝑰. Feature extractor model.py It is not an optimal feature extractor!
𝐨𝐧𝐥𝐲 𝟑×𝟑 filters 𝟑𝟎𝟎×𝟑𝟎𝟎×𝟏𝟔 𝟏𝟓𝟎×𝟏𝟓𝟎×𝟐𝟒 𝟑𝟎𝟎×𝟑𝟎𝟎×𝟑 𝟕𝟓×𝟕𝟓×𝟑𝟐 𝟑𝟕×𝟑𝟕×𝟒𝟖 𝟏𝟖×𝟏𝟖×𝟔𝟒 𝟗×𝟗×𝟔𝟒 𝐜𝐨𝐧𝐯𝐨𝐥𝐮𝐭𝐢𝐨𝐧 +ReLu 𝐦𝐚𝐱 𝐩𝐨𝐨𝐥𝐢𝐧𝐠

𝑰𝑰. Feature extractor model.py It is not an optimal feature extractor!
𝐨𝐧𝐥𝐲 𝟑×𝟑 filters 𝟑𝟎𝟎×𝟑𝟎𝟎×𝟏𝟔 𝟏𝟓𝟎×𝟏𝟓𝟎×𝟐𝟒 𝟑𝟎𝟎×𝟑𝟎𝟎×𝟑 𝟕𝟓×𝟕𝟓×𝟑𝟐 𝟑𝟕×𝟑𝟕×𝟒𝟖 𝟏𝟖×𝟏𝟖×𝟔𝟒 𝟗×𝟗×𝟔𝟒 Encoded bounding boxes Why such architecture? Why 9x9 ?

𝟐×𝟐 max pooling with stride = 𝟐
𝑰𝑰. Feature extractor : effective receptive field Effective receptive field is the area of the original image that can possibly influence the activation of a neuron. … 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 conv … 𝟎 𝟎 𝟎 … 𝟎 𝟎 𝟎 𝟎 𝑟 1 𝑐𝑜𝑛𝑣 =3 𝟑×𝟑 convolution 𝟐×𝟐 max pooling with stride = 𝟐

𝑰𝑰. Feature extractor : effective receptive field Effective receptive field is the area of the original image that can possibly influence the activation of a neuron. … 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 conv pool … 𝟎 𝟎 𝟎 𝟎 … 𝟎 𝟎 𝟎 𝑟 1 𝑐𝑜𝑛𝑣 =3 𝑟 1 𝑝𝑜𝑜𝑙 =4 𝟑×𝟑 convolution 𝟐×𝟐 max pooling with stride = 𝟐

𝑰𝑰. Feature extractor : effective receptive field Effective receptive field is the area of the input image that chosen feature looking on. … 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 𝟎 conv pool conv … 𝟎 𝟎 𝟎 𝟎 … 𝟎 𝟎 𝟎 𝑟 1 𝑐𝑜𝑛𝑣 =3 𝑟 1 𝑝𝑜𝑜𝑙 =4 𝑟 2 𝑝𝑜𝑜𝑙 =8 𝟑×𝟑 convolution 𝟐×𝟐 max pooling with stride = 𝟐

𝑰𝑰. Feature extractor : effective receptive field
𝑟 3 𝑐𝑜𝑛𝑣 =18 𝑟 2 𝑐𝑜𝑛𝑣 =8 𝑟 1 𝑐𝑜𝑛𝑣 =3 31

𝑟 4 𝑐𝑜𝑛𝑣 =38 𝑟 5 𝑐𝑜𝑛𝑣 =78 𝑟 3 𝑐𝑜𝑛𝑣 =18 𝑟 2 𝑐𝑜𝑛𝑣 =8 𝑟 1 𝑐𝑜𝑛𝑣 =3 32

𝑟 4 𝑐𝑜𝑛𝑣 =38 𝑟 6 𝑐𝑜𝑛𝑣 =158 𝑟 5 𝑐𝑜𝑛𝑣 =78 𝑟 3 𝑐𝑜𝑛𝑣 =18 𝑟 2 𝑐𝑜𝑛𝑣 =8 𝑟 1 𝑐𝑜𝑛𝑣 =3 𝟗×𝟗 33

𝑟 4 𝑐𝑜𝑛𝑣 =38 𝑟 6 𝑐𝑜𝑛𝑣 =158 𝑟 5 𝑐𝑜𝑛𝑣 =78 𝑟 3 𝑐𝑜𝑛𝑣 =18 2⋅32=64 𝑟 2 𝑐𝑜𝑛𝑣 =8 𝑟 1 𝑐𝑜𝑛𝑣 =3 𝟗×𝟗 34

Receptive field has to contain the object with a margin 𝟑𝐫𝐝 𝐜𝐨𝐧𝐯 𝟑𝟕×𝟑𝟕×𝟒𝟖 𝟏𝟖×𝟏𝟖 Receptive field Cannot recognize car position 2⋅32=64 𝟗×𝟗 35

Very large receptive field Hard to localize small objects 2⋅32=64 𝟗×𝟗 36

𝑰𝑰𝑰. Model head model.py Feature extractor Head Output 𝟑×𝟑 𝒄𝒐𝒏𝒗 37

𝑰𝑰𝑰. Model head … model.py Feature extractor Head Output Possible
improvement … For smaller objects 38

𝑰𝑽. Loss function model.ipynb, function YOLO_loss(y_true, y_pred) 𝑮𝑻
𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏 we do not care about 𝒙,𝒚,𝒘,𝒉 of ‘no object’ cell. Compare only 𝒑 𝟎 𝒉 𝒘 𝟎 objects 𝒚 𝟎 𝒙 𝟎 ‘object’ cell 𝒑 𝟎 ‘no object’ cell 𝑳= 𝟏 𝑵 𝒐𝒃𝒋 ⋅ 𝒚 𝑮𝑻 𝒊𝒔 𝒐𝒃𝒋 𝒚 𝑮𝑻 − 𝒚 𝒑𝒓𝒆𝒅 + 𝟏 𝑵 𝒏𝒐_𝒐𝒃𝒋 ⋅ 𝒚 𝑮𝑻 𝒊𝒔 𝒏𝒐𝒕 𝒐𝒃𝒋 𝒚 𝟎 𝑮𝑻 − 𝒚 𝟎 𝒑𝒓𝒆𝒅 39

𝑰𝑽. Loss function : possible improvements
1. Hard negative mining : take into account only 3⋅𝑛 negatives, 𝑛 - positives 2. Use binary classification + cross-entropy loss function 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒊𝒆𝒔: for 𝒄 classes + 𝟏 background 𝒄𝒐𝒐𝒓𝒅𝒊𝒏𝒂𝒕𝒆𝒔: 𝒉 𝒘 𝒚 𝒑 𝒄𝒂𝒓 𝒙 𝒑 𝒏𝒐_𝒄𝒂𝒓 40

𝑰𝑽. Loss function : possible improvements
1. Hard negative mining : take into account only 3⋅𝑛 negatives, 𝑛 - positives 𝑊×𝐻×𝑀×5 𝑎𝑛𝑐ℎ𝑜𝑟 1 2. Use binary classification + cross-entropy loss function 2 3. Use 𝑴 boxes on each cells with different aspect ratio 1 𝒑𝒓𝒐𝒃𝒂𝒃𝒊𝒍𝒊𝒕𝒊𝒆𝒔: for 𝒄 classes + 𝟏 background 𝒄𝒐𝒐𝒓𝒅𝒊𝒏𝒂𝒕𝒆𝒔: 𝑎𝑛𝑐ℎ𝑜𝑟 2 𝑴 times 𝒉 𝑴 times 1 𝒘 𝒚 2 𝒑 𝒄𝒂𝒓 𝒙 𝑝 1 𝑥 1 𝑦 1 𝑤 1 ℎ 1 𝑐 1 𝑐 2 𝑝 2 𝑥 2 𝑦 2 𝑤 2 ℎ 2 𝑐 1 𝑐 2 𝑏𝑜 𝑥 1 𝑏𝑜 𝑥 2 𝒑 𝒏𝒐_𝒄𝒂𝒓 41

𝑽. Postprocessing … 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏 𝑝 𝑥 𝑦 𝑤 ℎ confidence relative to cell
coordinates W×𝐻×5 W×𝐻 cells convert : relative to cell → relative to image 42

𝑽. Postprocessing … Filtering? 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒊𝒐𝒏 𝑝 𝑥 𝑦 𝑤 ℎ confidence
relative to cell coordinates W×𝐻×5 Filtering? W×𝐻 cells convert : relative to cell → relative to image 43

𝑽. Postprocessing: Filtering
Threshold 𝑻=𝟎.𝟓, for example 𝑁 – rectangle number with 𝑝>T 𝑝 𝑥 𝑦 𝑤 ℎ confidence relative to cell coordinates W×𝐻×5 𝑁 𝑁×4 W×𝐻 cells convert : relative to cell → relative to image scores rectangles 44

𝑽. Postprocessing: non-maximum suppression
𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 𝟎.𝟗 𝟎.𝟖𝟓 45

𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 𝟎.𝟗 𝟎.𝟖𝟓 Compare IOU of the 1st rectangle with others 46

𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 𝟎.𝟗 𝟎.𝟖𝟓 Compare IOU of the 1st rectangle with others 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅 𝐼𝑂𝑈= >0.5 47

𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 𝟎.𝟗 𝟎.𝟖𝟓 𝟎.𝟖𝟐 Compare IOU of the 1st rectangle with others 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅 𝐼𝑂𝑈= >0.5 48

𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 𝟎.𝟗 𝟎.𝟖𝟓 𝟎.𝟖𝟐 Compare IOU of the 1st rectangle with others 𝐼𝑂𝑈= = 0 𝒅𝒐 𝒏𝒐𝒕𝒉𝒊𝒏𝒈 49

𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 𝟎.𝟗 𝟎.𝟖𝟓 𝟎.𝟖𝟐 𝟎.𝟕𝟕 Compare IOU of the 1st rectangle with others 𝐼𝑂𝑈= = 0 IOU = 0 with the chosen rectangle 𝒅𝒐 𝒏𝒐𝒕𝒉𝒊𝒏𝒈! 50

𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 𝟎.𝟗 𝟎.𝟖𝟓 𝟎.𝟖𝟐 𝟎.𝟕𝟕 𝟎.𝟕𝟓 Compare IOU of the 1st rectangle with others 𝐼𝑂𝑈= < 0.5 𝒅𝒐 𝒏𝒐𝒕𝒉𝒊𝒏𝒈 51

𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 1 𝟎.𝟗 𝟎.𝟖𝟐 𝟎.𝟕𝟕 𝟎.𝟕𝟓 Compare IOU of the 2nd rectangle with others 𝑁 1 ≤𝑁 because we have thrown out rectangles 52

𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 1 𝟎.𝟗 𝟎.𝟖𝟐 𝟎.𝟕𝟕 𝟎.𝟕𝟓 Compare IOU of the 2nd rectangle with others 𝒕𝒉𝒓𝒆𝒔𝒉𝒐𝒍𝒅 𝐼𝑂𝑈= >0.5 53

𝑟𝑒𝑐 𝑡 1 𝑟𝑒𝑐 𝑡 2 𝑟𝑒𝑐 𝑡 3 … Sorted confidence 𝑝 𝑟𝑒𝑐 𝑡 𝑁 1 𝟎.𝟗 𝟎.𝟖𝟐 𝟎.𝟕𝟕 𝐼𝑂𝑈 is calculated 𝑁−1 ⋅ 𝑁 1 −2 ⋅ 𝑁 2 −2 ⋅… 54

Correspondence between
𝑽𝑰. Accuracy evaluation 𝒏 predicted 𝟏 𝟏 𝒎 Ground Truth 𝟐 𝐼𝑜𝑈= 𝐴⋂𝐵 𝐴⋃𝐵 𝐴 – ground truth 𝐵 – detector result 𝟐 𝟑 Correspondence between GT and predicted? 55

𝑽𝑰. Accuracy evaluation
Sorted array of 𝑰𝑶𝑼>𝑇 between GT and predicted (max length = 𝒏⋅𝒎) 𝟏 𝟏 ↑ ↓ 𝟐 (𝑖𝑜 𝑢 1 , 𝑖𝑜 𝑢 2 ,𝑖𝑜 𝑢 3 , …) GT 𝟐 𝟏 𝟏 pred 𝟑 𝟏 𝟐 𝟐 𝟑 evaluator.py -> sort_ious(gt_boxes, pred_boxes, iou_thr) 56

𝑽𝑰. Accuracy evaluation
Sorted array of 𝑰𝑶𝑼>𝑇 between GT and predicted (max length = 𝒏⋅𝒎) 𝟏 𝟏 ↑ ↓ 𝟐 (𝑖𝑜 𝑢 1 , 𝑖𝑜 𝑢 2 ,𝑖𝑜 𝑢 3 , …) GT 𝟐 𝟏 𝟏 pred 𝟑 𝟏 𝟐 𝟐 𝟑 if appear firstly matched GT 𝟐 𝟏 evaluator.py -> matched pred 𝟑 𝟏 get_single_image_results(gt_boxes, pred_boxes, iou_thr) 57

𝑽𝑰. Accuracy evaluation : true predicted
Sorted array of 𝑰𝑶𝑼>𝑇 between GT and predicted (max length = 𝒏⋅𝒎) 𝟏 𝟏 𝟐 True predicted 𝟐 𝟑 matched GT 𝟐 𝟏 matched pred 𝟑 𝟏 evaluator.py -> get_single_image_results(gt_boxes, pred_boxes, iou_thr) 58

𝑽𝑰. Accuracy evaluation : precision, recall
𝑇=0.5 getTruePredicted 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏−? 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏−? 𝒓𝒆𝒄𝒂𝒍𝒍-? 𝒓𝒆𝒄𝒂𝒍𝒍-? 𝒓𝒆𝒄𝒂𝒍𝒍= 𝒕𝒓𝒖𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝒈𝒓𝒐𝒖𝒏𝒅 𝒕𝒓𝒖𝒕𝒉 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏= 𝒕𝒓𝒖𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 What part of true predicted from all GT objects What part of predicted is true

𝑇=0.5 getTruePredicted predicted = 1 ground truth = 2 true predicted = 1 predicted = 4 ground truth = 3 true predicted = 1 𝒓𝒆𝒄𝒂𝒍𝒍= 𝒕𝒓𝒖𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝒈𝒓𝒐𝒖𝒏𝒅 𝒕𝒓𝒖𝒕𝒉 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏= 𝒕𝒓𝒖𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 What part of true predicted from all GT objects What part of predicted is true

𝑇=0.5 getTruePredicted predicted = 1 ground truth = 2 true predicted = 1 predicted = 4 ground truth = 3 true predicted = 1 𝒓𝒆𝒄𝒂𝒍𝒍= 𝒕𝒓𝒖𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝒈𝒓𝒐𝒖𝒏𝒅 𝒕𝒓𝒖𝒕𝒉 𝒑𝒓𝒆𝒄𝒊𝒔𝒊𝒐𝒏= 𝒕𝒓𝒖𝒆 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 𝒑𝒓𝒆𝒅𝒊𝒄𝒕𝒆𝒅 evaluator.py -> calc_precision_recall(img_results)

𝑽𝑰. Accuracy evaluation : confidence ↔precision, recall
↓ ↑ sorted box confidence ( 𝑝 1 , 𝑝 2 ,… 𝑝 𝑖 ,…, 𝑝 𝑁−1, 𝑝 𝑁 ) 𝟎.𝟑 𝟎.𝟐 𝟎.𝟏 𝟎.𝟒 𝟎.𝟒 Each predicted box has confidence 𝒑

↓ ↑ sorted box confidence ( 𝑝 1 , 𝑝 2 ,… 𝑝 𝑖 ,…, 𝑝 𝑁−1, 𝑝 𝑁 ) 𝐼. Get boxes with 𝑝≥ 𝑝 1 , i.e. all boxes calcPrecisionRecall score prec recall 𝑝 1 𝟎.𝟏 𝟎.𝟗 high recall low precision For all images together!

↓ ↑ sorted box confidence ( 𝑝 1 , 𝑝 2 ,… 𝑝 𝑖 ,…, 𝑝 𝑁−1, 𝑝 𝑁 ) 𝐼. Get boxes with 𝑝≥ 𝑝 1 , i.e. all boxes calcPrecisionRecall score prec recall 𝑝 1 𝟎.𝟏 𝟎.𝟗 𝑝 2 𝟎.𝟐 𝟎.𝟖 … For all images together!

↓ ↑ sorted box confidence ( 𝑝 1 , 𝑝 2 ,… 𝑝 𝑖 ,…, 𝑝 𝑁−1, 𝑝 𝑁 ) 𝐼. Get boxes with 𝑝≥ 𝑝 1 , i.e. all boxes calcPrecisionRecall score prec recall 𝑝 1 𝟎.𝟏 𝟎.𝟗 𝑝 2 𝟎.𝟐 𝟎.𝟖 … 𝑝 𝑁 For all images together! High precision low recall

↓ ↑ sorted box confidence ( 𝑝 1 , 𝑝 2 ,… 𝑝 𝑖 ,…, 𝑝 𝑁−1, 𝑝 𝑁 ) 𝐼. Get boxes with 𝑝≥ 𝑝 1 , i.e. all boxes calcPrecisionRecall score prec recall 𝑝 1 𝟎.𝟏 𝟎.𝟗 𝑝 2 𝟎.𝟐 𝟎.𝟖 … 𝑝 𝑁 High precision get_thr_prec_rec(…) low recall

𝑽𝑰. Accuracy evaluation : average precision calculation
𝑇 𝐼𝑂𝑈 =0.5 hundreds of values for real data sets 1 score prec recall 𝑝 1 𝟎.𝟏 𝟎.𝟗 𝑝 2 𝟎.𝟐 𝟎.𝟖 … 𝑝 𝑁 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 0.3 1 𝑟𝑒𝑐𝑎𝑙𝑙

𝑇 𝐼𝑂𝑈 =0.5 hundreds of values for real data sets 1 score prec recall 𝑝 1 𝟎.𝟏 𝟎.𝟗 𝑝 2 𝟎.𝟐 𝟎.𝟖 … 𝑝 𝑁 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 0.3 1 𝑟𝑒𝑐𝑎𝑙𝑙 𝐴 𝑃 𝑇 = 𝑟∈{0,0.1,…,1} max 𝑟 : 𝑟 ≥𝑟 𝑝( 𝑟 ) 11 recall thresholds

𝑇 𝐼𝑂𝑈 =0.5 hundreds of values for real data sets 1 score prec recall 𝑝 1 𝟎.𝟏 𝟎.𝟗 𝑝 2 𝟎.𝟐 𝟎.𝟖 … 𝑝 𝑁 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 0.3 1 𝑟𝑒𝑐𝑎𝑙𝑙 max 𝑟 : 𝑟 ≥0.3 𝑝( 𝑟 ) A lot of mAP tutorials has misunderstanding : “area under curve” (AUC) is not the same! 𝐴 𝑃 𝑇 = 𝑟∈{0,0.1,…,1} max 𝑟 : 𝑟 ≥𝑟 𝑝( 𝑟 ) 11 recall thresholds

𝑇 𝐼𝑂𝑈 =0.5 hundreds of values for real data sets 1 score prec recall 𝑝 1 𝟎.𝟏 𝟎.𝟗 𝑝 2 𝟎.𝟐 𝟎.𝟖 … 𝑝 𝑁 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 0.3 1 𝑟𝑒𝑐𝑎𝑙𝑙 𝐴 𝑃 𝑇 = 𝑟∈{0,0.1,…,1} max 𝑟 : 𝑟 ≥𝑟 𝑝( 𝑟 ) 𝑃𝑎𝑠𝑐𝑎𝑙 𝑉𝑂𝐶 𝑚𝑒𝑡𝑟𝑖𝑐 𝑖𝑠 𝐴 𝑃 0.5

𝑇 𝐼𝑂𝑈 =0.5 hundreds of values for real data sets 1 score prec recall 𝑝 1 𝟎.𝟏 𝟎.𝟗 𝑝 2 𝟎.𝟐 𝟎.𝟖 … 𝑝 𝑁 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 0.3 1 𝑟𝑒𝑐𝑎𝑙𝑙 𝐴𝑃= 1 10 ⋅ 𝑇∈{0.5,0.55,…,0.95} 𝐴 𝑃 𝑇 𝑀𝑆 𝐶𝑂𝐶𝑂 ℎ𝑎𝑠 𝑚𝑒𝑡𝑟𝑖𝑐𝑠 𝐴 𝑃 0.5 , 𝐴 𝑃 0.75 ,𝐴𝑃

𝐴𝑃= 1 10 ⋅ 𝑇∈{0.5,0.55,…,0.95} 𝐴 𝑃 𝑇 MS COCO uses 10 𝐼𝑂𝑈 thresholds 0.5, 0.55, …0.95 1 𝑇 𝐼𝑂𝑈 =0.5 𝑇 𝐼𝑂𝑈 =0.6 𝑇 𝐼𝑂𝑈 =0.7 𝑝𝑟𝑒𝑐𝑖𝑠𝑖𝑜𝑛 𝑇 𝐼𝑂𝑈 =0.8 𝑇 𝐼𝑂𝑈 =0.9 0.3 1 𝑟𝑒𝑐𝑎𝑙𝑙

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

Similar presentations

Presentation on theme: "Object Detection Creation from Scratch Samsung R&D Institute Ukraine"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Object Detection Creation from Scratch Samsung R&D Institute Ukraine

Similar presentations

Presentation on theme: "Object Detection Creation from Scratch Samsung R&D Institute Ukraine"— Presentation transcript:

Similar presentations

About project

Feedback