Recognition of Traffic Lights in Live Video Streams on Mobile Devices Jan Roters Xiaoyi Jiang Kai Rothaus 2011 IEEE Transactions on CSVT
Outline Introduction Problems System Architecture Experiment Results Identification Classification Video Analysis Time-Based Verification Experiment Results Evaluations Conclusion
Introduction People with visual disabilities are limited in mobility. Orientate pedestrians with zebra crossings at intersections Portable PC with a digital camera and a pair of auricular stereo Present a system for mobile devices to help sightless people cross roads. 2. 導盲犬不易訓練且非常地昂貴,所以很多人嘗試著用其它的方法來引導視障同胞行走。 早期有利用camera 偵測班馬線的方向來告訴使用者在十字路口時該往哪邊走 或者是把 portable PC 放在背包裡,然後用數位相機拍路上的情形之後做分析,然後再用一對耳機告訴使用者目前是紅燈還是綠燈 由於目前的手機普及率非常地高,而且配備的相機跟手機運算能力也愈來愈好,所以這篇paper的作者們想要直接在上面開發類似的軟體來服務視障同胞 紅綠燈又很少有聲音或很難用觸覺方面的刺激去告訴視障人目前的燈號 尤其現在的手機上配備的相機功能都有一定的水準之上,而且手機的運算能力愈來愈高,因此很適合
Problems Program usage Real world conditions Camera resolution Different appearances 要實作這個系統是一件困難的事情,因為 視障同胞不知道紅綠燈在哪裡,怎麼拿手機的鏡頭去對?? 真實的世界是很複雜的, 每支手機的相機畫素不同 每個國家的紅綠燈長得不一樣
Problems The scale of traffic lights Many traffic lights Occluded Illumination Rotation 每條馬路的寬度不同,如果馬路很寬的話,對面的紅綠燈會變得很小,看不清楚 車子經過時,對面的紅綠燈會被擋到 光線 角度
Pedestrian Lights in Germany Installation Shape Color arrangement Circuitry Background 為了簡化問題,作者限定這個系統只能偵測德國的紅綠燈 所以我們來看一下德國的紅綠燈長什麼樣子: 給行人看的紅綠燈是直立的豎在對面的馬路上 紅綠燈的形狀是長方形,有三個不同的樣子,分別是兩個燈、三個燈跟四個燈 德國的紅綠燈沒有所謂的黃燈,只有不能走的紅燈,跟可以走的綠燈,而且紅燈的位置都左綠燈的上面 紅燈跟綠燈不會同時存在,除非它壞了= = 都是黑黑的,跟我們一樣
Mobile Device & Databases Nokia N95 330MHZ ARM processor 18Mb RAM 320×240 2 publicly available database Ground truth segmentation was made manually N95在視障界非常地受歡迎,因為N95上有許多專為視障同胞設計的軟體, 像是screen reader, mobile reading或是shopping assistants
System Architecture 1. 2. 4. 3. 上面:針對單張影像來做,先用localization找出所有可能含有traffic light的區塊,然後用classification去確認真正的紅綠燈在哪 下面:video analysis 會參考前一張frame的資訊,利用motion estimation去找出理論上紅綠燈的位置 最後:利用time-based verification 比較兩種方法找出來的紅綠燈,決定是否有正確地detect到
1. Localization 為了要減少classification要比對的candidate數量,Localization會利用很多個filter來去掉不合適的region。
Red and Green Color Filter(1/3) Analyze the data 先用人圈出ground truth,再去分析圈出來的紅燈/綠燈中顏色的分佈情況 紅燈:包含了三個方向 (1) 灰色 (2) 黑色 (3) 紅色 用Gaussian mixture model 去代表三個方向 因為只有紅色的部份是真正的燈號,所以保留這個distribution
Red and Green Color Filter(2/3) Design the filter rules (ex : red traffic light) The Gaussian distribution of the red cluster is defined by its mean color 𝜇 = (0.48,0.06,0.07) and has three eigenvectors 𝑣 1 , 𝑣 2 𝑎𝑛𝑑 𝑣 3 A color c = (r, g, b) is a red traffic light color when 𝑡ℎ 𝑟𝑒𝑑,1 =0.20 𝑡ℎ 𝑟𝑒𝑑,2 =0.25 𝑡ℎ 𝑟𝑒𝑑,3 =0.07 接下來算出紅色這個cluster的mean,利用下面這三個條件來做 filter V1 v2 v3 分別是 r, g, b 三個方向的 eigenvector 紅色的intensity要大於一個threshold 然後 v2 要被limit在對角線的灰色上 V3 要被壓在low intensity的範圍中,就是黑色的部份
Red and Green Color Filter(3/3) Optimize parameters 10 4 different parameter settings for each color Use 300 images to train Measure the quality of each setting by TP, FP, FN Recall = 𝑇𝑃 (𝑇𝑃 +𝐹𝑁) , Precision = 𝑇𝑃 (𝑇𝑃+𝐹𝑃) 紅燈跟綠燈兩種情況共 8 個 parameter (各 3 個threshold跟 1 個mean u)要train 用300張image下去跑,計算每種setting的 recall和 precision 在紅燈的情況下,找不到燈號比找錯來得危險,如果找錯的話,大不了這個綠燈不過,等下一 個綠燈 但如果找不到的話,有可能就傻傻地走過去了>< ,所以recal 在 >=75%的情況下,找到最好的precision (recall=76%, precision=89.5%) 而在綠燈的情況下,把燈號誤判的狀況比找不出燈號要來得嚴重(也就是FP比FN更讓人無法接受), 把紅燈判成綠燈是不可原諒的= = ,所以我們要求precision >= 98.5% 的情況下,找到最好的recall (precision=98.5,recall=85.0%)
Size/Circuitry Filter Assume the traffic light is 4 to 24 meters away Fixed camera focal length and possible aspect ratios Filter out regions that are too small or too large Vertical neighbor should not have different color 可以推算出紅綠燈的寬度大概是2.5~15個pixel, 同理可得到一個高度的range,然後我們可以利用這兩個資訊來去掉面積差太多的區塊
Background Color Filter Inspect the region under a red light candidate or above a green light candidate If there are no dark pixels within search region, refuse this candidate Search region Search region
Validation of Localization Validate the localization results with 201 images Optimal Validation recall precision Red 76% 89.5% 71.8% 87% green 85% 98.5% 83.3% 92.6% Error很高,why? Error = 33.7%
2. Classification TLC is the broadest TLC has the smallest distance to the top of image No other traffic light has similar height with TLC 前面的identification找出了很多可能含有紅綠燈的region Classification就是要在這些 candidate 中找出真正含有紅綠燈的region / 真正的紅綠燈 Classification的filter主要用這三個條件來找出真正的紅綠燈
Performance of Classification Red Green Recall 86.3% Precision 97.4% 98.1%
3. Video Analysis(1/2) Temporary Occlusion Falsified Colors Contradictory Scene Repeating Results 為什麼要多加入video analysis? 有時候被車擋住了,但是如果用video的話,車子一下子就會離開,所以在下面幾個frame還可以偵測到紅綠燈 有時候camera自動調整會讓畫面的顏色或亮度怪怪的,用video的話,只要動一動camera,讓它有再次調整的機會 有時候會出現多個紅綠燈,調整一下角度,可能會更清楚 正確的result通常會重覆出現一段時間,更可以確保正確性
3. Video Analysis(2/2) Find the motion vector between two frames Use KLT tracker to track feature points Only search in a small area around crucial traffic light candidate (30 pixels in each direction) Correlate the features by using SAD Crucial traffic light Candidate region Feature point 𝑡 𝑖−1 𝑡 𝑖 Search region
4. Time-Based Verification Reduce the false positive detections by comparing 2 kinds of results Use state queue with 4 scenarios Identification and video analysis are both successful and the locations match with each other. Identification and video analysis are successful but the locations are different. Video analysis succeeds but identification fails. Video analysis fails but identification succeeds.
Experiment Results 𝑆𝑄 𝑠𝑖𝑧𝑒 =10 and 𝑆𝑄 𝑚𝑖𝑛 =5 Compute at least 5 frames per second At least 𝑆𝑄 𝑚𝑖𝑛 consecutive correct detection with the same color Switch 狀態會在<= Sqmin的情形下完成
Experiment Results Fig14. state queue的多次確定,可以提高準確度,prevent FP
Evaluations Reliability Prevent false positive green light detection
Evaluations Interactivity Temporal analysis reduce the interactivity The feedback is normally given within 2 seconds 除了sequence01和sequence04比較奇怪之外,其它的sequence在兩個frame之間,最長都不超過38張frame 01: 等40~80秒都還找不到feedback,雖然拖很久,但至少沒有給使用者錯誤的燈號訊息,這是應該要compromise的
Conclusion The system can be helpful on driver assistance systems Limited computational power on mobile devices The verification ideas can be improved