R OLE OF O BJECT I DENTIFICATION IN S ONIFICATION S YSTEM FOR V ISUALLY I MPAIRED Presented By, Ranjan Bangalore Seetharama
A GENDA Introduction Hardware of NAVI System Object Identification Stereo Sound Generation
I NTRODUCTION The Navigation Assistance for Visually Impaired (NAVI) System includes a single board processing system (SBPS), vision sensor mounted on headgear and stereo earphones. The vision sensor captures the vision information in front of the blind user. The captured image is processed to identify the object in the image. Object identification is achieved by a real time image processing methodology using fuzzy algorithms.
F UZZY A LGORITHMS Traditional logic has only two possible outcomes, true or false. Fuzzy logic instead uses a graded scale with many intermediate values, like a number between 0.0 and 1.0. (Similar to what probability theory does.) A fuzzy algorithm would then use fuzzy logic to operate on inputs and give a result. Applications include control logic (controlling engine speed, for instance, where it can be handy to have some intermediate values between "full speed" and "full stop") and edge detection in images.
The processed image is mapped onto stereo acoustic patterns and transferred to the stereo earphones in the system. The vOICe is one of the patented image sonification system. Video camera is used as vision sensor. A dedicated hardware was constructed for image to sound conversion. The image captured is scanned in the left-right direction with sine wave as sound generator.The top portion of the image is transformed into high frequency tones and the bottom portion into low frequency tones. The brightness of the pixel is transcoded into loudness.
Background fills more area in the image frame than the objects, as the sound produced from the unprocessed image will contain more information of the background. It is also noted that most of the background is of light colors and the sound produced on it will be of high amplitude compared to the objects in the scene. Object identification is achieved using a clustering algorithm. The identified objects are enhanced. Importance is given to the objects in the environment than the background of the environment for sound production. This will enable the blind user to identify the obstacles easier.
HARDWARE OF NAVI SYSTEM Navigation Assistance for Visually Impaired (NAVI) The hardware model constructed for this vision substitution system has a headgear mounted with the vision sensor, stereo earphone and Single Board Processing System (SBPS) in a specially designed vest for this application. The SBPS is placed in a pouch provided at the backside of the vest.
Source: Fuzzy Learning Vector Quantization in Intelligent vision Recognition for Blind Navigation By R Nagarajan, Yaacob and Sainarayanan
O BJECT IDENTIFICATION Digital video camera mounted in the headgear captures the vision information of scene in front of the blind user and the image is processed in the SBPS in real time. The processed image is mapped to sound patterns. Since the processing is done in real time, the time factor has to be critically considered.
O BJECT IDENTIFICATION The proposed vision substitutive system, the nature of object to be identified is undefined, un certain and time varying. One of important features needed by the blind user in the image from the environment are the orientation and size of the object and obstacles. During sonification, the amplitude of sound generated from the image directly depends on the pixel intensity. In any gray image, pixel value of white color is of maximum of 255 and black is with minimum of zero.
As the image pixels of light color produces sound of higher amplitude than darker pixels. If the image is transferred to sound without any enhancement, it will be a complex task to understand the sound, which is the major problem faced in early works. The main objective of this work is to suppress' the background and to enhance the object; for this, the gray levels of the object and background have to be identified. Image used for processing is of 32x32 pixel size and of four gray levels namely black (BL), white (WH), dark gray (DG) and light gray (LG).
Feature extraction is the most critical part in image processing. The extracted features should represent the image with limited data. In this work each image will have four feature vector namely X BL = [X 1, X 2, X 3. X 4 ], X DG = [X 1. X 2, X 3, X 4 ], X LG = [ X 1, X 2, X 3, X 4 ], X WH = [X 1. X 2, X 3, X 4 ]
X 1 = Represents the number of respective gray pixel in the image, this is a histogram value of the particular pixel. X 2 = Represents the number of respective gray pixel in the central area of the image. Generally the object of interest will be in the center of human vision. X 3 = Represents the pixel distribution gradient. x3 is calculated by the sum of the gradient values assigned to the pixel location. X 4 = Represents the gray value of the pixel. Generally most of the background in the real world are of light colors than the objects.
Artificial Neural Network (ANN) is playing a major role in pattern classification. It has the ability to learn and is fault tolerant, which makes it as a powerful tool for pattern recognition. One form of ANN is LVQ network. The objective of the LVQ network is to identify the output node that is nearest to the input vector. The weights are updated by competitive learning. FLVG – F UZZY L EARNING V ECTOR Q UANTIZATION
Let, G o be gray level as classified to object class of FLVQ network, G b be the gray level as classified to background class of FLVQ network and I be the preprocessed image. For i, j = 1, 2, …, 32 if I(i,j) == G o then I(i,j) = K1 If I(i,j) == G b then I(i,j) = K2 (1) End I1 = I where K1and K2 are chosen scalar constants, K1>>K2 and
S UPERIMPOSING A ND N ORMALIZATION Use any edge detection algorithms to detect edges in image I. Let the image of edges be I 1. Let I 2 be the background suppressed image of previous stage. I 1 and I 2 are superimposed to form an image matrix. Thus, we have a normalized image which is background suppressed, object enhanced and edge predominated.
Source: Fuzzy Learning Vector Quantization in Intelligent vision Recognition for Blind Navigation By R Nagarajan, Yaacob and Sainarayanan
Source: Fuzzy Learning Vector Quantization in Intelligent vision Recognition for Blind Navigation By R Nagarajan, Yaacob and Sainarayanan
Source: Fuzzy Learning Vector Quantization in Intelligent vision Recognition for Blind Navigation By R Nagarajan, Yaacob and Sainarayanan
S ONIFICATION Transformation of data in relation to perceived associations to an acoustic signal for the purpose of facilitating communication or interpretation is defined as Sonification. Human auditory system can sense frequencies between 20 Hz to 20,000 Hz. From literature and experimentations it is observed that the system is most sensitive to frequencies between 20 Hz to 4000 Hz. This range is adopted in the proposed sonification module.
S ONIFICATION In order to create variations in pitch in the sonification module, the pixel position in a column of the image pattern is made to be inversely related to the frequency of sine wave. The loudness is made to depend directly on the pixel value of the processed image.
S ONIFICATION The processed image is sonified to stereo acoustic patterns. The image is sonified to stereo sound by proper mapping of the image, by which information regarding image data corresponding to left side of a blind are transferred to the left earphone and the right half image data to the right earphone.
S ONIFICATION Let f o be the fundamental frequency of the sound generator G be a constant gain F D, the frequency difference between adjacent pixels in vertical direction. The changes in frequency corresponding to (I,j)th of the pixel in 32x32 image matrix is given by. F i = f o + F D Where F D = Gf o (32-i); i= 1,2,3,…,32
S ONIFICATION The generated sound pattern is hence given by Where S(j) is the sound pattern for column j of the image t = 0 to D and D depends on the total duration of the acoustic information for each column of the image; where f, is the frequency corresponding to row, i.
S ONIFICATION The sine wave with the designed frequency is multiplied with gray scale of each pixel of a column and summed up to produce the sound pattern. The scanning is performed from leftmost column towards the center and from right most column towards the center. Sound pattern to the left earphone is SL = S(1) to S(n/2) appended from the left side. Sound pattern to the right earphone is SR = S(n) to S(n/2) appended from the right side where n is the total number of columns. In our case n = 32.
F UTURE W ORK In this research, information regarding depth of the object is not considered. An object is ‘perceived’ bigger through the variation in sound pattern as the blind moves near to the object.
R EFERENCES Fuzzy Learning Vector Quantization in Intelligent vision Recognition for Blind Navigation By R Nagarajan, Yaacob and Sainarayanan Role of Object Identification in Sonification System for Visually impaired By R Nagarajan, Yaacob and Sainarayanan