American Sign Language Pattern Recognition: An Embedded Design Approach Wadner Joseph • James Haralambides, PhD Abstract Dynamic Sign Alphabet Recognition Pattern Matching The implementation of pattern recognition on FPGA goes through a few phases to receive an image and corresponding letter of the signers(person who puts sign in front of camera) alphabet on a monitor. A visual basic program is written to access a webcam and capture images. Pre-processing has to be done to achieve any results when it comes to matching. The steps to pre-processing include uniformly sizing the image to a specific size(255x255). Next an edge detection is done on the image to get a black image with white lines(pixels). Edge detection involves comparing the intensity of vertical and horizontal neighboring pixels to determine whether an edge exists. The outline information is important in this case because it is used to compute the Euclidean distance of the white pixels: (sum x/ total xy),(sum y, total xy), this gives the center of the signers hand. After the center is found, center the white pixels to the center of the image by translating the pixels T= (cwx- cx,cwy-cy). Using polar distances (arctangent) from the center of the signers hand, at every 5 degrees the farthest pixel from the center is stored thus giving 72(360/5) distances to put in a histogram. For pre-processing purposes a histogram for each letter being used is stored in block ram in the FPGA. The histogram is stored in a text file compatible for the Spartan 3e. Pre-processing is now done. To achieve results when running the program, the signer’s sign will be captured and the histogram will be sent through a serial cable to the FPGA. Using a modified correlation, the register will subtract valuesof the stored histogram to the signers histogram in each of the 72 positions, shift and subtract again for an average values(sum(Ai-Bi) closer to the given sign. Once that is done the corresponding image and letter is sent to the screen. Few environmental concerns are considered in this process which include: position of video camera, lightning sensitivity, background conditions. Pattern matching consists of two stages: a learning stage, which is usually performed offline, and a matching stage. Matching is used to recognize similar or exact features of the given item to be matched. There are many methods used for pattern matching depending on what type of application matching is being considered for. Pattern matching has a great significance in the field of forensics science in which it remains one of the prime factors in solving a crime. Stage one, having stored finger prints in a government database, once the criminal commits a following crime or commits for the first time, stage two, a matching is done to find the person associated. This work is an effort to recognize a subset of characters of the American Sign Language (ASL) dynamically using programmable logic. Pattern recognition takes place on a Spartan 3e educational board using an embedded design approach. A camera connected to a computer and activated by software will capture a snapshot of the end user’s hand sign placed in front of a uniform background. The image will go through a number of preprocessing steps including: edge detection, computing of its geometric center, outline normalization, and conversion to a byte stream. Following these transformations, a histogram will retain information about certain characteristics of the stored sign character. Such characteristics are related to the location (distance from geometric center) and orientation (angle of the pixel with respect to the horizontal) of its outline pixels produced through edge detection. Finally, the image is transmitted to the Spartan 3e FPGA board using an RS-232 serial port and is stored to available block RAM. Characters of the sign alphabet are preprocessed and stored on the FPGA board’s block ROM to increase the execution time efficiency of the design. These characters have been captured and preprocessed following the same approach that was used on the dynamically captured sign. An iterative correlation process applied between the histogram of the captured image and the histograms of the stored images of the sign characters yields an optimal match. A graphical representation of the resulting sign will be forwarded to an LCD monitor using the FPGA board’s VGA port. The Clock & UART The clock operates at 25 MHz which the Reader and the VGA require in order to perform the edge detection algorithm and display the image on the screen effectively. The UART is a baud rate generator that generates a specific clock pulse for parallel transmission timing between devices using a RS-232 serial cable. This sequential circuit needs a clock to synchronize its operations. Using the UART model, the computer sends a byte one at a time, and the FPGA receives data one bit at a time and converts to a byte for processing. ROM Memory Introduction The memory component is a single port read-only memory (ROM). 255 x 255 images are stored using a word size of 3 bits (color depth of 8 colors), the sample letters are stored, and the pre-processed histograms. American Sign Language (ASL) is a complete, complex language that employs signs made by moving the hands combined with facial expressions and postures of the body. It is the primary language of many North Americans who are deaf and is one of several communication options used by people who are deaf or hard-of-hearing. Spoken language are words that are produced by using the mouth and voice to make sounds. But for people who are hearing impaired, the sounds of speech are often not heard, and only a fraction of speech sounds can be seen on the lips. An embedded approach to ASL is growing in awareness and can turn in many directions as far as embracing the Block Diagram Initialize ROM Pre-processing Histogram FPGA Memory- ROM/RAM Register Correlation Display See figure 1-1 Visual Basic Capture Image w/ Webcam Algorithm Create Histogram ROM memory Captured Histogram RS-232 UART The VGA Controller language for learning and finding more solutions for the hearing impaired to communicate. To think about a person who has no knowledge of ASL and cant communicate with the hearing impaired, embedded solutions provide ease and less discrimination among us. Vision is the most useful tool a deaf person has to communicate and receive information and now image to text sign alphabets are steps closer to better communication. Send/ Receive Figure 1-1 The VGA controller displays the images on the screen and operates at 25 Mhz. The frequency of the clock is a key factor in the design. Synchronization of the reader and memory components to the VGA component is required in order to synchronize pixel retrieval , pixel processing, and pixel display. Block ROM 1 Histogram Array VGA Cable Store Histogram Register Correlation Display VGA Controller Block ROM 2 Store Image Monitor RAM Block ROM 3 Store Letter W Circuit Schematic VGA Pin Assignment