Download presentation
Presentation is loading. Please wait.
1
Personal Memory Assistant Abstract Facial recognition and speaker verification systems have been widely used in the security field. In this area the systems have to be very accurate to prevent unauthorized users from accessing classified information. The extensive list of possible uses of these technologies in the commercial world has not been taken advantage of yet. It is often difficult to remember the name of a person who is encountered out of context or infrequently. This situation can prove to be very embarrassing for the forgetful person. It can also be insulting to the person who is not remembered. The Personal Memory Assistant uses facial recognition and speaker identification to help avoid this situation. A user discretely collects images and voice samples of the person to be identified. The facial recognition component analyzes the image to identify the three closest facial matches in the system. The speaker identification component does the same to identify the top two voice matches. The top ranked IDs are compared using an algorithm that was developed through testing. If the IDs match, a picture of the person and personal profile is displayed to the user. If no match is made, the user has the option to add the subject to the database. In addition to the identification process, the system also gives the option of searching for and updating entries in the database. Group 7 Authors Scott Kyle CTE ’08 Erika Sanchez EE ’08 Meredith Skolnick CTE ’08 Advisor Dr. Kenneth Laker University of Pennsylvania Dept. of Electrical and Systems Engineering Facial Recognition System The facial identification system is divided into two components: detection and recognition. Detection isolates the desired face out of an image using the Intel Open Computer Vision library object detection algorithm. This algorithm uses a trained cascade of boosted classifiers based on Haar-like features (spatial contrasts) to determine if a certain region of the image is a face. The cache serves as the link between the detection and recognition components. It stores sequential detected faces with fault tolerance for false and missed detections in frames. The recognition component aligns the faces and masks the background before employing an eigenface algorithm, which is a combination of Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA). The eigenfaces are the principal sets of eigenvectors derived from covariance matrices that are calculated from the difference between the captured face and the mean of an aligned set of stored faces for each person. Speaker Identification System The speech wave goes through the following three major processing steps: preprocessing, feature extraction and pattern matching. The preprocessing step is performed to normalize the amplitude of the entire voice sample so that the signal amplitudes vary between -1 and 1. In the Feature Extraction process, the signal is analyzed and spectral amplitudes are saved. A Fast Fourier Transform is performed on the signal, and the spectrum values are saved as the speaker’s unique feature set. When appropriate features have been extracted from the speech signal, they are compared to the features of all the saved signals. The method for pattern matching that is used by this system is the Nearest Neighbor Algorithm using Euclidean distances. Signals processed by the system are either saved in the population database and used as voiceprints to be compared to future input or added to an existing voiceprint. As more samples of the subject’s voice are saved, the system is able to improve the voiceprint and more accurately identify this subject. Camera Speaker Identification Facial Recognition Audio Database Image Database Profile Database Compare Entry Form Display Update True False Mic Delete Search Overall Flow Chart Testing Process More than 100 people of different races, genders, and ages were used to test the functionality of the system. A subject pool with demographics representative of the U.S. population was used in order to ensure uniform performance. Each subject was entered into the database and then face and voice samples were collected for three trials. All of the similarity measurements were stored and an algorithm analyzed the results. The comparison formula was developed from these results to be used for the recognition process. api API(FaceDetect*, Speaker*) ~API() newEntry() sampleVoice() sampleFace() clear() reset() name(int) test() speakerApp main(String[]) begin() IDFound(String) totTrain() reset() delete(String) FaceDetect FaceDetect(int, int, int) ~FaceDetect() sample() erase(int) save(int) clear() reset() identify() trialResults(int, int) speaker Speaker() ~Speaker() identify() save(int) erase(int) clear() reset() Cache userlist : UserList* Cache(int) ~Cache() add(IplImage*, CvPoint, int) save(int) identify() average() trialResults(int, int) tick() clear() UserList ids : vector paths : vector UserList(char*) ~UserList() empty() erase(int) add(int, string) reset() recognize train() identify(IplImage*) SpeakersIdentDb SpeakersIdentDb(String) getIDByFilename(String) getNumberPerID() connect() query() close() record format : AudioFormat line : TargetDataLine sc : Scanner fileName : String samplesFolder : String record() stopRecord() getName() Software Flow Chart Database Database(char*) ~Database() insert(vector ) select(int) select(string) select(string, string) update(int, vector ) erase(int) reset()
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.