Perception Vision, Sections 24.1 - 24.3 Speech, Section 24.7.

Slides:



Advertisements
Similar presentations
Distinctive Image Features from Scale-Invariant Keypoints
Advertisements

Face Recognition Sumitha Balasuriya.
Presented by Erin Palmer. Speech processing is widely used today Can you think of some examples? Phone dialog systems (bank, Amtrak) Computers dictation.
Principal Component Analysis Based on L1-Norm Maximization Nojun Kwak IEEE Transactions on Pattern Analysis and Machine Intelligence, 2008.
Presented by Xinyu Chang
電腦視覺 Computer and Robot Vision I
CS Spring 2009 CS 414 – Multimedia Systems Design Lecture 4 – Digital Image Representation Klara Nahrstedt Spring 2009.
Computer vision: models, learning and inference Chapter 13 Image preprocessing and feature extraction.
COLORCOLOR A SET OF CODES GENERATED BY THE BRAİN How do you quantify? How do you use?
Face Recognition and Biometric Systems
Human-Computer Interaction Human-Computer Interaction Segmentation Hanyang University Jong-Il Park.
Computer Vision Spring ,-685 Instructor: S. Narasimhan Wean 5403 T-R 3:00pm – 4:20pm Lecture #20.
Modeling Pixel Process with Scale Invariant Local Patterns for Background Subtraction in Complex Scenes (CVPR’10) Shengcai Liao, Guoying Zhao, Vili Kellokumpu,
Pattern Recognition Topic 1: Principle Component Analysis Shapiro chap
CS 790Q Biometrics Face Recognition Using Dimensionality Reduction PCA and LDA M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Project 4 out today –help session today –photo session today Project 2 winners Announcements.
Distinctive Image Feature from Scale-Invariant KeyPoints
Face Recognition Jeremy Wyatt.
Distinctive image features from scale-invariant keypoints. David G. Lowe, Int. Journal of Computer Vision, 60, 2 (2004), pp Presented by: Shalomi.
Highlights Lecture on the image part (10) Automatic Perception 16
5/30/2006EE 148, Spring Visual Categorization with Bags of Keypoints Gabriella Csurka Christopher R. Dance Lixin Fan Jutta Willamowski Cedric Bray.
PCA Channel Student: Fangming JI u Supervisor: Professor Tom Geoden.
Computer Vision I Instructor: Prof. Ko Nishino. Today How do we recognize objects in images?
Smart Traveller with Visual Translator for OCR and Face Recognition LYU0203 FYP.
Face Recognition: An Introduction
Computer Vision - A Modern Approach Set: Segmentation Slides by D.A. Forsyth Segmentation and Grouping Motivation: not information is evidence Obtain a.
CS 485/685 Computer Vision Face Recognition Using Principal Components Analysis (PCA) M. Turk, A. Pentland, "Eigenfaces for Recognition", Journal of Cognitive.
Neighborhood Operations
Methods in Medical Image Analysis Statistics of Pattern Recognition: Classification and Clustering Some content provided by Milos Hauskrecht, University.
PCA & LDA for Face Recognition
Dimensionality Reduction: Principal Components Analysis Optional Reading: Smith, A Tutorial on Principal Components Analysis (linked to class webpage)
Lecture #32 WWW Search. Review: Data Organization Kinds of things to organize –Menu items –Text –Images –Sound –Videos –Records (I.e. a person ’ s name,
ECSE 6610 Pattern Recognition Professor Qiang Ji Spring, 2011.
Machine Vision for Robots
Introduction to Computer Vision Olac Fuentes Computer Science Department University of Texas at El Paso El Paso, TX, U.S.A.
1 Recognition by Appearance Appearance-based recognition is a competing paradigm to features and alignment. No features are extracted! Images are represented.
November 13, 2014Computer Vision Lecture 17: Object Recognition I 1 Today we will move on to… Object Recognition.
Face Recognition: An Introduction
CSE 185 Introduction to Computer Vision Face Recognition.
Chapter 23: Probabilistic Language Models April 13, 2004.
A NOVEL METHOD FOR COLOR FACE RECOGNITION USING KNN CLASSIFIER
Chapter 13 (Prototype Methods and Nearest-Neighbors )
1 Machine Vision. 2 VISION the most powerful sense.
1Ellen L. Walker 3D Vision Why? The world is 3D Not all useful information is readily available in 2D Why so hard? “Inverse problem”: one image = many.
Feature Selection and Dimensionality Reduction. “Curse of dimensionality” – The higher the dimensionality of the data, the more data is needed to learn.
Irfan Ullah Department of Information and Communication Engineering Myongji university, Yongin, South Korea Copyright © solarlits.com.
CS Spring 2010 CS 414 – Multimedia Systems Design Lecture 4 – Audio and Digital Image Representation Klara Nahrstedt Spring 2010.
Robotics Chapter 6 – Machine Vision Dr. Amit Goradia.
Image Features (I) Dr. Chang Shu COMP 4900C Winter 2008.
Distinctive Image Features from Scale-Invariant Keypoints Presenter :JIA-HONG,DONG Advisor : Yen- Ting, Chen 1 David G. Lowe International Journal of Computer.
Another Example: Circle Detection
SIFT Scale-Invariant Feature Transform David Lowe
- photometric aspects of image formation gray level images
CS262: Computer Vision Lect 09: SIFT Descriptors
PRINCIPAL COMPONENT ANALYSIS (PCA)
Lecture 8:Eigenfaces and Shared Features
Face Recognition and Feature Subspaces
Recognition: Face Recognition
Feature description and matching
Tremor Detection Using Motion Filtering and SVM Bilge Soran, Jenq-Neng Hwang, Linda Shapiro, ICPR, /16/2018.
Image Segmentation Techniques
Computer Vision Lecture 16: Texture II
Object Recognition Today we will move on to… April 12, 2018
CS4670: Intro to Computer Vision
CS4670: Intro to Computer Vision
Announcements Project 2 artifacts Project 3 due Thursday night
Announcements Project 4 out today Project 2 winners help session today
Announcements Artifact due Thursday
Announcements Artifact due Thursday
The “Margaret Thatcher Illusion”, by Peter Thompson
Presentation transcript:

Perception Vision, Sections Speech, Section 24.7

Computer Vision §“the process by which descriptions of physical scenes are inferred from images of them.” -- S. Zucker §“produces from images of the external 3D world a description that is useful to the viewer and not cluttered by irrelevant information”

Typical Applications §Medical Image Analysis §Aerial Photo Interpretation §Material Handling §Inspection §Navigation

Multimedia Applications §Image compression §Video teleconferencing §Virtual classrooms

Image pixelation

Pixel values

How to recognize faces?

Problem Background §M training images §Each image is N x N pixels §Each image is l normalized for face position, orientation, scale, and brightness §There are several pictures of each face l different “moods”

Your Task §Determine if the test image contains a face §If it contains a face, is it a face of a person in our database? §If it is a person in our database, which one? §Also, what is the probability that it is Jim?

Image Space §An N x N image can be thought of as a point in an N 2 dimensional image space §Each pixel is a feature with a gray scale value. §Example: l 512 x 512 image l each pixel can be 0 (black) to 255 (white)

Nearest Neighbor §The most likely match is the nearest neighbor §But that would take too much processing §Since all images are faces, they will have very high similarity

Face Space §Lower dimensionality to both simplify the storage and generalize the answer §Use eigenvectors to distill the 20 most distinctive metrics §Make a 20-item array for each face that contains the values of 20 features that most distinguish faces. §Now each face can be stored in 20 words

The average face §Training images are I 1, I 2,... I m §Average image is A

Weight of an image in each feature §For k=1,..., 20 features, compute the similarity between the Input image, I, and the kth eigenvector, E k

Image in Face Space §“Only” 20 dimensional space §W = [w 1, w 2,..., w 20 ], a column vector of weights that indicate the contribution of each of the 20 eigenfaces in I §Each image is projected from a point in high dimensional space into face space §20 features * 32 bits = 320 bits per image

Reconstructing image I §If M’ < M, we can only approximate I §Good enough for recognizing faces

Picking the 20 Eigenfaces §Principal Component Analysis l (also called Karhunen-Loeve transform) §Create 20 images that maximize the information content in eigenspace §Normalize by subtracting the average face §Compute the covariance matrix, C §Find the eigenvectors of C that have the 20 largest eigenvalues

Build a database of faces §Given a training set of face images, compute the 20 largest eigenvectors, E 1, E 2,..., E 20 l Offline because it is slow §For each face in the training set, compute the point in eigenspace, W = [w 1,w 2,...,w 20 ] l Offline, because it is big

Categorizing a test face §Given a test image, I test, project it into the 20-space by computing W test §Find the closest face in the database to the test face: l where Wk is the point in facespace associated with the kth person l || * || denotes the euclidean distance in facespace

Distance from facespace §Find the distance of the test image from eigenspace

Is this a face? §If dffs < threshold1 l then if d < threshold2 the test image is a face that is very close to the nearest neighbor, classify it as that person l else the image is a face, but not one we recognize § else the image probably does not contain a face

Face Recognition Accuracy §Using 20-dimensional facespace resulted in about 95% correct classification on a database of 7500 images of 3000 people §If there are several images per person, the average W for that person helps improve accuracy

Edge Detection §Finding simple descriptions of objects in complex images l find edges l interrelate edges

Causes of edges §Depth discontinuity l One surface occludes another §Surface orientation discontinuity l the edge of a block §reflectance discontinuity l texture or color changes §illumination discontinuity l shadows

Examples of edges

Finding Edges Image Intensity along a line First derivative of intensity Smoothed via convolving with gaussian

Pixels on edges

Edges found

Human-Computer Interfaces §Handwriting recognition §Optical Character Recognition §Gesture recognition §Gaze tracking §Face recognition

Vision Conclusion §Machine Vision is so much fun, we have a full semester course in it §Current research in vision modeling is very active l More breakthroughs are needed

Speech Recognition Section 24.7

Speech recognition goal §Find a sequence of words that maximizes P(words | signal)

Signal Processing §“Toll quality” was the Bell labs definition of digitized speech good enough for long distance calls (“toll” calls) l Sampling rate: 8000 samples per second l Quantization factor: 8 bits per sample §Too much data to analyze to find utterances directly

Computational Linguistics §Human speech is limited to a repertoire of about 40 to 50 sounds, called phones §Our problem: l What speech sounds did the speaker utter? l What words did the speaker intend? l What meaning did the speaker intend?

Finding features

Vector Quantization §The 255 most common clusters of feature values are labeled C1, …, C255 §Send only the 8 bit label §One byte per frame (a 100-fold improvement over the 500 KB/minute)

How to Wreck a Nice Beach §where P(signal) is a constant (it is the signal we received) §So we want

Unigram Frequency §Word frequency §Even though his handwriting was sloppy, Woody Allen’s bank hold-up note probably should not have been interpreted as “I have a gub” l The word “gun” is common l The word “gub” is unlikely

Language model §Use the language model to compare l P(“wreck a nice beach”) l P(“recognize speech”) §Use naïve Bayes to asses the likelihood for each word that it will appear in this context

Bigram model §want P(w i | w 1, w 2, …, w n ) l approximate it by P(w i | w I-1 ) §Easy to train l Simply count the number of times each word pair occurs l “I has” is unlikely, “I have” is likely l “an gun” is unlikely, “a gun” is likely

Trigram §Some trigrams are very common l only track the most common trigrams §Use a weighted sum of l unigram l bigram l trigram

Near the end of the semester l Time flies like an arrow l Fruit flies like a banana §It is currently hard to incorporate parts of speech and sentence grammar into the probability calculation l lots of ambiguity l but humans seem to do it

Conclusion §Speech recognition technology is changing very quickly §Highly parallel §Amenable to hardware implementations