Presentation is loading. Please wait.

Presentation is loading. Please wait.

Obtaining Data for Face Recognition from the web By Tal blum Advisor: Henry Schneiderman.

Similar presentations


Presentation on theme: "Obtaining Data for Face Recognition from the web By Tal blum Advisor: Henry Schneiderman."— Presentation transcript:

1 Obtaining Data for Face Recognition from the web By Tal blum Advisor: Henry Schneiderman

2 Sample Images

3 Overview System Purpose System Purpose Collecting Data methods Collecting Data methods System Structure System Structure Problems Problems Numbers & Statistics Numbers & Statistics

4 System Purpose Collecting face images from the www for: Data for face recognition purposes Data for face recognition purposes A system that people can submit images to and it will tell you who are the celebrities they most resemble. A system that people can submit images to and it will tell you who are the celebrities they most resemble. Goal: to collect images of 1000 people with at least 50 images for each Goal: to collect images of 1000 people with at least 50 images for each

5 Collection Vs. Web Collecting Cost Cost Data size Data size Aging Aging Controlled Setting Controlled Setting Limited backgrounds, poses, lightings, etc. Limited backgrounds, poses, lightings, etc. Duplicates Duplicates Metadata Metadata Alignment Alignment Tagging Errors Tagging Errors Authorization Authorization

6 System Overview Names Extraction Cleaning/Refinement/ remove duplicates Spidering Download remove duplicates remove faceless Manual Tagging html text Names Files Names Files URLs Images Face images

7 Names Extraction Sources: Sources: Web Directories Web Directories Types: Actors, Politicians, Sports players, singers … Types: Actors, Politicians, Sports players, singers … Infomedia project Infomedia project Extract names from html Extract names from html Result: Names Files Result: Names Files Cleaning Cleaning Duplicates Removed Duplicates Removed Refinement Refinement

8 Spidering 5 different image search engine: 5 different image search engine: Altavista, Yahoo-news, Yahoo, Picsearch, Alltheweb Altavista, Yahoo-news, Yahoo, Picsearch, Alltheweb Different Interface Different Interface Different results quality Different results quality Limited availability Limited availability Query refinement Query refinement Quoted names Quoted names

9 Downloading Gets the URLs and downloads them Gets the URLs and downloads them Only about 2/3 of the URLs were downloaded Only about 2/3 of the URLs were downloaded Work in the background Work in the background http://news.bbc.co.uk/media/images/38378000/jpg/_38941_bushap150.jpg

10 remove duplicates remove faceless Uses simple heuristics to compare files Uses simple heuristics to compare files Uses Schneiderman's face detection algorithm to find faces in the images Uses Schneiderman's face detection algorithm to find faces in the images

11 Manual Tagging Decide who is the person by that name Decide who is the person by that name Choose between several people in the image Choose between several people in the image Add additional metadata s.a. age race, gender … Add additional metadata s.a. age race, gender … Problems: unrelated images & multiple people by the same name Problems: unrelated images & multiple people by the same name Possible classification errors Possible classification errors Go over millions of images Go over millions of images

12 Manual Tagging

13 Manual Tagging – Face extraction

14 Problems - Name Duplicates Example: Example: George Bush, George Bush, President George Bush, President George Bush, George W. Bush George W. Bush Another example: Another example: Wham (a band) Wham (a band) George Michael George Michael

15 Problems - Name Duplicates Solution: Detect duplicates on 3 levels Solution: Detect duplicates on 3 levels Names – automatic, manual Names – automatic, manual URLs URLs By Recognition errors By Recognition errors Approaches Approaches Semi-automatic Semi-automatic Fully-automatic Fully-automatic

16 Numbers & Statistics We collected 36000 people names We collected 36000 people names For each we spidered up to 1000 URLs For each we spidered up to 1000 URLs On average only 1/3 of the URLs reach the manual stage. On average only 1/3 of the URLs reach the manual stage. So far we run the system on 9500 people So far we run the system on 9500 people Total # of URLs 1,500,000 Total # of URLs 1,500,000 1,000,000 image files consisting of 60GB. 1,000,000 image files consisting of 60GB. An average of 157 URLs for person or 182 for person not including people with no URLs An average of 157 URLs for person or 182 for person not including people with no URLs

17 More Information Contacts: Contacts: Tal Blum tblum@cmu.edu tblum@cmu.edu Henry Schneiderman hws@cs.cmu.edu hws@cs.cmu.edu Acknowledgement to David Fields

18 THE END


Download ppt "Obtaining Data for Face Recognition from the web By Tal blum Advisor: Henry Schneiderman."

Similar presentations


Ads by Google