Download presentation
Presentation is loading. Please wait.
Published byMargaretMargaret Hancock Modified over 9 years ago
1
Spatial Business Detection and Recognition from Images Alexander Darino
2
Outline Project Overview
3
PROJECT OVERVIEW Previous Work Project Objective Anticipated End Result Project Pipeline
4
Previous Work: Where Am I? ImageWhere Am I? Latitude, Longitude
5
Project Objective Given: – Image – Geolocation Yield: – Spatial Identification of Businesses in Image – Addresses of Businesses in Image – Information about Businesses in Image Ex. Reviews, Categories, Phone Number, etc.
6
Project Objective Given: – Image – Geolocation Yield: – Spatial Identification of Businesses in Image – Addresses of Businesses in Image – Information about Businesses in Image Ex. Reviews, Categories, Phone Number, etc.
7
Project Pipeline Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Text Extraction Detected Text Business Name Matching Business Identification Business Spatial Detection 7
8
Anticipated End Result
9
BUSINESS SEARCHING Obtaining a List of Candidate Businesses in Image via
10
Business Searching Latitude Longitude Latitude Longitude Geocoding Reverse Geocoding Reverse Geocoding Nearby Businesses Image Text Extraction Detected Text Business Name Matching Business Identification Business Spatial Detection 10
11
Business Searching Business Search Services – Google – Yelp – CityGrid (Supplier for Yellow Pages, Super Pages) REST-based API Results in JSON or XML format Aggregate Results into Facade
12
{'businesses': [{'address1': '466 Haight St', 'address2': '', 'address3': '', 'avg_rating': 4.0, 'categories': [{'category_filter': 'danceclubs', 'name': 'Dance Clubs', 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=dance clubs'}, {'category_filter': 'lounges', 'name': 'Lounges', 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=loung es'}, {'category_filter': 'tradamerican', 'name': 'American (Traditional)', 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=trada merican'}], 'city': 'San Francisco', 'distance': 1.8780401945114136, 'id': 'yyqwqfgn1ZmbQYNbl7s5sQ', 'is_closed': False, 'latitude': 37.772201000000003, 'longitude': -122.42992599999999, 'mobile_url': 'http://mobile.yelp.com/biz/yyqwqfgn1ZmbQYNbl7s5sQ', 'name': 'Nickies', 'nearby_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA', 'neighborhoods': [{'name': 'Hayes Valley', 'url': 'http://yelp.com/search?find_loc=Hayes+Valley%
13
Business Searching: Results 40.441127247181797 -80.002821624487595 Denham & Company Salon Ullrich's Shoe Repairing Nicholas Coffee Co Bella Sera On the Square A & J Ribs Starbucks Coffee Jenny Lee Bakery Galardi's 30 Minute Cleaners Jimmy John's Gourmet Sandwiches Charley's Grilled Subs Fresh Corner Lagondola Pizzeria & Restaurant Camera Repair Service Inc Pittsburgh Cigar Bar Original Oyster House MixStirs 1902 Tavern Costanzo's Pittsburgh Silver Llc Graeme St Galardi's 30 Minute Cleaners Denham & Co Salon Bruegger's Bagel Bakery Nicholas Coffee Co Market Square Fat Tommy's Pizzeria Mixstirs Cafe Giggles Rycon Construction Inc Garbera, Dennis C, Dds - Emmert Dental Assoc Bella Sera on the Square Mancini's Bread Co Las Velas Ciao Baby Washington Reprographics Inc Highmark Life Insurance Co Fischer, Donald R, Md - Highmark Life Insurance Co Jimmy John's Lynx Energy Partners Inc Emmert Dental Assoc
14
Business Searching: Evaluation Strengths – Aggregated results almost always found Business of interest Weaknesses – Each API limits query result set size - this is why we aggregate – Only businesses listed – Not all businesses listed Limitations – Dependent on well-populated, accurate Business Directories – Have only tested for 15 Pittsburgh images - unknown result quality for rural areas.
15
EXTRACTING IDENTIFYING TEXT Obtaining names of Businesses in Image by
16
Extracting Identifying Text Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Text Extraction Detected Text Business Name Matching Business Identification Business Spatial Detection 16
17
Extracting Identifying Text: OCR Used Two OCR APIs: – GNU OCR (Ocrad) – GOCR OCR APIs highly sensitive to: – Font (only works well with roman font) – Perspective – Scale – Binarization Threshold – Dark on Light vs. Light on Dark (inversion)
18
Extracting Identifying Text: OCR OCR API evaluations – Ocrad - could not yield any meaningful data across over 200 scale/threshold/inversion combinations – GOCR - produced good results across 10 scales with and without inversion using threshold automatically determined by Otsu's method 98% of Results are garbage! Examples of GOCR output (next slides)
19
Extracting Identifying Text: OCR
20
n. c.......o.a...u..............oU..D.oa..e......_RuEGGE..KERy..J...w...........L........M. II.....c.....i.......l..J.t...llt...l SHA.P. It..tllt.........._. l...J y. _.c _.... _tt.._....t.._.r.........t.t_t.._.._.l.. J.r.r.I.
21
Extracting Identifying Text: OCR
22
u..........._nq......eoR.E.l.e...í....e...n....n....n.e.R.E...e....o. _....E.R.E.IKE........I.ltlO.........rE..o......E.....I.K.E.o..... J.n....c...E.R.E.I.E.......M..E.R.E...E...a J...Gu. ge..ge E.F.._.....E..gE.D... fUlI..lll.lll.IIi.l..Xl..
23
Extracting Identifying Text: OCR
24
..e_..w.. _......D.........u J.....J.................n......n..........n _..r.l_d..J.ec.m._..n.......J.n.._...tn.. ct..._.................D.u.v... e. n.... u.. Y.._ w. n. n....Jn.......G..o..r..._........J...m l.t..l.tt.l.._w....................._....l....t........j..i lI.i..
25
Extracting Identifying Text: OCR
26
__. ncu_.l..._..._J...ne......._n._..v.....ra......d_..._............. i..n..U ll REsT.unAN...r. c.....r...T t.rJll......m...c.....n..........J n. I..c...r.r ESTAU.ANT.r.O....c.cc. Note: Even though "Tambellini" is a roman font, it is too stretched to be picked up by GOCR
27
OCR Evaluation Strengths – Applicable to expected input of orthogonal images Weaknesses – Only works well(-ish) for strictly roman font Limitations – Will perform poorly for artistic fonts and business signs Conclusion – By itself, OCR is not the best approach towards Business identification Reasons: poor recognition, franchises, perspective, etc
28
BUSINESS NAME MATCHING Matching Identifying Text to Candidate Business Names via
29
Business Name Matching Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Text Extraction Detected Text Business Name Matching Business Identification Business Spatial Detection 29
30
Business Name Matching Given: Unreliable fragments of ‘detected text’ Yield: Matching Business Names Process: – Filter input: trimming, uselessness (< 2 letters) – Fuzzy String Matching – Voting Scheme: confidence of business appearing in image
31
Business Name Detection 31
32
Business Name Matching Developed Confidence Attribution Algorithm – Confidence of OCR Token being Name Token Example: Confidence of “ESTUANT” representing “RESTAURANT” Point-based system – Confidence of Name appearing in Image Sum of points of matching OCR Text Use logarithmically-normalized points to determine business inclusion threshold 32
33
Business Name Matching 33
34
34
35
Business Name Matching 35
36
36
37
Business Name Matching 37
38
Business Name Matching 38
39
Business Name Matching 39 Note: This originally did not appear because it did not exceed the confidence threshold. It now appears because it contributes to the Business Name Identification
40
SPATIAL BUSINESS IDENTIFICATION Isolating Identified Images in Image via
41
Business Spatial Identification Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses ImageOCR Detected Text Business Name Matching Business Identification Business Spatial Detection Business Spatial Detection 41
42
Business Spatial Identification 42
43
Business Spatial Identification 43 Aiken George S Co Category:Food, Grocery Address:218 Forbes Ave Pittsburgh, PA 15222 Phone: (412) 391-6358 Rating: 4.5/5 (2 Reviews) Category:Food, Grocery Address:218 Forbes Ave Pittsburgh, PA 15222 Phone: (412) 391-6358 Rating: 4.5/5 (2 Reviews)
44
Business Spatial Identification 44
45
Business Spatial Identification 45
46
Business Spatial Identification 46 Bruegger's Bagels Category:Bagels Address:Market Sq Pittsburgh, PA 15222 Phone: (412) 281-2515 Rating: Not Rated Category:Bagels Address:Market Sq Pittsburgh, PA 15222 Phone: (412) 281-2515 Rating: Not Rated
47
V0.1: EVALUATION
48
Current Approach Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses ImageOCR Detected Text Business Name Matching Business Identification Business Spatial Detection 48
49
Weaknesses to Current Approach Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image OCR Detected Text Business Name Matching Business Identification Business Spatial Detection 49
50
Weaknesses to Current Approach Lots of Garbage 50
51
Weaknesses to Current Approach Fragmented Word Detection 51
52
Weaknesses to Current Approach Fails with non-orthogonal perspective 52 Did I already mention lots of garbage?
53
Weaknesses to Current Approach Fails with non-roman text Not scale-invariant 53
54
ALTERNATIVES TO OCR 54
55
Alternative #1: Image Matching Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Match to Storefront Image Business Identification Business Spatial Detection 55
56
Alternative #1: Image Matching 56
57
Alternative #1: Evaluation Weaknesses: – Low Availability of Storefront Images (< 50% Avg) George Aiken area businesses with photos: 18/35 Brueggers area businesses with photos: 22/40 Tambellini area businesses with photos: 8/22 – Available Images too small (100 x 100) – Computationally Expensive Conclusion: Not a viable solution
58
Alternative #2: Template Matching Tambellini 58
59
Alternative #2: Template Matching Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Render Templates of Business Names in Different Fonts Business Spatial Detection Image Matching (eg. SIFT, HAAR) Image Matching (eg. SIFT, HAAR) Template Images Business Identification 59
60
Alternative #2: Template Matching OCR Not Scale Invariant Unbounded Search Fragmented Recognition Roman-only font Alternative #2 Scale Invariant Bounded Search Whole-word recognition All fonts 60
63
Subsequent Attempts
64
Alternative #3: Scene Text Recognition State of the Art: – STR ≠ OCR – Far superior to our ‘naïve’ approaches to STR (ie. OCR, Image matching, SIFT) OCR only works for highly controlled environments STR works for unconditioned environments – Scale invariant – Color/intensity invariant – Lexicon-Assisted
65
Alternative: Scene Text Recognition No STR implementations readily available Have contacted several groups specialized in STR – unable to assist us in providing implementation for research purposes Had to resort to implement STR from scratch
66
SCENE TEXT RECOGNITION The long and perilous journey of implementing
67
STR Implementation STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution- based potential characters detection Character/layout geometry and color properties analysis Local affine rectification Refined Detection
68
MULTIRESOLUTION-BASED POTENTIAL CHARACTERS DETECTION Candidate Text Detection via
69
STR Implementation STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution- based potential characters detection Character/layout geometry and color properties analysis Local affine rectification Refined Detection
70
Multiresolution-based potential characters detection Laplacian-of-Guassian Edge Detection Dice image/edges into Patches – Combine patches with similar properties into regions – Obtain bounding box of region as candidate text – Properties include: Mean Variance Intensity
71
Multiresolution-based potential characters detection
77
Problems with Current Approach Too much “bleeding” Unstable edge-data due to unpredictability of location of edge patch relative to edge itself
78
New Approach Each edge pixel gets an N x N edge patch (eg. 3x3) Edge patches overlap – Tighter boundary boxes – More region consistency – More robust to resolution changes – Able to use tighter thresholds
79
New Approach
91
New Challenges!
92
Text Detection Problem #1 How do I know that two regions are close enough together that they might be part of the same character? Center of bounding box? Moment of regions? Nearest Neighbor? Connectedness? All have severe weaknesses
93
Text Detection Problem #2 How do I know that two characters are close enough to be considered a part of the same word? Easier version of the last problem, but still hard!
94
CHARACTER/LAYOUT GEOMETRY AND COLOR PROPERTIES ANALYSIS
95
STR Implementation STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution- based potential characters detection Character/layout geometry and color properties analysis Local affine rectification Refined Detection
96
Color Properties Analysis Implemented Gaussian Mixture Model (GMM) to obtain μ and σ of foreground/background for: R/G/B/H/I Calculated Confidences that component (RGBHI) can be used to recognize characters Multiresolution- based potential characters detection Character/layout geometry and color properties analysis Local affine rectification Refined Detection
97
Color Properties Analysis Assumed Invariant: High contrast between foreground/background of characters in sign Choose the channel (R/G/B/H/I) that is best suited for use with character recognition
98
Original
99
Green
100
Blue
101
Hue
102
Intensity
103
Mistake: This should only be done on individual characters, not words
104
Color Analysis: Evaluation Highest confidence observed to be channel best suited for OCR… …Did I just say OCR? YES! (I did.)
105
OPTICAL CHARACTER RECOGNITION A second shot at
107
Refined Detection Generate alphabet templates in different fonts Resize templates; Divide into grid Apply several 2D Gabor filters to each grid patch – Different orientations, frequencies, variances – For each pixel, yields real/imaginary component of transformation Feed data into Linear Discriminant Analysis – Reduces features and forms classifier at same time
108
2D Gabor Filter Convolution of Gaussian x Sine wave
109
Live Demonstration Training Classification
110
Thank You!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.