Presentation is loading. Please wait.

Presentation is loading. Please wait.

Spatial Business Detection and Recognition from Images Alexander Darino.

Similar presentations


Presentation on theme: "Spatial Business Detection and Recognition from Images Alexander Darino."— Presentation transcript:

1 Spatial Business Detection and Recognition from Images Alexander Darino

2 Outline Project Overview

3 PROJECT OVERVIEW Previous Work Project Objective Anticipated End Result Project Pipeline

4 Previous Work: Where Am I? ImageWhere Am I? Latitude, Longitude

5 Project Objective Given: – Image – Geolocation Yield: – Spatial Identification of Businesses in Image – Addresses of Businesses in Image – Information about Businesses in Image Ex. Reviews, Categories, Phone Number, etc.

6 Project Objective Given: – Image – Geolocation Yield: – Spatial Identification of Businesses in Image – Addresses of Businesses in Image – Information about Businesses in Image Ex. Reviews, Categories, Phone Number, etc.

7 Project Pipeline Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Text Extraction Detected Text Business Name Matching Business Identification Business Spatial Detection 7

8 Anticipated End Result

9 BUSINESS SEARCHING Obtaining a List of Candidate Businesses in Image via

10 Business Searching Latitude Longitude Latitude Longitude Geocoding Reverse Geocoding Reverse Geocoding Nearby Businesses Image Text Extraction Detected Text Business Name Matching Business Identification Business Spatial Detection 10

11 Business Searching Business Search Services – Google – Yelp – CityGrid (Supplier for Yellow Pages, Super Pages) REST-based API Results in JSON or XML format Aggregate Results into Facade

12 {'businesses': [{'address1': '466 Haight St', 'address2': '', 'address3': '', 'avg_rating': 4.0, 'categories': [{'category_filter': 'danceclubs', 'name': 'Dance Clubs', 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=dance clubs'}, {'category_filter': 'lounges', 'name': 'Lounges', 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=loung es'}, {'category_filter': 'tradamerican', 'name': 'American (Traditional)', 'search_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA&cflt=trada merican'}], 'city': 'San Francisco', 'distance': 1.8780401945114136, 'id': 'yyqwqfgn1ZmbQYNbl7s5sQ', 'is_closed': False, 'latitude': 37.772201000000003, 'longitude': -122.42992599999999, 'mobile_url': 'http://mobile.yelp.com/biz/yyqwqfgn1ZmbQYNbl7s5sQ', 'name': 'Nickies', 'nearby_url': 'http://yelp.com/search?find_loc=466+Haight+St%2C+San+Francisco%2C+CA', 'neighborhoods': [{'name': 'Hayes Valley', 'url': 'http://yelp.com/search?find_loc=Hayes+Valley%

13 Business Searching: Results 40.441127247181797 -80.002821624487595 Denham & Company Salon Ullrich's Shoe Repairing Nicholas Coffee Co Bella Sera On the Square A & J Ribs Starbucks Coffee Jenny Lee Bakery Galardi's 30 Minute Cleaners Jimmy John's Gourmet Sandwiches Charley's Grilled Subs Fresh Corner Lagondola Pizzeria & Restaurant Camera Repair Service Inc Pittsburgh Cigar Bar Original Oyster House MixStirs 1902 Tavern Costanzo's Pittsburgh Silver Llc Graeme St Galardi's 30 Minute Cleaners Denham & Co Salon Bruegger's Bagel Bakery Nicholas Coffee Co Market Square Fat Tommy's Pizzeria Mixstirs Cafe Giggles Rycon Construction Inc Garbera, Dennis C, Dds - Emmert Dental Assoc Bella Sera on the Square Mancini's Bread Co Las Velas Ciao Baby Washington Reprographics Inc Highmark Life Insurance Co Fischer, Donald R, Md - Highmark Life Insurance Co Jimmy John's Lynx Energy Partners Inc Emmert Dental Assoc

14 Business Searching: Evaluation Strengths – Aggregated results almost always found Business of interest Weaknesses – Each API limits query result set size - this is why we aggregate – Only businesses listed – Not all businesses listed Limitations – Dependent on well-populated, accurate Business Directories – Have only tested for 15 Pittsburgh images - unknown result quality for rural areas.

15 EXTRACTING IDENTIFYING TEXT Obtaining names of Businesses in Image by

16 Extracting Identifying Text Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Text Extraction Detected Text Business Name Matching Business Identification Business Spatial Detection 16

17 Extracting Identifying Text: OCR Used Two OCR APIs: – GNU OCR (Ocrad) – GOCR OCR APIs highly sensitive to: – Font (only works well with roman font) – Perspective – Scale – Binarization Threshold – Dark on Light vs. Light on Dark (inversion)

18 Extracting Identifying Text: OCR OCR API evaluations – Ocrad - could not yield any meaningful data across over 200 scale/threshold/inversion combinations – GOCR - produced good results across 10 scales with and without inversion using threshold automatically determined by Otsu's method 98% of Results are garbage! Examples of GOCR output (next slides)

19 Extracting Identifying Text: OCR

20 n. c.......o.a...u..............oU..D.oa..e......_RuEGGE..KERy..J...w...........L........M. II.....c.....i.......l..J.t...llt...l SHA.P. It..tllt.........._. l...J y. _.c _.... _tt.._....t.._.r.........t.t_t.._.._.l.. J.r.r.I.

21 Extracting Identifying Text: OCR

22 u..........._nq......eoR.E.l.e...í....e...n....n....n.e.R.E...e....o. _....E.R.E.IKE........I.ltlO.........rE..o......E.....I.K.E.o..... J.n....c...E.R.E.I.E.......M..E.R.E...E...a J...Gu. ge..ge E.F.._.....E..gE.D... fUlI..lll.lll.IIi.l..Xl..

23 Extracting Identifying Text: OCR

24 ..e_..w.. _......D.........u J.....J.................n......n..........n _..r.l_d..J.ec.m._..n.......J.n.._...tn.. ct..._.................D.u.v... e. n.... u.. Y.._ w. n. n....Jn.......G..o..r..._........J...m l.t..l.tt.l.._w....................._....l....t........j..i lI.i..

25 Extracting Identifying Text: OCR

26 __. ncu_.l..._..._J...ne......._n._..v.....ra......d_..._............. i..n..U ll REsT.unAN...r. c.....r...T t.rJll......m...c.....n..........J n. I..c...r.r ESTAU.ANT.r.O....c.cc. Note: Even though "Tambellini" is a roman font, it is too stretched to be picked up by GOCR

27 OCR Evaluation Strengths – Applicable to expected input of orthogonal images Weaknesses – Only works well(-ish) for strictly roman font Limitations – Will perform poorly for artistic fonts and business signs Conclusion – By itself, OCR is not the best approach towards Business identification Reasons: poor recognition, franchises, perspective, etc

28 BUSINESS NAME MATCHING Matching Identifying Text to Candidate Business Names via

29 Business Name Matching Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Text Extraction Detected Text Business Name Matching Business Identification Business Spatial Detection 29

30 Business Name Matching Given: Unreliable fragments of ‘detected text’ Yield: Matching Business Names Process: – Filter input: trimming, uselessness (< 2 letters) – Fuzzy String Matching – Voting Scheme: confidence of business appearing in image

31 Business Name Detection 31

32 Business Name Matching Developed Confidence Attribution Algorithm – Confidence of OCR Token being Name Token Example: Confidence of “ESTUANT” representing “RESTAURANT” Point-based system – Confidence of Name appearing in Image Sum of points of matching OCR Text Use logarithmically-normalized points to determine business inclusion threshold 32

33 Business Name Matching 33

34 34

35 Business Name Matching 35

36 36

37 Business Name Matching 37

38 Business Name Matching 38

39 Business Name Matching 39 Note: This originally did not appear because it did not exceed the confidence threshold. It now appears because it contributes to the Business Name Identification

40 SPATIAL BUSINESS IDENTIFICATION Isolating Identified Images in Image via

41 Business Spatial Identification Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses ImageOCR Detected Text Business Name Matching Business Identification Business Spatial Detection Business Spatial Detection 41

42 Business Spatial Identification 42

43 Business Spatial Identification 43 Aiken George S Co Category:Food, Grocery Address:218 Forbes Ave Pittsburgh, PA 15222 Phone: (412) 391-6358 Rating: 4.5/5 (2 Reviews) Category:Food, Grocery Address:218 Forbes Ave Pittsburgh, PA 15222 Phone: (412) 391-6358 Rating: 4.5/5 (2 Reviews)

44 Business Spatial Identification 44

45 Business Spatial Identification 45

46 Business Spatial Identification 46 Bruegger's Bagels Category:Bagels Address:Market Sq Pittsburgh, PA 15222 Phone: (412) 281-2515 Rating: Not Rated Category:Bagels Address:Market Sq Pittsburgh, PA 15222 Phone: (412) 281-2515 Rating: Not Rated

47 V0.1: EVALUATION

48 Current Approach Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses ImageOCR Detected Text Business Name Matching Business Identification Business Spatial Detection 48

49 Weaknesses to Current Approach Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image OCR Detected Text Business Name Matching Business Identification Business Spatial Detection 49

50 Weaknesses to Current Approach Lots of Garbage 50

51 Weaknesses to Current Approach Fragmented Word Detection 51

52 Weaknesses to Current Approach Fails with non-orthogonal perspective 52 Did I already mention lots of garbage?

53 Weaknesses to Current Approach Fails with non-roman text Not scale-invariant 53

54 ALTERNATIVES TO OCR 54

55 Alternative #1: Image Matching Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Match to Storefront Image Business Identification Business Spatial Detection 55

56 Alternative #1: Image Matching 56

57 Alternative #1: Evaluation Weaknesses: – Low Availability of Storefront Images (< 50% Avg) George Aiken area businesses with photos: 18/35 Brueggers area businesses with photos: 22/40 Tambellini area businesses with photos: 8/22 – Available Images too small (100 x 100) – Computationally Expensive Conclusion: Not a viable solution

58 Alternative #2: Template Matching Tambellini 58

59 Alternative #2: Template Matching Latitude Longitude Geocoding Reverse Geocoding Nearby Businesses Image Render Templates of Business Names in Different Fonts Business Spatial Detection Image Matching (eg. SIFT, HAAR) Image Matching (eg. SIFT, HAAR) Template Images Business Identification 59

60 Alternative #2: Template Matching OCR Not Scale Invariant Unbounded Search Fragmented Recognition Roman-only font Alternative #2 Scale Invariant Bounded Search Whole-word recognition All fonts 60

61

62

63 Subsequent Attempts

64 Alternative #3: Scene Text Recognition State of the Art: – STR ≠ OCR – Far superior to our ‘naïve’ approaches to STR (ie. OCR, Image matching, SIFT) OCR only works for highly controlled environments STR works for unconditioned environments – Scale invariant – Color/intensity invariant – Lexicon-Assisted

65 Alternative: Scene Text Recognition No STR implementations readily available Have contacted several groups specialized in STR – unable to assist us in providing implementation for research purposes Had to resort to implement STR from scratch

66 SCENE TEXT RECOGNITION The long and perilous journey of implementing

67 STR Implementation STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution- based potential characters detection Character/layout geometry and color properties analysis Local affine rectification Refined Detection

68 MULTIRESOLUTION-BASED POTENTIAL CHARACTERS DETECTION Candidate Text Detection via

69 STR Implementation STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution- based potential characters detection Character/layout geometry and color properties analysis Local affine rectification Refined Detection

70 Multiresolution-based potential characters detection Laplacian-of-Guassian Edge Detection Dice image/edges into Patches – Combine patches with similar properties into regions – Obtain bounding box of region as candidate text – Properties include: Mean Variance Intensity

71 Multiresolution-based potential characters detection

72

73

74

75

76

77 Problems with Current Approach Too much “bleeding” Unstable edge-data due to unpredictability of location of edge patch relative to edge itself

78 New Approach Each edge pixel gets an N x N edge patch (eg. 3x3) Edge patches overlap – Tighter boundary boxes – More region consistency – More robust to resolution changes – Able to use tighter thresholds

79 New Approach

80

81

82

83

84

85

86

87

88

89

90

91 New Challenges!

92 Text Detection Problem #1 How do I know that two regions are close enough together that they might be part of the same character? Center of bounding box? Moment of regions? Nearest Neighbor? Connectedness? All have severe weaknesses

93 Text Detection Problem #2 How do I know that two characters are close enough to be considered a part of the same word? Easier version of the last problem, but still hard!

94 CHARACTER/LAYOUT GEOMETRY AND COLOR PROPERTIES ANALYSIS

95 STR Implementation STR Implementation: “Automatic Detection and Recognition of Signs From Natural Scenes” Multiresolution- based potential characters detection Character/layout geometry and color properties analysis Local affine rectification Refined Detection

96 Color Properties Analysis Implemented Gaussian Mixture Model (GMM) to obtain μ and σ of foreground/background for: R/G/B/H/I Calculated Confidences that component (RGBHI) can be used to recognize characters Multiresolution- based potential characters detection Character/layout geometry and color properties analysis Local affine rectification Refined Detection

97 Color Properties Analysis Assumed Invariant: High contrast between foreground/background of characters in sign Choose the channel (R/G/B/H/I) that is best suited for use with character recognition

98 Original

99 Green

100 Blue

101 Hue

102 Intensity

103 Mistake: This should only be done on individual characters, not words

104 Color Analysis: Evaluation Highest confidence observed to be channel best suited for OCR… …Did I just say OCR? YES! (I did.)

105 OPTICAL CHARACTER RECOGNITION A second shot at

106

107 Refined Detection Generate alphabet templates in different fonts Resize templates; Divide into grid Apply several 2D Gabor filters to each grid patch – Different orientations, frequencies, variances – For each pixel, yields real/imaginary component of transformation Feed data into Linear Discriminant Analysis – Reduces features and forms classifier at same time

108 2D Gabor Filter Convolution of Gaussian x Sine wave

109 Live Demonstration Training Classification

110 Thank You!


Download ppt "Spatial Business Detection and Recognition from Images Alexander Darino."

Similar presentations


Ads by Google