Augmenting the Generalized Hough Transform to Enable the Mining of Petroglyphs Qiang Zhu, Xiaoyue Wang, Eamonn Keogh, 1 Sang-Hee Lee Dept. Of Computer.

Slides:



Advertisements
Similar presentations
Aggregating local image descriptors into compact codes
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Mining Mouse Vocalizations Jesin Zakaria Department of Computer Science and Engineering University of California Riverside.
Carolina Galleguillos, Brian McFee, Serge Belongie, Gert Lanckriet Computer Science and Engineering Department Electrical and Computer Engineering Department.
Image Analysis Phases Image pre-processing –Noise suppression, linear and non-linear filters, deconvolution, etc. Image segmentation –Detection of objects.
Database-Based Hand Pose Estimation CSE 6367 – Computer Vision Vassilis Athitsos University of Texas at Arlington.
Image Indexing and Retrieval using Moment Invariants Imran Ahmad School of Computer Science University of Windsor – Canada.
Mining Time Series.
Segmentation (2): edge detection
Locally Constraint Support Vector Clustering
1 Manifold Clustering of Shapes Dragomir Yankov, Eamonn Keogh Dept. of Computer Science & Eng. University of California Riverside.
1Ellen L. Walker Segmentation Separating “content” from background Separating image into parts corresponding to “real” objects Complete segmentation Each.
© University of Minnesota Data Mining for the Discovery of Ocean Climate Indices 1 CSci 8980: Data Mining (Fall 2002) Vipin Kumar Army High Performance.
Themis Palpanas1 VLDB - Aug 2004 Fair Use Agreement This agreement covers the use of all slides on this CD-Rom, please read carefully. You may freely use.
A Study of Approaches for Object Recognition
Augmenting the Generalized Hough Transform to Enable the Mining of Petroglyphs Qiang Zhu, Xiaoyue Wang, Eamonn Keogh, 1 Sang-Hee Lee Dept. Of Computer.
Robust Real-time Object Detection by Paul Viola and Michael Jones ICCV 2001 Workshop on Statistical and Computation Theories of Vision Presentation by.
Reduced Support Vector Machine
Probabilistic Similarity Search for Uncertain Time Series Presented by CAO Chen 21 st Feb, 2011.
Pores and Ridges: High- Resolution Fingerprint Matching Using Level 3 Features Anil K. Jain Yi Chen Meltem Demirkus.
Finding Time Series Motifs on Disk-Resident Data
Cluster Analysis (1).
Detecting Time Series Motifs Under
A Multiresolution Symbolic Representation of Time Series
Face Processing System Presented by: Harvest Jang Group meeting Fall 2002.
1 Dot Plots For Time Series Analysis Dragomir Yankov, Eamonn Keogh, Stefano Lonardi Dept. of Computer Science & Eng. University of California Riverside.
כמה מהתעשייה? מבנה הקורס השתנה Computer vision.
Principal Component Analysis (PCA) for Clustering Gene Expression Data K. Y. Yeung and W. L. Ruzzo.
Exact Indexing of Dynamic Time Warping
L. Padmasree Vamshi Ambati J. Anand Chandulal J. Anand Chandulal M. Sreenivasa Rao M. Sreenivasa Rao Signature Based Duplicate Detection in Digital Libraries.
FEATURE EXTRACTION FOR JAVA CHARACTER RECOGNITION Rudy Adipranata, Liliana, Meiliana Indrawijaya, Gregorius Satia Budhi Informatics Department, Petra Christian.
Graph-based Segmentation. Main Ideas Convert image into a graph Vertices for the pixels Vertices for the pixels Edges between the pixels Edges between.
An efficient method of license plate location Pattern Recognition Letters 26 (2005) Journal of Electronic Imaging 11(4), (October 2002)
Mining Time Series.
80 million tiny images: a large dataset for non-parametric object and scene recognition CS 4763 Multimedia Systems Spring 2008.
Ground Truth Free Evaluation of Segment Based Maps Rolf Lakaemper Temple University, Philadelphia,PA,USA.
Abdullah Mueen Eamonn Keogh University of California, Riverside.
Semi-Supervised Time Series Classification & DTW-D REPORTED BY WANG YAWEN.
CS654: Digital Image Analysis Lecture 25: Hough Transform Slide credits: Guillermo Sapiro, Mubarak Shah, Derek Hoiem.
Paired Sampling in Density-Sensitive Active Learning Pinar Donmez joint work with Jaime G. Carbonell Language Technologies Institute School of Computer.
Click to edit Master subtitle style 2/23/10 Time and Space Optimization of Document Content Classifiers Dawei Yin, Henry S. Baird, and Chang An Computer.
2005/12/021 Fast Image Retrieval Using Low Frequency DCT Coefficients Dept. of Computer Engineering Tatung University Presenter: Yo-Ping Huang ( 黃有評 )
Exact indexing of Dynamic Time Warping
A B C D E F A ABSTRACT A novel, efficient, robust, feature-based algorithm is presented for intramodality and multimodality medical image registration.
Multiple Instance Learning for Sparse Positive Bags Razvan C. Bunescu Machine Learning Group Department of Computer Sciences University of Texas at Austin.
Intelligent Database Systems Lab N.Y.U.S.T. I. M. Mining massive document collections by the WEBSOM method Presenter : Yu-hui Huang Authors :Krista Lagus,
Accelerating Dynamic Time Warping Clustering with a Novel Admissible Pruning Strategy Nurjahan BegumLiudmila Ulanova Jun Wang 1 Eamonn Keogh University.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Fast Query-Optimized Kernel Machine Classification Via Incremental Approximate Nearest Support Vectors by Dennis DeCoste and Dominic Mazzoni International.
Feature Selction for SVMs J. Weston et al., NIPS 2000 오장민 (2000/01/04) Second reference : Mark A. Holl, Correlation-based Feature Selection for Machine.
Parameter Reduction for Density-based Clustering on Large Data Sets Elizabeth Wang.
Incremental Reduced Support Vector Machines Yuh-Jye Lee, Hung-Yi Lo and Su-Yun Huang National Taiwan University of Science and Technology and Institute.
1 A Statistical Matching Method in Wavelet Domain for Handwritten Character Recognition Presented by Te-Wei Chiang July, 2005.
Hough Transform CS 691 E Spring Outline Hough transform Homography Reading: FP Chapter 15.1 (text) Some slides from Lazebnik.
Feature learning for multivariate time series classification Mustafa Gokce Baydogan * George Runger * Eugene Tuv † * Arizona State University † Intel Corporation.
Experience Report: System Log Analysis for Anomaly Detection
Bag-of-Visual-Words Based Feature Extraction
Data Mining K-means Algorithm
Supervised Time Series Pattern Discovery through Local Importance
Fast Preprocessing for Robust Face Sketch Synthesis
A Time Series Representation Framework Based on Learned Patterns
Spatio-temporal Pattern Queries
Efficient Subgraph Similarity All-Matching
Outline Background Motivation Proposed Model Experimental Results
CSE 554 Lecture 3: Shape Analysis (Part II)
Outline Announcement Perceptual organization, grouping, and segmentation Hough transform Read Chapter 17 of the textbook File: week14-m.ppt.
Topological Signatures For Fast Mobility Analysis
CSE572: Data Mining by H. Liu
Physics-guided machine learning for milling stability:
Liang Jin (UC Irvine) Nick Koudas (AT&T Labs Research)
Presentation transcript:

Augmenting the Generalized Hough Transform to Enable the Mining of Petroglyphs Qiang Zhu, Xiaoyue Wang, Eamonn Keogh, 1 Sang-Hee Lee Dept. Of Computer Science & Eng., 1 Dept. of Anthropology University of California, Riverside

Outline  Motivation  Approach  Evaluation  Conclusion

Motivation(1)-applications Petroglyphs are one of the earliest expressions of abstract thinking. Providing a rich source of information:  climate change  existence of a certain species  patterns of human’s migrations and interactions

Motivation(2)-difficulties Progress in petroglyph research has been frustratingly slow.  due to their extraordinarily diverse and complex structure  most matching algorithms can not capture the similarity of petroglyphs  for those that can, even in limited cases, do not scale to large collections

Approach How to preprocess the raw data? How to define the distance measure? How to speed up?

Preprocessing(1) With rare exceptions, petroglyphs do not lend themselves to automatic extraction with segmentation algorithms. The border of this rock may be recognized as the edge of this petroglyph

PetroAnnotator Load the raw image into our human computation tool

PetroAnnotator (cont.) Draw an approximate boundary around object, and then trace the shape

Preprocessing(2)-downsampling A (A)Two overlaid skeleton traces (340 by 250) of the same image of a Bighorn sheep. Less than 3.5% of the pixels from each image overlap. (B) The same two images after downsampling (30 by 23). 75.6% of the pixels (denoted by black) are common to both.

Distance Measure-why GHT ? essentially makes no assumption about the data  open/closed boundaries  connected/disconnected shapes correctly captures the similarity  subjective/objective similarity on unlabeled/labeled datasets tightly lower bound the distance  allowing for very efficient searches in large datasets

Classic GHT GHT is a useful method for two dimensional arbitrary shape detection. QC

(1) Find the “star-pattern” R R

(2) Superimpose & Accumulate A C

(3) Find the “peak” C Q R R’ A

A Basic Distance Measure Classic GHT doesn’t explicitly encode a similarity measure We can simply define a GHT-based distance: minimal unmatched edge points (MUE) = number of edge points in Q – maximal matched edge points = 4 – 3 = 1 (for our toy example)

A New Cell Incrementation Strategy When can we obtain the value of a particular cell in the accumulator?  In the classic GHT, until the end of all incrementation  Is it possible to obtain the value one by one?  Need to check all positions that are possible to increase the cell value QC ?

Lower Bound In this column Q needs 2 pixels in C, and has 3 In this column Q needs 2 pixels in C, and has 2 In this column Q needs 4 pixels in C, and has only 2 In this column Q needs 2 pixels in C, and has 2 In this column Q needs 2 pixels in C, and has 3 QC SigQx = SigCx = ? 0 Minimal missed points: = 2 ? ? ? ? ? ? ? ?

Time Complexity Classic GHT  O(N Q ×N C +S 2 )  superimpose all query vectors to all edge points in the candidate image Lower bound GHT  O(S 2 )  compare one-dimensional signatures  further reduced by early abandon and shifting order  one to two orders of magnitude speed-up

Variants on the Basic Distance Measure Query-by-Content: Clustering: Finding Motifs:

Evaluation We performed three sets of experiments :  Evaluation of Utility -on unlabeled data  Evaluation of Accuracy -on labeled data  Evaluation of Scalability -on synthetic data

Evaluation of Utility (1) Atlatls Anthropomorphs Bighorn Sheep (1)Our GHT-based distance measure correctly groups all seven pairs (2)The higher level structure of the dendrogram also correctly groups similar petroglyphs A clustering of typical Southwestern USA petroglyphs

Evaluation of Utility (2) abcdef g h SC WY

Evaluation of Utility (3) Whether our distance measure can find meaningful motifs?  2,852 real petroglyphs  4,065,526 possible pairs  52 top motifs ( %) by motif cutoff Motif Cutoff

Evaluation of Accuracy-datasets NicIcon dataset  24,441 images  14 categories  33 volunteers  234×234 pixels  WD/WI tests Farsi digits dataset  From 11,942 registration forms  60,000 digits for training  20,000 digits for testing  54×64 pixels (largest MBR)

(1) Test the Downsampling Size Resolution (R×R) of Downsampled Images (NicIcon) Error Rate (%) 5 WI WD Resolution (R×R) of Downsampled Images (Farsi) Error Rate (%) In both datasets, the error rate of one-nearest- neighbor test varies little once the resolution is greater than 10×10

(2) Competitive accuracy NicIcon dataset  Error rate for WD: 4.78%  8.46% for WI  The dataset creators tested on the online data using three classifiers.  Only one of them (DTWB) is better, however, slower Farsi digits dataset  Error rate: 4.54%  Borji et al. performed extensive empirical tests on this dataset  Of the twenty reported error rates, the mean was 8.69%  Only four beat our approach, but need to set at least six parameters

Evaluation of Scalability-datasets We made 8 synthetic petroglyph datasets  Based on 22 classic petroglyphs  Duplicated by 10 volunteers on a tablet  Applied a Random Polynomial Transformation  Containing up to 1,280,000 objects

(1) Querying by Content Leave-one-out one-nearest-neighbor test. Repeated the test for 10 times on each dataset. 10K 20K40K80K160K320K640K1280K Size of Synthetic Petroglyphs Datasets Prune Rate (%) Max Prune Rate Avg Prune Rate Min Prune Rate 10K20K40K80K160K320K640K1280K Size of Synthetic Petroglyphs Datasets % to Brute Force Time

(2) Finding Motifs  A brute force algorithm requires time quadratic in the size of dataset.  By using the triangular inequality of our distance measure, we only need to calculate a tiny fraction of the exact distance.  Even for the smallest dataset: -our algorithm is 712 times faster -we can prune 99.84% of the calculations 10K20K40K80K160K320K640K1280K Size of Synthetic Petroglyphs Datasets Speed Up (times)

Conclusion In this work we considered, for the first time, the problem of mining large collections of rock art.  Introduced a novel distance measure  Found an efficiently computable tight lower bound to this measure  Enabled mining large data archives effectively

Thanks for your listening ! All datasets and the code can be downloaded from:

Preprocessing With rare exceptions, petroglyphs do not lend themselves to automatic extraction with segmentation algorithms. Cracks in the rock are more “significant” than the actual edges

Preprocessing-existing archives There are several other rich sources of rock art data to be mined, e.g.: sketches by anthropologists From a scanned book Downsampled Binarized Thinned

By HausdroffBy GHT Experiment testing the impact of noise, a single dot is randomly added