Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

Dynamic View Selection for Time-Varying Volumes Guangfeng Ji* and Han-Wei Shen The Ohio State University *Now at Vital Images.
Automatic Color Gamut Calibration Cristobal Alvarez-Russell Michael Novitzky Phillip Marks.
Image Segmentation Longin Jan Latecki CIS 601. Image Segmentation Segmentation divides an image into its constituent regions or objects. Segmentation.
The A-tree: An Index Structure for High-dimensional Spaces Using Relative Approximation Yasushi Sakurai (NTT Cyber Space Laboratories) Masatoshi Yoshikawa.
Mustafa Cayci INFS 795 An Evaluation on Feature Selection for Text Clustering.
San Jose State University Engineering 101 JKA & KY.
1.  Texturing is a core process for modeling surface details in computer graphics applications › Texture mapping › Surface texture synthesis › Procedural.
3D Skeletons Using Graphics Hardware Jonathan Bilodeau Chris Niski.
University at BuffaloThe State University of New York WaveCluster A multi-resolution clustering approach qApply wavelet transformation to the feature space.
כמה מהתעשייה? מבנה הקורס השתנה Computer vision.
Frequency Distributions and Graphs
HPDC 2014 Supporting Correlation Analysis on Scientific Datasets in Parallel and Distributed Settings Yu Su*, Gagan Agrawal*, Jonathan Woodring # Ayan.
Light-Weight Data Management Solutions for Scientific Datasets Gagan Agrawal, Yu Su Ohio State Jonathan Woodring, LANL.
Some Shape Descriptors for 3D Visual Objects
HPDC 2013 Taming Massive Distributed Datasets: Data Sampling Using Bitmap Indices Yu Su*, Gagan Agrawal*, Jonathan Woodring # Kary Myers #, Joanne Wendelberger.
December 9, 2014Computer Vision Lecture 23: Motion Analysis 1 Now we will talk about… Motion Analysis.
Efficient Local Statistical Analysis via Integral Histograms with Discrete Wavelet Transform Teng-Yok Lee & Han-Wei Shen IEEE SciVis ’13Uncertainty & Multivariate.
Synthesizing Natural Textures Michael Ashikhmin University of Utah.
Hough Transform Procedure to find a shape in an image Shape can be described in parametric form Shapes in image correspond to a family of parametric solutions.
A Multiresolution Symbolic Representation of Time Series Vasileios Megalooikonomou Qiang Wang Guo Li Christos Faloutsos Presented by Rui Li.
Detecting Image Features: Corner. Corners Given an image, denote the image gradient. C is symmetric with two positive eigenvalues. The eigenvalues give.
Packet Classification Using Multi- Iteration RFC Author: Chun-Hui Tsai, Hung-Mao Chu, Pi-Chung Wang Publisher: 2013 IEEE 37th Annual Computer Software.
A Self-organizing Semantic Map for Information Retrieval Xia Lin, Dagobert Soergel, Gary Marchionini presented by Yi-Ting.
南台科技大學 資訊工程系 Region partition and feature matching based color recognition of tongue image 指導教授:李育強 報告者 :楊智雁 日期 : 2010/04/19 Pattern Recognition Letters,
Date of download: 10/2/2017 Copyright © ASME. All rights reserved.
Data Transformation: Normalization
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
ChaNGa: Design Issues in High Performance Cosmology
Graphing.
From: What are the units of storage in visual working memory?
Fast Kernel-Density-Based Classification and Clustering Using P-Trees
Efficient Image Classification on Vertically Decomposed Data
Fitting: The Hough transform
Nonparametric Semantic Segmentation
Sameh Shohdy, Yu Su, and Gagan Agrawal
3.1 Clustering Finding a good clustering of the points is a fundamental issue in computing a representative simplicial complex. Mapper does not place any.
Pejman Tahmasebi and Jef Caers
Other Algorithms Follow Up
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Efficient Image Classification on Vertically Decomposed Data
K Nearest Neighbor Classification
6. Introduction to nonparametric clustering
Homogeneity Guided Probabilistic Data Summaries for Analysis and Visualization of Large-Scale Data Sets Ohio State University (Shen) Problem: Existing.
Divide Areas Algorithm For Optimal Multi-Robot Coverage Path Planning
Igor V. Cadez, Padhraic Smyth, Geoff J. Mclachlan, Christine and E
Volume 3, Issue 1, Pages (July 2016)
Volume 104, Issue 2, Pages (January 2013)
Photo by Carl Warner.
With Age Comes Representational Wisdom in Social Signals
Cooperative Nonlinearities in Auditory Cortical Neurons
3. Brute Force Selection sort Brute-Force string matching
Dynamic Processes Shape Spatiotemporal Properties of Retinal Waves
Anastasia Baryshnikova  Cell Systems 
DATABASE HISTOGRAMS E0 261 Jayant Haritsa
Data Transformations targeted at minimizing experimental variance
3. Brute Force Selection sort Brute-Force string matching
Experience-Dependent Asymmetric Shape of Hippocampal Receptive Fields
The Normal Curve Section 7.1 & 7.2.
Nonparametric density estimation and classification
Matlab Training Session 2: Matrix Operations and Relational Operators
Elena A. Allen, Erik B. Erhardt, Vince D. Calhoun  Neuron 
Stephen V. David, Benjamin Y. Hayden, James A. Mazer, Jack L. Gallant 
Range Likelihood Tree: A Compact and Effective Representation for Visual Exploration of Uncertain Data Sets Ohio State University (Shen) Problem: Uncertainty.
A Data Partitioning Scheme for Spatial Regression
Spatial statistics of X-ray volumes reveal layering and spatially diverse distribution of cell bodies. Spatial statistics of X-ray volumes reveal layering.
Data exploration and visualization
Normal Probability Distribution Lecture 1 Sections: 6.1 – 6.2
3. Brute Force Selection sort Brute-Force string matching
Sampling Plans.
Presentation transcript:

Efficient Distribution-based Feature Search in Multi-field Datasets Ohio State University (Shen) Problem: How to efficiently search for distribution-based (histogram-based) features in multi-field datasets without scanning through the entire data space? How to overcome the curse of dimensionality when searching for the feature defined by a joint histogram? How to efficiently search for large-sized features defined by local histograms with large spatial neighborhoods? Solution: Utilize bitmap indexing to reduce the search space and apply local voting scheme to construct a local histogram in an inverse way. Compare multiple bins between two histograms in one pass, instead of comparing individual bins iteratively. Apply stratified sampling to quickly approximate initial result and use a low-cost refinement to get final result. Problem: Distribution-based (histogram-based) features has been utilized in many volume analysis and visualization applications. However, local histogram computation and matching is difficult in multi-field dataset due to the high computational cost. 1. It’s infeasible to scan through the entire data space and compute and compare local histogram in each location. 2. The number of histogram bins increases exponentially as the number of fields or dimensions increases, which requires a large amount of bin-by-bin comparison. 3. The high computation cost when searching for large-sized feature, which is defined by a large neighborhood-sized histogram. Proposed Solution: We utilizing bitmap indexing to reduce the search space from the entire space domain to the voxels whose values fall into the user-defined value range and their neighborhood voxels. Then apply the local deposit approach to construct a local histogram in an inverse way. In the multi-field feature search cases, we proposed two complementary algorithms for accelerating local distribution searches. Both algorithms first approximate a search result, and then use a low-cost refinement step to generate the final search result. The first approach is merged-bin comparison (MBC). Instead of comparing individual bins iteratively, we compare multiple histogram bins between two histograms in one pass. Utilizing a property of distance measures, our approximate search result from MBC has no false negatives so that the refinement process only needs to remove the false positives to generate the final result. The second approach is called sampled-active-voxels (SAV). This utilizes stratified sampling to quickly generate approximate initial results, which are close to the final results when compared to simple random sampling. So the cost of refinement can thus be reduced. Figure A shows an example of local deposit procedure with setting neighborhood size to 3×3. In the left figure, the blue points represent the voxels within the user-defined value range. Each blue point deposits one count to its neighborhood deposit buffer. The number label on each voxel represents the accumulated deposit counts after performing local deposit. Voxels without numbers mean that the count is zero. Right figure shows the distance of bink between the target histogram with frequency 3 and the local histogram in the search region (red circles). Figure B shows an example of group selection for MBC algorithm. All grids represent the 2D joint histogram of a target feature. The number in each cell is the value frequency for that bin. Bins of the same color have been grouped together. We group bins by target frequency first (for frequency 5, 15, 20) and then separate the group with large number of bins (the group with frequency 5 in this example ) into two groups by bin indices. Figure C shows the concept of the stratified sampling, which draws samples evenly from each space partition and draws samples evenly from each bin. B. An example of group selection for MBC A: An example of local deposit procedure C. An example of stratified sampling

D: The performance speedup gained by our MBC approach E: The performance speedup gained by our SAV approach (a) (b) (c) (d) (e) Figure D shows the performance speedup gained by our MBC approach. The left figure shows the individual performance speed-up for all test cases, where the x-axis is the number of bins reduced by MBC. The right figure shows the average performance gain after grouping the test cases by data set and by the number of fields searched. Figure E shows the performance speedup gained by our SAV method using 20% sampling. The left figure shows all test cases where the x-axis is the percentage of total data points of the corresponding dataset. The right figure shows the average performance after grouping the test cases by dataset and the number of fields. Figure F shows an usage scenario - applying local distribution searches for turbine flow stability analysis. (a) Selection of a stall cell region (the red sphere) close to the blade tip of passage 21. (b) Joint distribution of pressure and entropy of the selected region. (c) Search result with only the pressure distribution. (d) Search result with only the entropy distribution. (e) Search result of the joint distribution of pressure and entropy. F: Local distribution searches for turbine flow stability analysis