Download presentation
Presentation is loading. Please wait.
Published byAldous Preston Modified over 8 years ago
1
Indexing and Visualizing Multidimensional Data I. Csabai, M. Trencséni, L. Dobos, G. Herczegh, P. Józsa, N. Purger Eötvös University,Budapest
2
How to handle data Flat files PlotFortran/C/IDL Database Visualization Fortran/C/IDLPlot SQL filtering Database with integrated processing More data
3
The challenges Database servers are not designed for complex processing »SQL Server 2005 CLR integration allows running C# code inside the server Multidimensional indexing is not integrated into database servers »Data multi-dimensional Disk storage 1 dimensional »Have to port memory algorithms to DB Visualize more data, than fits into our memory »Need two-way interaction
4
u g r i z 300 million points in 5+ dimensions 300 million points in 5+ dimensions The Magnitude Space
5
Why magnitude space is interesting? LIGHT; SED BROADBAND FILTERS MAGNITUDE SPACE REDSHIFT PARAMETRS age, dust,... GALAXY early type, late type 3000 DIM 5 DIMENSION3-10 DIMENSION
6
Similar to SkyServer HTM indexing … but in 5 dimensions Spatial indexing
7
Quad-trees 32-tree in 5D No need to store the structure Number of nodes goes exponentially Breaks down in high dimensions or if data is highly non-uniformly distributed 32-tree in 5D No need to store the structure Number of nodes goes exponentially Breaks down in high dimensions or if data is highly non-uniformly distributed
8
K-d trees Only one cut in each level Store bounding boxes
9
Voronoi tessellation each point of the cell is closer to the seed than to any other the solution space for NN more spherical cells, 50 neighbors, 1000 vertices density estimation, clustering complex code, computation intensive in higher dimensions
10
Complex queries petroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and (petroMag_r > 0 and g > 0 and r > 0 and i > 0) and ( (petroMag_r-extinction_r) -0.2) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r - extinction_r < 19.5) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) > (0.45 - 4 * (dered_g - dered_r)) ) and ( (dered_g - dered_r) > (1.35 + 0.25 * (dered_r - dered_i)) ) ) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r) ) < 23.3 ) ) petroMag_i > 17.5 and (petroMag_r > 15.5 or petroR50_r > 2) and (petroMag_r > 0 and g > 0 and r > 0 and i > 0) and ( (petroMag_r-extinction_r) -0.2) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r)) < 24.2) ) or ( (petroMag_r - extinction_r < 19.5) and ( (dered_r - dered_i - (dered_g - dered_r)/4 - 0.18) > (0.45 - 4 * (dered_g - dered_r)) ) and ( (dered_g - dered_r) > (1.35 + 0.25 * (dered_r - dered_i)) ) ) and ( (petroMag_r - extinction_r + 2.5 * LOG10(2 * 3.1415 * petroR50_r * petroR50_r) ) < 23.3 ) ) Star/galaxy separation, QSO/LRG/photo-z targeting, search for rare objects Linear combination of colors Multidimensional polyhedra Drop outliers Find similar objects: k-nearest neigbor search (László: similar spectra) Skyserver log; a query from the 12 million:
11
Geometric Queries First run the query against the index Select cells those are fully covered fully outside intersected Run detailed SQL on intersected cells
12
Range query performance
13
Complex code in SQL/CLR Spectrum Services Composite, continuum and line fit, convolving filters and spectra, dereddening Non-parametric estimation Find k-nearest neighbors Polynomial fit (AMD optimized LAPACK) DR5: photometric redshift Garching DR4: ‘photometric’ D n (4000), Hδ A, age, mass
14
Redshift estimation quality Template fitting K-nearest neighbor with Kd-tree + local polynomial fit
16
Visualization Paraview VTK, OpenGL, lot of filters already built ODBC, Web service interface, to fetch data from SQL Server One-way: mouse action feedback is limited
17
New Adaptive Visualizer Using managed DirectX Graphical SQL: mouse actions are converted to queries and passed to SQL server LOD, zoom in and out 300M points, random subset, Voronoi, kd-tree visualization Click-connect to SkyServer Multi-resolution density maps Multidim : quickly change axes Brush select, select nearest neighbor Interact with other VO data
18
Adaptive visualization Adaptively fetch data from database
19
Summary TRADITIONAL APPROACH Flat files, Fortran, C code + Complex manipulation of data - Sequential slow access TRADITIONAL APPROACH Flat files, Fortran, C code + Complex manipulation of data - Sequential slow access SQL DATABASES Oracle, MS SQL Server, … + Organize, efficiently access data - Hard to implement complex algorithms - Multidimensional indexing (OLAP) is limited to categorical data SQL DATABASES Oracle, MS SQL Server, … + Organize, efficiently access data - Hard to implement complex algorithms - Multidimensional indexing (OLAP) is limited to categorical data MULTIDIMENSIONAL INDEXING B-tree, R-tree, K-d tree, BSP-tree … + Many for low D, some for high D + Fast, tuned for various problems - Implemented mostly as memory algorithms, maybe suboptimal in databases MULTIDIMENSIONAL INDEXING B-tree, R-tree, K-d tree, BSP-tree … + Many for low D, some for high D + Fast, tuned for various problems - Implemented mostly as memory algorithms, maybe suboptimal in databases VISUALIZATION Tools using OpenGL, DirectX + Fast - Using files, some tools access database, but not interactive VISUALIZATION Tools using OpenGL, DirectX + Fast - Using files, some tools access database, but not interactive INTEGRATE Implement in SQL Server use for astronomical data-mining and for fast interactive visualization INTEGRATE Implement in SQL Server use for astronomical data-mining and for fast interactive visualization
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.