NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu
2003 2MASS Point Source Catalog 0.5 billion rows > 100 columns 2013 AllWISE Source Catalog 0.75 billion rows > 300 columns
Gum31, AllWISE Source Catalog, 0.5d search. Data are selected in each of the 3 views.
Sky area: box with center , and length 5400 arcsec. CatalogRows, Columns (short form default) Space on disk (ascii IPAC Table) AllWISE Source Catalog 30,000 rows, 47 columns13MB / 9B per cell COSMOS Cassata morphology Catalog 230,000 rows,15 columns62MB / 18B per cell Spitzer Source List250,000 rows, 148 columns416MB / 11B per cell Table covers one page at a time. Image overlay and plot should cover all rows. How do we visualize this much data?
Points on top of each other - hard to distinguish - hard to interpret - can be aggregated Plot area: 400 x 400 px 2 Symbol size: 5 x 5 px 2 160,000 px 2 / 25 px 2 = ,000 catalog rows are plotted with 5960 square symbols
Data aggregation technique Used by statistical packages (R or SDSS) 2-d histogram; shade represent N p in bin Outlier preserving
Color-color diagram created from AllWISE Source Catalog. 1 degree cone search. Lockman Hole. 46,475 data points from are represented by 1,598 bins.
Same diagram, different shading scheme. Darker – 3.1 times more points.
x:y – aspect ratio N bins – maximum number of bins N x = (int)sqrt( N bins * [x:y] ) N y = (int)sqrt( N bins / [x:y] ) binsize x = (x max – x min ) / N x + pad x binsize y = (y max – y min ) / N y + pad y
SERVER SIDE CLIENT SIDE Reduces transferred data size Used for larger tables (> 30,000 rows) Reduces rendered data size Common plot operations – zoom, select – do not require server call Used for smaller tables (up to 30,000 rows)
1. Retrieve data from low-level query and data service 2. Apply dynamic [current table] filters 3. Apply current sorting order 4. Aggregate data for visualization stream table processing – one row at a time cache intermediate results cache intermediate results fix plot aspect ratio fix plot aspect ratio Policies
Filtering from image overlay. How to find matching rows? Aggregation parameters must be preserved!
Aggregation parameters X, Y names or expressions Minimum values: x min, y min Step sizes: binsize x, binsize y For each aggregated value Bin index Number of points
Binning is efficient aggregation technique Use client-side binning for smaller tables Preserve aggregation parameters to move between aggregated and full data Process one row at a time / cache on server Fix aspect ratio on client
NASA/IPAC Infrared Science Archive Tatiana Goldina, Loi Ly, Trey Roby, Xiuqin Wu