Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology SDMIV 24 October 2002 Edinburgh KE ToolsS Data
Scientific Data Datacubes N-dimensional array –spectrum, time-series, –image, voxels, hyperspectral image Concentration Pattern matching Integration Event Sets Often derived from pattern matching A set of events is a table Integrating Event Sets Clustering
Knowledge Extraction Concentration principle components cluster/outlier finding Datacube Eventset Pattern matching From theory or from training set Integration registration of datacubes join / crossmatch of eventsets
Datacube Some stars from the DPOSS survey
Datacube An AVIRIS image of San Francisco Bay nm in 224 bands R. Green, JPL atmospheric absorption
Concentrating Information eg Principle Component Analysis Given a set of vectors Compute dot products (same as correlations) Diagonalize Throw out weaker (noise) components
Information concentration Principle Component Analysis
Event Sets Created by pattern matching from a known rule from a training set by finding clusters
Event Set = Table name=longitude content=Earth coordinate units=degrees datatype=double display=f name=ID content=key units=none datatype=char E E E ? 10 3 ?
Gravitational Lenses A. Szalay, Johns Hopkins Pattern matching finds events in datacubes
Black hole collisions LIGO: Laser Interferometric Gravitational Wave Experiment
Creating Event Sets Given a set of volcanoes, find a lot more volcanoes Here we use Singular Value Decomposition Supervised Classification
all sources stellar galaxy compact galaxy high f X /f opt low f X /f opt all sources active dM stars BLAGN medium f X /f opt NELGs possible hi-z quasar F/G stars? normal galaxies? symbols: X-ray source counterparts contours: all optical objects BLAGN Multiparameter data colour-colour-f x /f opt Mike Watson Leicester University
Integrating Datacubes Find a mapping from one domain to the other Registration of DPOSS and Hubble Deep Field
Datacube Registration Movement of ice inferred from registration
Integrating Event Sets Database Join Fuzzy Join eg astronomical crossmatch Distributed Join does the Grid do databases?
Integration of Star Catalogs
Visualizing Event Sets Unsupervised clustering stars in color-color space
A Grid of Services Human gets Data Network of Services Understood by human Further processing after format change Grid of pipes and engines Switches and actuators data flow
Example Grid of Services Storage Service DPOSS Service Catalog Service User’s code Crossmatch Service 2MASS Service Query Check Service Query Estimator flexible complex metadata AND broadband binary
Computing Challenges High-dimensional Clustering & Classification Visualization Outlier Detection Visualization of points Database access to points Large Distributed Join
Standards needed Bundling diverse objects together with code and references Referencing data resources on the Grid local, remote, replicated,....
Problem Solving Environment Storage Service DPOSS Service Catalog Service User’s code Crossmatch Service 2MASS Service Query Check Service Query Estimator Plumbing (big data) and electrical (control, metadata) Web service and workflow Finding service classes/implementations by semantics GUI / Executive / IO adapters / Algorithms