Presentation is loading. Please wait.

Presentation is loading. Please wait.

Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Similar presentations


Presentation on theme: "Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology SDMIV 24 October 2002 Edinburgh KE ToolsS Data."— Presentation transcript:

1 Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology roy@caltech.edu SDMIV 24 October 2002 Edinburgh KE ToolsS Data

2 Scientific Data Datacubes N-dimensional array –spectrum, time-series, –image, voxels, hyperspectral image Concentration Pattern matching Integration Event Sets Often derived from pattern matching A set of events is a table Integrating Event Sets Clustering

3 Knowledge Extraction Concentration principle components cluster/outlier finding Datacube  Eventset Pattern matching From theory or from training set Integration registration of datacubes join / crossmatch of eventsets

4 Datacube Some stars from the DPOSS survey

5 Datacube An AVIRIS image of San Francisco Bay 400-2500 nm in 224 bands R. Green, JPL atmospheric absorption

6 Concentrating Information eg Principle Component Analysis Given a set of vectors Compute dot products (same as correlations) Diagonalize Throw out weaker (noise) components

7 Information concentration Principle Component Analysis

8 Event Sets Created by pattern matching from a known rule from a training set by finding clusters

9 Event Set = Table name=longitude content=Earth coordinate units=degrees datatype=double display=f6.2 43.4 87.2 83.2 name=ID content=key units=none datatype=char E3948547 E3948545 E3943766 10 8 ? 10 3 ?

10 Gravitational Lenses A. Szalay, Johns Hopkins Pattern matching finds events in datacubes

11 Black hole collisions LIGO: Laser Interferometric Gravitational Wave Experiment

12 Creating Event Sets Given a set of volcanoes, find a lot more volcanoes Here we use Singular Value Decomposition Supervised Classification

13 all sources stellar galaxy compact galaxy high f X /f opt low f X /f opt all sources active dM stars BLAGN medium f X /f opt NELGs possible hi-z quasar F/G stars? normal galaxies? symbols: X-ray source counterparts contours: all optical objects BLAGN Multiparameter data colour-colour-f x /f opt Mike Watson Leicester University

14 Integrating Datacubes Find a mapping from one domain to the other Registration of DPOSS and Hubble Deep Field

15 Datacube Registration Movement of ice inferred from registration

16 Integrating Event Sets Database Join Fuzzy Join eg astronomical crossmatch Distributed Join does the Grid do databases?

17 Integration of Star Catalogs

18 Visualizing Event Sets Unsupervised clustering 50000 stars in color-color space

19 A Grid of Services Human gets Data Network of Services Understood by human Further processing after format change Grid of pipes and engines Switches and actuators data flow

20 Example Grid of Services Storage Service DPOSS Service Catalog Service User’s code Crossmatch Service 2MASS Service Query Check Service Query Estimator flexible complex metadata AND broadband binary

21 Computing Challenges High-dimensional Clustering & Classification Visualization Outlier Detection Visualization of 10 10 points Database access to 10 10 points Large Distributed Join

22 Standards needed Bundling diverse objects together with code and references Referencing data resources on the Grid local, remote, replicated,....

23 Problem Solving Environment Storage Service DPOSS Service Catalog Service User’s code Crossmatch Service 2MASS Service Query Check Service Query Estimator Plumbing (big data) and electrical (control, metadata) Web service and workflow Finding service classes/implementations by semantics GUI / Executive / IO adapters / Algorithms


Download ppt "Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology SDMIV 24 October 2002 Edinburgh KE ToolsS Data."

Similar presentations


Ads by Google