Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology SDMIV 24 October 2002 Edinburgh KE ToolsS Data.

Slides:



Advertisements
Similar presentations
Applications of one-class classification
Advertisements

3 September 2004NVO Coordination Meeting1 Grid-Technologies NVO and the Grid Reagan W. Moore George Kremenek Leesa Brieger Ewa Deelman Roy Williams John.
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
The Top 10 Reasons Why Federated Can’t Succeed And Why it Will Anyway.
Presented by Xinyu Chang
Dark Matter Mike Brotherton Professor of Astronomy, University of Wyoming Author of Star Dragon and Spider Star.
C van Ingen, D Agarwal, M Goode, J Gupchup, J Hunt, R Leonardson, M Rodriguez, N Li Berkeley Water Center John Hopkins University Lawrence Berkeley Laboratory.
The Open Science Grid: Bringing the power of the Grid to scientific research
CASJOBS: A WORKFLOW ENVIRONMENT DESIGNED FOR LARGE SCIENTIFIC CATALOGS Nolan Li, Johns Hopkins University.
Constraining Astronomical Populations with Truncated Data Sets Brandon C. Kelly (CfA, Hubble Fellow, 6/11/2015Brandon C. Kelly,
Advanced Topics COMP163: Database Management Systems University of the Pacific December 9, 2008.
Ameriranikistan Muhammad Ahmad Kyle Huston Farhad Majdeteimouri Dan Mackin.
02 -1 Lecture 02 Agent Technology Topics –Introduction –Agent Reasoning –Agent Learning –Ontology Engineering –User Modeling –Mobile Agents –Multi-Agent.
KDD for Science Data Analysis Issues and Examples.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
LÊ QU Ố C HUY ID: QLU OUTLINE  What is data mining ?  Major issues in data mining 2.
Data Mining Techniques
Quasars and Other Active Galaxies
Katanosh Morovat.   This concept is a formal approach for identifying the rules that encapsulate the structure, constraint, and control of the operation.
Alok 1Northwestern University Access Patterns, Metadata, and Performance Alok Choudhary and Wei-Keng Liao Department of ECE,
Black holes: do they exist?
Data Conservancy: A Blueprint for Libraries in the Data Age Sayeed Choudhury Johns Hopkins University
GridLab, Eger, 31 Mar-1 Apr Potential Gravitational Applications of Grid B.S. Sathyaprakash GridLab conference, 31.
Ch Star Groups.
Functions and Demo of Astrogrid 1.1 China-VO Haijun Tian.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Introduction to Web Mining Spring What is data mining? Data mining is extraction of useful patterns from data sources, e.g., databases, texts, web,
Virtual Observatory & LIGO Roy Williams California Institute of Technology.
Acknowledgements The work presented in this poster was carried out within the LIGO Scientific Collaboration (LSC). The methods and results presented here.
Chapter 25 Galaxies and Dark Matter Dark Matter in the Universe We use the rotation speeds of galaxies to measure their mass:
* Working Group 4. 2 AstroGrid-D Meeting, Heidelberg Tobias Scholl Astrometric Matching Prototype (D4.2) 50 RASS-BSC sources Correlation with.
4 - 1 Copyright © 2006, The McGraw-Hill Companies, Inc. All rights reserved. Computer Software Chapter 4.
Federation and Fusion of astronomical information Daniel Egret & Françoise Genova, CDS, Strasbourg Standards and tools for the Virtual Observatories.
Copyright © 2012, SAS Institute Inc. All rights reserved. ANALYTICS IN BIG DATA ERA ANALYTICS TECHNOLOGY AND ARCHITECTURE TO MANAGE VELOCITY AND VARIETY,
The Sloan Digital Sky Survey ImgCutout: The universe at your fingertips Maria A. Nieto-Santisteban Johns Hopkins University
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
Some Grid Science California Institute of Technology Roy Williams Paul Messina Grids and Virtual Observatory Grids and and LIGO.
Kepler includes contributors from GEON, SEEK, SDM Center and Ptolemy II, supported by NSF ITRs (SEEK), EAR (GEON), DOE DE-FC02-01ER25486.
Hyperatlas Coregistered Federated Imagery Roy Williams Bruce Berriman George Djorgovski John Good Reagan Moore Caltech CACR Caltech IPAC Caltech Astronomy.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
The International Virtual Observatory Alliance (IVOA) interoperability in action.
German Astrophysical Virtual Observatory Overview and Results So Far W. Voges, G. Lemson, H.-M. Adorf.
Galaxies with Active Nuclei Chapter 14:. Active Galaxies Galaxies with extremely violent energy release in their nuclei (pl. of nucleus).  “active galactic.
1 Unsupervised Learning and Clustering Shyh-Kang Jeng Department of Electrical Engineering/ Graduate Institute of Communication/ Graduate Institute of.
NSF 21 Oct Science Nuggets Jolien Creighton University of Wisconsin–Milwaukee.
Japanese Virtual Observatory (JVO) National Astronomical Observatory of Japan (NAOJ) Contact Address: Demo 1: GRID-based database federation.
Copyright © 2010 Pearson Education, Inc. Chapter 16 Galaxies and Dark Matter Lecture Outline.
Satisfying Requirements BPF for DRA shall address: –DAQ Environment (Eclipse RCP): Gumtree ISEE workbench integration; –Design Composing and Configurability,
A field of study that encompasses computational techniques for performing tasks that require intelligence when performed by humans. Simulation of human.
The Universe: Big, Weird and Kind of Scary!
IVOA Small Projects Meeting Application to the science S. Honda, Y. Shirasaki, M. Tanaka and JVO team National Astronomical Observatory of Japan.
Origami: Scientific Distributed Workflow in McIDAS-V Maciek Smuga-Otto, Bruce Flynn (also Bob Knuteson, Ray Garcia) SSEC.
Space Tools Key Point (Std ): Compare the purposes of the tools and the technology that scientists use to study space.
HELIO: Discovery and Analysis of Data in Heliophysics Robert Bentley, John Brooke, André Csillaghy, Donal Fellows, Anja Le Blanc, Mauro Messerotti, David.
Astronomy toolkits and data structures Andrew Jenkins Durham University.
E.C. Auden1, J.L. Culhane1, Y. P. Elsworth2, A. Fludra3, M. Thompson4
DATA MINING © Prentice Hall.
A Black-Box Approach to Query Cardinality Estimation
Soma Mukherjee for LIGO Science Collaboration
DIRECT DETECTION OF GRAVITATIONAL WAVES FROM NEUTRON STARS
The Top 10 Reasons Why Federated Can’t Succeed
Chapter 16 Active Galaxy.
The Universe Visual Vocabulary.
Classification of GAIA data
The Past, The Present, and The Future
Clustering John Owen Sarah Smith.
3.1.1 Introduction to Machine Learning
Data Mining (Don’t worry, I am not presenting these slides; just for your reading pleasure)
Data Warehousing Data Mining Privacy
Machine Learning for Space Systems: Are We Ready?
Presentation transcript:

Knowledge Extraction from Scientific Data Roy Williams California Institute of Technology SDMIV 24 October 2002 Edinburgh KE ToolsS Data

Scientific Data Datacubes N-dimensional array –spectrum, time-series, –image, voxels, hyperspectral image Concentration Pattern matching Integration Event Sets Often derived from pattern matching A set of events is a table Integrating Event Sets Clustering

Knowledge Extraction Concentration principle components cluster/outlier finding Datacube  Eventset Pattern matching From theory or from training set Integration registration of datacubes join / crossmatch of eventsets

Datacube Some stars from the DPOSS survey

Datacube An AVIRIS image of San Francisco Bay nm in 224 bands R. Green, JPL atmospheric absorption

Concentrating Information eg Principle Component Analysis Given a set of vectors Compute dot products (same as correlations) Diagonalize Throw out weaker (noise) components

Information concentration Principle Component Analysis

Event Sets Created by pattern matching from a known rule from a training set by finding clusters

Event Set = Table name=longitude content=Earth coordinate units=degrees datatype=double display=f name=ID content=key units=none datatype=char E E E ? 10 3 ?

Gravitational Lenses A. Szalay, Johns Hopkins Pattern matching finds events in datacubes

Black hole collisions LIGO: Laser Interferometric Gravitational Wave Experiment

Creating Event Sets Given a set of volcanoes, find a lot more volcanoes Here we use Singular Value Decomposition Supervised Classification

all sources stellar galaxy compact galaxy high f X /f opt low f X /f opt all sources active dM stars BLAGN medium f X /f opt NELGs possible hi-z quasar F/G stars? normal galaxies? symbols: X-ray source counterparts contours: all optical objects BLAGN Multiparameter data colour-colour-f x /f opt Mike Watson Leicester University

Integrating Datacubes Find a mapping from one domain to the other Registration of DPOSS and Hubble Deep Field

Datacube Registration Movement of ice inferred from registration

Integrating Event Sets Database Join Fuzzy Join eg astronomical crossmatch Distributed Join does the Grid do databases?

Integration of Star Catalogs

Visualizing Event Sets Unsupervised clustering stars in color-color space

A Grid of Services Human gets Data Network of Services Understood by human Further processing after format change Grid of pipes and engines Switches and actuators data flow

Example Grid of Services Storage Service DPOSS Service Catalog Service User’s code Crossmatch Service 2MASS Service Query Check Service Query Estimator flexible complex metadata AND broadband binary

Computing Challenges High-dimensional Clustering & Classification Visualization Outlier Detection Visualization of points Database access to points Large Distributed Join

Standards needed Bundling diverse objects together with code and references Referencing data resources on the Grid local, remote, replicated,....

Problem Solving Environment Storage Service DPOSS Service Catalog Service User’s code Crossmatch Service 2MASS Service Query Check Service Query Estimator Plumbing (big data) and electrical (control, metadata) Web service and workflow Finding service classes/implementations by semantics GUI / Executive / IO adapters / Algorithms