Download presentation
Presentation is loading. Please wait.
1
A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D
2
ROADMAP Background Information Interesting Astronomy Data Mining Problems What has / not been done (Literature review) My project objectives The problem of Alignment in astronomy catalogs The Fundamental Plane A case study for recreating the Fundamental Plane from astronomy catalogs Experimental Results Efforts towards building Web services
3
Background Information Next generation Astronomy catalogs will contain data for most of the sky Existing astronomy sky surveys – SDSS, 2Mass, FIRST, etc Terabytes and Peta bytes of Data Data Avalanche in Astronomy Getting useful information is like looking for a needle in a haystack National Virtual Observatory (NVO) has been set up to facilitate scientific discovery Obvious need for Distributed Data Mining
4
What kind of Data Mining activities are astronomers interested in ? Detection of transient objects such as supernovae (Online transient object detection in real time) Obtain statistics of variable and moving objects (model variability, refine existing models, fit models to irregularly sampled data ) Parameterize shapes of objects using rotationally invariant quantities Efficient cluster and outlier detection Supervised Data Mining problems (match objects detected in multiple bands, derive photometric red shifts)
5
What has/not been done Lot of efforts in centralized data mining (NVO, FMass, Class X, FIRST etc ) Some grid mining (Notable GRIST project) Very few distributed data mining efforts in their preliminary stages ( http://www.cs.queensu.ca/home/mcconell/DDMAstro.html ) ( http://www.cs.queensu.ca/home/mcconell/DDMAstro.html )
6
Objectives of this project Aligning of Catalogs (The Fundamental Plane Problem) Implementation of algorithms for Distributed Data Mining on Astronomy Catalogs Development of webservices for the catalogs / investigation into what needs to be done to integrate this into the NVO
7
Alignment of Astronomy Catalogs Cross matching is a non trivial problem in itself. We assume cross matching happens off line and there exists an indexing scheme by which catalogs know the exact cross matched tuples
8
Some interesting numbers Size of current SDSS catalogs 3.0 TB, contains about 180 million objects (As per Data Release 4) 2Mass has already observed 99% of the sky and reports 470,992,970 Point sources and 1,647,599 Extended sources Portion of the sky observed by SDSS
9
Problems Cross Matching is an inherently difficult problem for the astronomy catalogs We assume data sets are cross matched and this computation is done offline This is a strong assumption and often may not be acceptable to astronomers
10
A real life cross matching Exercise Problems encountered Which catalogs to use ? We tried several - SDSS, 2Mass, HyperLeda, CfA RedShift Catalog Catalogs have different indexing schemes – more recent ones use HTM (Hierarchical Triangular Mesh), others use (ra,dec) or even Names of objects Some attributes are really not available ! (SDSS has -9999 for most of its red shift values) Different catalogs observe different portions of the sky (SDSS covers only about 16% of the sky in the latest release while 2Mass covers the entire sky) – Select subsets to cross match wisely !
11
The successful cross matching ….. Chose a region of the sky between 0 and 15 (dec) and 150 and 200 degrees (ra) – observed by both SDSS and 2Mass Use a web interface provided by SDSS to do the cross matching Selected the K-band for obtaining red shift and surface brightness (astronomical significance) Case Study Centralized database 1249 cross matched objects Attributes are size, surface brightness, velocity dispersion Does not really make a case for a distributed data mining scenario ! Solution - try a larger subset of the data from both catalogs - try a larger subset of the data from both catalogs
12
The Fundamental Plane Interesting problem in astronomy - Identify correlations in high dimensional spaces For the class of elliptical and spiral galaxies Observed features – radius, mean surface brightness and central velocity dispersion Observed features – radius, mean surface brightness and central velocity dispersion A two dimensional plane in the observed space of 3D parameters exist called THE FUNDAMENTAL PLANE
13
An illustration of the Fundamental Plane
14
Experimental Results First PC captured 69.4193% of variance Second PC captured 12.1333% of the variance The astronomy literature suggests 1 st and 2 nd PC together should capture about 88% of variance Reasonably close recreation of the Fundamental Plane from two cross matched data sets in the centralized setting
15
Algorithm for Distributed Covariance Computation A central co-ordination site S sends A and B a random number generation seed A and B generate and n X l Random matrix R where l << n A and B send S – R T A and R T B S computes ( R A ) T (RB) / n
16
Experimental Results – Distributed Setting Case Study 1249 attributes at site A and B 2 attributes at site A and 1 attribute at site B
17
More results
18
Development of a Web Service Architecture of the Proposed System CLIENT SITE A SITE B WEB SERVICE For Distributed Covariance Computation Soap Message
19
Current Implementation Using Apache Axis (SOAP engine – a framework for making SOAP processors such as clients, servers ) Tomcat version 4.1 SOAP version 1.2 Short Demo Further System Developmental Issues (use of SOAP with attachments)
20
QUESTIONS ?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.