Presentation is loading. Please wait.

Presentation is loading. Please wait.

A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D.

Similar presentations


Presentation on theme: "A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D."— Presentation transcript:

1 A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D

2 ROADMAP Background Information Interesting Astronomy Data Mining Problems What has / not been done (Literature review) My project objectives The problem of Alignment in astronomy catalogs The Fundamental Plane A case study for recreating the Fundamental Plane from astronomy catalogs Experimental Results Efforts towards building Web services

3 Background Information  Next generation Astronomy catalogs will contain data for most of the sky  Existing astronomy sky surveys – SDSS, 2Mass, FIRST, etc  Terabytes and Peta bytes of Data  Data Avalanche in Astronomy  Getting useful information is like looking for a needle in a haystack  National Virtual Observatory (NVO) has been set up to facilitate scientific discovery  Obvious need for Distributed Data Mining

4 What kind of Data Mining activities are astronomers interested in ?  Detection of transient objects such as supernovae (Online transient object detection in real time)  Obtain statistics of variable and moving objects (model variability, refine existing models, fit models to irregularly sampled data )  Parameterize shapes of objects using rotationally invariant quantities  Efficient cluster and outlier detection  Supervised Data Mining problems (match objects detected in multiple bands, derive photometric red shifts)

5 What has/not been done  Lot of efforts in centralized data mining (NVO, FMass, Class X, FIRST etc )  Some grid mining (Notable GRIST project)  Very few distributed data mining efforts in their preliminary stages ( http://www.cs.queensu.ca/home/mcconell/DDMAstro.html ) ( http://www.cs.queensu.ca/home/mcconell/DDMAstro.html )

6 Objectives of this project  Aligning of Catalogs (The Fundamental Plane Problem)  Implementation of algorithms for Distributed Data Mining on Astronomy Catalogs  Development of webservices for the catalogs / investigation into what needs to be done to integrate this into the NVO

7 Alignment of Astronomy Catalogs Cross matching is a non trivial problem in itself. We assume cross matching happens off line and there exists an indexing scheme by which catalogs know the exact cross matched tuples

8 Some interesting numbers  Size of current SDSS catalogs 3.0 TB, contains about 180 million objects (As per Data Release 4)  2Mass has already observed 99% of the sky and reports 470,992,970 Point sources and 1,647,599 Extended sources Portion of the sky observed by SDSS

9 Problems  Cross Matching is an inherently difficult problem for the astronomy catalogs  We assume data sets are cross matched and this computation is done offline  This is a strong assumption and often may not be acceptable to astronomers

10 A real life cross matching Exercise Problems encountered  Which catalogs to use ?  We tried several - SDSS, 2Mass, HyperLeda, CfA RedShift Catalog  Catalogs have different indexing schemes – more recent ones use HTM (Hierarchical Triangular Mesh), others use (ra,dec) or even Names of objects  Some attributes are really not available ! (SDSS has -9999 for most of its red shift values)  Different catalogs observe different portions of the sky (SDSS covers only about 16% of the sky in the latest release while 2Mass covers the entire sky) – Select subsets to cross match wisely !

11 The successful cross matching …..  Chose a region of the sky between 0 and 15 (dec) and 150 and 200 degrees (ra) – observed by both SDSS and 2Mass  Use a web interface provided by SDSS to do the cross matching  Selected the K-band for obtaining red shift and surface brightness (astronomical significance) Case Study  Centralized database 1249 cross matched objects  Attributes are size, surface brightness, velocity dispersion  Does not really make a case for a distributed data mining scenario ! Solution - try a larger subset of the data from both catalogs - try a larger subset of the data from both catalogs

12 The Fundamental Plane  Interesting problem in astronomy - Identify correlations in high dimensional spaces  For the class of elliptical and spiral galaxies Observed features – radius, mean surface brightness and central velocity dispersion Observed features – radius, mean surface brightness and central velocity dispersion A two dimensional plane in the observed space of 3D parameters exist called THE FUNDAMENTAL PLANE

13 An illustration of the Fundamental Plane

14 Experimental Results  First PC captured 69.4193% of variance  Second PC captured 12.1333% of the variance  The astronomy literature suggests 1 st and 2 nd PC together should capture about 88% of variance Reasonably close recreation of the Fundamental Plane from two cross matched data sets in the centralized setting

15 Algorithm for Distributed Covariance Computation  A central co-ordination site S sends A and B a random number generation seed  A and B generate and n X l Random matrix R where l << n  A and B send S – R T A and R T B  S computes ( R A ) T (RB) / n

16 Experimental Results – Distributed Setting Case Study  1249 attributes at site A and B  2 attributes at site A and 1 attribute at site B

17 More results

18 Development of a Web Service Architecture of the Proposed System CLIENT SITE A SITE B WEB SERVICE For Distributed Covariance Computation Soap Message

19 Current Implementation  Using Apache Axis (SOAP engine – a framework for making SOAP processors such as clients, servers )  Tomcat version 4.1  SOAP version 1.2  Short Demo  Further System Developmental Issues (use of SOAP with attachments)

20 QUESTIONS ?


Download ppt "A Web service for Distributed Covariance Computation on Astronomy Catalogs Presented by Haimonti Dutta CMSC 691D."

Similar presentations


Ads by Google