Presentation is loading. Please wait.

Presentation is loading. Please wait.

Data Avalanche in Astronomy

Similar presentations


Presentation on theme: "Data Avalanche in Astronomy"— Presentation transcript:

1 Data Avalanche in Astronomy
DISTRIBUTED DATA MINING ON ASTRONOMY CATALOGS Data Avalanche in Astronomy Cross Matching : Alignment of Astronomy Catalogs Tuple ID Join Attribute (X) A P1 X1 A1 P2 X2 A2 P3 X3 Tuple ID Join Attribute (X) B Q1 X3 B1 Q2 X2 B2 Q3 B4 Q4 X1 B3 Astronomy Sky Surveys (SDSS , 2MASS) Observes Galaxies, Quasars, Stars Serendipity Objects Raw Data from Telescope is pre-processed Hundreds of attributes for each object National Virtual Observatory - Develop an information technology infrastructure for enabling easy access to distributed astronomy catalogs Catalog P Join Attribute (X) A B X1 A1 B3 X2 A2 B2 B4 X3 B1 Catalog Q The Matched Catalog Distributed PCA Algorithm Data Matrix: Site A - n X p , Site B – n X q p + q = m (total number of attributes) Normalize the data at respective sites without any communication A central co-ordination site S sends A and B a random number generation seed A and B generate a l X n random matrix R (elements of the random matrix are i.i.d and chosen from any distribution with mean 0 and variance 1) A sends RA and B sends RB to S Compute D = (RA)T (RB) / l E[D]= E[AT(RTR)B/ l ] = AT E[RTR] B / l ~ AT B (Johnson and Linden Strauss lemma) The Fundamental Plane of Galaxies Mass / Luminosity / Radius Experimental Results Velocity Dispersion Surface Brightness Objective: Finding correlations in high dimensional spaces Domain Knowledge: For the class of elliptical galaxies, observe the parameters Surface Brightness, Log (Velocity Dispersion), Log (Radius) A 2D plane exists in the observed space of parameters called The Fundamental Plane The Distributed Problem Objective: Finding correlations in high dimensional spaces Domain Knowledge: For the class of elliptical galaxies, observe the parameters Surface Brightness, Log (Velocity Dispersion), Log (Radius) A 2D plane exists in the observed space of parameters called The Fundamental Plane 2MASS Mean Surface Brightness ( Kmsb) SDSS Red Shift (rs) Angular Effective Radius (Iaer) Velocity Dispersion (vd) Build a Distributed Principal Component Analysis Algorithm Assumptions : 1. Build the cross matched table off-line 2. Compute indices and send to the sites Kmsb Velocity Dispersion (Angular Eff. Radius X Red Shift) The Virtual Table Work Done by Haimonti Dutta, Chris Giannella, Kirk Borne, Ran Wolff and Hillol Kargupta NSF Grants: IIS , IIS , IIS and NASA Grant NAS


Download ppt "Data Avalanche in Astronomy"

Similar presentations


Ads by Google