Download presentation
Presentation is loading. Please wait.
Published byBethany Wilkins Modified over 9 years ago
1
Ames Research Center NAS Division 1 Experience With NASA’s Grid Miner Thomas H. Hinke NASA Ames Research Center Moffett Field, California, USA
2
Ames Research Center NAS Division 2 Outline Why use the grid for data mining? Overview of Grid Miner Experience adapting existing stand-along miner to grid A recent application of the Grid Miner
3
Ames Research Center NAS Division 3 Grid Provides Computational Power Grid couples needed computational power to data –NASA has a large volume of data stored in its distributed archives E.g., In the Earth Science area, the Earth Observing System Data and Information System (EOSDIS) holds large volume of data at multiple archives –Data archives are not designed to support user processing –Grids, coupled to archives, could provide such a computational capability for users
4
Ames Research Center NAS Division 4 Grid Provides Re-Usable Functions Grid-provided functions do not have to be re-implemented for each new mining system –Single sign-on security –Ability to execute jobs at multiple remote sites –Ability to securely move data between sites –Broker to determine best place to execute mining job –Job manager to control mining jobs Mining system developers do not have to re-implement common grid services Mining system developers can focus on the mining applications and not the issues associated with distributed processing
5
Ames Research Center NAS Division 5 Grid Will Provide Re-usable Services In the future, Grid/Web services will provide the ability to create reusable services that can facilitate the development of data mining systems –Builds on the web services work from the e- commerce area Service interface is defined through WSDL (Web Services Description Language) Standard access protocol is SOAP (Simple Object Access Protocol)
6
Ames Research Center NAS Division 6 Grid Services: A Foundation for Grid Mining Global Grid Forum working groups on –Open Grid Services Architecture (OGSA) standard under development to specify a grid-enabled web services architecture. See “Physiology of the Grid: An Open Grid Services Architecture for Distributed Systems Integration” –Open Grid Services Infrastructure (OGSI) standard has been released. Specifies common interfaces that all grid services should support.
7
Ames Research Center NAS Division 7 Grid Mining and OGSA/OGSI An OGSA/OGSI compliant mining service could be build Mining applications could be built by re- using capabilities provided by existing grid services.
8
Ames Research Center NAS Division 8 Outline Why use the grid for data mining? Overview of Grid Miner Experience adapting existing stand-along miner to grid A recent application of the Grid Miner
9
Ames Research Center NAS Division 9 Grid Miner Developed as one of the early applications on the IPG –Helped debug the IPG –Provided basis for satisfying a major IPG milestones IPG is NASA implementation of Globus-based Grid Provides basis for what could be an on-going Grid Mining Service
10
Grid Miner Operations Preprocessed Data Preprocessed Data Translated Data Patterns/ Models Patterns/ Models Results Output GIF Images HDF-EOS HDF Raster Images HDF SDS Polygons (ASCII, DXF) SSM/I MSFC Brightness Temp TIFF Images Others... PreprocessingAnalysis Clustering K Means Isodata Maximum Pattern Recognition Bayes Classifier Min. Dist. Classifier Image Analysis Boundary Detection Cooccurrence Matrix Dilation and Erosion Histogram Operations Polygon Circumscript Spatial Filtering Texture Operations Genetic Algorithms Neural Networks Others... Selection and Sampling Subsetting Subsampling Select by Value Coincidence Search Grid Manipulation Grid Creation Bin Aggregate Bin Select Grid Aggregate Grid Select Find Holes Image Processing Cropping Inversion Thresholding Others... Input HDF HDF-EOS GIF PIP-2 SSM/I Pathfinder SSM/I TDR SSM/I NESDIS Lvl 1B SSM/I MSFC Brightness Temp US Rain Landsat ASCII Grass Vectors (ASCII Text) Intergraph Raster Others... Figure thanks to Information and Technology Laboratory at the University of Alabama in Huntsville
11
Ames Research Center NAS Division 11 Mining on the Grid Grid Mining Agent IPG Processor Satellite Data Archive X Satellite Data Archive Y Grid Mining Agent IPG Processor Grid Mining Agent IPG Processor
12
Ames Research Center NAS Division 12 Grid Miner Architecture Grid Mining Agent IPG Processor Mining Database Daemon Control Database IPG Processor Grid Mining Agent IPG Processor Mining Operations Repository IPG Processor Data Archive X Satellite Data Archive Y Miner Confiig Server IPG Processor
13
Ames Research Center NAS Division 13 Example: Mining for Mesoscale Convective Systems Image shows results from mining SSM/I data
14
Ames Research Center NAS Division 14 Outline Why use the grid for data mining? Overview of Grid Miner Experience adapting existing stand-along miner to grid A recent application of the Grid Miner
15
Ames Research Center NAS Division 15 Starting Point for Grid Miner Grid Miner reused code from object-oriented ADaM data mining system –Developed under NASA grant at the University of Alabama in Huntsville, USA –Implemented in C++ as stand-alone, objected-oriented mining system Runs on NT, IRIX, Linux –Has been used to support research personnel at the Global Hydrology and Climate Center and a few other sites. Object-oriented nature of ADaM provided excellent base for enhancements to transform ADaM into Grid Miner
16
Ames Research Center NAS Division 16 Transforming Stand-Alone Data Miner into Grid Miner Original stand-alone miner had 459 C++ classes. Had to make small modifications to ADaM –Modified 5 existing classes –Added 3 new classes Grid commands added for –Staging miner agent to remote sites –Moving data to mining processor
17
Ames Research Center NAS Division 17 Staging Data Mining Agent to Remote Processor globusrun -w -r target_processor '&(executable=$(GLOBUSRUN_GASS_U RL)# path_to_agent)(arguments=arg1 arg2 … argN)(minMemory=500)'
18
Ames Research Center NAS Division 18 Moving Data to be Mined gsincftpget remote_processor local_directory remote_file
19
Ames Research Center NAS Division 19 Outline Why use the grid for data mining? Overview of Grid Miner Experience adapting existing stand-along miner to grid A recent application of the Grid Miner
20
Ames Research Center NAS Division 20 Demonstrate Grid Support for Interdisciplinary Earth Science Research Goal: Combine data from two distinctly different instruments (stored on two different grid-connected mass storage systems) to produce new insights by looking at data covering the same time and place across data from the two different instruments. Approach: –Use Grid Miner to mine TMI data for mesoscale convective systems. –Generate feature index (convex hull polygon) for all mesoscale convective systems found. –Transmit polygons in form of XML document to subsetter. –Subset CERES SSF data that corresponds to mesoscale convective systems discovered by Grid Miner
21
Ames Research Center NAS Division 21 Desired Processing Pattern Data Archive Network Subsetting Data Mining NASA Atmospheric Sciences Data Center Data Archived at NASA Ames Grid Processing To User Ideally Grid Miner would use grid resources co-located with the data, but if not, could use available remote grid resources
22
The Details Data Cache Grid Processor CERES Data Archive TMI Data on Mass Store Storage Resource Broker Grid Mining Agent IPG Processor Subsetter Grid Processor LaRC Atmospheric Sciences Data Center IPG MSCP (2) sends mining plan and (5) retrieves Feature Index (MSCP) Miner-Subsetter Control Program Grid Processor Miner Executables IPG Processor IPG Broker IPG Processor (1) Broker Selects IPG Resource for Mining (3) MSCP transfers GridAgent to Mining Site using Job Manager (not shown) (4) GridAgent Transfers Mining Ops (6) MSCP Starts Subsetter on Feature Index (5)
23
Ames Research Center NAS Division 23 Example of Data Being Mined 230 MB contained in 15 orbit files for one day of TMI (TRMM [Tropical Rainfall Measuring Mission] Microwave Imager) data Much higher resolution data exists with significantly higher volume.
24
Mining and Subsetting Results Grid Miner produced XML (Extensible Markup Language) document of polygons that circumscribe mesoscale convective systems. (MCSs) The following shows a portion of the XML description for two of the 64 vertices that comprise the convex hull polygon produced for the third MCS found by the miner in TMI data for April 1, 1998: 2450904.754815 1998-04-01 GMT 06:06:56 2083.126221 2 64 -2.26 -178.28 -2.08 -178.38. Grid Miner produced view of area mined using TMI data. CERES SSF footprints for April 1, 1998 hour 6 corresponding to the third MCS found by Grid Miner Convex Hull: 3 with 9 Footprint: 115804 Footprint: 215805 Footprint: 316090 Footprint: 416091 Footprint: 516094 Footprint: 616376 Footprint: 716377 Footprint: 816381 Footprint: 916382
25
Ames Research Center NAS Division 25 Current Status Currently works on the IPG as a prototype system User documentation underway Data archives need to be grid-enabled –Connected to the grid –Provide controlled access to data on tertiary storage E.g., by using a system such as the Storage Resource Broker that was developed at the San Diego Super Computer Center Some earlier-adopter users need to be found to begin using the Grid Miner –Willing to code any new operations needed for their applications –Willing to work with system with prototype-level documentation
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.