Download presentation
Presentation is loading. Please wait.
Published byEugenia Walsh Modified over 8 years ago
1
07/07/2016 1 KDD Services at the Goddard DAAC Robert Mack NASA/Goddard Space Flight Center Distributed Active Archive Center
2
07/07/2016 2 Introduction As data volumes get larger, the proportion of data that can be distributed to users decreases User communities express concern about the ability to manage the data explosion on their end Many of these users apply Knowledge Discovery in Databases (KDD) techniques to large volumes of data (up to 1 TB) received from the GES DAAC Rapid advances in computer power (CPU) are enabling increases in data processing that are outpacing tape drive performance and network capacity
3
07/07/2016 3 Volume Distributed and Archived
4
07/07/2016 4 Mitigation Migrate more data mining and mining preparation activities into the data center in order to reduce the data volume that needs to be distributed Offers the user a more useful and manageable product
5
07/07/2016 5 TRMM Data Mining Archive Data Mining “Campaigns” Schedule a wholesale retrieval of specific data products and offer users the opportunity to extract information from the data being retrieved. First campaign November 2000 to January 2001 6 mining algorithms 3 years of TRMM data 3.31 TB to 450 GB (7.4 to 1) Second campaign September 2001 to present 8 mining algorithms 4+ years of TRMM data 4 TB to 300 GB (13 to 1) Subscription Data Mining
6
07/07/2016 6 Simple, Scalable Script-based Science Processor (S4P) Data Driven - when data show up, they are automatically processed Work is executed at stations in response to work orders Output work orders are sent to downstream stations Station 1Station 2 Station 3 work order work order work order work order Or, straightforward implementation of a data flow diagram
7
07/07/2016 7 KDD System Architecture
8
07/07/2016 8 Mining Algorithm Code Conventions Languages Supported C, Fortran and IDL Non-interactive/command-line Input/output filenames Empty data files Null-output KDD product files
9
07/07/2016 9 Mining Algorithm Code Integration Code Analysis Unit Test Benchmarks Execution time Computer resource use Output data volumes Integrate into KDD subsystem Process 1 days worth of data and wait for user approval Process 1 months worth of data and wait for user approval
10
07/07/2016 10 Data Mining Algorithms Fire Algorithm by L. Giglio & J. Kendall (SSAI) Looks for Fires in VIRS Visible-Infrared Radiance Coincidence Subsetting by C. Kummerow (CSU) Extracts surface rainfall from PR rain product coincident with rain gauge data Content-Based Subsetting by E. Anagnostou (UCONN) 4 algorithms extract information for 8 regions from microwave, precipitation radar, and multi-instrument rainfall products.
11
07/07/2016 11 VIRS Active-Fire Product Statistical summary product 0.25º (or coarser) spatial resolution Monthly (or longer) temporal resolution Normalized (important!) unique (not total) fire pixels per month accounts for multiple fire observations due to repeated overpasses and extensive overlapping of pixels at edge of scan Non-biomass-burning high temperature sources separated volcanoes, gas flares Diurnal burning cycle parameters (possible new data layer) time, amplitude, and width of peak experimental (under development)
12
07/07/2016 12 VIRS Active-Fire Product: Example Unique Fire Pix: September 2000
13
07/07/2016 13 Future Directions The third TRMM data mining campaign is tentatively scheduled to start June 2002 Global Merge IR data mining will begin this month Provide the capability to mine MODIS data from the ECS data pool
14
07/07/2016 14 Near-line Archive Data Mining (NADM) Web-based GUI interface Interactively execute algorithm code on selected data in the ECS data pool Set up ongoing mining subscriptions for data as it becomes available in the ECS data pool Pickup the generated products via FTP Develop, test and execute data mining algorithms without the data center's intervention (Spring 2002) Use predefined data mining algorithms to mine data from the pool (Spring 2002)
15
07/07/2016 15 NADM Architecture Data Pool Cleanup Monitor Subscribe Master Web Client Web Server On Off Monitor Trigger Subscriptions Cleanup Disk Pool DB /user1 /user2 Trigger Queue Algorithm Export Trigger Queue Algorithm Export /userN Trigger Queue Algorithm Export...
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.