Presentation is loading. Please wait.

Presentation is loading. Please wait.

07/07/2016 1 KDD Services at the Goddard DAAC Robert Mack NASA/Goddard Space Flight Center Distributed Active Archive Center.

Similar presentations


Presentation on theme: "07/07/2016 1 KDD Services at the Goddard DAAC Robert Mack NASA/Goddard Space Flight Center Distributed Active Archive Center."— Presentation transcript:

1 07/07/2016 1 KDD Services at the Goddard DAAC Robert Mack NASA/Goddard Space Flight Center Distributed Active Archive Center

2 07/07/2016 2 Introduction •As data volumes get larger, the proportion of data that can be distributed to users decreases •User communities express concern about the ability to manage the data explosion on their end •Many of these users apply Knowledge Discovery in Databases (KDD) techniques to large volumes of data (up to 1 TB) received from the GES DAAC •Rapid advances in computer power (CPU) are enabling increases in data processing that are outpacing tape drive performance and network capacity

3 07/07/2016 3 Volume Distributed and Archived

4 07/07/2016 4 Mitigation •Migrate more data mining and mining preparation activities into the data center in order to reduce the data volume that needs to be distributed •Offers the user a more useful and manageable product

5 07/07/2016 5 TRMM Data Mining •Archive Data Mining “Campaigns” –Schedule a wholesale retrieval of specific data products and offer users the opportunity to extract information from the data being retrieved. •First campaign November 2000 to January 2001 –6 mining algorithms –3 years of TRMM data –3.31 TB to 450 GB (7.4 to 1) •Second campaign September 2001 to present –8 mining algorithms –4+ years of TRMM data –4 TB to 300 GB (13 to 1) •Subscription Data Mining

6 07/07/2016 6 Simple, Scalable Script-based Science Processor (S4P) Data Driven - when data show up, they are automatically processed Work is executed at stations in response to work orders Output work orders are sent to downstream stations Station 1Station 2 Station 3 work order work order work order work order Or, straightforward implementation of a data flow diagram

7 07/07/2016 7 KDD System Architecture

8 07/07/2016 8 Mining Algorithm Code Conventions •Languages Supported –C, Fortran and IDL •Non-interactive/command-line •Input/output filenames •Empty data files •Null-output KDD product files

9 07/07/2016 9 Mining Algorithm Code Integration •Code Analysis •Unit Test •Benchmarks –Execution time –Computer resource use –Output data volumes •Integrate into KDD subsystem •Process 1 days worth of data and wait for user approval •Process 1 months worth of data and wait for user approval

10 07/07/2016 10 Data Mining Algorithms •Fire Algorithm by L. Giglio & J. Kendall (SSAI) –Looks for Fires in VIRS Visible-Infrared Radiance •Coincidence Subsetting by C. Kummerow (CSU) –Extracts surface rainfall from PR rain product coincident with rain gauge data •Content-Based Subsetting by E. Anagnostou (UCONN) –4 algorithms extract information for 8 regions from microwave, precipitation radar, and multi-instrument rainfall products.

11 07/07/2016 11 VIRS Active-Fire Product •Statistical summary product •0.25º (or coarser) spatial resolution •Monthly (or longer) temporal resolution •Normalized (important!) –unique (not total) fire pixels per month –accounts for multiple fire observations due to repeated overpasses and extensive overlapping of pixels at edge of scan •Non-biomass-burning high temperature sources separated –volcanoes, gas flares •Diurnal burning cycle parameters (possible new data layer) –time, amplitude, and width of peak –experimental (under development)

12 07/07/2016 12 VIRS Active-Fire Product: Example Unique Fire Pix: September 2000

13 07/07/2016 13 Future Directions •The third TRMM data mining campaign is tentatively scheduled to start June 2002 •Global Merge IR data mining will begin this month •Provide the capability to mine MODIS data from the ECS data pool

14 07/07/2016 14 Near-line Archive Data Mining (NADM) •Web-based GUI interface •Interactively execute algorithm code on selected data in the ECS data pool •Set up ongoing mining subscriptions for data as it becomes available in the ECS data pool •Pickup the generated products via FTP •Develop, test and execute data mining algorithms without the data center's intervention (Spring 2002) •Use predefined data mining algorithms to mine data from the pool (Spring 2002)

15 07/07/2016 15 NADM Architecture Data Pool Cleanup Monitor Subscribe Master Web Client Web Server On Off Monitor Trigger Subscriptions Cleanup Disk Pool DB /user1 /user2 Trigger Queue Algorithm Export Trigger Queue Algorithm Export /userN Trigger Queue Algorithm Export...


Download ppt "07/07/2016 1 KDD Services at the Goddard DAAC Robert Mack NASA/Goddard Space Flight Center Distributed Active Archive Center."

Similar presentations


Ads by Google