Presentation is loading. Please wait.

Presentation is loading. Please wait.

Advisor: Professor Fukuda Student: Jason Woodring A Multi-Agent distributed approach to analyzing large climate data sets.

Similar presentations


Presentation on theme: "Advisor: Professor Fukuda Student: Jason Woodring A Multi-Agent distributed approach to analyzing large climate data sets."— Presentation transcript:

1 Advisor: Professor Fukuda Student: Jason Woodring A Multi-Agent distributed approach to analyzing large climate data sets.

2 World Climate …. Is changing Rising Temperatures… It’s our fault We need to simulate the future changes

3 Local sea level rise Decreased snowpack Glaciers in decline Stream flow peaks earlier in year Winter Floods Summer Droughts Ocean acidification The Impact

4 What's being done about it? Simulated World Climate Data is produced using Many different models Climate Scientists Analyze this Data in different ways to try to predict extreme weather events

5 Climate Analysis Large climate science facilities may have dedicated computing resources to analyze these climate models. Small to medium size labs often are resource constrained, and create custom tools.

6 Climate Science Programming Climate scientists typically use domain specific scripting languages (CDO / NCL). They use the scripting languages to do analysis on climate models produced by different scientific agencies. These environments can be difficult to install and work with for the average person.

7 When the average conditions consistently exceed some threshold extreme weather events are more likely to happen. An essential calculation for predicting climate behavior. Time of Emergence (ToE)

8 UWCA Web Based Application Performs Time of Emergence calculations Parameterizes input Utilizes UW-320 Linux lab computing cluster Using MASS Agents to travel MASS Places / computing Nodes to analyze NetCDF data

9 Workflow Based 1. Select ToE variable to calculate 2. Select climate model(s) to use 3. Select parameters for ToE calculations 4. Then wait….. And view job status 5. Download files when done

10 MASS Library Designed for this type of simulation. Abstracts difficult parallel / distributed programming. Brings the programmer closer to the data they are working with rather than the boiler plate code that is usually necessary in order to achieve these types of simulations.

11 Input Data NetCDF Software [4] A set of libraries and self-describing, machine independent data formats that support the creation, access, and sharing of array-oriented scientific data. Free software available All sorts of software packages and APIs for working with NetCDF from different environments.

12 Provenance (to come from…) Records input climate model files used Tracking all calculation steps, and files produced between them. Tracking parameters used in calculations

13 Implementation Used NetBeans and Glassfish Enterprise Web Application Maven Archetype project Utilizing MASS Places and Agents

14 Architecture – Hardware View

15 Architecture – SAVN Notation

16 Class Diagram

17 Infrastructure Setup Deployed on Juno Port 8080 opened Large size shared NFS storage for climate models NetBeans Project folder setup on file share.

18 GUI Application Header Job Status Viewer Job Creator

19 Algorithms Architecture Using Template Pattern Provides a skeleton for an algorithm while deferring some steps to sub classes. Following Hollywood Principle “Don’t call us, we’ll call you”

20 ToE calculation 5 step calculation 1. Find days over threshold 2 Find historical tolerances 3. Find climatology 4. Least Squared Regression 5. Find ToE

21 Step 1: Find days over Threshold Use Temp Threshold parameter to find days over threshold. Every 365 days temp values collapse to a days over threshold integer value in a year index.

22 Step 1: Code

23 Step 2: Find Historical Tolerances Agents spawn at year 1950 and travel to year 1999 gathering days over threshold variables. User parameter specifies Min / Max %, and values are calculated

24 Step 2: Code

25 Step 3: Find Climatology Agents move to year 1980 and travel along the z axis until 2010 gathering days over threshold values The Climatology is the average of those values collected

26 Step 3: Code

27 Step 4: Least Squared Regression All agents move to year 2006, and travel all the way down to year 2099 gathering days over threshold values For each x && y column sets of values, slopes and confidence intervals are calculated.

28 Step 4: Code

29 Step 4 results in 3 artifacts 3 2-dimensional arrays These array values represent year 2001

30 Step 5: Find ToE The arrays from step 4 get expanded by 200 years

31 Step 5: Code

32 ToE Found! Start at year 2001, and go down each longitude && longitude column to find the year in which the value exceeds the min or max values.

33 Final ToE output 3 2-dimensional array NetCdf files with each cell containing year values representing the ToE

34 Performance Best Timing 1 node 8 thread with current algorithm

35 Algorithm challenges Reading in the 22gb data is time consuming Significant Heap considerations Had to experiment with many different methods of reading in the data Current method reads in all data at master node and distributes over the network to slave nodes Debugging large data sets Debugging remote nodes

36 Provenance File Displays all times for steps of the job Which input files were used What output files were produced What steps were executed for the job What parameters were used for the calculations

37 Usability Analysis Questionnaire Learning how to use the tool was quite simple Adaptability of the tool was rated low, as it requires software developer skillsets. The amount of provenance and detail was rated highly Climate researcher appreciated the workflow logging qualities of the provenance file, it provides information on what steps, files, and parameters were used. The fact that it is a web tool allows access to various interested parties, including funding agencies to perform climate science analysis.

38 Data Visualization The web application features file downloading. The user can use their software of choice for visualization Some free options are: Panopoly Viewer ncBrowse

39 Next Steps Experiment with Places reading Algorithm Add test cases for main classes Add new climate models More issues recorded in Redmine for ongoing work

40 References [1] Fukuda, M., Stiber, M., Salathe, E., & Kim, W. (2013). CDS&E: Small: Multi-Agent-Based Parallelization of Scientific Data Analysis and Simulation. [2] Michalakes, J., Dudhia, J., Henderson, T., Klemp, J., Skamarock, W., & Wang, W. (n.d.). The Weather Research and Forecast Model: Software Architecture and Performance. [3] Fukuda, M. (2010). MASS: Parallel-Computing Library for Multi-Agent Spatial Simulation. [4] Yasutake, B., Simonson, N., Asuncion, H., Fukuda, M., & Salathe, E. (2014). Supporting Provenance in Climate Research. [5] Zender, C., & Mangalam, H. (2007). Scaling Properties of Common Statistical Operators for Gridded Datasets. International Journal of High Performance Computing Applications,21. [6] Dalton, M., Mote, P., & Snover, A. (2013). Climate Change In The Northwest Implications for our Landscapes, Waters, and Communities. Island Press. [7] Salathe, E., Hamlet, A., & Mass, C. (2014). Estimates of 21st Century Flood Risks in the Pacific Northwest. [8] Climate Change Impacts and Adaptation in Washington State. (2013). Climate Impacts Group University of Washington. [9] Climate Change. (2014, January 1). Retrieved July 11, 2014, from http://cses.washington.edu/cig/pnwc/cc.shtmlhttp://cses.washington.edu/cig/pnwc/cc.shtml [10] NetCDF FAQ. (n.d.). Retrieved July 12, 2014, from www.unidata.ucar.edu/software/netcdf/docs/faq.htmlwww.unidata.ucar.edu/software/netcdf/docs/faq.html [11] Software for Manipulating or Displaying NetCDF Data. (n.d.). Retrieved July 13, 2014, from www.unidata.ucar.edu/software/netcdf/software.html www.unidata.ucar.edu/software/netcdf/software.html [12] Muir, L., Brown, J., Risbey, J., & Wijffels, S. (n.d.). Determining the time of emergence of the climate change signal at regional scales. The Center for Australian Weather and Climate Research, Hobart, Australia. [13] Hawkins, E., & Sutton, R. (2012). Time of emergence of climate signals. American Geophysical Union. [14] Keller, K., Joos, F., & Raible, C. (2014). Time of emergence of trends in the ocean biogeochemistry. [15] Time of Emergence of Climate Change Signals in the Puget Sound Basin Quality Assurance Project Plan. (2013). Climate Impacts Group University of Washington. [16] Mahlstein, I., Knutti, R., Solomon, S., & Portmann, R. (2011). Early onset of significant local warming in low latitude countries. [17] Maraun, D. (2013). When will trends in European mean and heavy daily precipitation emerge? [18] Ho, C., Hawkins, E., & Shaffrey, L. (2012). Statistical decadal predictions for sea surface temperatures: A benchmark for dynamical GCM predictions. [19] Capalbo, S., Eigenbrode, S., Glick, P., Littell, J., Raymondi, R., & Reeder, S. (2014). NORTHWEST. In Climate Change Impacts in the United States.


Download ppt "Advisor: Professor Fukuda Student: Jason Woodring A Multi-Agent distributed approach to analyzing large climate data sets."

Similar presentations


Ads by Google