A High-Throughput Computational Approach to Environmental Health Study Based on CyberGIS Xun Shi 1, Anand Padmanabhan 2, and Shaowen Wang 2 1 Department.

Slides:



Advertisements
Similar presentations
NCeSS e-Stat quantitative node Prof. William Browne & Prof. Jon Rasbash University of Bristol.
Advertisements

H EURISTIC S OLVER  Builds and tests alternative fuel treatment schedules (solutions) at each iteration  In each iteration:  Evaluates the effects of.
Hotspot/cluster detection methods(1) Spatial Scan Statistics: Hypothesis testing – Input: data – Using continuous Poisson model Null hypothesis H0: points.
State of CyberGIS State of CyberGIS Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
11 Pre-conference Training MCH Epidemiology – CityMatCH Joint 2012 Annual Meeting Intermediate/Advanced Spatial Analysis Techniques for the Analysis of.
SAN DIEGO SUPERCOMPUTER CENTER The Integration of 2 Science Gateways: CyberGIS + OpenTopography Choonhan Youn, Nancy Wilkins-Diehr, SDSC Christopher Crosby,
A CyberGIS Environment for Near-Real-Time Spatial Analysis of Social Media Data Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory.
Statistical approaches for detecting clusters of disease. Feb. 26, 2013 Thomas Talbot New York State Department of Health Bureau of Environmental and Occupational.
Empirical/Asymptotic P-values for Monte Carlo-Based Hypothesis Testing: an Application to Cluster Detection Using the Scan Statistic Allyson Abrams, Martin.
 Statistical approaches for detecting unexplained clusters of disease.  Spatial Aggregation Thomas Talbot New York State Department of Health Environmental.
Packard BioScience. Packard BioScience What is ArrayInformatics?
Workload Management Massimo Sgaravatto INFN Padova.
UNDERSTANDING SPATIAL DISTRIBUTION OF ASTHMA USING A GEOGRAPHICAL INFORMATION SYSTEM Mohammad A. Rob Management Information Systems University of Houston-Clear.
Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
Statistical Critical Path Selection for Timing Validation Kai Yang, Kwang-Ting Cheng, and Li-C Wang Department of Electrical and Computer Engineering University.
Introduction to the Use of Geographic Information Systems in Public Health Elio Spinello, MPH California State University, Northridge.
Workflow API and workflow services A case study of biodiversity analysis using Windows Workflow Foundation Boris Milašinović Faculty of Electrical Engineering.
MATE-EC2: A Middleware for Processing Data with Amazon Web Services Tekin Bicer David Chiu* and Gagan Agrawal Department of Compute Science and Engineering.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Motivation “Every three minutes a woman is diagnosed with Breast cancer” (American Cancer Society, “Detailed Guide: Breast Cancer,” 2006) Explore the use.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Background: MapReduce and FREERIDE Co-clustering on FREERIDE Experimental.
CyberGIS Toolkit: A Software Toolbox Built for Scalable cyberGIS Spatial Analysis and Modeling Yan Liu 1,2, Michael Finn 4, Hao Hu 1, Jay Laura 3, David.
1 Babak Behzad, Yan Liu 1,2,4, Eric Shook 1,2, Michael P. Finn 5, David M. Mattli 5 and Shaowen Wang 1,2,3,4 Babak Behzad 1,3, Yan Liu 1,2,4, Eric Shook.
Venkatram Ramanathan 1. Motivation Evolution of Multi-Core Machines and the challenges Summary of Contributions Background: MapReduce and FREERIDE Wavelet.
Test Of Distributed Data Quality Monitoring Of CMS Tracker Dataset H->ZZ->2e2mu with PileUp - 10,000 events ( ~ 50,000 hits for events) The monitoring.
AIRNow Web Services Data to Go! Prepared by Steven A. Ludewig, Timothy S. Dye Sonoma Technology, Inc. Petaluma, CA John E. White U.S. Environmental Protection.
Project Title : CyberGIS Project Members : M.S.R Perera D.S Kulasuriya W.M.D Jeewantha Project Title : CyberGIS Project Members : M.S.R Perera D.S Kulasuriya.
Appraisal and Data Mining of Large Size Complex Documents Rob Kooper, William McFadden and Peter Bajcsy National Center for Supercomputing Applications.
CyberGIS in Action CyberGIS in Action Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic.
ROBUST RESOURCE ALLOCATION OF DAGS IN A HETEROGENEOUS MULTI-CORE SYSTEM Luis Diego Briceño, Jay Smith, H. J. Siegel, Anthony A. Maciejewski, Paul Maxwell,
Modeling Destination Choice in MATSim Andreas Horni IVT ETH Zürich July 2011.
1 GISolve – TeraGrid GIScience Gateway Shaowen Wang Department of Geography and Grid Research & educatiOn ioWa (GROW) The University of Iowa May.
High Resolution Models using Monte Carlo Measurement Uncertainty Research Group Marco Wolf, ETH Zürich Martin Müller, ETH Zürich Dr. Matthias Rösslein,
A performance evaluation approach openModeller: A Framework for species distribution Modelling.
Small-Scale Raster Map Projection Transformation Using a Virtual System to Interactively Share Computing Resources and Data U.S. Department of the Interior.
Wenjing Wu Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013.
A Web-based Distributed Simulation System Christopher Taewan Ryu Computer Science Department California State University, Fullerton.
IPDPS 2005, slide 1 Automatic Construction and Evaluation of “Performance Skeletons” ( Predicting Performance in an Unpredictable World ) Sukhdeep Sodhi.
Realizing CyberGIS Vision through Software Integration Anand Padmanabhan, Yan Liu, Shaowen Wang CyberGIS Center for Advanced Digital and Spatial Studies.
Spatial Analysis Workshop In-Class Exercise Tuesday, December 11, 2012.
Video Data Hiding using Forbidden Zone and Selective Embedding Submitted Under Team Members.
Taking ‘Geography’ Seriously: Disaggregating the Study of Civil Wars. John O’Loughlin and Frank Witmer Institute of Behavioral Science University of Colorado.
CLASS Information Management Presented at NOAATECH Conference 2006 Presented by Pat Schafer (CLASS-WV Development Lead)
1 of 30 GIS for Reservoir Management: Estimating Original Gas In Place Jeffrey Vu, M.GIS Candidate Dr. Patrick Kennelly, Advisor.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation Analyzing.
6 February 2009 ©2009 Cesare Pautasso | 1 JOpera and XtremWeb-CH in the Virtual EZ-Grid Cesare Pautasso Faculty of Informatics University.
MROrder: Flexible Job Ordering Optimization for Online MapReduce Workloads School of Computer Engineering Nanyang Technological University 30 th Aug 2013.
©2012 LIESMARS Wuhan University Building Integrated Cyberinfrastructure for GIScience through Geospatial Service Web Jianya Gong, Tong Zhang, Huayi Wu.
A Grid-enabled Multi-server Network Game Architecture Tianqi Wang, Cho-Li Wang, Francis C.M.Lau Department of Computer Science and Information Systems.
1 Kalev Leetaru, Eric Shook, and Shaowen Wang CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department of Geography and Geographic Information.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
Non-parametric Methods for Clustering Continuous and Categorical Data Steven X. Wang Dept. of Math. and Stat. York University May 13, 2010.
GEOSPATIAL CYBERINFRASTRUCTURE. WHAT IS CYBERINFRASTRUCTURE(CI)?  A combination of data resources, network protocols, computing platforms, and computational.
CyberGIS Prof. Wenwen Li School of Geographical Sciences and Urban Planning 5644 Coor Hall
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Constraints on primordial non-Gaussianity.
Lecture 18: Spatial Analysis Using Rasters Jeffery S. Horsburgh CEE 5190/6190 Geographic Information Systems for Civil Engineers Spring 2016.
INTRODUCTION TO XSEDE. INTRODUCTION  Extreme Science and Engineering Discovery Environment (XSEDE)  “most advanced, powerful, and robust collection.
Shaowen Wang 1, 2, Yan Liu 1, 2, Nancy Wilkins-Diehr 3, Stuart Martin 4,5 1. CyberInfrastructure and Geospatial Information Laboratory (CIGI) Department.
SDN controllers App Network elements has two components: OpenFlow client, forwarding hardware with flow tables. The SDN controller must implement the network.
Principles of GIS Fundamental database concepts – II Shaowen Wang
Shaowen Wang1, 2, Yan Liu1, 2, Nancy Wilkins-Diehr3, Stuart Martin4,5
Principles of GIS Fundamental spatial concepts – Part II Shaowen Wang
HLM with Educational Large-Scale Assessment Data: Restrictions on Inferences due to Limited Sample Sizes Sabine Meinck International Association.
EPANET-MATLAB Toolkit An Open-Source Software for Interfacing EPANET with MATLAB™ Demetrios ELIADES, Marios KYRIAKOU, Stelios VRACHIMIS and Marios POLYCARPOU.
CyberGIS: Reston, VA, September 22, 2018
Principles of GIS Fundamental database concepts Shaowen Wang
Shadow: Scalable and Deterministic Network Experimentation
PRasterBlaster: High-Performance Small-Scale Raster Map Projection Transformation Using the Extreme Science and Engineering Discovery Environment  Michael.
Department of Computer Science, University of Tennessee, Knoxville
Presentation transcript:

A High-Throughput Computational Approach to Environmental Health Study Based on CyberGIS Xun Shi 1, Anand Padmanabhan 2, and Shaowen Wang 2 1 Department of Geography, Dartmouth College 2 Department of Geography and Geographic Information Science, National Center for Supercomputing Applications (NCSA), University of Illinois at Urbana Champaign September, 2013

Basic functionality of CyberGIS Accessibility: Making GIS capabilities accessible to a large of number of users for research and education, through online cyberGIS Gateway; Computational Capability: Embedding geospatial software capabilities into advanced cyberinfrastructure environments; Interoperability: Managing heterogeneous and distributed resources and services through GISolve middleware.

Basic functionality of CyberGIS Accessibility: Making GIS capabilities accessible to a large of number of users for research and education, through online cyberGIS Gateway; Computational Capability: Embedding geospatial software capabilities into advanced cyberinfrastructure environments; Interoperability: Managing heterogeneous and distributed resources and services through GISolve middleware.

Disaggregate polygon-level location data using restricted and controlled Monte Carlo (RCMC). Calculate local statistics, e.g., calculate intensity of disease occurrence using kernel ratio estimation (KRE). Estimate statistical significance of the intensity using unrestricted and controlled Monte Carlo (UCMC). A computational approach to spatial epidemiology

Disaggregate polygon-level location data 23 births with defects 1202 births Birth with defect(s) Normal birth Population High Low

Restricted and Controlled Monte Carlo (RCMC) for Disaggregation Assign polygon-level addresses to random locations. The randomization is restricted by the smallest polygon to which a polygon-level address belongs. The randomization is controlled by the detailed background data. The randomization is repeated many times (Monte Carlo).

Advantages of RCMC Allows analyses designed for individual/precise locations to be conducted. Maximize the utilization of available spatial information. Explicitly evaluate the spatial uncertainty caused by the imprecision in the data.

Kernel ratio estimation (KRE) for Estimating Local Disease Intensity Birth with defect(s) Normal birth Essentially, calculate the ratio between cases and cohort for each and every location.

Setting of KRE fixed bandwidth vs. adaptive bandwidth site-side kernel vs. case-side kernel

Types of KRE Site-side fixed bandwidth Case-side fixed bandwidth Site-side adaptive bandwidthCase-side adaptive bandwidth

Unrestricted and Controlled Monte Carlo (UCMC) for Estimating Statistical Significance RCMC KRE UCMC KRE Compare P-value

MalesFemales AGE countrateAGE countrate >0<= >0<= >29<= >29<= >39<= >39<= >49<= >49<= >54<= >54<= >59<= >59<= >64<= >64<= >69<= >69<= >74<= >74<= total3498total2969 Epidemiological Confounding factors 2

mean P-value Std dev of P-value hot spots

RCMC-UCMC-based Simulated Case-Control Study for Detecting Disease-Environment Association Case location from RCMC Control location from UCMC Environmental exposure

Spatial variation in disease-environment association: A map of P-value 1 P-value

Computational Demand I: Number of local statistic computing (e.g. KRE) iterations in RCMC and UCMC RCMC iterations: No. of Strata X No. of iterations for cases X No. of iterations for cohort e.g. 2 X 100 X 100 = 20,000 UCMC iterations: No. of Strata X No. of iterations for simulation X No. of iterations for cohort e.g. 2 X 99 X 100 = 19,800 Scenario: Stratification is needed for addressing confounding factors Case data are at the polygon level Cohort data are at the polygon level Detailed background data are available

No. of iterations for cases X No. of iterations for simulation X No. of iterations for cohort e.g. 100 X 99 X 100 = 990,000 Computational Demand II: Number of layer-on-layer comparisons for estimating P-value

No. of pixels that are not “nodata” pixels e.g. About 3 million in a 1652 X 2912 raster Major operations, use case-side adaptive bandwidth KRE as example: Expand the kernel in a spinning way Accumulate the distance-decayed kernel value for each case encountered Accumulate the cohort value Check if the threshold is met Computational Demand III: Pixel-wise statistic computing

Number of raster layers generated during the process: No. of RCMC iterations + No. of UCMC iterations + No. of Parallel Comparisons e.g. 20, , ,000 = 49,800 Memory: Size of data type X No. of columns X No. of rows X No. of raster layers e.g. 4 bytes X 1652 X 2912 X 49,800 = 550 gigabytes Computational Demand IV – Memory

On a HP Z800 Workstation (2 Xeon CPUs 3.07GHz, 32GB RAM) Mapping birth defects for New Hampshire 1400 birth defect cases for ,000 births for age categories 220 town polygons 100-m resolution female population raster (1652 x 2912) 100 RCMC iterations for cases 100 RCMC iterations for cohort 99 URMC iterations 40 hours

Migrating to cyberGIS Setup infrastructure – New repository created in CyberGIS SVN – Establish a development environment Define the application interface using GISolve Open Service APIs Build and deploy the code on cyberinfrastructure resources from SVN Publish the application Test application execution

Computation Management through GISolve Open Service APIs Compress input into a single zip file and make it available on a Web accessible location – Input to the program include files for point cases, zone cases, cohort, background, zone file, and associated settings need by the application – The URL of the zip file is the single parameter to the Open service APIs Code execution and input/output data are put into a computation sandbox Simply run php job-submit.php and the GISolve middleware will take care of the rest

Parallel computing through CIGI local cluster and XSEDE Original MFC (Windows) code was extracted and adapted to run on the Linux environment Application code has been checked into the CyberGIS SVN for co-development and deployment on a CIGI local cluster and XSEDE Developed a set of parallel and distributed computing strategies based on a spatial computational domain construct Optimizing computational performance of these strategies

Ongoing … Accessibility: Making GIS capabilities accessible to a large of number of users for research and education, through online cyberGIS Gateway; Computational Capability: Embedding geospatial software capabilities into advanced cyberinfrastructure environments; Interoperability: Managing heterogeneous and distributed resources and services through GISolve middleware.

Designing and constructing secured data transporting protocol and tunnel …

Acknowledgements  National Science Foundation - OCI XSEDE SES  NIH P20RO18787  NIH P20ES and EPA RD  Dartmouth Neukom/IQBS CompX Faculty Grant

Thanks! Questions …