Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.

Slides:



Advertisements
Similar presentations
Designing Services for Grid-based Knowledge Discovery A. Congiusta, A. Pugliese, Domenico Talia, P. Trunfio DEIS University of Calabria ITALY
Advertisements

Cyberinfrastructure for Coastal Forecasting and Change Analysis
Kensington Oracle Edition: Open Discovery Workflow Meets Oracle 10g Professor Yike Guo.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Computer Science and Engineering A Middleware for Developing and Deploying Scalable Remote Mining Services P. 1DataGrid Lab A Middleware for Developing.
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
1 Building National Cyberinfrastructure Alan Blatecky Office of Cyberinfrastructure EPSCoR Meeting May 21,
Exploiting Domain-Specific High-level Runtime Support for Parallel Code Generation Xiaogang Li Ruoming Jin Gagan Agrawal Department of Computer and Information.
Parallel Processing CS453 Lecture 2.  The role of parallelism in accelerating computing speeds has been recognized for several decades.  Its role in.
Ohio State University Department of Computer Science and Engineering Automatic Data Virtualization - Supporting XML based abstractions on HDF5 Datasets.
A Metadata Based Approach For Supporting Subsetting Queries Over Parallel HDF5 Datasets Vignesh Santhanagopalan Graduate Student Department Of CSE.
Definition of Computational Science Computational Science for NRM D. Wang Computational science is a rapidly growing multidisciplinary field that uses.
Ohio State University Department of Computer Science and Engineering 1 Supporting SQL-3 Aggregations on Grid-based Data Repositories Li Weng, Gagan Agrawal,
Ohio State University Middleware Systems Driven by Sensing Scenarios Gagan Agrawal CSE (Joint Work with Qian Zhu, David Chiu, Ron Li, Keith Bedford ….
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Shared Memory Parallelization of Decision Tree Construction Using a General Middleware Ruoming Jin Gagan Agrawal Department of Computer and Information.
Data-Intensive Computing: From Multi-Cores and GPGPUs to Cloud Computing and Deep Web Gagan Agrawal u.
Secure Sensor Data/Information Management and Mining Bhavani Thuraisingham The University of Texas at Dallas October 2005.
Performance Prediction for Random Write Reductions: A Case Study in Modelling Shared Memory Programs Ruoming Jin Gagan Agrawal Department of Computer and.
Big Data Vs. (Traditional) HPC Gagan Agrawal Ohio State ICPP Big Data Panel (09/12/2012)
Ohio State University 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan Ferhatosmanoglu Xutong Niu Ron Li Keith Bedford.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Data-Intensive Computing: From Clouds to GPUs Gagan Agrawal June 1,
Computer Science and Engineering Predicting Performance for Grid-Based P. 1 IPDPS’07 A Performance Prediction Framework.
1 Computing Challenges for the Square Kilometre Array Mathai Joseph & Harrick Vin Tata Research Development & Design Centre Pune, India CHEP Mumbai 16.
Computer Science and Engineering Parallelizing Defect Detection and Categorization Using FREERIDE Leonid Glimcher P. 1 ipdps’05 Scaling and Parallelizing.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
GRID ARCHITECTURE Chintan O.Patel. CS 551 Fall 2002 Workshop 1 Software Architectures 2 What is Grid ? "...a flexible, secure, coordinated resource- sharing.
FREERIDE: System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Ge Yang Gagan Agrawal Department of Computer and Information.
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu & Gagan Agrawal Enabling.
What is SAM-Grid? Job Handling Data Handling Monitoring and Information.
High-level Interfaces and Abstractions for Data-Driven Applications in a Grid Environment Gagan Agrawal Department of Computer Science and Engineering.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
CCGrid, 2012 Supporting User Defined Subsetting and Aggregation over Parallel NetCDF Datasets Yu Su and Gagan Agrawal Department of Computer Science and.
May 6, 2002Earth System Grid - Williams The Earth System Grid Presented by Dean N. Williams PI’s: Ian Foster (ANL); Don Middleton (NCAR); and Dean Williams.
1 Supporting Dynamic Migration in Tightly Coupled Grid Applications Liang Chen Qian Zhu Gagan Agrawal Computer Science & Engineering The Ohio State University.
Ohio State University Department of Computer Science and Engineering An Approach for Automatic Data Virtualization Li Weng, Gagan Agrawal et al.
Supporting Load Balancing for Distributed Data-Intensive Applications Leonid Glimcher, Vignesh Ravi, and Gagan Agrawal Department of ComputerScience and.
Computer Science and Engineering FREERIDE-G: A Grid-Based Middleware for Scalable Processing of Remote Data Leonid Glimcher Gagan Agrawal.
Programming Sensor Networks Andrew Chien CSE291 Spring 2003 May 6, 2003.
Big traffic data processing framework for intelligent monitoring and recording systems 學生 : 賴弘偉 教授 : 許毅然 作者 : Yingjie Xia a, JinlongChen a,b,n, XindaiLu.
System Support for High Performance Data Mining Ruoming Jin Leo Glimcher Xuan Zhang Gagan Agrawal Department of Computer and Information Sciences Ohio.
High-level Interfaces for Scalable Data Mining Ruoming Jin Gagan Agrawal Department of Computer and Information Sciences Ohio State University.
Ohio State University Department of Computer Science and Engineering Servicing Range Queries on Multidimensional Datasets with Partial Replicas Li Weng,
Research Overview Gagan Agrawal Associate Professor.
Ohio State University Department of Computer Science and Engineering 1 Tools and Techniques for the Data Grid Gagan Agrawal The Ohio State University.
Data Grid Research Group Dept. of Computer Science and Engineering The Ohio State University Columbus, Ohio 43210, USA David Chiu and Gagan Agrawal Enabling.
Euro-Par, HASTE: An Adaptive Middleware for Supporting Time-Critical Event Handling in Distributed Environments ICAC 2008 Conference June 2 nd,
1 A Grid-Based Middleware’s Support for Processing Distributed Data Streams Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering.
Cyberinfrastructure Overview of Demos Townsville, AU 28 – 31 March 2006 CREON/GLEON.
Ohio State University Department of Computer Science and Engineering 1 Tools and Techniques for the Data Grid Gagan Agrawal.
System Support for High Performance Scientific Data Mining Gagan Agrawal Ruoming Jin Raghu Machiraju S. Parthasarathy Department of Computer and Information.
1 Supporting a Volume Rendering Application on a Grid-Middleware For Streaming Data Liang Chen Gagan Agrawal Computer Science & Engineering Ohio State.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Computer Science and Engineering Parallelizing Feature Mining Using FREERIDE Leonid Glimcher P. 1 ipdps’04 Scaling and Parallelizing a Scientific Feature.
Chapter 1 Characterization of Distributed Systems
Applying Control Theory to Stream Processing Systems
Design and Manufacturing in a Distributed Computer Environment
QianZhu, Liang Chen and Gagan Agrawal
Liang Chen Advisor: Gagan Agrawal Computer Science & Engineering
Supporting Fault-Tolerance in Streaming Grid Applications
SDM workshop Strawman report History and Progress and Goal.
GATES: A Grid-Based Middleware for Processing Distributed Data Streams
Resource Allocation in a Middleware for Streaming Data
Big DATA.
Automatic and Efficient Data Virtualization System on Scientific Datasets Li Weng.
New (Applications of) Compiler Techniques for Data Grids
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
FREERIDE: A Framework for Rapid Implementation of Datamining Engines
L. Glimcher, R. Jin, G. Agrawal Presented by: Leo Glimcher
Presentation transcript:

Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan Ferhatosmanoglu Xutong Niu Ron Li Keith Bedford The Ohio State University

Ohio State University Department of Computer Science and Engineering 2 Context New Award from Office of Cyberinfrastructure (OCI) –Under Cyberinfrastructure for Environmental Observatories Program –September 2006 – August 2009, total amount $1,400,000 Involves 2 Computer Scientists and 2 Environmental Scientists –G. Agrawal (PI) – Grid Middleware –H. Ferhatosmanoglu – Databases –K. Bedford: Great Lakes Now/Forecasting –R. Li: Coastal Erosion Analysis

Ohio State University Department of Computer Science and Engineering 3 Coastal Forecasting and Change Detection (Lake Erie)

Ohio State University Department of Computer Science and Engineering 4 Project Premise Limitation of Current Environmental Observation Systems –Tightly coupled systems »No reuse of algorithms »Very hard to experiment with new algorithms –Closely tied to existing resources Our claim –Emerging trends towards web-services and grid- services can help

Ohio State University Department of Computer Science and Engineering 5 Challenges Existing Grid Middleware Systems have not considered –Processing of Streaming Data –Data Integration Issues The applications involved needs techniques for multi-modal data fusion, query planning, and data mining –Need to implement them as grid or web-services

Ohio State University Department of Computer Science and Engineering 6 Proposed Infrastructure and Collaboration

Ohio State University Department of Computer Science and Engineering 7 Application Details: Great Lakes Now/ForeCasting GLOS: Great Lakes Observing System –Co-designer/project manager: K. Bedford, a co-PI on this project –Collaboration with NOAA Limitations: Hard-wired –Cannot incorporate new streams or algorithms Create an Implementation using our Middleware for Streaming Data

Ohio State University Department of Computer Science and Engineering 8 Application Details: Coastal Erosion Prediction and Analysis Focus: Erosion along Lake Erie Shore –Serious problem –Substantial Economic Losses Prediction requires data from –Variety of Satellites –In-situ sensors –Historical Records Challenges –Analyzing distributed data –Data Integration/Fusion

Ohio State University Department of Computer Science and Engineering 9 Middleware Developed at Ohio State Middleware Developed at Ohio State Automatic Data Virtualization Framework –Enabling processing and integration of data in low- level formats GATES (Grid-based AdapTive Execution on Streams) –Processing of distributed data streams FREERIDE-G (FRamework for Rapid Implementation of Datamining Engines in Grid) –Supporting scalable data analysis on remote data

Ohio State University Department of Computer Science and Engineering 10 Automatic Data Virtualization: Motivation Access mechanisms for remote repositories –Complex low-level formats make accessing and processing of data difficult –Main desired functionality »Ability to select, down-load, and process a subset of data Sensor Data –Again, low level data –Need to convert formats –Need a flexible architecture

Ohio State University Department of Computer Science and Engineering 11 Data Virtualization An abstract view of data dataset Data Service Data Virtualization By Global Grid Forum’s DAIS working group: A Data Virtualization describes an abstract view of data. A Data Service implements the mechanism to access and process data through the Data Virtualization

Ohio State University Department of Computer Science and Engineering 12 Our Approach: Automatic Data Virtualization Automatically create data services –A new application of compiler technology A metadata descriptor describes the layout of data on a repository An abstract view is exposed to the users Two implementations: –Relational /SQL-based –XML/XQuery based

Ohio State University Department of Computer Science and Engineering 13 Streaming Data Model Continuous data arrival and processing Emerging model for data processing –Sources that produce data continuously: sensors, long running simulations –Critical In Environmental Observatories Active topic in many computer science communities –Databases –Data Mining –Networking ….

Ohio State University Department of Computer Science and Engineering 14 Need for a Grid-Based Stream Processing Middleware Application developers interested in data stream processing –Will like to have abstracted »Grid standards and interfaces »Adaptation function –Will like to focus on algorithms only GATES is a middleware for –Grid-based –Self-adapting Data Stream Processing

Ohio State University Department of Computer Science and Engineering 15 Adaptation for Real-time Processing Analysis on streaming data is approximate Accuracy and execution rate trade-off can be captured by certain parameters (Adaptation parameters) –Sampling Rate –Size of summary structure Application developers can expose these parameters and a range of values

Ohio State University Department of Computer Science and Engineering 16 FREERIDE-G: Supporting Distributed Data-Intensive Science Data Repository Cluster Compute Cluster User ?

Ohio State University Department of Computer Science and Engineering 17 Challenges for Application Development Analysis of large amounts of disk resident data Incorporating parallel processing into analysis Processing needs to be independent of other elements and easy to specify Coordination of storage, network and computing resources required Transparency of data retrieval, staging and caching is desired

Ohio State University Department of Computer Science and Engineering 18 FREERIDE-G Goals Support High-End Processing –Enable efficient processing of large scale data mining computations Ease Use of Parallel Configurations –Support shared and distributed memory parallelization starting from a common high-level interface Hide Details of Data Movement and Caching –Data staging and caching (when feasible/appropriate) needs to be transparent to application developer

Ohio State University Department of Computer Science and Engineering 19 Data Analysis Services Multi-model Multi-Sensor Data Integration –Built on our Data Virtualization Framework Query Planning Service –Feature Extraction: Integration with Grid Metadata Catalogs Remote Mining of Spatio-Temporal Data –Built using FREERIDE-G Mining algorithms for Data Streams –Built using GATES

Ohio State University Department of Computer Science and Engineering 20 Recap

Ohio State University Department of Computer Science and Engineering 21 Looking For Feedback on our approach Synergy with other efforts Lessons learnt by others