Geoffrey Fox Andrea Donnellan May 3, 2004 Network and Grid Computing Computational Geoinformatics Workshop.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
Reusable Components for Grid Computing Portals Marlon Pierce Community Grids Lab Indiana University.
Notes to the presenter. I would like to thank Jim Waldo, Jon Bostrom, and Dennis Govoni. They helped me put this presentation together for the field.
ARCS Data Analysis Software An overview of the ARCS software management plan Michael Aivazis California Institute of Technology ARCS Baseline Review March.
Problem-Solving Environments: The Next Level in Software Integration David W. Walker Cardiff University.
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Principles for Collaboration Systems Geoffrey Fox Community Grids Laboratory Indiana University Bloomington IN 47404
UNIVERSITY of MARYLAND GLOBAL LAND COVER FACILITY High Performance Computing in Support of Geospatial Information Discovery and Mining Joseph JaJa Institute.
TPAC Digital Library Talk Overview Presenter:Glenn Hyland Tasmanian Partnership for Advanced Computing & Australian Antarctic Division Outline: TPAC Overview.
The GEM Computational System and Recent Scientific Results Andrea Donnellan Third International ACES Meeting May 10, 2002 GEM.
3DAPAS/ECMLS panel Dynamic Distributed Data Intensive Analysis Environments for Life Sciences: June San Jose Geoffrey Fox, Shantenu Jha, Dan Katz,
Computing in Atmospheric Sciences Workshop: 2003 Challenges of Cyberinfrastructure Alan Blatecky Executive Director San Diego Supercomputer Center.
GEON Science Application Demos
ISERVO International Solid Earth Research Virtual Observatory Grid/Web Services and Portals Supporting Earthquake Science December AGU Fall Meeting.
Conversational Case Base Recommender Systems for Metadata Discovery Mehmet S. Aktas, Marlon Pierce, Geoffrey Fox and David Leake Indiana University.
DISTRIBUTED COMPUTING
Ohio State University Department of Computer Science and Engineering 1 Cyberinfrastructure for Coastal Forecasting and Change Analysis Gagan Agrawal Hakan.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
IPlant cyberifrastructure to support ecological modeling Presented at the Species Distribution Modeling Group at the American Museum of Natural History.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
The Future of the iPlant Cyberinfrastructure: Coming Attractions.
Instrumentation of the SAM-Grid Gabriele Garzoglio CSC 426 Research Proposal.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
InSAR Working Group Meeting, Oxnard, Oct IT discussion Day 1 afternoon breakout session
1 Geospatial and Business Intelligence Jean-Sébastien Turcotte Executive VP San Francisco - April 2007 Streamlining web mapping applications.
SBIR Final Meeting Collaboration Sensor Grid and Grids of Grids Information Management Anabas July 8, 2008.
Service - Oriented Middleware for Distributed Data Mining on the Grid ,劉妘鑏 Antonio C., Domenico T., and Paolo T. Journal of Parallel and Distributed.
Grid Architecture William E. Johnston Lawrence Berkeley National Lab and NASA Ames Research Center (These slides are available at grid.lbl.gov/~wej/Grids)
National Center for Supercomputing Applications Barbara S. Minsker, Ph.D. Associate Professor National Center for Supercomputing Applications and Department.
6/12/99 Java GrandeT. Haupt1 The Gateway System This project is a collaborative effort between Northeast Parallel Architectures Center (NPAC) Ohio Supercomputer.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
GRID Overview Internet2 Member Meeting Spring 2003 Sandra Redman Information Technology and Systems Center and Information Technology Research Center National.
SIG: Synthetic Seismogram Exchange Standards (formats & metadata) Is it time to establish exchange standards for synthetic seismograms? IRIS Annual Workshop.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
ISERVO and SERVOGrid: (International) Solid Earth Research Virtual Observatory Grid/Web Services and Portals Supporting Earthquake Science Jan
ISERVOGrid Architecture Working Group Brisbane Australia June Geoffrey Fox Community Grids Lab Indiana University
Applications and Requirements for Scientific Workflow Introduction May NSF Geoffrey Fox Indiana University.
NEES Cyberinfrastructure Center at the San Diego Supercomputer Center, UCSD George E. Brown, Jr. Network for Earthquake Engineering Simulation NEES TeraGrid.
Interoperability from the e-Science Perspective Yannis Ioannidis Univ. Of Athens and ATHENA Research Center
Breakout # 1 – Data Collecting and Making It Available Data definition “ Any information that [environmental] researchers need to accomplish their tasks”
GEOSCIENCE NEEDS & CHALLENGES Dogan Seber San Diego Supercomputer Center University of California, San Diego, USA.
3 rd International ACES Meeting Maui, Hawaii May 6-10, 2002 Sponsored by NASA and NSF.
7. Grid Computing Systems and Resource Management
Distributed Data Analysis & Dissemination System (D-DADS ) Special Interest Group on Data Integration June 2000.
Some comments on Portals and Grid Computing Environments PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics,
Development of e-Science Application Portal on GAP WeiLong Ueng Academia Sinica Grid Computing
Earth System Curator and Model Metadata Discovery and Display for CMIP5 Sylvia Murphy and Cecelia Deluca (NOAA/CIRES) Hannah Wilcox (NCAR/CISL) Metafor.
Securing the Grid & other Middleware Challenges Ian Foster Mathematics and Computer Science Division Argonne National Laboratory and Department of Computer.
GRID ANATOMY Advanced Computing Concepts – Dr. Emmanuel Pilli.
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
Partnerships in Innovation: Serving a Networked Nation Grid Technologies: Foundations for Preservation Environments Portals for managing user interactions.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
Interacting Data Services for Distributed Earthquake Modeling Marlon Pierce, Choonhan Youn, and Geoffrey Fox Community Grids Lab Indiana University.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
E-commerce Architecture Ayşe Başar Bener. Client Server Architecture E-commerce is based on client/ server architecture –Client processes requesting service.
ACGT Architecture and Grid Infrastructure Juliusz Pukacki ‏ EGEE Conference Budapest, 4 October 2007.
The Earth System Curator Metadata Infrastructure for Climate Modeling Rocky Dunlap Georgia Tech.
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Clouds , Grids and Clusters
SERVO Grid: Solid Earth Research Virtual Observatory Grid/Web Services and Portals Supporting Earthquake Science July Fourth ACES APEC Cooperation.
University of Technology
iSERVOGrid Architecture Working Group Brisbane Australia June
SDM workshop Strawman report History and Progress and Goal.
Gordon Erlebacher Florida State University
Presentation transcript:

Geoffrey Fox Andrea Donnellan May 3, 2004 Network and Grid Computing Computational Geoinformatics Workshop

Solid Earth Science Questions From NASA’s Solid Earth Science Working Group Report, Living on a Restless Planet, Nov What is the nature of deformation at plate boundaries and what are the implications for earthquake hazards? 2.How do tectonics and climate interact to shape the Earth’s surface and create natural hazards? 3.What are the interactions among ice masses, oceans, and the solid Earth and their implications for sea level change? 4.How do magmatic systems evolve and under what conditions do volcanoes erupt? 5.What are the dynamics of the mantle and crust and how does the Earth’s surface respond? 6.What are the dynamics of the Earth’s magnetic field and its interactions with the Earth system?

The Solid Earth is: Complex, Nonlinear, and Self-Organizing Relevent questions that Computational technologies can help answer: 1.How can the study of strongly correlated solid earth systems be enabled by space-based data sets? 2.What can numerical simulations reveal about the physical processes that characterize these systems? 3.How do interactions in these systems lead to space-time correlations and patterns? 4.What are the important feedback loops that mode-lock the system behavior? 5.How do processes on a multiplicity of different scales interact to produce the emergent structures that are observed? 6.Do the strong correlations allow the capability to forecast the system behavior in any sense? SESWG fed into NASA ESE Computational Technology Requirements Workshop, May 2002

Characteristics of Computing for Solid Earth Science Widely distributed heterogeneous datasets Multiplicity of time and spatial scales Decomposable problems requiring interoperability for full models Distributed models and expertise Enabled by Grids and Networks

Objectives IT approaches: Integrate multiple scales into computer simulations. Web services: Simplified access to data, simulation codes, and flow between simulations of varying types.

What are Grids Good for? They are “Internet Scale Distributed Computing” and support the linking of globally distributed entities in e-Science concept –Computers –Data from repositories and sensors – People Early Grids focused on metacomputing (linking computers together) but recently e-Science has highlighted integration of data and building communities Grid technology naturally build Problem Solving Environments

Some Relevant Grid/Framework Projects QuakeSim and Solid Earth Research Virtual Observatory SERVOGrid (JPL …) GEON: Cyberinfrastructure for the Geosciences (San Diego, Missouri, USGS..) CME: Community Modeling Environment from SCEC CIG: Computational Infrastructure for Geodynamics Geoframework.org Caltech/VPAC ESMF: Earth System Modeling Framework (NASA) NERCGrid: Natural Environment Research Council UK e-Science Earth Systems Grid in DoE Science Grid

Earth Science Computing Capability Capacity

Earth Science Data

Large Scale Parallel Computers Metacomputing Grid Analysis and Visualization NO Capability: Spread a single large Problem over multiple supercomputers YES Capacity: Seamless access to multiple computers Large Disks

Database Research Simulations Analysis and Visualization Portal Repositories Federated Databases Data Filter Services Field Trip Data Streaming Data Sensors ? Discovery Services SERVOGrid ResearchEducation Customization Services From Research to Education Education Grid Computer Farm Geoscience Research and Education Grids

More General Material on Grids Grids today are built in terms of Web Services – a technology designed to support Enterprise Software and e-Business –Provides wonderful support tools –Provides a new software engineering model supporting interoperability Grids do not compete with parallel computing –They let MPI run untouched so your parallel codes run as fast as they used to do Grids do “control/management/metadata management” where higher latency (around 10 milliseconds – thousand times worse than MPI) acceptable Global Grid Forum, W3C, OASIS set relevant standards and support community

Raw (HPC) Resources Middleware Database Portal Services System Services Application Service System Services Grid Computing Environments User Services “Core” Grid Application Metadata Actual Application

Grids provide “Service Oriented Architecture” supporting distributed programs in scalable fashion with clean software engineering “Multi-tier” architecture supporting seamless access with brokers mediating access to diverse computers and data sources “Workflow” integrating different distributed services in a single application Event services to notify computers and people of issues (earthquake struck, job completed) Easy support of parameter searches and other pleasingly parallel applications with many related non-communicating jobs Security (Web Services), Database access (OGSA-DAI), Collaboration (Access Grid, GlobalMMCS) File, data and meta-data management

Web Services Web services are the fundamental pieces of distributed Service Oriented Architectures. We should define lots of useful services that are remotely available –Archival data access services supporting queries, real time sensor access, and mesh generation all seem to be popular choices. Web services have two important parts: –Distributed services –Client applications These two pieces are decoupled: one can build clients to remote services without caring about the programming language implementation of the remote service. –Java, C++, Python

Web Services, Continued Clients can be built in any number of styles –We build portlet clients: ubiquitous, can combine –One can build fancier GUI client applications. –You can even embed Web service client stubs (library routines) in your application code, so that your code can make direct calls to remote data sources, etc. Regardless of the client one builds, the services are the same in all cases: –my portal and your application code may each use the same service to talk to the same database. So we need to concentrate on services and let clients bloom as they may: –Client applications (portals, GUIs, etc.) will have a much shorter lifecycle than service interface definitions, if we do our job correctly. –Client applications that are locked into particular services, use proprietary data formats and wire protocols, etc., are at risk. Use WSRF/JSR-168 Portlet standards

Data Deluged Science During the HPCC Initiative , we worried about data in the form of parallel I/O or MPI-IO, but we didn’t consider it as an enabler of new algorithms and new ways of computing Data assimilation was not central to HPCC DoE ASCI (Stockpile Stewardship) set up because didn’t want/have test data! Now particle physics will get 100 petabytes from CERN LHC –Nuclear physics (Jefferson Lab) in same situation –Use continuously ~30,000 CPU’s simultaneously 24X7 Weather, climate, solid earth (EarthScope) Bioinformatics curated databases Virtual Observatory and SkyServer in Astronomy Environmental Sensor nets

DataInformationIdeasSimulationModelAssimilationReasoningDatamining Computational Science Informatics Data Deluged Science Computing Paradigm

HPC Simulation Data Filter Data Filter Data Filter Data Filter Data Filter Distributed Filters massage data For simulation Other Grid and Web Services Analysis Control Visualize Data Deluged Science Computing Architecture Grid OGSA-DAI Grid Services Grid Data Assimilation

Some Questions for Data Deluged Science A new trade-off: How to split funds between sensors and simulation engines No systematic study of how best to represent data deluged sciences without known equations at resolution of interest Data assimilation very relevant Relationship to “just” interpolating data and then extrapolating a little Role of Uncertainty Analysis – everything (equations, model, data) is uncertain! Relationship of data mining and simulation Growing interest in Data curation and provenance Role of Cellular Automata (CA) Potts Models and Neural Networks which are “fundamental equation free” approaches

Recommendations of NASA’s Computational Technologies Workshop (May 2002) 1.Create a Solid Earth Research Virtual Observatory (SERVO) Numerous distributed heterogeneous real-time datasets Seamless access to large distributed volumes of data Data handling and archiving part of framework Tools for visualization, datamining, pattern recognition, and data fusion 2.Develop an Solid Earth Science Problem Solving Environment (PSE) Addresses the NASA specific challenges of multiscale modeling Model and algorithm development and testing, visualization, and data assimilation Scalable to workstations or supercomputer depending on size of problem Numerical libraries existing within a compatible framework 3.Improve the Computational Environment PetaFLOP computers with Terabytes of RAM Distributed and cluster computers for decomposable problems Development of GRID technologies

SERVOGrid Requirements Seamless Access to Data repositories and large scale computers Integration of multiple data sources including sensors, databases, file systems with analysis system –Including filtered OGSA-DAI (Grid database access) Rich meta-data generation and access with SERVOGrid specific Schema extending openGIS (Geography as a Web service) standards and using Semantic Grid Portals with component model for user interfaces and web control of all capabilities Collaboration to support world-wide work Basic Grid tools: workflow and notification Not metacomputing

Solid Earth Research Virtual Observatory (SERVO) Tier2 Center Archive SERVO … GoddardJPLAmes Institute Fully functional problem solving environment Plug and play composing of parallel programs from algorithmic modules On-demand downloads of 100 GB in 5 minutes 10 6 volume elements rendering in real-time Program-to-program communication in milliseconds Approximately 100 model codes Data cache ~TBytes/day Tier2 Center Tier 0 +1 Tier 1 Tier 3 Tier 4 Tier2 Center 1 PB per year data rate in 2010 Distributed Heterogeneous Real-Time Datasets Observations Archive Downlink Archive Downlink … … … … … … … 100 TeraFLOPs sustained Tier 2 Workstations, other portals Mbits/sec

Virtual Observatory Project Timeline Capability Architecture & technology approach Decomposition into services with requirements Prototype cooperative federated data base service integrating 5 datasets of 10 TB each Prototype data analysis service Prototype modeling service capable of integrating 5 modules Prototype 1920x1080 pixels at 120 frames per second visualization service Scaled to 100 sites Solid earth research virtual observatory (SERVO) On-demand downloads of 100 GB files from 40 TB datasets within 5 minutes. Uniform access to 1000 archive sites with volumes from 1 TB to 1 PB NASA CT Workshop, May 2002

Problem Solving Environment Project Timeline Capability Isolated platform dependent code fragments Prototype PSE front end (portal) integrating 10 local and remote services Extend PSE to Include 20 users collaboratory with shared windows Seamless access to high-performance computers linking remote processes over Gb data channels. Integrated visualization service with volumetric rendering Fully functional PSE used to develop models for building blocks for simulations. Program-to-program communication in milliseconds using staging, streaming, and advanced cache replication Integrated with SERVO Plug and play composing of parallel programs from algorithmic modules Plug and play composing of sequential programs from algorithmic modules NASA CT Workshop, May 2002

Computational Environment Timeline Capability 100’s GigaFLOPs 40 GB RAM 1 Gb/s network bandwidth ~100 model codes with parallel scaled efficiency of 50% ~10 4 PetaFLOPs throughput per subfield per year ~100 TeraFLOPs sustained capability per model ~10 6 volume elements rendering in real time Access to mixture of platforms low cost clusters (20-100) to supercomputers with massive memory and thousands of processors NASA CT Workshop, May 2002 This slide appears inconsistent with slide 8

Solid Earth Research Virtual Observatory (iSERVO) Web-services (portal) based Problem Solving Environment (PSE) Couples data with simulation, pattern recognition software, and visualization software Enable investigators to seamlessly merge multiple data sets and models, and create new queries. Data Spaced-based observational data Ground-based sensor data (GPS, seismicity) Simulation data Published/historical fault measurements Analysis Software Earthquake fault Lithospheric modeling Pattern recognition software

Philosophy Store simulated and observed data Archive simulation data with original simulation code and analysis tools Access heterogeneous distributed data through cooperative federated databases Couple distributed data sources, applications, and hardware resources through an XML-based Web Services framework. Users access the services (and thus distributed resources) through Web browser-based Problem Solving Environment clients. The Web services approach defines standard, programming language-independent application programming interfaces, so non-browser client applications may also be built.

SERVOGrid Basics Under development in collaboration with researchers at JPL, UC-Davis, USC, and Brown University. Geoscientists develop simulation codes, analysis and visualization tools. We need a way to bind distributed codes, tools, and data sets. We need a way to deliver it to a larger audience –Instead of downloading and installing the code, use it as a remote service.

SERVOGrid Application Descriptions Codes range from simple “rough estimate” codes to parallel, high performance applications. –Disloc: handles multiple arbitrarily dipping dislocations (faults) in an elastic half-space. –Simplex: inverts surface geodetic displacements for fault parameters using simulated annealing downhill residual minimization. –GeoFEST: Three-dimensional viscoelastic finite element model for calculating nodal displacements and tractions. Allows for realistic fault geometry and characteristics, material properties, and body forces. –Virtual California: Program to simulate interactions between vertical strike- slip faults using an elastic layer over a viscoelastic half-space –RDAHMM: Time series analysis program based on Hidden Markov Modeling. Produces feature vectors and probabilities for transitioning from one class to another. –PARK: Boundary element program to calculate fault slip velocity history based on fault frictional properties.a model for unstable slip on a single earthquake fault. Preprocessors, mesh generators Visualization tools: RIVA, GMT

iSERVO Web Services Job Submission: supports remote batch and shell invocations –Used to execute simulation codes (VC suite, GeoFEST, etc.), mesh generation (Akira/Apollo) and visualization packages (RIVA, GMT). File management: –Uploading, downloading, backend crossloading (i.e. move files between remote servers) –Remote copies, renames, etc. Job monitoring Apache Ant-based remote service orchestration –For coupling related sequences of remote actions, such as RIVA movie generation. Database services: support SQL queries Data services: support interactions with XML-based fault and surface observation data. –For simulation generated faults (i.e. from Simplex) –XML data model being adopted for common formats with translation services to “legacy” formats. –Migrating to Geography Markup Language (GML) descriptions.

Some Conclusions Grids facilitates support –International Collaborations –Integration of computing with distributed data repositories and real-time sensors –Web services from a variety of fields (e.g. map services from openGIS) –Seamless access to multiple networked compute resources including computational steering –Software infrastructure for Problem Solving Environments