Presentation is loading. Please wait.

Presentation is loading. Please wait.

QuakeSim: Grid Computing, Web Services, and Portals for Earthquake Science Marlon Pierce Community Grids Lab Indiana University.

Similar presentations


Presentation on theme: "QuakeSim: Grid Computing, Web Services, and Portals for Earthquake Science Marlon Pierce Community Grids Lab Indiana University."— Presentation transcript:

1 QuakeSim: Grid Computing, Web Services, and Portals for Earthquake Science Marlon Pierce Community Grids Lab Indiana University

2 Acknowledgements  Prof. Geoffrey Fox, CGL Director  Many external collaborators: Andrea Donnellan and team (JPL), Yehuda Bock and team (Scripps/UCSD), Neil Devadason, John Buechler, and David Coats (POLIS)  Dr. Yili Gong  Graduate Students  Choonhan Youn (now with GEON project)*  Galip Aydin*  Harshawardhan Gadgil  Mehmet S. Aktas  Ahmet Sayar  Zhigang Qi  Zao Liu  Jong Youl Choi

3 Grids and Cyberinfrastructure  Cyberinfrastructure is a term coined by the National Science Foundation in the famous “Atkins Report”.  http://www.nsf.gov/od/oci/reports/toc.jsp  Prof. Dan Atkins (UM) is now the head of NSF’s Office of Cyberinfrastructure.  Roughly synonymous with  eScience (UK)  Grid Computing (DOE and NSF)  Global Information Grid (DOD), etc.

4 What Is CI, Really?  Computing, Data Storage, Networking  NSF TeraGrid (www.teragrid.org)  Open Sciences Grid (www.opensciencegrid.org)  Many international equivalents  Middleware  Globus: multi-institutional security, job management, file transfer, data management, system monitoring  Condor: Cycle-scavenging and job scheduling.  And many others: see for example the TeraGrid’s Common TeraGrid Software Stack, the OSG’s Virtual Data Toolkit and the NMI Grids Center for composite releases.  Scientific Gateways (like QuakeSim)  Useful Online Services  NIH’s PubMed, PubChem  Most Grids are built these days with Web Services and follow Service Oriented Architecture principles.

5 QuakeSim Project Requirements and Architecture Contributions from Choonhan Youn, Ahmet Sayar, Galip Aydin, Harsh Gadgil, and collaborators’ codes

6 Science Gateways  QuakeSim is an example of a science gateway.  Google “TeraGrid Science Gateways” for other examples.  Combines a Web portal and Web services to access on-line data sources and connect them to geophysical applications running on computing resources.

7 QuakeSim Applications and Their Data  Pattern Informatics (UC-Davis)  Earthquake forecasting code, uses seismic archives as input  Regularized Dynamic Annealing Hidden Markov Method (RDAHMM) (JPL)  Time series analysis code, can be applied to GPS and seismic archives.  Identifies signal components (possibly associated with underlying physical causes) with no fixed parameters.  GeoFEST (JPL/CalTech)  Finite element code for detailed modeling of fault stresses, seismic displacements, uses fault models as input.

8 Data Requirements  QuakeTables Fault Database  QuakeSim’s fault repository for California.  Compatible with GeoFEST, Disloc, VC  GPS Data sources and formats (RDAHMM and others).  JPL: ftp://sideshow.jpl.nasa.gov/pub/mbh  SOPAC: ftp://garner.ucsd.edu/pub/timeseries  USGS: http://pasadena.wr.usgs.gov/scign/Analysis/plotdata/  Seismic Event Data (RDAHMM and others)  SCSN: http://www.scec.org/ftp/catalogs/SCSN  SCEDC: http://www.scecd.scec.org/ftp/catalogs/SCEC_DC  Dinger-Shearer: http://www.scecdc.org/ftp/catalogs/dinger- shearer/dinger-shearer.catalog  Haukkson: http://www.scecdc.scec.org/ftp/catalogs/hauksson/Socal

9 JSP + Client Stubs DB Service JDBC DB Job Sub/Mon And File Services Operating and Queuing Systems WSDL Browser Interface WSDL Visualization Or Map Service DB WSDL Host 1 (WFS)Host 2 (Grid)Host 3 (WMS) My “octopus” diagram, from the archives. SOAP/HTTP HTTP(S)

10 GIS Services as a Data Grid  We decided that the Data Grid components of SERVO is best implemented using standard GIS services.  Use Open Geospatial Consortium standards  Maximize reusability in future QuakeSim projects  Provide downloadable GIS software to the community as a side effect of QuakeSim research.  We implemented two cornerstone standards  Web Feature Service (WFS): data service for storing abstract map features  Supports queries  Faults, GPS, seismic records  Web Map Service (WMS): generate interactive maps from WFS’s and other WMS’s.  We built these as Web Services  WSDL and SOAP: programming interfaces and messaging formats  You can work with the data and map services through programming APIs as well as browser interfaces.  See www.crisisgrid.org.www.crisisgrid.org

11 Plotting Google satellite maps with QuakeTables fault overlays for Los Angeles.

12 Pattern Informatics  This has been our simplest “proving ground” example.  Integrates (streaming) WFS, WMS, WS-Context, and HPSearch’s WSProxy services (wraps PI executable and helper format conversion services).  This is basically a linear workflow

13 Whole earth seismic catalog plotted on NASA map server. Combines streaming feature server and map server. Pattern informatics results combined with Feature and Map servers can be used to forecast areas of increased earthquake probability.

14 Data Flow or Event Flow?  Octopus slide implies a sequential data flow between applications on distributed hosts.  Usually called “scientific workflow” in the CI community.  See http://vtcpc.isi.edu/wiki/ for the an overview and players.  See www.hpsearch.org for our work to using JavaScript as a workflow language.www.hpsearch.org  This is not MPI or parallel programming. It’s more like a stone age mash-up.  Services don’t need to know much about each other.  Don’t have to be from the same providers  Loosely coupled.  Transfer data (or URL pointers) as needed.  Event flow and traditional message passing are better suited for closely coupled applications.  See for example DOE’s CCA project and NASA’s Earth System Modeling Framework (ESMF).

15 Portlet Development We use JSR 168 portlets to build sharable portal plugins.

16 Portlets: Portal Components  Web portals are essentially websites with logins.  Personalization, content control, etc, derive from this.  Java portals are based on a standard component/container model.  Componets are called portlets  JSR 168 is the standard  Many TeraGrid and other science gateways use this standard.

17 Portlet Summary RDAHMMSet up and run RDAHMM, query Scripps GRWS GPS Service, maintain persistent user sessions. ST_FilterSimilar to RDAHMM portlet; ST_Filter has much more input. Station MonitorShows GPS stations on a Google Map, displays last 10 minutes of data. Real Time RDAHMMDisplays RDAHMM results of last 10 minutes of GPS data in a Google map. Seismic Archive Query Portlet Google Map portlet that shows seismic events based on your query. Fault Query PortletAllows you to query the QuakeTables fault data base for information on faults.

18 RDAHMM Portlet: Main Navigation

19 RDAHMM Project Set Up

20 RDAHMM GRWS Query Interface

21 RDAHMM Results Page

22 Real Time RDAHMM Portlet

23 Station Monitor Portlet

24 ST_Filter Portlets

25 Managing Real Time GPS Data Slides from Galip Aydin

26 California Real Time Network Network Data Rates Message Format TimeRYOASCIIGML CRTN GPS Site Positions (9 Stations) 1 second1.5KB4.03KB48.7KB 1 hour5.31MB14.18MB171.31MB 1 day127.44MB340.38MB4.01GB 1 month3.8GB9.97GB123.3GB 1 year45.8GB119.67GB1.41TB Entire SCIGN Network (250 stations) 1year1.23TB16.18TB160TB Continuous GPS Stations (CGPS) are depicted as triangles while the Real-Time stations are represented as circles. Image is obtained from SOPAC GPS Explorer at http://sopac.ucsd.edu/projects/realtime How does one manage all the data generated by the 85 stations? How can you get just the data you want? Note this is fundamentally different from traditional request/response style Web Services.

27 Processing Real-Time GPS Streams 27 ryo2nb Raw Data 7010 7011 7012 RYO Ports NB Server ryo2as cii ascii2gm l ascii2po s Single Station Displaceme nt Filter Station Health Filter RDAHMM Filter Scripp s RTD Server Scripp s RTD Server ryo2nb Raw Data ryo2as cii ascii2po s Single Station RDAHMM Filter A Complete Sensor Message Processing Path, including a data analysis application. /SOPAC/GPS/CRTN01/R YO /SOPAC/GPS/CRTN01/A SCII /SOPAC/GPS/CRTN01/P OS /SOPAC/GPS/CRTN01/DS ME GPS Networks

28 Application Integration with Real-Time Filters Station Monitor Filter records real-time positions for 10 minutes and calculates position changes Graph Plotter Application creates visual representation of the positions. RDAHMM Filter records real-time positions for 10 minutes and invokes RDAHMM application which determines state changes in the XYZ signal. Graph Plotter Application creates visual representation of the RDAHMM output. 28

29 NB Server RYO To ASCII Converter Simple Filter RYO Publisher 1 RYO Publisher 2 RYO Publisher n 2 – Multiple Publishers Test  We add more GPS networks by running more publishers.  The results show that 1000 publishers can be supported with no performance loss. This is an operating system limit. 29 Topi c 1A Topi c 1B Topi c 2 Topi c n

30 4 – Multiple Brokers Test  NaradaBrokering allows creation of Broker networks.  We create a two-broker network.  Messages published to first broker can be received from the second broker.  We take timings on each broker.  We connect 750 clients to each broker and run for 24 hours. We chose 750 clients to stay well below the saturation limit.  The results show that the performance is very good and similar to single broker test. 30 NB Server 1 NB Server 2 RYO To ASCII Converter Simpl e Filter 1 RYO Publisher Topi c 1A Topi c 1B Simpl e Filter 2 Simple Filter 750 Simple Filter 751 Simple Filter 752 Simple Filter 1500 Topi c 1B NB Serv er 2

31 Supporting Geographical Information Systems Slides courtesy of Zao Liu

32 Integrating Map Servers  Geographical Information Systems combine online dynamic maps and databases.  Many GIS software packages exist  GIS servers around state of Indiana  ESRI ArcIMS and ArcMap Server (Marion, Vanderburgh, Hancock, Kosciusco, Huntington, Tippecanoe)  Autodesk MapGuide (Hamilton, Hendricks, Monroe, Wayne)  WTH Mapserver™ Web Mapping Application (Fulton, Cass, Daviess, City of Huntingburg) based on several Open Source projects (Minnesota Map Server)  Challenge: make 17 different county map servers from different companies work together.  92 counties in Indiana, so potentially 92 different map servers.

33 Considerations  We assume heterogeneity in GIS map and feature servers.  GIS services are organized bottom-up rather than top-down.  Local city governments, 92 different county governments, multiple Indiana state agencies, inter-state (Ohio, Kentucky) consideration, federal government data providers (Hazus).  Must find a way to federate existing services.  We must reconcile ESRI, Autodesk, OGC, Google Map, and other technical approaches.  Must try to take advantage of Google, ESRI, etc rather than compete.  We must have good performance and interactivity.  Servers must respond quickly--launching queries to 20 different map servers is very inefficient.  Clients should have simplicity and interactivity of Google Maps and similar AJAX style applications.

34 Caching and Tiling Maps  Federation through caching:  WMS and WFS resources are queried and results are stored on the cache servers.  WMS images are stored as tiles.  These can be assembled into new images on demand (c. f. Google Maps).  Projections and styling can be reconciled.  We can store multiple layers this way.  We build adapters that can work with ESRI and OGC products; tailor to specific counties.  Serving images as tiles  Client programs obtain images directly from our tile server.  That is, don’t go back to the original WMS for every request.  Similar approaches can be used to mediate WFS requests.  This works with Google Map-based clients.  The tile server can re-cache and tile on demand if tile sections are missing.

35 35 Browser + Google Map API Cass County Map Server (OGC Web Map Server) Hamilton County Map Server (AutoDesk) Marion County Map Server (ESRI ArcIMS) Browser client fetches image tiles for the bounding box using Google Map API. Tile Server Cache Server Adapter Tile Server requests map tiles at all zoom levels with all layers. These are converted to uniform projection, indexed, and stored. Overlapping images are combined. Must provide adapters for each Map Server type. The cache server fulfills Google map calls with cached tiles at the requested bounding box that fill the bounding box. Google Maps Server

36 Map Server Example Marion and Hancock county parcel plots and IDs are overlaid on IU aerial photographic images that are accessed by this mashup using Google Map APIs. We cache and tile all the images from several different map servers. (Marion and Hancock actually use different commercial software.)

37 Final Thoughts

38 It’s the Data, Stupid  Grids have been distracted by complicated security issues.  Accounts, allocations, authentication, etc on supercomputers.  It assumes a lot of people actually want to do this.  But arguably most people really want access to data and results, not computers.  Ex: PubChem has properties on 12 million drug-like molecules online, can be browsed for free.  The Grid security model is equivalent to actually giving you a key to the lab.  My suggestion: leave the Grid to the experts and try to think of as many online data services that can be created using results from TeraGrid resources.  Challenge: use all of the TeraGrid, NASA, Open Science Grid, China National Grid, etc, etc to opportunistically perform these calculations.  Why not? The infrastructure is there.

39 Multiple Grid Job Execution

40 Web 2.0?  QuakeSim and many similar science gateways have generally correct approach...  Web Services, online components. ...but arguably the details need to be changed.  We have been following the Enterprise model (IBM, HP, MS, Sun).  JSR 168, WSRP, WSDL, SOAP, WS-*  Maybe time to switch to the Internet model  Google desktop, Netvibes startpage  Programmable Web, mash ups, AJAX, REST, etc.

41 More Information  mpierce@cs.indiana.edu mpierce@cs.indiana.edu  www.crisisgrid.org www.crisisgrid.org  www.quakesim.org (being updated) www.quakesim.org

42 The End http://www.tryscience.org/grid/master/mas ter.html

43 WFS + Seismic Rec. WSDL WFS + State Bounds WSDL WMS + OnEarth Or Google Maps “REST” … Aggregating WMS Stubs Web Map Client Stubs WSDL SOAP HTTP

44 Tying It All Together: HPSearch  HPSearch is an engine for orchestrating distributed Web Service interactions  It uses an event system and supports both file transfers and data streams.  Legacy name  HPSearch flows can be scripted with JavaScript  HPSearch engine binds the flow to a particular set of remote services and executes the script.  HPSearch engines are Web Services, can be distributed interoperate for load balancing.  Boss/Worker model  ProxyWebService: a wrapper class that adds notification and streaming support to a Web Service.  More info: http://www.hpsearch.orghttp://www.hpsearch.org

45

46 SensorGrid Architecture  Major components :  Real-Time filters  Publish-Subscribe System  Information Service  Filters can be run as Web Services to create workflows.  Filter Chains can be deployed for complex processing.  Streaming messaging provide high-performance transfer options. 46


Download ppt "QuakeSim: Grid Computing, Web Services, and Portals for Earthquake Science Marlon Pierce Community Grids Lab Indiana University."

Similar presentations


Ads by Google