Download presentation
Presentation is loading. Please wait.
1
Metadata Management on the SCEC PetaSHA Project: Helping Users Describe, Discover, Understand, and Use Simulation Data in a Large-scale Scientific Collaboration David Okaya (Univ. Southern California) Ewa Deelman (ISI) Phil Maechling (SCEC) Mona Wong-Barnum (SDSC) Tom Jordan (USC/SCEC) David Meyers (SCEC) AGU December 14, 2007 Southern California Earthquake Center
2
Outline Southern California Earthquake Center (SCEC) and the cyberinfrastructure PetaSHA earthquake hazards project. PetaSHA computer-based estimations of ground shaking: "Platforms" Role of metadata in automated & manual workflow within Platforms. Lessons learned - what works and where we need assistance from computer scientists.
3
Southern California A Natural Laboratory for Earthquake Hazard & Risk Analysis Complex network of over 300 active faults high hazard Large urban population high risk Southern California Earthquake Center (SCEC) coordinates a major program of earthquake research system-level studies of hazard and risk
4
Southern California Earthquake Center Involves 500+ scientists at 55 institutions worldwide Focuses on earthquake system science using Southern California as a natural laboratory Translates basic research into practical products for earthquake risk reduction SCEC Collaboratory –Grid-enabled Community Modeling Environment (CME) developed under NSF’s ITR Program –Partnership with IT organizations in physics-based seismic hazard analysis
5
SCEC - Computer Science - Information Technology Collaboration Cyberinfrastructure layering of the SCEC Collaboratory –Vertical integration of hardware, software, and wetware into a cyber- infrastructure for earthquake scientists and consumer of reseach products. –Across-the-Internet: High performance computing (Terascale, Petascale), Grid services, storage and digital libraries, visualization, portals, validated and optimized scientific codes, scientific workflow technologies.
6
SCEC Focus Groups Sedimentary Basins 1857 rupture San Andreas fault
7
SCEC/CME Computational Platforms Vertically integrated computational configurations (hardware + software + wetware) for physics-based seismic hazard analysis Platform Attributes: –System-level scale range –High-performance hardware –IT/geoscience collaboration –Validated software framework –Workflow management tools –Well-defined interface CyberShake Broadband Simulate one earthquake through earth volume. standard computing capacity & data-intensive computing multiple computing EarthWorks gateway increased complexity Numerical simulations of ground shaking by an earthquake Simulate one earthquake through earth volume including high frequency near-surface shaking (e.g., under buildings). Simulate hundreds to thousands of variations of earthquakes and statistically calculate earthquake probabilities.
8
Platform Metadata: Workflow Provenance and Scientific Content Traditional : –domain scientists do on own. –text, embedded in file headers. CyberShake Broadband Simulate one earthquake through earth volume. standard computing capacity & data-intensive computing multiple computing EarthWorks gateway increased complexity Numerical simulations of ground shaking by an earthquake Simulate one earthquake through earth volume including high frequency near-surface shaking (e.g., under buildings). Simulate hundreds to thousands of variations of earthquakes and statistically calculate earthquake probabilities. Defined : –history. –produced by codes, appended to flat file. Optimized : –history and more: –produced by codes. –upstream metadata used by downstream codes to determine run; eliminates hardcodes. Optimized Defined Traditional
9
Dolan et al. (2003) 3D Velocities of seismic waves Earthworks Gateway Linear workflow with choices; Optimized metadata. scientific workflow management
10
Earthworks Gateway Linear workflow with choices; Optimized metadata. scientific workflow management Two types of metadata workflow metadata: resources, provenance. scientific content metadata: describes products and can be used by scientific codes.
11
Earthworks Gateway Linear workflow with choices; Optimized metadata. scientific workflow management Two types of metadata workflow metadata: resources, provenance. scientific content metadata: describes products and can be used by scientific codes. simulation_codeauthor=Rob_Graves simulation_codename=emod3d #set_region... region_origin_definition=lat_long region_latlong_ellipsoid=WGS-84 region_UTM_zone=11 region_origin_latitude=34.00000 region_origin_longitude=-118.00000 region_lengtheast_m=30000.0 region_lengthnorth_m=30000.0 region_depth_shallow=0.0 region_depth_deep=17000.0 region_velocitymodel=SCEC_CVM3.0 #set_simulation_seismic_times. simulation_tmax=5.000 simulation_dt=0.0050 simulation_timesamples=1001 #define_earthquake_location. eq_latitude=34.05300 eq_longitude=-117.90000 eq_depth_km=2.0000 eq_depth_m=2000.0 eq_Mw=5.00 source_type=PT_DCOUPLE source_wavetype=triangle #set_mesh_info... mesh_dx=100.00 mesh_dy=100.00 mesh_dz=100.00 mesh_nx=301 mesh_ny=301 mesh_nz=171
12
Broadband Platform Parallel workflow with choices; Defined metadata. create EQ. choice of code: 1. Stanford 2. UCSB 3. URSCorp User choice of of earthquake and earth volume earthquake description file library: earth volumes, location info. Combine low & high frequency. filter, time shift, sum, etc. Broadband seismograms (0-10 Hz) workflows Low frequency simulation ( < 1Hz) choice of codes. High frequency simulation ( > 1Hz) choice of codes.
13
Broadband Platform Parallel workflow with choices; Defined metadata. create EQ. choice of code: 1. Stanford 2. UCSB 3. URSCorp User choice of of earthquake and earth volume earthquake description file library: earth volumes, location info. Combine low & high frequency. filter, time shift, sum, etc. Broadband seismograms (0-10 Hz) workflows Low frequency simulation ( < 1Hz) choice of codes. High frequency simulation ( > 1Hz) choice of codes. Mixed type of metadata workflow history & limited scientific content metadata: what codes were run, input files.
14
Broadband Platform Parallel workflow with choices; Defined metadata. create EQ. choice of code: 1. Stanford 2. UCSB 3. URSCorp User choice of of earthquake and earth volume earthquake description file library: earth volumes, location info. Combine low & high frequency. filter, time shift, sum, etc. Broadband seismograms (0-10 Hz) workflows Low frequency simulation ( < 1Hz) choice of codes. High frequency simulation ( > 1Hz) choice of codes. Mixed type of metadata workflow history & limited scientific content metadata: what codes were run, input files. #Starting a Metadata File for the Broadband Platform workflow_name=wf_1_urs_urs_urs workflow_name=urs_genslip urs_genslip.00003_velmod=indata/6904311/nga_rock1.v1d urs_genslip.00002_srcfile=indata/6904311/rg_hd4-eq.src urs_genslip.00001_version=2.3 urs_genslip.00004_genslip=$BIN/genslip-v2.3 read_erf=0 outfile=tmpdata/6904311/tmp_slip stype=urs mag=7.00 nx=128 ny=128 dx=0.281 dy=0.219 dtop=0.000 strike=0 dip=45 rake=90 elon=-118.0000 elat=34.0000 ns=1 nh=1 shypo=10.800 dhypo=22.400 stretch_kcorner=1 dt=0.0250 velfile=indata/6904311/nga_rock1.v1d seed=9 urs_jbrun.00015_wcc_resamp_ardbt=$BIN/wcc_resamp_arbdt newdt=0.025000 infile=tmpdata/6904311/s178.ver outfile=tmpdata/6904311/s178.ver inbin=0 outbin=0
15
CyberShake Platform Ewa Deelman, ISI ERF (Earthquake Rupture Forcast): general earthquake description. Rupture Generator: variations of the earthquake. SGT (Strain Green's Tensor) Generator: numerical simulation of earth response. GM (Ground Motion) Simulation: makes ground shaking of each EQ variation. Hazard Curve Calculator: probability of exceeding a specific level of ground shaking. Simulates ground motions for potential fault ruptures within 200 km of each site. ~12,700 sources in SoCal from USGS 2002 ERF. Extends ERF to multiple hypo- centers and slip models for each source. ~100,000 ground motion simulations for each site. Thousands of runs; Traditional metadata (coherent set not formalized).
16
CyberShake Platform Ewa Deelman, ISI ERF (Earthquake Rupture Forcast): general earthquake description. Rupture Generator: variations of the earthquake. SGT (Strain Green's Tensor) Generator: numerical simulation of earth response. GM (Ground Motion) Simulation: makes ground shaking of each EQ variation. Hazard Curve Calculator: probability of exceeding a specific level of ground shaking. Thousands of runs; Traditional metadata (coherent set not formalized). Domain-scientist structured metadata: metadata packed into name of files (thousands of them): earthquake.rupture.variation# some metadata stored in datafile header blocks. other metadata describing end products stored in a database. primary knowledge resides with operator who runs codes; not easily searchable.
17
Lessons Learned Two tiers of Metadata: (workflow) provenance & scientific content. –Don't cross-over that often except when choices exist and scientific content indicates the choice. Upstream-downstream codes can communicate via metadata. –"one code's output is another code's input". True for data & metadata. –allows construction of dynamic workflows. Choice of different codes when available. NO HARDCODED PARAMETER VALUES. Domain scientists need metadata tools and strategies of metadata structure and naming conventions. –must be able to code with scientific languages such as C, Fortran. –need to use early on before alternative approaches spring up on own. How: Metadata and hence codes tethered to database or free-form via flat text or XML?
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.