Download presentation
Presentation is loading. Please wait.
Published byEmory Goodwin Modified over 8 years ago
1
Servicing Seismic and Oil Reservoir Simulation Data through Grid Data Services Sivaramakrishnan Narayanan, Tahsin Kurc, Umit Catalyurek and Joel Saltz Multiscale Computing Lab Biomedical Informatics Department The Ohio State University http://www.bmi.osu.edu http://www.multiscalecomputing.org
2
VLDB-DMG'052 Joel Saltz Gagan Agrawal Umit Catalyurek Shannon Hastings Vijay S Kumar Tahsin Kurc Steve Langella Scott Oster Tony Pan Benjamin Rutt Narayanan Sivaramakrishnan, Li Weng Michael Zhang Multiscale Computing Lab http://www.multiscalecomputing.org
3
VLDB-DMG'053 Analysis Production rates, bypass oil, net present value Workflow Run new reservoir simulations Data Seismic, well pressures, reservoir simulations Generate requests for new simulations, new seismic studies Obtain initial, boundary conditions, input parameters for simulations Store and index simulation results Summary data from datasets Spatio-temporal queries Simulate multiple realizations of multiple geostatistical models and production strategies Evaluate geologic uncertainty and production strategies simultaneously Enable on-demand exploration and comparison of multiple scenarios –Integration of a robust, Grid-based computational and data handling infrastructure –Distributed databases of reservoir and geophysical data –Storage and computing resources at multiple institutions Implementing effective oil and gas production
4
VLDB-DMG'054 Characteristics and Issues Spatio-temporal datasets –Simulations carried out/data captured on 3D meshes over many time steps –Multiple data attributes per data point (gas pressure, oil saturation, seismic traces, etc). Very large datasets –Tens of gigabytes to 100+ TB data Lots of simulation runs –Up to thousands of runs for a study are possible Data can be stored in distributed collection of files Distributed datasets –Data may be captured at multiple locations by multiple groups –Simulations are carried out at multiple sites Common operations: subsetting, filtering, interpolations, projections, comparisons, frequency counts
5
VLDB-DMG'055 Data Management, Access and Integration Tracking of metadata associated with data –Metadata defining simulation parameters, mesh description, files associated with simulations, etc. –Metadata defining seismic measurements (location, year, files storing data, etc.) Support for data subsetting and filtering on file- based, distributed datasets Support for on-demand data product generation –Track metadata associated with data analysis workflows Grid data services and distributed querying –Make data and data products available through Grid service interfaces
6
VLDB-DMG'056 Applications developers generally prefer storing data in files Support high level queries on multi-dimensional distributed datasets Many possible data abstractions, query interfaces Grid virtualized object-relational database or XML database Grid virtualized objects with user defined methods invoked to access and process data Data Virtualization Our Approach Support a basic SQL Select query with a virtual relational table view or a virtual XML database view A lightweight layer on top of datasets –Runtime middleware carries out query execution, query planning
7
VLDB-DMG'057 Middleware Support Data Virtualization: STORM –Large data querying capabilities, layered on DataCutter –Distributed data virtualization –Indexing, Subsetting, Data Cluster/Decluster, Parallel Data Transfer Data Analysis/Processing Workflows: DataCutter –Component Framework for Combined Task/Data Parallelism –Filtering/Program coupling Service: Distributed C++ component framework –On demand data product generation Distributed Metadata and Data Management: Mobius –Create, manage, version data definitions –Management of metadata and data instances –Data integration Grid Data Services (OGSA-DAI) –Defines services and interfaces that can be used by clients to specify operations on data resources and data
8
VLDB-DMG'058 Data Management, Access, Integration Grid-level data services via OGSA-DAI Management of data definitions and metadata, XML virtualization via Mobius Object-relational virtualization and subsetting of file based datasets via STORM On-demand data product generation via DataCutter STORM, Mobius, DataCutter support data operations on heterogeneous collections of storage and compute clusters Schema Management Mobius Data Product Generation DataCutter SQL Virtualization of Files STORM XML Virtualization Metadata Management Mobius OGSA-DAI Grid Protocols
9
VLDB-DMG'059 Data Management, Access, and Integration Schema Management Mobius Data Product Generation DataCutter SQL Virtualization of Files STORM XML Virtualization Metadata Management Mobius Grid-data Service (OGSA-DAI) Data Product Generation DataCutter SQL Virtualization of Files STORM XML Virtualization Metadata Management Mobius Data Product Generation DataCutter SQL Virtualization of Files STORM XML Virtualization Metadata Management Mobius Grid-data Service (OGSA-DAI) Grid Service Protocols Grid-data Service (OGSA-DAI) Seismic Data Simulation Data Seismic/Simulation Data
10
VLDB-DMG'0510 Data Querying and Processing Seismic Data Geostatistics Model 1 Model 2 Model n … … m realizations Well Pattern p Production Strategies Well Pattern 1 … Well Pattern 2 Reservoir Simulations
11
VLDB-DMG'0511 STORM Support efficient selection of the data of interest from distributed scientific datasets and transfer of data from storage clusters to compute clusters Data Subsetting Model –Virtual Tables –Select Queries –Distributed Arrays SELECT FROM Dataset-1, Dataset-2,…, Dataset-n WHERE AND )> GROUP-BY-PROCESSOR ComputeAttribute( )
12
VLDB-DMG'0512 STORM Services Query Meta-data Indexing Data Source Filtering Partition Generation Data Mover
13
VLDB-DMG'0513 Grid Data Resource Grid has emerged as an integrated infrastructure for distributed computation OGSA-DAI initiative is to deliver high level data management functionality for the Grid. –Defines services and interfaces that can be used by clients to specify operations on data resources and data OGSA-DAI services can be configured to expose a specific database management system. To be a GDS, a service must accept perform documents and return results –Interpretation of perform documents is open to interpretation –Traditionally wrap SQL queries
14
VLDB-DMG'0514 STORM Data Resource Extractor Filter Data Mover Storm Daemon JDBC Driver GDS STORM instance Data Resource
15
VLDB-DMG'0515 Experimental Setup All nodes running linux Gigabit switch mob8 nodesDual 1.4 GHz AMD Optron 8 GB memory1.5 TB local disk Xio162 Xeon 2.4 GHz4 GB memory7.3 TB FAStT600 disk array DatasetAttributesRecord SizeRecords (millions) Dataset (GB)Cluster, Num nodes Oil Reservoir2184 bytes3,840315Mob,03 Seismic164240 bytes2471,056Xio,16 TXm624 bytesX24 * X / 1MMob,01
16
VLDB-DMG'0516 STORM Results Seismic Datasets 10-25GB per file. About 30-35TB of Data.
17
VLDB-DMG'0517 Comparison with MySQL - 1 Varying table size. Per tuple cost is lesser
18
VLDB-DMG'0518 Comparison with MySQL - 2 Varying query size Also compare them as data resources
19
VLDB-DMG'0519 Oil Reservoir Data Results Improvements due to: treating records as array of bytes, combining results at client
20
VLDB-DMG'0520 Seismic Data Results 96 x 11GB files on 16 nodes
21
VLDB-DMG'0521 Conclusions Overview of work related to Large Scale Scientific Data Management at Multi-Scale Computing Lab Exposed STORM as a Grid Data Service –Results on use case: Oil reservoir management For more info / to download STORM, DataCutter, Mobius http://www.multiscalecomputing.org or http://www.bmi.osu.edu
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.