Presentation is loading. Please wait.

Presentation is loading. Please wait.

5/19/05 New Geoscience Applications 1 A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS Bill KampBill Kamp, Lumnilogical Research Center,

Similar presentations


Presentation on theme: "5/19/05 New Geoscience Applications 1 A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS Bill KampBill Kamp, Lumnilogical Research Center,"— Presentation transcript:

1 5/19/05 New Geoscience Applications 1 A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS Bill KampBill Kamp, Lumnilogical Research Center, Univ of Minnesota, Bill Kamp

2 5/19/05 New Geoscience Applications 2 The Corewall

3 5/19/05 New Geoscience Applications 3 Overview  The data required for a core interpretation session can be very large.  An individual IODP core's data can be in the 10 to 100 gigabyte range.  To compound this problem, many users will be interpreting at locations with slow internet connections.  Finally users may be interpreting data from databases that are often designed as read-only archives and not designed to hold ‘works in progress' of investigators.  Our goal is to provide a very smart clipboard.

4 5/19/05 New Geoscience Applications 4 The Data Requirement Demand a Database  Workflow Oriented  Large Throughput  Internet Aware  Accept all data types  Locally and Remotely Connect to Geowall  Integrate with legacy Tools  And most Importantly – Transparent –Little or no CWD work by the Researcher  Automatic, automatic, automatic

5 5/19/05 New Geoscience Applications 5 Legacy Tools  Core Log Integration Platform from Lamont-Doherty Earth Observatory (LDEO) Lamont-Doherty Earth Observatory (LDEO) Lamont-Doherty Earth Observatory (LDEO) –Splicer: Provides interactive depth- shifting of multiple holes of core data to build composite sections Splicercomposite sectionsSplicercomposite sections –Sagan: Allows the composite sections output by Splicer to be mapped to their true stratigraphic depths, unifying core and log records Sagan

6 5/19/05 New Geoscience Applications 6 Sample Plot

7 5/19/05 New Geoscience Applications 7 Interfaces  We will provide interfaces that enable the CWD (Computer Workflow Database) to retrieve user selected data from established databases such as JANUS, LacCore Vault, dbSEABED, and PaleoStrat.  We hope to also pull data through the emerging portals such as CHRONOS.  The result is fast cached access to multiple data sources.

8 5/19/05 New Geoscience Applications 8 Features  The CWD captures the results of analyses and interpretations.  As the workflow is captured it can be accessed by other collaborators locally or remotely.  In a high bandwidth environment, such as a core lab or a university office, a group of collaborators could track the work of one-another as they work on the same cores.  In a low-bandwidth environment we will cache the data locally upon first access.  In a zero-bandwidth environment, the CDW can be copied to a portable mass storage device: All pointers are relative to the location of the CWD.

9 5/19/05 New Geoscience Applications 9 Coordinate Systems  Co-registration across coordinate systems, e.g. wire length, geologic boundary, and/or geologic age.  We use the standard algorithms from SAGAN and SPLICER for this purpose.  We intend to take advantage of existing technologies such as the Storage Resource Broker and Meta-data Catalog [SRBMDC] to facilitate the locating of replicated data-sets  We will use SESAR identifiers to uniquely and automatically identify the sample and the author and the experiment when the data is loaded.

10 5/19/05 New Geoscience Applications 10 Database Design  The paradigm for the metadata is: paradigm –Author –Experiment –Raw Data –Presentation  Data type is missing: We support all mime data types –XML and Text stored in the database –All other data stored in the Bin Cache

11 5/19/05 New Geoscience Applications 11 The Data Diagram The Data Diagram

12 5/19/05 New Geoscience Applications 12 Caches  Uploading requires a caching system –Upload Cache, accessed  Directly  FTP  HTTP upload –Archive Cache: All data is stored in raw form in an archive that is permanent –Staging: A temporary holding place for data while it is examined and transformed –Bin Cache: The location of the binary data managed by the database  The complete uploading process, including automatic recognition of the data type, is available as a single script, called ForceUpload. –It is the best way when you have multiple data sets of the same data type.

13 5/19/05 New Geoscience Applications 13 Data Access  All raw data is available via URL’s.  The author has the option of refining the automatically generated presentation, i.e. the HTML page that shows the data.  Presentations can be dynamically built using database data. Tools are provided.  If data is not local, it is transferred to the local bin cache, and the CWD is updated.  If you are not on the internet you need to bring with you the database (small) and the bin cache

14 5/19/05 New Geoscience Applications 14 Sample Presentations  9.134.readme.txt.html 9.134.readme.txt.html  9.137.cwilocs.zip.html 9.137.cwilocs.zip.html  1.195.logo.bmp.html 1.195.logo.bmp.html  1.148.kamp_1218c_021x_07.jpg.html 1.148.kamp_1218c_021x_07.jpg.html  1.7.MOLE-JUAN03-1A.Geotek.and.L-a- b.data.xls.html 1.7.MOLE-JUAN03-1A.Geotek.and.L-a- b.data.xls.html 1.7.MOLE-JUAN03-1A.Geotek.and.L-a- b.data.xls.html  7.122.GLAD4-HVT03-4B-9H-1.BMP.html 7.122.GLAD4-HVT03-4B-9H-1.BMP.html  7.123.GLAD4-HVT03-4C-1H-1.BMP.html 7.123.GLAD4-HVT03-4C-1H-1.BMP.html  7.93.GLAD4-HVT03-4B-1H-1.BMP.html 7.93.GLAD4-HVT03-4B-1H-1.BMP.html

15 5/19/05 New Geoscience Applications 15Replication  The data base is replicated to multiple sites on the internet automatically via TCP/IP. This is a MySql feature.  The URL of the data is sent to the replicated database.  If upon the first access, if the data is not local, it is fetched to the bin cache via a URL, and the pointers in the local CWD are updated.  Currently we have a parent-child relationship: All data is first uploaded to the main CWD.  When we complete the integration of SESAR identifiers, the design will support peer-to-peer relationships.

16 5/19/05 New Geoscience Applications 16 Database Access  Data uploaded via a web site Data uploaded via a web site Data uploaded via a web site  Data pulled out the CWD via Corewall  Data will automatically cross load to other DB’s such as Chronos when there is a meta-data match  The latter will be enforced via XSLT’s

17 5/19/05 New Geoscience Applications 17 Current State  Test versions are on the web:  Currently at http://www.iagp.net/LRC/LrcVault http://www.iagp.net/LRC/LrcVault  Soon to be at http://burnout.geo.umn.edu http://burnout.geo.umn.edu  Documented at http://mm/html/iagp/LRC/LrcVault/ http://mm/html/iagp/LRC/LrcVault/  Currently holds 10 GByte of test data


Download ppt "5/19/05 New Geoscience Applications 1 A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS Bill KampBill Kamp, Lumnilogical Research Center,"

Similar presentations


Ads by Google