5/19/05 New Geoscience Applications 1 A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS Bill KampBill Kamp, Lumnilogical Research Center,

Slides:



Advertisements
Similar presentations
LEAD Portal: a TeraGrid Gateway and Application Service Architecture Marcus Christie and Suresh Marru Indiana University LEAD Project (
Advertisements

TCP/IP Protocol Suite 1 Copyright © The McGraw-Hill Companies, Inc. Permission required for reproduction or display. Chapter 22 World Wide Web and HTTP.
ASCR Data Science Centers Infrastructure Demonstration S. Canon, N. Desai, M. Ernst, K. Kleese-Van Dam, G. Shipman, B. Tierney.
CoreWall: A Visualization Environment for the Analysis of Lake and Ocean Cores Arun Rao – Electronic Visualization Lab, University of Illinois at Chicago.
SOFTWARE PRESENTATION ODMS (OPEN SOURCE DOCUMENT MANAGEMENT SYSTEM)
Copyright 2004 Monash University IMS5401 Web-based Systems Development Topic 2: Elements of the Web (g) Interactivity.
Toolbox Mirror -Overview Effective Distributed Learning.
Information Retrieval in Practice
The Internet Useful Definitions and Concepts About the Internet.
Web Servers How do our requests for resources on the Internet get handled? Can they be located anywhere? Global?
Development of a Community Hydrologic Information System Jeffery S. Horsburgh Utah State University David G. Tarboton Utah State University.
World Wide Web1 Applications World Wide Web. 2 Introduction What is hypertext model? Use of hypertext in World Wide Web (WWW) – HTML. WWW client-server.
Astrophysics, Biology, Climate, Combustion, Fusion, Nanoscience Working Group on Simulation-Driven Applications 10 CS, 10 Sim, 1 VR.
CS CS 5150 Software Engineering Lecture 13 System Architecture and Design 1.
SESSION 9 THE INTERNET AND THE NEW INFORMATION NEW INFORMATIONTECHNOLOGYINFRASTRUCTURE.
Mgt 240 Lecture Website Construction: Software and Language Alternatives March 29, 2005.
Overview of Search Engines
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Module 9: SQL Server 2005 Replication. Overview Overview of Replication Enhancements New Types of Replication Configuring Replication.
Sys Prog & Scripting - HW Univ1 Systems Programming & Scripting Lecture 15: PHP Introduction.
ViciDocs for BPO Companies Creating Info repositories from documents.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Overview of SQL Server Alka Arora.
Databases and the Internet. Lecture Objectives Databases and the Internet Characteristics and Benefits of Internet Server-Side vs. Client-Side Special.
CSS/417 Introduction to Database Management Systems Workshop 5.
Research paper: Web Mining Research: A survey SIGKDD Explorations, June Volume 2, Issue 1 Author: R. Kosala and H. Blockeel.
About Dynamic Sites (Front End / Back End Implementations) by Janssen & Associates Affordable Website Solutions for Individuals and Small Businesses.
Using Taxonomies Effectively in the Organization v. 2.0 KnowledgeNets 2001 Vivian Bliss Microsoft Knowledge Network Group
AERONET Web Data Access and Relational Database David Giles Science Systems and Applications, Inc. NASA Goddard Space Flight Center.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
Business Solutions Using Microsoft ® Office SharePoint ® Server ROADSHOW.
Fundamentals of XML Management Greg Alexopoulos Systems Engineer Documentum.
Web Page Design I Basic Computer Terms “How the Internet & the World Wide Web (www) Works”
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Chapter 34 Java Technology for Active Web Documents methods used to provide continuous Web updates to browser – Server push – Active documents.
Unit – I CLIENT / SERVER ARCHITECTURE. Unit Structure  Evolution of Client/Server Architecture  Client/Server Model  Characteristics of Client/Server.
Web Design and Development for E-Business By Jensen J. Zhao Copyright 2003 Prentice Hall, Inc. Web Design and Development for E-Business Jensen J. Zhao.
V e RSI Victorian eResearch Strategic Initiative VBL Introduction Crystal 25 Rev 1.2.
Using Taxonomies Effectively in the Organization KMWorld 2000 Mike Crandall Microsoft Information Services
Open access & visibility Management Digital Preservation ORA: Purposes.
1 Schema Registries Steven Hughes, Lou Reich, Dan Crichton NASA 21 October 2015.
BIEN Confederated DB (S) Analytical DB(s) Heterogeneous source database(s) of Plots/Specimens/Occurrences Synonymy Names Reference taxonomy *** *** Feedback.
INTEGRATED OCEAN DRILLING PROGRAM MANAGEMENT INTERNATIONAL International Data Exchange Workshop – Kiel, Germany – May 9-11, 2007 SEDIS Scientific Earth.
GEM Portal and SERVOGrid for Earthquake Science PTLIU Laboratory for Community Grids Geoffrey Fox, Marlon Pierce Computer Science, Informatics, Physics.
Implementation Experiences METIS – April 2006 Russell Penlington & Lars Thygesen - OECD v 1.0.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
Application Layer Honolulu Community College Cisco Academy Training Center Semester 1 Version
GEON2 and OpenEarth Framework (OEF) Bradley Wallet School of Geology and Geophysics, University of Oklahoma
Schedule Introduction to Web & Database Integration Tools and Resources HTML and Styles Forms and Client-Side Scripts DB Engines Forms Processing and Server-Side.
SWGData and Software Access - 1 UCB, Nov 15/16, 2006 THEMIS SCIENCE WORKING TEAM MEETING Data and Software Access Ken Bromund GST Inc., at NASA/GSFC.
Internet Applications (Cont’d) Basic Internet Applications – World Wide Web (WWW) Browser Architecture Static Documents Dynamic Documents Active Documents.
ASP-2-1 SERVER AND CLIENT SIDE SCRITPING Colorado Technical University IT420 Tim Peterson.
AHM04: Sep 2004 Nottingham CCLRC e-Science Centre eMinerals: Environment from the Molecular Level Managing simulation data Lisa Blanshard e- Science Data.
8 th Semester, Batch 2009 Department Of Computer Science SSUET.
File Transfer And Access (FTP, TFTP, NFS). Remote File Access, Transfer and Storage Networks For different goals variety of approaches to remote file.
Distributed Archives Interoperability Cynthia Y. Cheung NASA Goddard Space Flight Center IAU 2000 Commission 5 Manchester, UK August 12, 2000.
1 Introduction to Active Directory Directory Services Uniquely identify users and resources on a network Provide a single point of network management.
The Virtual Observatory and Ecological Informatics System (VOEIS): Using RESTful architecture and an extensible data model to provide a unique data management.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Grid Services for Digital Archive Tao-Sheng Chen Academia Sinica Computing Centre
Information Retrieval in Practice
Application Layer Honolulu Community College
z/Ware 2.0 Technical Overview
Spark Presentation.
Cataloging the Internet
Manuscript Transcription Assistant Initiative
Introduction to Active Directory Directory Services
敦群數位科技有限公司(vanGene Digital Inc.) 游家德(Jade Yu.)
9/8/ :03 PM © 2006 Microsoft Corporation. All rights reserved.
Presentation transcript:

5/19/05 New Geoscience Applications 1 A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS Bill KampBill Kamp, Lumnilogical Research Center, Univ of Minnesota, Bill Kamp

5/19/05 New Geoscience Applications 2 The Corewall

5/19/05 New Geoscience Applications 3 Overview  The data required for a core interpretation session can be very large.  An individual IODP core's data can be in the 10 to 100 gigabyte range.  To compound this problem, many users will be interpreting at locations with slow internet connections.  Finally users may be interpreting data from databases that are often designed as read-only archives and not designed to hold ‘works in progress' of investigators.  Our goal is to provide a very smart clipboard.

5/19/05 New Geoscience Applications 4 The Data Requirement Demand a Database  Workflow Oriented  Large Throughput  Internet Aware  Accept all data types  Locally and Remotely Connect to Geowall  Integrate with legacy Tools  And most Importantly – Transparent –Little or no CWD work by the Researcher  Automatic, automatic, automatic

5/19/05 New Geoscience Applications 5 Legacy Tools  Core Log Integration Platform from Lamont-Doherty Earth Observatory (LDEO) Lamont-Doherty Earth Observatory (LDEO) Lamont-Doherty Earth Observatory (LDEO) –Splicer: Provides interactive depth- shifting of multiple holes of core data to build composite sections Splicercomposite sectionsSplicercomposite sections –Sagan: Allows the composite sections output by Splicer to be mapped to their true stratigraphic depths, unifying core and log records Sagan

5/19/05 New Geoscience Applications 6 Sample Plot

5/19/05 New Geoscience Applications 7 Interfaces  We will provide interfaces that enable the CWD (Computer Workflow Database) to retrieve user selected data from established databases such as JANUS, LacCore Vault, dbSEABED, and PaleoStrat.  We hope to also pull data through the emerging portals such as CHRONOS.  The result is fast cached access to multiple data sources.

5/19/05 New Geoscience Applications 8 Features  The CWD captures the results of analyses and interpretations.  As the workflow is captured it can be accessed by other collaborators locally or remotely.  In a high bandwidth environment, such as a core lab or a university office, a group of collaborators could track the work of one-another as they work on the same cores.  In a low-bandwidth environment we will cache the data locally upon first access.  In a zero-bandwidth environment, the CDW can be copied to a portable mass storage device: All pointers are relative to the location of the CWD.

5/19/05 New Geoscience Applications 9 Coordinate Systems  Co-registration across coordinate systems, e.g. wire length, geologic boundary, and/or geologic age.  We use the standard algorithms from SAGAN and SPLICER for this purpose.  We intend to take advantage of existing technologies such as the Storage Resource Broker and Meta-data Catalog [SRBMDC] to facilitate the locating of replicated data-sets  We will use SESAR identifiers to uniquely and automatically identify the sample and the author and the experiment when the data is loaded.

5/19/05 New Geoscience Applications 10 Database Design  The paradigm for the metadata is: paradigm –Author –Experiment –Raw Data –Presentation  Data type is missing: We support all mime data types –XML and Text stored in the database –All other data stored in the Bin Cache

5/19/05 New Geoscience Applications 11 The Data Diagram The Data Diagram

5/19/05 New Geoscience Applications 12 Caches  Uploading requires a caching system –Upload Cache, accessed  Directly  FTP  HTTP upload –Archive Cache: All data is stored in raw form in an archive that is permanent –Staging: A temporary holding place for data while it is examined and transformed –Bin Cache: The location of the binary data managed by the database  The complete uploading process, including automatic recognition of the data type, is available as a single script, called ForceUpload. –It is the best way when you have multiple data sets of the same data type.

5/19/05 New Geoscience Applications 13 Data Access  All raw data is available via URL’s.  The author has the option of refining the automatically generated presentation, i.e. the HTML page that shows the data.  Presentations can be dynamically built using database data. Tools are provided.  If data is not local, it is transferred to the local bin cache, and the CWD is updated.  If you are not on the internet you need to bring with you the database (small) and the bin cache

5/19/05 New Geoscience Applications 14 Sample Presentations  readme.txt.html readme.txt.html  cwilocs.zip.html cwilocs.zip.html  logo.bmp.html logo.bmp.html  kamp_1218c_021x_07.jpg.html kamp_1218c_021x_07.jpg.html  1.7.MOLE-JUAN03-1A.Geotek.and.L-a- b.data.xls.html 1.7.MOLE-JUAN03-1A.Geotek.and.L-a- b.data.xls.html 1.7.MOLE-JUAN03-1A.Geotek.and.L-a- b.data.xls.html  GLAD4-HVT03-4B-9H-1.BMP.html GLAD4-HVT03-4B-9H-1.BMP.html  GLAD4-HVT03-4C-1H-1.BMP.html GLAD4-HVT03-4C-1H-1.BMP.html  7.93.GLAD4-HVT03-4B-1H-1.BMP.html 7.93.GLAD4-HVT03-4B-1H-1.BMP.html

5/19/05 New Geoscience Applications 15Replication  The data base is replicated to multiple sites on the internet automatically via TCP/IP. This is a MySql feature.  The URL of the data is sent to the replicated database.  If upon the first access, if the data is not local, it is fetched to the bin cache via a URL, and the pointers in the local CWD are updated.  Currently we have a parent-child relationship: All data is first uploaded to the main CWD.  When we complete the integration of SESAR identifiers, the design will support peer-to-peer relationships.

5/19/05 New Geoscience Applications 16 Database Access  Data uploaded via a web site Data uploaded via a web site Data uploaded via a web site  Data pulled out the CWD via Corewall  Data will automatically cross load to other DB’s such as Chronos when there is a meta-data match  The latter will be enforced via XSLT’s

5/19/05 New Geoscience Applications 17 Current State  Test versions are on the web:  Currently at  Soon to be at  Documented at  Currently holds 10 GByte of test data