Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer.

Slides:



Advertisements
Similar presentations
Retrieval of Information from Distributed Databases By Ananth Anandhakrishnan.
Advertisements

Database Management3-1 L3 Database Management Santa R. Susarapu Ph.D. Student Virginia Commonwealth University.
The North American Carbon Program Google Earth Collection Peter C. Griffith, NACP Coordinator; Lisa E. Wilcox; Amy L. Morrell, NACP Web Group Organization:
Entomological Collections Network Meeting, Indianapolis, IN 13 December 2009 Darwin Core Ratified in the Year of Darwin Gail E. Kampmeier Illinois Natural.
GI Systems and Science January 30, Points to Cover  Recap of what we covered so far  A concept of database Database Management System (DBMS) 
Database Security and Auditing: Protecting Data Integrity and Accessibility Chapter 5 Database Application Security Models.
LSIR Developments for SiwssEx Hoyoung Jeung EPFL-LSIR SwissEx Annual Meeting, Zurich 15 th July.
Organizing Data & Information
Chapter 5 Database Application Security Models
The University of Akron Dept of Business Technology Computer Information Systems Database Management Approaches 2440: 180 Database Concepts Instructor:
DATABASE DEVELOPMENT STRATEGIES TOP DOWNTOP DOWN –Large scale application driven by strategic objectives –General  Specific –Organization-wide (“data.
® IBM Software Group © IBM Corporation IBM Information Server Deliver – Federation Server.
Session-01. Hibernate Framework ? Why we use Hibernate ?
Database Management COP4540, SCS, FIU An Introduction to database system.
Data Management Needs and Challenges for Telemetry Scientists Josh M London Wildlife Biologist, Polar Ecosystems Program National Marine Mammal Laboratory.
CHRONOS Cinzia Cervato, Doug Fils, Geoff Bohling, Pat Diver, Doug Greer, Brice Lambi, Josh Reed, and Xiaoyun Tang Geoinformatics 2006, May 12, 2006.
Database System Concepts and Architecture Lecture # 3 22 June 2012 National University of Computer and Emerging Sciences.
Intro to MIS – MGS351 Databases and Data Warehouses Chapter 3.
GCE-LTER Taxonomic Database: An automated database application for displaying custom species lists on the web Wade Sheldon GCE Information Manager GCE.
Concepts of Database Management, Fifth Edition Chapter 1: Introduction to Database Management.
Tutorial 10 Adding Spry Elements and Database Functionality Dreamweaver CS3 Tutorial 101.
1 Distributed Database Concepts 8:30-10:00AM Thursday, July 21 st 2005 CSIG05 Chaitan Baru.
Web-Enabled Decision Support Systems
2. Database System Concepts and Architecture
Page 1 Informatics Pilot Project EDRN Knowledge System Working Group San Antonio, Texas January 21, 2001 Steve Hughes Thuy Tran Dan Crichton Jet Propulsion.
Science Environment for Ecological Knowledge: EcoGrid Matthew B. Jones National Center for.
SYST Web Technologies SYST Web Technologies Databases & MySQL.
Universal Data Access and OLE DB. Customer Requirements for Data Access Technologies High-Performance access to data Reliability Vendor Commitment Broad.
Lecture2: Database Environment Prepared by L. Nouf Almujally & Aisha AlArfaj 1 Ref. Chapter2 College of Computer and Information Sciences - Information.
Fundamentals of Information Systems, Seventh Edition 1 Chapter 3 Data Centers, and Business Intelligence.
INTEGRATED OCEAN DRILLING PROGRAM MANAGEMENT INTERNATIONAL International Data Exchange Workshop – Kiel, Germany – May 9-11, 2007 SEDIS Scientific Earth.
Lecture2: Database Environment Prepared by L. Nouf Almujally 1 Ref. Chapter2 Lecture2.
FEN Introduction to the database field:  Applications, concepts and terminology Seminar: Introduction to relational databases.
Role of Spatial Database in Biodiversity Conservation Planning Sham Davande, GIS Expert Arid Communities Technologies, Bhuj 11 September, 2015.
MANAGING DATA RESOURCES ~ pertemuan 7 ~ Oleh: Ir. Abdul Hayat, MTI.
Oracle's Distributed Database Bora Yasa. Definition A Distributed Database is a set of databases stored on multiple computers at different locations and.
DATABASE TOOLS CS 260 Database Systems. Overview  Database accounts  Oracle SQL Developer  MySQL Workbench.
NA-MIC National Alliance for Medical Image Computing UCSD: Engineering Core 2 Portal and Grid Infrastructure.
IODE Ocean Data Portal - ODP  The objective of the IODE Ocean Data Portal (ODP) is to facilitate and promote the exchange and dissemination of marine.
Visual Programing SQL Overview Section 1.
Application Development
The US Long Term Ecological Research (LTER) Network: Site and Network Level Information Management Kristin Vanderbilt Department of Biology University.
Database Systems Lecture 1. In this Lecture Course Information Databases and Database Systems Some History The Relational Model.
Hellenic Centre for Marine Research (HCMR) MedOBIS - Ocean Biogeographic Information System for the Eastern Mediterranean and Black Sea.
Fundamentals of Web DevelopmentRandy Connolly and Ricardo HoarFundamentals of Web DevelopmentRandy Connolly and Ricardo Hoar Fundamentals of Web DevelopmentRandy.
5/19/05 New Geoscience Applications 1 A DISTRIBUTED WORKFLOW DATABASE DESIGNED FOR COREWALL APPLICATIONS Bill KampBill Kamp, Lumnilogical Research Center,
Relational Database Systems Bartosz Zagorowicz. Flat Databases  Originally databases were flat.  All information was stored in a long text file, called.
Centre for Environmental Data and Recording - CEDaR Established in 1995 to collect, collate and disseminate all biodiversity and geodiversity records for.
CMPE 226 Database Systems April 19 Class Meeting Department of Computer Engineering San Jose State University Spring 2016 Instructor: Ron Mak
XML and Distributed Applications By Quddus Chong Presentation for CS551 – Fall 2001.
ODP V2 Data Provider overview. 22 Scope Data Provider provides access to data and metadata of the local data systems. Data Provider is a wrapper, installed.
Chronos Age-Depth Plot: A Java application for stratigraphic data analysis by Geoffrey C. Bohling Geosphere Volume 1(2):78-84 February 22, 2006 ©2005 by.
Databases and DBMSs Todd S. Bacastow January 2005.
The LIBI Federated database
Database management system (DBMS)
Intro to MIS – MGS351 Databases and Data Warehouses
Database System Concepts and Architecture
CS1222 Using Relational Databases and SQL
ICT Database Lesson 1 What is a Database?.
Datamining : Refers to extracting or mining knowledge from large amounts of data Applications : Market Analysis Fraud Detection Customer Retention Production.
Databases and Data Warehouses Chapter 3
Bringing Organism Observations Into Bioinformatics Networks
Data, Databases, and DBMSs
MANAGING DATA RESOURCES
CS1222 Using Relational Databases and SQL
CS1222 Using Relational Databases and SQL
CS1222 Using Relational Databases and SQL
CS1222 Using Relational Databases and SQL
CS1222 Using Relational Databases and SQL
Geographic Information Systems
Presentation transcript:

Federated Databases for the Geosciences CSIG July 21, 2005 Douglas S. Greer

Overview Database Federation Primer –Basic concepts and principles –DB2 Information Integrator The CHRONOS Federated Database –Integration of 7 independently developed geoscience databases

Top-Level View of a Federated Database Applications Federated Database Data Source AData Source DData Source CData Source B

Federated DB Data Sources Geographically Distributed Data Sources Heterogeneous Data Sources –Relational Databases – most common –Non-relational Sources –Web Pages / Web Services –Flat Files

Federated Databases May or may not actually contain data Federated database can create Global Views that define data in a uniform way across the data sources Applications can then access data through the global view using the standardized SQL schema

IBM DB2 Information Integrator Provides a framework for strategic information integration to help applications access, manipulate and integrate diverse and distributed data sources across multiple servers in real time. Can access structured and unstructured data types including relational databases such as Oracle, MySQL, PostgreSQL and MS SQL Server

Connecting to the Remote Database Step 1 – Create WRAPPER –Mechanism that the federated server uses to communicate with a data source –Identifies “Driver” code Step 2 – Identify SERVER – Identifies the connection to a data source –Specifies which WRAPPER to use –Directly or Indirectly specifies the server name, server type, version, database name and special parameters

Connecting to the Remote Database Step 3 – Specify USER MAPPING –Maps between a federated database user and an authorized user (account and password) of a data source Step 4 – Define NICKNAMES –Pointer to a table or view in a data source –Creates a binding between a local name and the data source name and hides the associated metadata details

A Simple Federated View CREATE VIEW AS SELECT (Database #1 SQL Command) UNION SELECT (Database #2 SQL Command) UNION SELECT (Database #3 SQL Command)

Identifying Data Sources CREATE VIEW AS SELECT ‘PALEOSTRAT’ AS db_name genus_id AS genus … FROM PSTRAT.tbl_taxonomy … UNION SELECT ‘PALEOBIOLOGY’ AS db_name genus_name AS genus …

Materialized Views Federated databases normally do not store data locally. Data from remote sites is fetched as needed. Materialized Views create a local copy of a Global-View. –Advantage: faster access –Disadvantages: Data may be stale. Refreshes required Several of the CHRONOS Global-Views have versions that use materialized views to increase performance

CHRONOS Project Create a dynamic, interactive and time-calibrated framework for Earth history Network of chronostratigraphy databases Online stratigraphic record Visualization and analytical tools Develop a better understanding of fundamental Earth processes through time

CHRONOS Federated Databases The following databases are all part of the CHRONOS Federated Database at SDSC based on IBM’s DB2 Information Integrator –Neptune –PaleoStrat –PaleoBiology –Janus –TimeScale –FAUNMAP –MIOMAP

Neptune Database Developed at ETH Zürich and currently hosted by Iowa State University Contains microfossil occurrences reported in DSDP and ODP samples PostgreSQL based Contains four basic types of data: Fossil Records, Taxonomy, Age models and Biogeography data Schema contains approximately 20 tables with hundreds of thousands of taxonomic occurrences

PaleoStrat Database Developed at Boise State University in collaboration with the CHRONOS Designed to support geoscience tools with broad applicability Contains sedimentary, paleontologic and stratigraphic data MS SQL Server based Approximately 120 tables with thousands taxonomic occurrences Data from other databases currently being loaded

PaleoBiology Database Hosted by the National Center for Ecological Analysis and Synthesis (NCEAS) at the University of California at Santa Barbara Contains collection-based occurrence and taxonomic information about marine and terrestrial animals and plants MySQL based 16 tables with hundreds of thousands of taxonomic occurrences

Janus Database Database for the Integrated Ocean Drilling Program (IODP) hosted at Texas A&M University Contains numerous types of ocean drilling data collected by United States, Japanese and European ships Oracle based Approximately 580 tables with millions of taxonomic occurrences

TimeScale Database Contains data and information from the 2004 Global Time Scale of the International commission on Stratigraphy and 19 other time scales Supports web service conversions tools PostgreSQL based Approximately 25 tables with thousands of data records

FAUNMAP Database Hosted by Illinois State Museum Contains information about the historical distribution of mammal species in the United States MySQL based Approximately 30 tables with tens of thousands of data records

MIOMAP Database Hosted by University of California, Berkeley Contains comprehensive spatial and temporal analysis of Miocene mammal taxa for the Western United States MySQL based Thousands of records in a relatively small number of tables

The Taxa Global-View Simple View to list taxa in all of the databases CHRONOS Taxa –Database Name –Table_Name –Taxon_ID –Genus –Species

Taxa Global View Example

Conop9 Application Developed by Peter M. Sadler, Dept. of Earth Sciences, Univ. of California Riverside Correlates stratigraphic sections by minimizing the number of inconsistencies in the order of first and last occurrences of fossils between sections Originally developed for flat files then adapted to CHRONOS DB2/II global-views

CONOP9 Data Correlation

Conop9 Global View Developed for the Conop9 Application The Conop9 SDSC global-view provides a much larger collection of data than that available in the older flat file system The CHRONOS global-view presents exactly the data needed by Conop9 but uses different SQL statements for each database – this involves joins across four tables in Neptune, seven tables in PaleoStrat and five tables in Janus

Conop9 Global-View Attributes CHRONOS Conop Global View Fields –Database Name –Genus –Species –Taxon_id – Used to create Conop9 input tables –Hole_id – Which stratigraphic section does this come from –LAD – Last Appearance Datum, newest observation of this taxa for this hole –FAD – First Appearance Datum, oldest observation of this taxa for this hole –LAD and FAD are the result of an SQL computation

Conop9 Global View Example

Age-Depth Plot

Age/Depth Plot Global-Views Uniform Global-View of hole location for ADP application Surprisingly there are significant differences between databases CHRONOS Hole_Summary –Database Name –Hole_ID –Latitude –Longitude

Age/Depth Plot Views Uniform Global-View for Hole/Taxa Description for ADP application CHRONOS Hole_Desc –Database Name –Hole_ID –Elevation –Meters_of_Section –Taxa_Count

Age/Depth Global View Example

Questions ?