OGSA Data Architecture

Slides:



Advertisements
Similar presentations
Supervisor : Prof . Abbdolahzadeh
Advertisements

Database Architectures and the Web
Chapter 3 Database Management
GGF Toronto Spitfire A Relational DB Service for the Grid Peter Z. Kunszt European DataGrid Data Management CERN Database Group.
NextGRID & OGSA Data Architectures: Example Scenarios Stephen Davey, NeSC, UK ISSGC06 Summer School, Ischia, Italy 12 th July 2006.
Integration of Applications MIS3502: Application Integration and Evaluation Paul Weinberg Adapted from material by Arnold Kurtz, David.
Data Warehouse success depends on metadata
WORKDAY TECHNOLOGY Stan Swete CTO - Workday 1.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
High-Speed, High Volume Document Storage, Retrieval, and Manipulation with Documentum and Snowbound March 8, 2007.
Ihr Logo Chapter 5 Business Intelligence: Data Warehousing, Data Acquisition, Data Mining, Business Analytics, and Visualization Turban, Aronson, and Liang.
Data Management Kelly Clynes Caitlin Minteer. Agenda Globus Toolkit Basic Data Management Systems Overview of Data Management Data Movement Grid FTP Reliable.
1 Introduction to Database Systems. 2 Database and Database System / A database is a shared collection of logically related data designed to meet the.
material assembled from the web pages at
1 School of Computer, National University of Defense Technology A Profile on the Grid Data Engine (GridDaEn) Xiao Nong
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
DATABASE MANAGEMENT SYSTEMS IN DATA INTENSIVE ENVIRONMENNTS Leon Guzenda Chief Technology Officer.
1 Reviewing Data Warehouse Basics. Lessons 1.Reviewing Data Warehouse Basics 2.Defining the Business and Logical Models 3.Creating the Dimensional Model.
5-1 McGraw-Hill/Irwin Copyright © 2007 by The McGraw-Hill Companies, Inc. All rights reserved.
Transaction-based Grid Data Replication Using OGSA-DAI Presented by Yin Chen February 2007.
7 Strategies for Extracting, Transforming, and Loading.
Rajesh Bhat Director, PLM Analytics Applications
ESG-CET Meeting, Boulder, CO, April 2008 Gateway Implementation 4/30/2008.
1 Data Architecture Strawman - Grimshaw Important points Everything is a service (object) >All have a name (EPR) and an interface (type) One or more base.
Oracle’s EPM System and Strategy
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
1 Case Study: Business Intelligence & Customer Data Customer Support Web-based Dashboard VP Marketing SQL XSLT XML Data Grid Customer Data Customer Order.
1 Copyright © 2008, Oracle. All rights reserved. Repository Basics.
Amy Krause EPCC OGSA-DAI An Overview OGSA-DAI on OMII 2.0 OMII The Open Middleware Infrastructure Institute NeSC,
Leading the pervasive adoption of grid computing for research and industry © 2006 Global Grid Forum The information contained herein is subject to change.
Supervisor : Prof . Abbdolahzadeh
Databases and DBMSs Todd S. Bacastow January 2005.
Jean-Philippe Baud, IT-GD, CERN November 2007
Introduction To DBMS.
Segmap Solutions Mapping segments.
Integrating Enterprise Applications Into SharePoint® Portal Server
Database Architectures and the Web
Connected Maintenance Solution
Defining Data Warehouse Concepts and Terminology
OGSA Data Architecture WG Data Transfer Discussion
Overview of MDM Site Hub
Open Source distributed document DB for an enterprise
OGSA Data Architecture Scenarios
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
Connected Maintenance Solution
The OGSA Data Architecture
Database Architectures and the Web
OGSA Data Architecture Scenarios
Defining Data Warehouse Concepts and Terminology
Computerized and Manual Systems
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
The Top 10 Reasons Why Federated Can’t Succeed
Business Intelligence for Project Server/Online
Chapter 2 Database Environment Pearson Education © 2009.
Database Management System (DBMS)
Chapter 2 Database Environment.
Data, Databases, and DBMSs
Collaborative Business Solutions
Database Environment Transparencies
Course: Module: Lesson # & Name Instructional Material 1 of 32 Lesson Delivery Mode: Lesson Duration: Document Name: 1. Professional Diploma in ERP Systems.
DAT381 Team Development with SQL Server 2005
Database Architecture
Data Warehouse.
Metadata The metadata contains
Database and Database Management System (DBMS)
Database Management Systems
Chapter 3 Database Management
Terms: Data: Database: Database Management System: INTRODUCTION
Chapter 2 Database Environment Pearson Education © 2009.
Presentation transcript:

OGSA Data Architecture Dave Berry, NeSC Andrew Grimshaw, U. Virginia

OGSA Use Cases (v2.0, GGF10) (v??, GGF??) Process overview OGSA v1.0, GGF11 (GWD-I) DAIS-WG? OREP-WG? GFS-WG? Query Services Data Design Team (OGSA-WG) Created GGF10 Replication Services Use Cases Capabilities Architecture Remote File Access Initial group shows flow of work from use cases to detailed definition. Second through fourth groups show the documents describing the first three stages. Black font means published documents; Grey font means still to be written. Fifth and sixth groups show the groups working on the various activities. WGs have a question mark in case they don’t sign up to this. OGSA Use Cases (v2.0, GGF10) (v??, GGF??) OGSA v?? (GWD-R)

Use Cases OGSA Use Cases document Service-Based Distributed Query Processing using OGSA and OGSA-DAI Informal presentations (to be written up) Business Intelligence & Customer Data Data Grid Provisioning Data to Cluster-Based Analytical Application Physics Analysis Medical Imaging Data Distribution Background Data Capture & Processing Data Warehouse Processing MIS Reporting,Analysis and Interpretation OLTP – Sales Order Entry

Business Intelligence & Customer Data Customer Order Information Oracle SAP DB2 Siebel SQL Stored Procedure XSLT XML XSLT Results XML Results XML Data Grid Company wants real-time integrated view of customer buying behavior Data resides in various distributed CRM & ERP systems Grid allows developers and apps to access and integrate customer data sources together in real time--across many distributed databases Here’s an example of how a company can use Avaki Data Grid to integrate data from multiple systems. In this case, a large, distributed insurance company has customer, policy, and claims data in a number of individual systems that support different product lines. Marketing specialists and other business managers need integrated views of this information that will help them enhance the company’s marketing strategy, such as: all the different policies held by a single customer, all the different customers from a single geographic location, and so on. This diagram shows two systems that have customer data. In this case, Avaki is used as a netural integration layer that can take data out of both the Oracle and DB2 databases that are supporting two different applications, and also some data out of a file server, and combine all this data for use by a marketing dashboard and a business intelligence application. XSL transformations are used to provide the data in a particular XML format that is used by the application. In the data grid, an architect can specify the sequence of transformations and updates that ultimately create the views of the data required by the business analysts. In this case, no intermediate data mart or data warehouse is needed. By deploying a data grid, the company has eliminated the need for users to access remote databases directly. Instead, users sign onto their local systems and are automatically able to access data via the data grid. This saves significant time for the DBAs who manage each database; they no longer have to manage remote user access, and need only specify who should have access to the data objects that represent their data. Data is not moved and does not leave their control. The IT organization, which thought it might have to create a data warehouse, now has a low-overhead infrastructure for making data available to users and applications for analysis. With this infrastructure in place, data owners and developers can meet requests for additional data more quickly. Through the data flow definitions and cache configurations, architects can “dial” the freshness of the data to meet business requirements. Static reference data does not need to be updated at all, while customer data is updated frequently enough to show recent selling trends. As a result, the organization has fresher, more accurate data from which to make important business decisions. Customer Support Web-based Dashboard (Identifies Likely Buyers of New Product) VP Marketing

Data Grid Provisioning Data to Cluster-Based Analytical Application R&D West Coast Engineering East Coast QA/Testing Outsourcer India Data Grid Data Grid Data Grid Company has centralized HPC cluster running compute-intensive applications Source data for analyses distributed among 3 global sites, one of them an external partner Highly manual data-sharing processes increase costs/errors, and hinder time-to-results Grid enables secure, automatic provisioning of remote data to HPC cluster—feeding CPUs more data faster Headquarters Illinois Forward Proxy Data Caches of Remote Data Data Grid Analytical Applications Centralized Compute Cluster

Analyzing HEP data involves Physics Analysis Analyzing HEP data involves A (group of) researchers with an algorithm A set of selection criteria on metadata to identify the data to be analyzed Metadata Catalogs Identify a dataset based on a metadata query Data is stored in files. The user navigates in a logical namespace, like a local filesystem The algorithm may need to access files based on the calculation, so the dataset that the analysis runs on is not always fully determined by the metadata query Might need to access data that is initially remote (co-locating data and computation is not always possible as a preparatory step) Large number of data files to be managed (1012)

Diagnosing based on sensitive patient data Medical Imaging Diagnosing based on sensitive patient data Users: a (group of) doctor(s) Retrieve an image, run algorithm, examine result and write diagnosis, maybe re-run another algorithm. Secure Data Retrieval Patient data is sensitive, needs to be kept anonymous at all times Site admins are not trustworthy – strip or encrypt patient data from image Image in database or secure data store ready for retrieval Replication of data not always allowed High security needs Strong authorization Fine-grained access control mechanisms Leaking patient information results in prosecution.

Trigger-based Data Distribution Users: a (group of) scientists Have automatic delivery of data at many sites based on some criteria Trigger may be An Event in the local Store, Catalog, Monitor, … Cron-like events

Background Data Capture & Processing Raw & Existing Data Processing Data Sources Reference Data Data Streaming Archive Multi-stage Processing Processed Data Staging Bulkload of raw data Audit Trail Temporary Storage & staging

Data Warehouse Processing Reference Data Data Warehouse Staged Data Summarised Data Local/ remote replication Operational systems Local/Remote extraction Deliver & load Insert Extract Validation Merging Transformation Detailed Data Archive Aged Data

MIS Reporting,Analysis and Interpretation Specification & result review Reports Temporary Data Data Manipulation Extraction and integration Summarised Data Data in operational systems Remote Data Detailed Data Drill down to detail Data Warehouse

OLTP – Sales Order Entry Delivery notification Enter customer information Order Entry Delivery notification Receive order from Customer Enter new/update customer information Create/ modify order Enter/ modify item lines Submit order Validate address Validate product & price Validate product & price Check inventory Check inventory Adjust inventory Distributed Transaction Sub-second responses 1000’s of concurrent: - users - processes field level validations small insert/update transactions Local data Check customer credit rating/ validate card Adjust inventory External data

OGSA v1.0 Capabilities Types of data resource Data virtualisation Functional capabilities

Types of Data Resource Flat files Streams DBMS Catalogues Derivations Relational, XML, OO Catalogues Derivations

“Primitive” Data Sources Data Virtualisation Abstraction Federation Transformation etc. Client Client API Data Service Implementation Data Service API “Primitive” Data Sources Other Data Services

Functional Capabilities (1) Virtualisation, Transparency Layered interfaces (for “under the hood” access) Interface to “legacy” APIs Data Management Transfer, caching, replication, … Queries SQL, XQuery, Regexp, … Synchronous / asynchronous Deliver results to client or third-party

Functional Capabilities (2) Transformation Update Security mapping Data resource configuration Metadata management Provenance

Key non-functional properties? Architecture Key non-functional properties? Scalability in several dimensions e.g. large data sets, large number of data sets, size of data flows Support for multiple/variable levels of coherency for replicated/federated data Composability minimising unnecessary movement of data These are taken from the most recent design team telcon.

Three Layer Architecture ETL Data Catalog Access Profiling & Quality XML Mapping Integration Development Tools Provision Metadata Registry Distributed Query DATA GRID Additional & 3rd Party Data Integration Capabilities Core “Data Service Layer”

EGEE Data Service Interfaces (1) Storage Element SRM interface (GSM-WG) Manage a Storage Resource Space reservation Put and retrieve files using various protocols Posix-like File I/O Most posix-compliant feature support Abstraction over existing MSS IO mechanisms File Catalog Management of the logical namespace Replica Catalog Tracking of file replicas Metadata Catalog Application metadata

EGEE Data Service Interfaces (2) Data Catalog Added functionality by orchestration of the 3 catalogs (providing transaction safety) File Transfer Service Reliable Transfer of files between two sites Pre- and post-processing hooks File Placement Service Transfer and register files Orchestrate File Transfer and Data Catalog services Data Scheduling Service Event-based data transfer, using File Placement Service

Questions?