A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.

Slides:



Advertisements
Similar presentations
The Replica Location Service In wide area computing systems, it is often desirable to create copies (replicas) of data objects. Replication can be used.
Advertisements

Data Management Expert Panel - WP2. WP2 Overview.
Presentation by Priyanka Sawarkar
Database System Concepts and Architecture
PROVENANCE FOR THE CLOUD (USENIX CONFERENCE ON FILE AND STORAGE TECHNOLOGIES(FAST `10)) Kiran-Kumar Muniswamy-Reddy, Peter Macko, and Margo Seltzer Harvard.
Data Modeling and Database Design Chapter 1: Database Systems: Architecture and Components.
High Performance Computing Course Notes Grid Computing.
Management Information Systems, Sixth Edition
Connect. Communicate. Collaborate Click to edit Master title style MODULE 1: perfSONAR TECHNICAL OVERVIEW.
Chapter 2 Database Environment.
70-290: MCSE Guide to Managing a Microsoft Windows Server 2003 Environment Chapter 1: Introduction to Windows Server 2003.
Data Management I DBMS Relational Systems. Overview u Introduction u DBMS –components –types u Relational Model –characteristics –implementation u Physical.
UMIACS PAWN, LPE, and GRASP data grids Mike Smorul.
Chapter 2 Database Environment Pearson Education © 2014.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Overview of Database Languages and Architectures.
Web-based Portal for Discovery, Retrieval and Visualization of Earth Science Datasets in Grid Environment Zhenping (Jane) Liu.
Understanding Active Directory
Introduction to Databases Transparencies 1. ©Pearson Education 2009 Objectives Common uses of database systems. Meaning of the term database. Meaning.
ADVANCED MICROSOFT ACTIVE DIRECTORY CONCEPTS
Database Environment 1.  Purpose of three-level database architecture.  Contents of external, conceptual, and internal levels.  Purpose of external/conceptual.
1 Introduction An organization's survival relies on decisions made by management An organization's survival relies on decisions made by management To make.
Database and Database Users. Outline Database Introduction An Example Characteristics of the Database Actors on the Scene Advantages of using the DBMS.
Web-Enabled Decision Support Systems
Presenter: Dipesh Gautam.  Introduction  Why Data Grid?  High Level View  Design Considerations  Data Grid Services  Topology  Grids and Cloud.
MCAT: A Metadata Catalog San Diego Supercomputing Center Part of the Storage Resource Broker (SRB)
Information Systems: Databases Define the role of general information systems Describe the elements of a database management system (DBMS) Describe the.
GT Components. Globus Toolkit A “toolkit” of services and packages for creating the basic grid computing infrastructure Higher level tools added to this.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
Introduction to Apache OODT Yang Li Mar 9, What is OODT Object Oriented Data Technology Science data management Archiving Systems that span scientific.
Entity Framework Overview. Entity Framework A set of technologies in ADO.NET that support the development of data-oriented software applications A component.
XML Registries Source: Java TM API for XML Registries Specification.
CYBERINFRASTRUCTURE FOR THE GEOSCIENCES Data Replication Service Sandeep Chandra GEON Systems Group San Diego Supercomputer Center.
Lesson Overview 3.1 Components of the DBMS 3.1 Components of the DBMS 3.2 Components of The Database Application 3.2 Components of The Database Application.
1 CS 430 Database Theory Winter 2005 Lecture 2: General Concepts.
The Replica Location Service The Globus Project™ And The DataGrid Project Copyright (c) 2002 University of Chicago and The University of Southern California.
Policy Based Data Management Data-Intensive Computing Distributed Collections Grid-Enabled Storage iRODS Reagan W. Moore 1.
Presented by Scientific Annotation Middleware Software infrastructure to support rich scientific records and the processes that produce them Jens Schwidder.
Database Environment Chapter 2. Data Independence Sometimes the way data are physically organized depends on the requirements of the application. Result:
Web Portal Design Workshop, Boulder (CO), Jan 2003 Luca Cinquini (NCAR, ESG) The ESG and NCAR Web Portals Luca Cinquini NCAR, ESG Outline: 1.ESG Data Services.
Data Management and Transfer in High-Performance Computational Grid Environments B. Allcock, J. Bester, J. Bresnahan, A. L. Chervenak, I. Foster, C. Kesselman,
1 Chapter 1 Introduction to Databases Transparencies.
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Presented by Jens Schwidder Tara D. Gibson James D. Myers Computing & Computational Sciences Directorate Oak Ridge National Laboratory Scientific Annotation.
Copyright © 2006 Pilothouse Consulting Inc. All rights reserved. Search Overview Search Features: WSS and Office Search Architecture Content Sources and.
Replica Management Kelly Clynes. Agenda Grid Computing Globus Toolkit What is Replica Management Replica Management in Globus Replica Management Catalog.
Introduction to The Storage Resource.
Copyright (c) 2014 Pearson Education, Inc. Introduction to DBMS.
Introduction to Databases Dr. Osama AL Rababah. Objectives In this capture you will learn: Some common uses of database systems. The characteristics of.
Chapter 2 Database Environment.
1 Database Environment. 2 Objectives of Three-Level Architecture u All users should be able to access same data. u A user’s view is immune to changes.
1 Chapter 2 Database Environment Pearson Education © 2009.
Lecture On Introduction (DBMS) By- Jesmin Akhter Assistant Professor, IIT, Jahangirnagar University.
1 Artemis: Integrating Scientific Data on the Grid Rattapoom Tuchinda Snehal Thakkar Yolanda Gil Ewa Deelman.
Building Preservation Environments with Data Grid Technology Reagan W. Moore Presenter: Praveen Namburi.
Collection-Based Persistent Archives Arcot Rajasekar, Richard Marciano, Reagan Moore San Diego Supercomputer Center Presented by: Preetham A Gowda.
Preservation Data Services Persistent Archive Research Group Reagan W. Moore October 1, 2003.
ISC321 Database Systems I Chapter 2: Overview of Database Languages and Architectures Fall 2015 Dr. Abdullah Almutairi.
Data Grids, Digital Libraries and Persistent Archives: An Integrated Approach to Publishing, Sharing and Archiving Data. Written By: R. Moore, A. Rajasekar,
Introduction To DBMS.
Using E-Business Suite Attachments
AMGA Web Interface Salvatore Scifo INFN sez. Catania
GSAF Grid Storage Access Framework
Copyright © 2011 Pearson Education, Inc. Publishing as Pearson Addison-Wesley Chapter 2 Database System Concepts and Architecture.
Chapter 2 Database Environment Pearson Education © 2009.
The Globus Toolkit™: Information Services
Database Environment Transparencies
AMGA Web Interface Vincenzo Milazzo
Data Model.
Presentation transcript:

A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai

2 Outline  Introduction  The Role of Metadata Services in Grid Data Management  Requirements for the Metadata Service  Components of a Metadata Service  MSC: A Metadata Catalog Service for Grids  Application Experiences  Scalability of the MCS  summary

3 Data-intensive application  Experimental analyses  Simulation in scientific disciplines Massive datasets are shared by a community of hundreds or thousands of researchers  Purpose  To manage these large data sets efficiently  Metadata or descriptive information about the data needs to be managed

4 High Level Diagram of the Metadata Catalog Architecture Client Application Web Server Database Connectivity Metadata Database (MySQL) Standard interface Metadata Catalog Service

5 Introduction  Metadata is information that describes data. Metadata Catalog Service  Design of a Metadata Catalog Service (MCS) that provides a mechanism for storing and accessing descriptive metadata and allows users to query for data items based on desired attribtues.  Accurate identification of desired data items is essential for correct analysis of experimental and simulation results.

6 Introduction (cont’d)  There are various types of metadata.  Replication metadata  Describe the contents of data items  Relate to the physical characteristics of data objects, such as size, access permission.  Distinguish between logical file metadata and physical file metadata. logical file metadataphysical filequery

7 A usage scenario of the Metadata Catalog Service Client Application Physical Storage System Replical Location Service Metadata Catalog Service MCS Web Server MCS Database Replica Index Node Local Replica Cat

8 Metadata types User Metadata Virtual Organization Metadata Domain-Specific Metadata Domain-Independent Metadata Physical Metadata Metadata Types Information about the characteristics of data on physical storage system Regardless of the application domain or virtual organization in which the data sets are created and shared. Specific to an application domain, a virtual organization or to particular user

9 The Role of Metadata Services in Grid Data Management  Medata Services as services that maintain mappings between logical name attributes for data items and other descriptive metadata attributes and respond to queries about those mappings.  Metadata Services play a key role in the publication and the discovery and access of data sets.

10 Publication  Publication is the process by which data sets and their associated attributes are stored and made acessible to a user community.  Domain-independent, domain-dependent, and virtual organization metadata attributes  To discover and access according to attributes  Some members of the community may use the Metadata Service to annotate the data sets with their own observations using user attributes and make these annotations available to a controlled subset of the community.

11 Discovery and Access  Discovery is the process of identifying data items of interest to the user. Client Application Physical Storage System Replical Location Service Metadata Catalog Service MCS Web Server MCS Database Replica Index Node Local Replica Cat

12 Requirements for the Metadata Service  Metadata Service must provide a mechanism for associating logical name attributes with domain- independent metadata attributes.  The Metadata Service must support queries on its contents.  The Metadata Service must implement policies regarding the consistency guarantees, authentication, authorization, and auditing capabilities provided by the service.

13 Requirements for the Metadata Service  The Metadata Service may support the ability to aggregate metadata into collections or views by associating aggregation attributes with logical name attributes.  The Metadata should provide the ability to store attributes that describe the record the transformations ona dataset.  The Metadata Service should provide good performance and scalability.

14 Components of a Metadata Service  A data model that includes mechanisms for aggregation of metadata mappings  A standard schema for domain-independent metadata attributes with extensibility for additional user-defined attributes  A set of standard service behaviors  Query mechanisms for accessing the database  A set of standard interfaces and APIs for storing and accessing metadata  A set of policies for consistency, access control and authorization, and auditing

15 MCS : A Metadata Catalog Service for Grids ( design and implementation )  The MCS data model  MCS Schema  MCS service implementation  MCS Query mechanism and APIs  MCS policies

16 The MCS Data Model  The most basic item in MCS data model is the logical file  Logical collecitons are user-defined aggregations that can consist of zero or more logical file and/or other logical collections.  A logical file may belong to at most one logical collection  Logical views are another type of aggregation that can consist of zero or more logical files, collections and/or other logical veiws.

17 The MCS Data Model Logical file Logical collectionLogical view

18 MCS Schema  Logical file metadata  main attributes of a logical file  Logical collection metadata  user-defined associations of logical files  Logical view metadata  user-defined aggregation of logical files, logical collections or other logical views  Authorization information  is associated the both individual logical files and logical collections  User information  Audit metadata  User-defined metadata  Annotation attributes  Creation history  External catalog metadata

field nametyperemarksdescription Data_idIntegerNon nullThe data identifier Logical_nameVarchar(250)Non nullThe logical file name VersionIntegerThe version of the daat Data_typeVarchar(250)The type of data Collection_idInteger Container_idInteger Container ServiceVarchar(250) Is_validIntegerNon null Creator_DnVarchar(250)Non null Last_Modifier_DnVarchar(250) Create_TimeDate/TimeNon null Last_Modify_TimeDate/Time Master_CopyVarchar(250) Logical file metadata

Logical collection metadata

Logical view metadata

Authorization information

 User information  About writers or modicifers of the logical files in the database  Audit metadata  Record information about actions that can be performed on the Metadata Service  User-defined metadata  Different application domains have their own metadata schemas  Annotation attributes  comments  Creation history  Information about how data items are geneated  External catalog metadata  Use this information to further query the external catalog

24 MCS Service Implementation Application Program Main() { mcsClient( ); mcsCreate( x ); } MCS Client SOAP Engine MCS Server MySQL Database Overview of the Implementation

25 MCS Query Mechanisms and APIs  The client API provides the following operations:  Querying the catalog for logical objects based on object attributes  Querying the static attributes of a logical object  Querying the user defined attributes of a logical object  Querying the contents of a logical view or a logical collection  Creating a logical file, collection or a view  Modifying the attributes of a logical object  Deleting a logical file, view or a collction  Annotating a logical object  Adding logical objects to view

26 MCS Policies  The MCS provides authentication and authorization capabilities on the logical files and logical colleciton attributes in MCS  The MCS provides auditing metadata  Creation information  log  To support other services  Such as replica managers that maintain consistency among data items

27 Application Experiences  To intergrate MCS into the software used by these applications  The Pegasus/LIGO Application  The Earth System Grid Application

28 The Pegasus/LIGO Application  Pegasus is used to map complex application workflows onto the available Grid resources  Pegasus uses MCS to discover existing application data products.  Pegasus uses the MCS and Replication Location Service  MCS only stores logical file names  Attributes that describe these data products, including the type of the data and the duration of data measurements, are stored in the MCS.  23 user defined attributes Client Application Physical Storage System Replical Location Service Metadata Catalog Service MCS Web Server MCS Database Replica Index Node Local Replica Cat

29 The Earth System Grid Application  The MCS is one component in an ESG testbed  ESG scientists use the MCS to discover and query for ESG files based on metadata attributes

30 Scalability of the MCS Database sizeLogical collectionLogical fileUser defined 100, ,000, ,000, Add and query operations

31 Scalability of the MCS With web interface Web service overhead

32 Scalability of the MCS

33 Scalability of the MCS

34 Scalability of the MCS

35 Scalability of the MCS

36 Scalability of the MCS

37 Scalability of the MCS

38 Summary  The design and implementation of a MCS  Store, access, and query  To make the service more extensible and to provdie a more general query model  Use of other database backnd technologies 