Presentation is loading. Please wait.

Presentation is loading. Please wait.

Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D.

Similar presentations


Presentation on theme: "Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D."— Presentation transcript:

1 Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D. Król, B. Kryza, K. Skalkowski, D. Nikolow, R. Slota, and J. Kitowski ACC Cyfronet AGH, Kraków, Poland Institute of Computer Science AGH- UST, Krakow, Poland Cracow Grid Workshop 2010 Kraków, 11-13.10.2010

2 Agenda 1.Research and implementation goals 2.Data intensive applications 3.Non-functional requirements in data management 4.FiVO/QStorMan toolkit 5.Architecture overview 6.Main use case 7.Implementation status 8.Conclusions

3 Research and implementation goals The main objective of the presented research is automation of creation and management a Virtual Organization (VO) using knowledge, especially in the data storage area using the following concepts:  allowing users to define non-functional requirements for storage devices explicitly,  exploiting a knowledge base of the VO extended with descriptions of storage elements  exploiting information from storage monitoring systems and VO knowledge base to find the most suitable storage device complient with the defined requirements

4 Data intensive applications  Generate gigabytes (or more) of data per day.  Different types of data which require different types of storage.  Heavily uses read/write operations.  The run time of an application heavily depends on storage access time and transfer speed rather than the computation time. Examples (from wikipedia):  The LHC experiment produces 15 PB/year = ~42 TB/day = ~1 GB/s  The German Climate Computing Center (DKRZ) has a storage capacity of 60 petabytes of climate data.

5 Non-functional requirements in data management  Data intensive applications may have different requirements, e.g. important data should be replicated  Abstraction of storage elements prevents users from influencing the actual location of data  Distribution of data among available storage elements according to the defined requirements  Choose possible ways to check a fulfilling ratio of requirements for each storage element

6 FiVO/QStorMan toolkit  On the user side – a programming library (libSES) which provides functions for managing files in a distributed storage environment  On the server side :  A service (Storage Element Selection service) which finds the most suitable storage element according to the defined requirements and current workload  A knowledge base (GOM) which stores a configuration of the storage environment along with defined non-functional requirements from the users  A monitoring system (SMED) which monitors storage resources and provides information about current or average values of different QoS parameters  A portal where a user can define nonfunctional requirements for the storage enviornment.

7 Architecture overview

8 Main use case User Portal GOM SMED Monitoring system Application SES library SE..... Distributed storage system Defining non- functional requirements Storing requirements definition Classical „write” operation to concrete SE Getting requirements for the application along with configuration of a Lustre installation Getting actual storage system parameter values Monitoring information Getting configuration information of a Lustre installation Operation interception „Write” operation to the most suitable SE

9 Implementation status – the user side (libSES)  The library is implemented as a shared library in the C++ language.  Communication with the server side is implemented with the libcurl library (exploiting the REST model).  For file storaging we use the Lustre filesystem and its pool mechanism.  The most important part of the library API is : ocreateFile(fileName : char*, policy : StoragePolicy*) : int oopenFile(fileName : char*) : int ocloseFile(fileName : char*) : void ochangeStoragePolicy(fileName : char*, policy : StoragePolicy*) : void

10 Implementation status – Storage Element Selection service  The finder service is implemented in Python.  Function that determines similarity is as follows:  Communication with the monitoring system is implemented with the REST model  Integration with the VO knowledge base is implemented with the SOAP-based web service - if - else the current value of an attribute – required value of an attribute i

11 Implementation status – GOM  Allows to use multiple available storage and reasoning mechanisms.  Provides several interfaces for querying and modifying the managed ontologies.  The communication protocols supported by GOM currently include Java RMI and SOAP.

12 Implementation status – SMED  SMED architecture is based on the Enterprise Service Bus.  Current version of the system supports monitoring of :  local system hard drives,  disk arrays,  hierarchical storage management systems  distributed file systems, e.g. Lustre

13 Implementation status – non-functional requirements Implementation status – non-functional requirements Current implementation of the StoragePolicy class includes:  preferredDeviceType – a device type which is preferred by the user, e.g. a fast writable device or high available device  capacity –free storage space required  averageReadTransferRate – mean read time from the last monitoring serie  averageWriteTransferRate – mean write time from the last monitoring serie  throughput – numerical value of the required throughput  throughputLevel – an abstract level of required throughput, e.g. LOW, MEDIUM or HIGH The user can choose only these aspects of a policy which are important from their point of view.

14 Conclusions  The presented research goal is to develop new approaches to issues of storage management in the Grid environment  Explicit definitions of non-functional requirements are necessary in data intensive applications  Allowing to find the most suitable storage element within a distributed file system or a server where a grid job should be scheduled based on the given requirements  The presented solution is easy to use (standard C++ shared library), extend (description of storage elements is in an ontological form) and understand (clean algorithm of finding a storage element)

15 Do you want to know more ? www.plgrid.pl


Download ppt "Polish Infrastructure for Supporting Computational Science in the European Research Space QoS provisioning for data-oriented applications in PL-Grid D."

Similar presentations


Ads by Google