Presentation is loading. Please wait.

Presentation is loading. Please wait.

Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations.

Similar presentations


Presentation on theme: "Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations."— Presentation transcript:

1 Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations Dariusz Król, Darin Nikolow, Włodzimierz Funika, Renata Słota, Jacek Kitowski ACC Cyfronet AGH, ul. Nawojki 11, 30-950, Kraków, Poland INGRID 2010 Poznań, 12-14.05.2010

2 Agenda 1.Goals of the PL-Grid project 2.Research and implementation goals 3.Non-functional requirements in data management 4.Description of the proposed solution 5.Architecture overview 6.Sample use cases 7.Implementation status 8.Future work 9.Conclusions

3 PL-Grid  Polish national grid initiative (2009-2011)  The main goal is to provide the Polish scientific community with an IT infrastructure based on the Grid  Extend the amount of computing resources by approximately 215 Tflops of computing power and 2500 TB of storage capacity  Develop grid-oriented applications and tools in VO and data management areas, e.g. FiVO - Grid Virtual Organization Semantic Framework

4 Research and implementation goals  Allow users to define non-functional requirements for storage devices explicitly (more on the next slide)  Extend VO knowledge base with descriptions of storage elements  Exploit information from storage monitoring systems and VO knowledge base to find the most suitable storage device complient with the defined requirements  Integrate the developed solution with PL-Grid infrastructure, e.g. the Lustre file system  Easy to use and extend

5 Non-functional requirements in data management  Data intensive applications may have different requirements, e.g. important data should be replicated  Abstraction of storage elements prevents users from influencing the actual location of data  Distribution of data among available storage elements according to the defined requirements  Choose possible ways to check a fulfilling ratio of requirements for each storage element

6 Proposed solution  On the user side – a programming library which provides functions for managing files in distributed storage environment  On the server side – a service which finds the most suitable storage element according to the defined requirements and current workload  User can specify a storage policy in a declarative way in the application code but the actual location is determined at runtime  The whole computation is done on the server side and the identifier of a concrete element is returned to the user side

7 Architecture overview

8 Use case 1 – user-level requirements 1.A ‘StoragePolicy’ instance is created in the user application along with a ‘LustreManager’ instance. 2.The 'createFile(, )‘ function from the ‘LustreManager’ instance is called. 3.The ‘LustreManager’ instance creates a request to the finder service to find the most suitable storage element which meets the provided requirements. 4.The finder service retrieves infromation about available storage elements from a VO knowledge base. 5.The finder service sends a request to a monitoring system for the current values of attributes which are contained in the storage policy object. 6.The fulfilling function is computed with each available storage device as an argument. 7.Information about suitable storage elements is returned to the user side. 8.A file is created on the most suitable storage element and returns a file descriptor to the user application or an error code if something went wrong.

9 Use case 2 – VO-level requirements 1.The application uses a standard programming library to create a new file. 2.This request is intercepted on the filesystem-level and delegated to the storage-management library. 3.The library sends a request to find the most suitable storage element for this user. 4.The finder service retrieves information about the default storage policy for the VO which the user belongs to. 5.The rest of the use case is similar to the previous one: 6.The finder service retrieves infromation about available storage elements from a VO knowledge base. 7.The finder service sends a request to a monitoring system for the current values of attributes which are contained in the storage policy object. 8.The fulfilling function is computed with each available storage device as an argument. 9.Information about suitable storage elements is returned to the user side. 10.A file is created on the most suitable storage element and returns a file descriptor to the user application or an error code if something went wrong.

10 Implementation status – the user side  The library is implemented as a shared library in the C++ language.  Communication with the server side is implemented with the libcurl library (exploiting the REST model).  For file storaging we use the Lustre filesystem and its pool mechanism.  The most important part of the library API is : ocreateFile(fileName : char*, policy : StoragePolicy*) : int oopenFile(fileName : char*) : int ocloseFile(fileName : char*) : void ochangeStoragePolicy(fileName : char*, policy : StoragePolicy*) : void

11 Implementation status – the server side  The finder service is implemented in Python.  Function that determines similarity is as follows:  Communication with the monitoring system is implemented with the REST model  Integration with the VO knowledge base is implemented with the SOAP-based web service - if - else the current value of an attribute – required value of an attribute i

12 Implementation status – non-functional requirements Implementation status – non-functional requirements Current implementation of the StoragePolicy class includes:  preferredDeviceType – a device type which is preferred by the user, e.g. a fast writable device or high available device  capacity –free storage space required  averageReadTransferRate – mean read time from the last monitoring serie  averageWriteTransferRate – mean write time from the last monitoring serie  currentReadTransferRate – mean read time from the last measurement  currentWriteTransferRate – mean write time from the last measurement  throughput – numerical value of the required throughput  throughputLevel – required throughput level on a device, e.g. LOW, MEDIUM or HIGH – will be mapped onto the numerical value The user can choose only these aspects of a policy which are important from their point of view.

13 Implementation status – usage example Implementation status – usage example LustreManager manager; StoragePolicyFactory factory; // creating a new policy object with defualt parameters for the given device type StoragePolicy policy = factory.createPolicyForDeviceType(HIGHAVAILABLEDEVICE); // setting additional parameters policy.setCapacity(1.1); // [TB] policy.setThroughput(80.0); // [MB/s] policy.setAverageWriteTransferRate(100.0); // [MB/s] // creating a new file int descriptor = manager.createFile("test_file", &policy);

14 Future work  Support for multiple Lustre installations (geographically distributed)  Support for different storage systems (e.g. dCache )  Performance tests on the PL-Grid testbed (not available yet)  New algorithms for determinating similarity  Implementation of the second presented use case (the one with default storage policy defined on the VO level)

15 Conclusions  One of the PL-Grid project goal is to develop new approaches to issues of storage management in the Grid environment  Explicit definitions of non-functional requirements are necessary in data intensive applications  The presented solution is easy to use (standard C++ shared library), extend (description of storage elements is in an ontological form) and understand (clean algorithm of finding a storage element)  There is much to be done

16 Do you want to know more ? www.plgrid.pl


Download ppt "Polish Infrastructure for Supporting Computational Science in the European Research Space Policy Driven Data Management in PL-Grid Virtual Organizations."

Similar presentations


Ads by Google