Information System Evolution Enabling Grids for E-sciencE EGEE-III INFSO-RI-222667 2170 LDAP LDAP_ADD LDAP_MODIFY Query Merge Update Provider Plugin LDIF.

Slides:



Advertisements
Similar presentations
SWaNI Project Update Report April Project Outcomes Under review, might not all be possible in conjunction with Skillnet or SITS Interoperability.
Advertisements

Business Intelligence Dr. Mahdi Esmaeili 1. Technical Infrastructure Evaluation Hardware Network Middleware Database Management Systems Tools and Standards.
Lecture Nine Database Planning, Design, and Administration
Database System Development Lifecycle Transparencies
A centralized system.  Active Directory is Microsoft's trademarked directory service, an integral part of the Windows architecture. Like other directory.
System Design/Implementation and Support for Build 2 PDS Management Council Face-to-Face Mountain View, CA Nov 30 - Dec 1, 2011 Sean Hardman.
Database Planning, Design, and Administration Transparencies
Database System Development Lifecycle © Pearson Education Limited 1995, 2005.
GRACE Project IST EGAAP meeting – Den Haag, 25/11/2004 Giuseppe Sisto – Telecom Italia Lab.
ITEC224 Database Programming
GRAM: Software Provider Forum Stuart Martin Computational Institute, University of Chicago & Argonne National Lab TeraGrid 2007 Madison, WI.
A Metadata Catalog Service for Data Intensive Applications Presented by Chin-Yi Tsai.
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE Middleware: gLite Information Systems (IS) EGEE Tutorial 23 rd APAN Meeting,
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America Information System (IS) Valeria Ardizzone.
The Data Grid: Towards an Architecture for the Distributed Management and Analysis of Large Scientific Dataset Caitlin Minteer & Kelly Clynes.
INFSO-RI Enabling Grids for E-sciencE Logging and Bookkeeping and Job Provenance Services Ludek Matyska (CESNET) on behalf of the.
Real Time Monitor of Grid Job Executions Janusz Martyniak Imperial College London.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System on gLite middleware Vincent.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Migration to the GLUE 2.0 information schema in the LCG/EGEE/EGI.
What is Sure Stats? Sure Stats is an add-on for SAP that provides Organizations with detailed Statistical Information about how their SAP system is being.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks GStat 2.0 Joanna Huang (ASGC) Laurence Field.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks WMSMonitor: a tool to monitor gLite WMS/LB.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks David Kelsey RAL/STFC,
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Performance Improvements to BDII - Grid Information.
INFSO-RI Enabling Grids for E-sciencE OSG-LCG Interoperability Activity Author: Laurence Field (CERN)
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Batch Systems and the Info (Dynamic) Provider.
Enabling Grids for E-sciencE INFSO-RI Tools for CIC Operations, Bologna, 24th May Monitoring workflow in EGEE GOC DB is used to get the list.
E-infrastructure shared between Europe and Latin America FP6−2004−Infrastructures−6-SSA gLite Information System Pedro Rausch IF.
CISC Machine Learning for Solving Systems Problems Presented by: Suman Chander B Dept of Computer & Information Sciences University of Delaware Automatic.
Grid Deployment Enabling Grids for E-sciencE BDII 2171 LDAP 2172 LDAP 2173 LDAP 2170 Port Fwd Update DB & Modify DB 2170 Port.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Abel Carrión Ignacio Blanquer Vicente Hernández.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Using GStat 2.0 for Information Validation.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information Dump White Areas Lecture Laurence.
EGEE-II INFSO-RI Enabling Grids for E-sciencE GStat Work Plans for EGEE-III Joanna Huang, ASGC/OPS EGEE SA1 F2F Meetings, Abingdon.
FP6−2004−Infrastructures−6-SSA E-infrastructure shared between Europe and Latin America gLite Information System Claudio Cherubino.
Security Policy: From EGEE to EGI David Kelsey (STFC-RAL) 21 Sep 2009 EGEE’09, Barcelona david.kelsey at stfc.ac.uk.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks John Gordon SA1 Face to Face CERN, June.
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks APEL CPU Accounting in the EGEE/WLCG infrastructure.
EGEE-II INFSO-RI Enabling Grids for E-sciencE Practical: The Information Systems.
The National Grid Service User Accounting System Katie Weeks Science and Technology Facilities Council.
EGEE-II INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks Information System Tutorial Laurence Field.
INFSO-RI Enabling Grids for E-sciencE gLite Information System: R-GMA Tony Calanducci INFN Catania gLite tutorial at the EGEE User.
FESR Trinacria Grid Virtual Laboratory gLite Information System Muoio Annamaria INFN - Catania gLite 3.0 Tutorial Trigrid Catania,
Co-ordination & Harmonisation of Advanced e-Infrastructures for Research and Education Data Sharing Research Infrastructures Grant Agreement n
EGEE-III INFSO-RI Enabling Grids for E-sciencE EGEE and gLite are registered trademarks The Dashboard for Operations Cyril L’Orphelin.
E-science grid facility for Europe and Latin America Updates on Information System Annamaria Muoio - INFN Tutorials for trainers 01/07/2008.
Chapter 9 Database Planning, Design, and Administration Transparencies © Pearson Education Limited 1995, 2005.
The EPIKH Project (Exchange Programme to advance e-Infrastructure Know-How) gLite Grid Introduction Salma Saber Electronic.
Implementation of GLUE 2.0 support in the EMI Data Area Elisabetta Ronchieri on behalf of JRA1’s GLUE 2.0 Working Group INFN-CNAF 13 April 2011, EGI User.
EGEE is a project funded by the European Union under contract INFSO-RI DGAS Grid accounting L.Gaido on behalf of A.Guarise LCG Workshop November.
EMI is partially funded by the European Commission under Grant Agreement RI EMI Status And Plans Laurence Field, CERN Towards an Integrated Information.
Outline Introduction and motivation, The architecture of Tycho,
gLite Information System
Data Warehouse Components
The Development Process of Web Applications
Lecture 16: Data Storage Wednesday, November 6, 2006.
GGF OGSA-WG, Data Use Cases Peter Kunszt Middleware Activity, Data Management Cluster EGEE is a project funded by the European.
BDII Performance Tests
Presented by Munezero Immaculee Joselyne PhD in Software Engineering
gLite Information System
gLite Information System
Leigh Grundhoefer Indiana University
Indexing and Hashing Basic Concepts Ordered Indices
Data Model.
Final Design Authorization
Information System (BDII)
Reportnet 3.0 Database Feasibility Study – Approach
gLite The EGEE Middleware Distribution
Information Services Claudio Cherubino INFN Catania Bologna
Presentation transcript:

Information System Evolution Enabling Grids for E-sciencE EGEE-III INFSO-RI LDAP LDAP_ADD LDAP_MODIFY Query Merge Update Provider Plugin LDIF New LDIF LDIF DIFF Update LDIF Query The information system is a mission-critical component of the EGEE production infrastructure. It provides the detailed information about Grid services which is required to discover, select and use them during Grid related activities such as job and data management. The information system components are found throughout the infrastructure, and are especially sensitive to the information volume and query rate. As such it must be ensured that current components can meet the scalability requirements due to the growth of the infrastructure. An improved Berkley Database Information Index (BDII) [1] architecture is presented that has the potential to meet these future requirements. The information changes in the information system were monitored by recording the modified entries during each BDII update. Over a period of 9 days the changes for 1932 update cycles were recorded, which corresponds to approximately one update cycle every 7 minutes. A graph of the number of changes per cycle can be seen above. The average number of entries modified per update cycle was which corresponds to 21.8% of the total number of entries. A further investigation was conducted to find out how often each attribute type was changed and the results can be found in the table above. 97.8% of the changes are confined to 14 attributes which is only 4% of the total attributes used. In the current implementation all the entries are transported and updated during each cycle, which is inefficient. The new architecture for the BDII consists of a standard LDAP database which is updated by an external process. The update process obtains LDIF from a number of sources and merges them. It then compares this to the contents of the database and creates an LDIF file of the differences. This is then used to update the database. The aim of this approach is to reduce complexity within the BDII and speed up the update cycle, therefore enabling more data to be handled in a given time period. This increased efficiency can be directly seen from viewing the graph below, which shows the once minute load average before and after upgrading from BDII v4 to BDII v5. With the information being inserted in to the resource BDIIs as modifications to the database, this opens up number of possibilities. One possibility is to use LDAP replication mechanisms to automatically propagate these changes to the higher levels in the system. This would be a possibility for the site level BDIIs and would reduce the latency between the update of the resource BDII and the site level BDII. Due to the use of the Freedom of Choice for Resources (FCR) [4] mechanism, it may not be possible to use LDAP replication technologies. To improve efficiency in this case a compressed content exchange mechanism could be employed or the FCR mechanism may need to be re- evaluated. The Glue[2] information model version 2.0 is an official recommendation from the Open Grid Forum [3]. It consolidates over 4 years of production experience with the Glue 1.x series. A common information model is required to facilitate interoperation between Grid infrastructures, and the definition of version 2.0 in an open forum will increase its adoption by other infrastructures. Migrating the EGEE information system from Glue 1.3 to 2.0 will occur in three stages. Firstly the information system will be updated to support both versions. Secondly the information providers will be updated to produce both 1.3 and 2.0 information. Finally, applications can start migrating from using version 1.3 to 2.0. Glue 1.3 information will only be removed once applications have migrated to version 2.0. User Domain Admin Domain Resource Manager ShareEnd Point Activity Access Policy Mapping Policy Negotiates Share with Provides Manages Runs Defined on Contacts Maps User to Has Service GlueCEStateTotalJobs9.41% GlueCEStateFreeCpus9.52% GlueSAStateUsedSpace5.38% GlueCEStateFreeJobslots19.36% GlueCEStateWorstResponseTime11.79% GlueSASateAvailableSpace6.57% GlueCEStateEstimatedResponseTime12.50% GlueCEStateRunningJobs7.90% GlueCEInfoTotalCpus4.67% GlueCEStateWaitingJobs6.37% GlueCEPolicyAssignedJobSlots0.90% GlueServiceStartTime0.71% GlueSAUsedOnlineSize1.34% GlueSAFreeOnlineSize1.37% The graph above shows that the rate of increase with respect to the number of sites joining the infrastructure is slowing; however, for the number of cores and jobs per day it is increasing. Assuming a growth rate of 50 sites per year, by 2015 there could potentially be 550 sites. Each new site would contribute more fundamental services, users and resources. Assuming an exponential growth rate for the number of cores and computing activities (jobs), by 2015 the number of cores in the EGEE infrastructure could reach 500,000 and the number of jobs per day could reach 2 million. References: Overview BDII v5 Improved Performance! One minute load average before and after upgrading Future Directions GLUE 2.0 Has The growth of the number of sites, cores and jobs per day Infrastructure Growth Investigation into the frequency of changes [1] [2] [3] [4] Log Scale! M. W. Schulz and L. Field CERN-IT Authors: