DataGrid Kimmo Soikkeli Ilkka Sormunen
What is DataGrid? DataGrid is a project that aims to enable access to geographically distributed computing power and storage facilities belonging to different institutions. DataGrid is a project that aims to enable access to geographically distributed computing power and storage facilities belonging to different institutions. This will provide resources to process huge amounts of data coming from scientific experiments. This will provide resources to process huge amounts of data coming from scientific experiments. Project started (led by CERN) Project started (led by CERN) Project is funded by the European Union. Project is funded by the European Union.
Problems, which DataGrid tries to solve Different institutions may use different computing and storage systems and will also have local security rules. Different institutions may use different computing and storage systems and will also have local security rules. Researchers need to access all of the resources in a uniform, transparent and easy way. Researchers need to access all of the resources in a uniform, transparent and easy way. To use resources effectively, the user needs effective and dependable information systems that allow automatic resource discovery and allocation. To use resources effectively, the user needs effective and dependable information systems that allow automatic resource discovery and allocation.
DataGrid Applications High Energy Physics (HEP), led by CERN (Switzerland) High Energy Physics (HEP), led by CERN (Switzerland) Biology and Medical Image processing, led by CNRS (France) Biology and Medical Image processing, led by CNRS (France) Earth Observations (EO), led by the ESA/ESRIN (Italy) Earth Observations (EO), led by the ESA/ESRIN (Italy)
Virtual Organization Institutions and individuals belonging to the same community and working at the same scientific problems would greatly benefit from putting together their resources. Institutions and individuals belonging to the same community and working at the same scientific problems would greatly benefit from putting together their resources. Virtual Organization: concept which has been formulated to describe all those distributed communities willing to share their resources in order to achieve common goals. Virtual Organization: concept which has been formulated to describe all those distributed communities willing to share their resources in order to achieve common goals.
Work Packages 12 Work Packages 12 Work Packages - WP 1: Work Scheduling - WP 2: Data Management Working Groups - Testbed - Application - Middleware - Infrastructure 4 Working Groups - Testbed - Application - Middleware - Infrastructure
Middleware The Grid software is often called middleware because it is mid-level software that provides services to users and to the applications. The Grid software is often called middleware because it is mid-level software that provides services to users and to the applications. The DataGrid project is developing a new Grid middleware based on the Globus toolkit. The DataGrid project is developing a new Grid middleware based on the Globus toolkit.
The DataGrid Testbed Testbed: made up of one or more sites. Each site contains a certain number of machines, each one playing a different role. Testbed: made up of one or more sites. Each site contains a certain number of machines, each one playing a different role. First DataGrid TestBed released in mid- November First DataGrid TestBed released in mid- November New software modules have been developed and they have been used to set up a large European testbed that is now fully operational. New software modules have been developed and they have been used to set up a large European testbed that is now fully operational.
The Resource Broker: module that receives users' requests and queries the Information Index to find suitable resources. The Resource Broker: module that receives users' requests and queries the Information Index to find suitable resources. The Information Index: keeps information about the available resources. The Information Index: keeps information about the available resources. The Replica Manager: coordinates file replication across the testbed from one Storage Element to another. The Replica Manager: coordinates file replication across the testbed from one Storage Element to another. The Replica Catalog: keeps information about file replicas. The Replica Catalog: keeps information about file replicas.
The Computing Element: module which receives job requests and delivers them to the Worker Nodes, which will perform the real work. The Computing Element: module which receives job requests and delivers them to the Worker Nodes, which will perform the real work. The Worker Node: module installed on the machines which will process input data. The Worker Node: module installed on the machines which will process input data. The Storage Element: module installed on the machines which will provide storage space to the testbed. The Storage Element: module installed on the machines which will provide storage space to the testbed. The User Interface: module that allows users to access all the DataGrid service. The User Interface: module that allows users to access all the DataGrid service.
Submitting jobs The user specifies their requirements in a file using the Job Description Language (JDL). Example "myjob.jdl" The user specifies their requirements in a file using the Job Description Language (JDL). Example "myjob.jdl" The User creates a proxy process issuing the command: The User creates a proxy process issuing the command:"grid-proxy-init" The User submits their job issuing the command: The User submits their job issuing the command: "dg-job-submit myjob.jdl" The Resource Broker reads the user's requirements, finds suitable resources and finds the input data files. The Resource Broker reads the user's requirements, finds suitable resources and finds the input data files.
Submitting jobs The Resouces Broker submits the job to the selected Computing Element. Each submitted job is assigned a unique identifier. The Resouces Broker submits the job to the selected Computing Element. Each submitted job is assigned a unique identifier. The user can query the status of her job issuing the command "dg-job-status JobId" The user can query the status of her job issuing the command "dg-job-status JobId" When the status of the job is "Output Ready" the user can retrieve the output issuing the command: "dg-job-get-output dJobId" When the status of the job is "Output Ready" the user can retrieve the output issuing the command: "dg-job-get-output dJobId"
Web Interfaces, simulations tools and demonstrators Map Center Map Center - web based monitoring tool Genius Genius - web-based GUI for job Submission OptorSim OptorSim DataGrid Demonstrator DataGrid Demonstrator