Job submission architectures in GRID environment Masamichi Ando M1 Student Taura Lab. Department of Information Science and Technology
Background(1) Large computing power is required in emerging scientific fields Astronomy High-energy physics Genomic science Etc.
Background(2) We can get large computing power by connecting many computers through the network. Fast computational resources Fast access to large quantities of data Access to Data which is physically distant But large-scale distributed computing (Grid) still has many problems. We look at job submission.
Contents 1. GRID environment 2. How single sign-on is realized 3. Towards rapid job submission 4. GRID beyond firewalls 5. Conclusion
1.GRID environment Because of its feature, grid environment requires specific architecture, such as secure authentication, authorization and others.
Grid Grid is large-scale(Continental-scale or National-scale) distributed computing E.g. CERN ’ s LHC Global project in particle physics which will start in 2006 Generate petascale data every year Grid includes different administrative domains
User population is large and dynamic The system which has only one “ master ” doesn ’ t have scalability. User node1 node2 node3 node4 node5 Server
Resource pool is large and dynamic The system that a crash in a part affects the whole doesn ’ t have scalability. network Crash!
A computation acquire and release resources dynamically Single sign-on(User should be able to authenticate once and compute without further authentication) computing resource User Without authentication
Communication support Some application require specific communication mechanism Unicast and Multicast Low-level communication connection(e.g., TCP) Dynamic connection for dynamic resources and users
Authentication and Security Resources are subject to its local security policy An individual user is associated with different local name space at different administrative domain
About job submission We require … Single sign-on Rapid and scalable job submission More nodes to be participate in Grid
2.How single sign-on is realized Survey of GSI(Grid security infrastructure) developed as part of Globus project
Globus toolkit (de facto standard) Globus toolkit is a bag of service for GRID computing One of them is the GSI(grid security infrastructure) GSI provides single sign-on and other security architectures
USER PROXY Definition session manager process given permission to act on behalf of a user for limited period of time Advantage User can realize single sign-on by generating user proxy before computing
RESOURCE PROXY Definition An agent that represents a resource Serve as the interface between the grid security architecture and the local security architecture
Resource Allocation Protocol User Site A Site B Site C Child process User proxy process Resource proxy Resource proxy
3. Towards rapid job submission Survey of Gfpmd(Gfarm Process Management Daemon) developed by iwasaki
Gfpmd Gfpmd is developed as part of the Gfarm(Grid Data Farm). Gfarm architecture is designed for global Petascale data intensive computing. Gfarm uses GSI for communication.
Overhead of authentication Using GSI for authentication, if an ingenuous method is used to start a job, it takes the time proportional to the number of nodes. It is expected to take several thousand seconds for starting job which consists of thousands of process. Gfpmd is aiming to shorten this overhead
Connect before Computing (GSI Authentication with Host Credential) Node ANode CNode BNode D Gfpmd gfpmd User GSI authentication
Ring-connection structure(1) Crash occurs
Ring-connection structure(2)
Ring-connection structure(3)
I/O tree is built in parallel for each job
Examination Examine the gfpmd with small job. ( ) number of nodes seconds
4. GRID beyond firewalls Survey of VPG(Virtual Private Grid) developed by Kaneda
Restriction VPG is designed for … Automatically work around administrative restrictions Utilize machines without changing administrative restrictions subnet Cannot connect Node B Log on to gateway Node A Node C
VPG VPG provides shell nicknaming (giving an unique name to each host independent of DNS names) job submission to any nicknamed host redirection from/to a file on any host pipe between commands executed on any host
VPG architecture Internet LAN Node C Node B Node A (private IP) (global IP) Cannot connect Bi-directional connection vpgd
Using SSH port forwarding LAN Node BNode A (private IP) (global IP) vpgd Node C Cannot connect (global IP) Use SSH port forwarding with empty pass-phrase vpgd
VPG nicknaming LAN X Node B private IP “ ” No dns name nickname “sky” LAN Y Node C private IP “ ” No dns name nickname “marine” Node A global IP “xxx.xxx.xxx.xxx” *.u-tokyo.ac.jp nickname “earth” vpgd Same IP (private IP) No dns namenickname Job to node B Job to node C
Spanning tree connection Home node normal ssh
Examination Compare vpg with other tools by submitting a small job. seconds
5. Conclusion We introduce GRID environment and three architectures for job submission. Single sign-on architecture using USER PROXY. Rapid job submission architecture using gfpmd. An architecture to utilize machines beyond firewall using vpgd.