Download presentation
Presentation is loading. Please wait.
Published byGervais Gray Modified over 9 years ago
1
BaBar MC production BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results The simple question: How can we run BaBar software on EDG grid sites?
2
ParrotChirp Introduction of Parrot BaBar MC production software Farm @ VU (Amsterdam University) A lot of computers EDG testbed (NIKHEF) Jobs Results We need transparent access to the Objectivity Database (requires local file access)
3
Parrot functionality BaBar MC production The Parrot Virtual File System HTTPFTPRFIONeSTChirp Local Cache HTTP Server FTP Server (POSIX Interface) Whole File I/O (get/put) Partial File I/O (open,close,read,write, lseek) RFIO Server NeST Server Chirp Server Condor Proxy Secure Remote RPC Condor Shadow Integration with Castor Traditional I/O Services Allocation and Mgmt Full UNIX Semantics Integration with Condor (Ptrace trap) Not yet x509 Optimize
4
Private network Relay GCB Parrot Chirp NFS The introduction of GCB BaBar MC production software Farm @ VU (Amsterdam University) EDG testbed (NIKHEF) Condor-G Jobs Results Some computers A lot of computers Jobs Results
5
GCB functionality GCB Server Central Manager A B P Private network Persistent connection Relay NATNAT
6
PBS job manager 72 hour jobs Can’t wait for queues Private network NFS BaBar MC production software Queue Batch job Condor-G Job GlideIn EDG testbed (NIKHEF) Relay Private network Relay Parrot Chirp The introduction of GlideIn Farm @ VU (Amsterdam University) Jobs Results Some computers A lot of computers Jobs Results GCB
7
GlideIn functionality
8
Private network PBS job manager 72 hour jobs Can’t wait for queues Private network NFS BaBar MC production software Queue Batch job Condor-G Job GlideIn EDG testbed (NIKHEF) Relay Parrot Chirp Overview of complete setup Farm @ VU (Amsterdam University) Jobs Results Some computers A lot of computers Jobs Results GCB
9
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Leave only the components Farm @ VU (Amsterdam University) Some computers A lot of computers GCB
10
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp The interesting dependencies Farm @ VU (Amsterdam University) Some computers A lot of computers GCB NAT box Different MDS scheme Objectivity database LOCK server sockets NFS problems UID / hostname checks Dropping UDP packages Timeout 2 minutes Inactive sockets Inactive File I/O
11
Consequences Different MDS scheme –Implemented EDG scheme for GlideIn Objectivity –A lot of debugging –Made Parrot mimic hostname and uid –Tricked Objectivity to use standard NFS libraries Aggressive NAT box –Changed GCB to use TCP instead of UDP –Used Parrot to keep sockets alive –Parrot recovers File I/O when TCP connection is lost We are the first to run Objectivity cross-domain
12
Performance 5001000 15002000 Events Time (minutes) 500 1000 1500 2000 2500 3000 Application Initializes 10 times slower Production 3 times slower Production on local machine Production on EDG testbed
13
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Possible improvements Farm @ VU (Amsterdam University) Some computers A lot of computers GCB Parrot: Caching On per directory basis Requires debugging Create more sophisticated tool to acquire resources Resource planning, distribution, etc. Maybe something fancy already exists?
14
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Move chirp servers to private nodes Farm @ VU (Amsterdam University) Some computers A lot of computers GCB Use Condor/GCB machinery for chirp server Solves security issues Allows chirp server to be on private nodes Requires new chirp-condor implementation
15
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Move GCB to head node Farm @ VU (Amsterdam University) Some computers A lot of computers GCB Move GCB to same machine as Central Manager Solution required for port conflicts Temporary solution: Move CM to a private node
16
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Use EDG data storage Farm @ VU (Amsterdam University) Some computers A lot of computers GCB EDG data storage Write events to EDG data storage (gsiFTP) Requires debugging
17
PBS job manager NFS BaBar MC production software Queue GlideIn EDG testbed (NIKHEF) Private network Parrot Chirp Use more sites Farm @ VU (Amsterdam University) Some computers A lot of computers GCB Private network A lot of computers Other testbed EDG data storage Let GCB manage several private networks at the same time Requires solution for conflicting private addresses
18
Conclusions It works –BaBar MC production runs successfully on NIKHEF EDG testbed –All this experimental software actually works when used together It looks easy –Our GRID setup is complicated, but…. –Parrot hides problems related to local file access –GCB hides problems related to network configurations –GlideIn hides complications with resource gathering –The user can just submit his/her jobs to a local batch system There is some work to do –Performance could be better Initialization 10 times slower Production 3 times slower –Caching and (semi-) local event storage should improve this –Usability could be improved GlideIn should have a tool to acquire them Several improvements proposed for GCB/Parrot The improvements are done at the level of the “grid” tools –The user benefits without rewriting code
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.