The HDF Group Milestone 5.1: Initial POSIX Function Shipping Demonstration Jerome Soumagne, Quincey Koziol 09/24/2013 © 2013 The HDF Group 1
09/24/2013Initial POSIX Function Shipping Demonstration Overview – Mercury © 2013 The HDF Group 2 Mercury “Function Shipper”: RPC layer that supports Non-blocking transfers Large data arguments (w/RMA) Native transport protocols of HPC systems Mercury serves as a basis for higher-level frameworks that need to operate on/store/access data remotely HDF5 IOD virtual object plugin IOFSL I/O forwarding scalability layer Storage systems Analysis frameworks
09/24/2013Initial POSIX Function Shipping Demonstration Overview – Mercury © 2013 The HDF Group 3 Already largely presented in previous milestones No major modification of Mercury for this deliverable in order to support POSIX calls But Mercury is still being improved: Performance tuning on Infiniband cluster Support for additional network transports is being added (TCP / ibverbs / SSM) Paper submitted at end of Q4 now accepted and being presented at IEEE Cluster 2013: J. Soumagne, D. Kimpe, J. Zounmevo, M. Chaarawi, Q.Koziol, A. Afsahi, and R. Ross, “Mercury: Enabling Remote Procedure Call for High-Performance Computing”, IEEE International Conference on Cluster Computing, Sep 2013
09/24/2013Initial POSIX Function Shipping Demonstration Fast Forward Stack – Function Shipping © 2013 The HDF Group 4 HDF5 API VOL Mercury (Client) Mercury (Client) Mercury (Server) Mercury (Server) Native (H5) IOD VOL Network IOD VOL VFL … …
09/24/2013Initial POSIX Function Shipping Demonstration POSIX Function Shipping (Example) © 2013 The HDF Group 5 HDF5 API VOL VFL File System Mercury (Client) Mercury (Client) Mercury (Server) Mercury (Server) Native (H5) IOD VOL sec2 Network POSIX I/O
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX © 2013 The HDF Group 6 Support POSIX I/O routines through Mercury Completely separate package built on top of Mercury called: Mercury POSIX (lightweight library + server) Design keys: Support 32/64 bit platforms and large files No modification of original source code that uses POSIX I/O (e.g., HDF5 sec2 driver) Redirects I/O to Mercury server with dynamic linking Can make use of all the transports available through Mercury (although MPI dynamic connection is not really flexible and always available) Code for supporting POSIX routine is automatically generated inside Mercury POSIX by using BOOST preprocessor macros
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 7 Most routines are generated with one line macro Built on top of existing Mercury/Boost macros However supporting variable arguments routines requires some extra lines to create encoding / decoding routines that check argument flags etc
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 8 Two main macros: /* Non-bulk routines */ MERCURY_POSIX_GEN_STUB(func_name, ret_type, in_types, out_types) /* Bulk routines */ MERCURY_POSIX_GEN_BULK_STUB(func_name, ret_type, in_types, out_types, bulk_read)/* 1/0 if reading/writing bulk data */
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 9 Example, showing results of the following macro: /* off_t lseek(int fildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), )
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 10 /* off_t lseek(int fildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) Generate input structure typedef struct { hg_int32_t in_param_0; hg_off_t in_param_1; hg_int32_t in_param_2; } lseek_in_t;
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 11 /* off_t lseek(int fildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) Generate proc routine for input structure static __inline__ int hg_proc_lseek_in_t(hg_proc_t proc, void *data) { lseek_in_t *struct_data = (lseek_in_t *) data; hg_proc_hg_int32_t(proc, &struct_data->in_param_0); hg_proc_hg_off_t(proc, &struct_data->in_param_1); hg_proc_hg_int32_t(proc, &struct_data->in_param_2); return ret; }
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 12 /* off_t lseek(int fildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) Generate output structure typedef struct { hg_off_t ret; } lseek_out_t;
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 13 /* off_t lseek(int fildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) Generate proc routine for output structure static __inline__ int hg_proc_lseek_out_t(hg_proc_t proc, void *data) { lseek_out_t *struct_data = (lseek_out_t *) data; hg_proc_hg_int64_t(proc, &struct_data->ret); return ret; }
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 14 /* off_t lseek(int fildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) Generate client stub (simplified version) hg_off_t lseek(hg_int32_t in_param_0, hg_off_t in_param_1, hg_int32_t in_param_2) { lseek_in_t in_struct; lseek_out_t out_struct; hg_off_t ret; /* Initialization */...
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 15 /* Register function if not registered */ MERCURY_REGISTER("lseek", lseek_in_t, lseek_out_t); /* Fill input structure */ in_struct.in_param_0 = in_param_0; in_struct.in_param_1 = in_param_1; in_struct.in_param_2 = in_param_2; /* Forward call to remote addr and get a new request */ HG_Forward(addr, id, &in_struct, &out_struct, &request); /* Wait for call to be executed */ HG_Wait(request, HG_MAX_IDLE_TIME, &status); /* Get output parameters */ ret = out_struct.ret; return ret; }
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 16 /* off_t lseek(int fildes, off_t offset, int whence) */ MERCURY_POSIX_GEN_STUB(lseek, hg_off_t, (hg_int32_t)(hg_off_t)(hg_int32_t), ) Generate server stub (simplified version) static int lseek_cb(hg_handle_t handle) { lseek_in_t in_struct; lseek_out_t out_struct; hg_int32_t in_param_0; hg_off_t in_param_1; hg_int32_t in_param_2; hg_off_t ret;
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX – Stub Generation © 2013 The HDF Group 17 /* Get input buffer */ HG_Handler_get_input(handle, &in_struct); /* Get parameters */ in_param_0 = in_struct.in_param_0; in_param_1 = in_struct.in_param_1; in_param_2 = in_struct.in_param_2; /* Call function */ ret = lseek (in_param_0, in_param_1, in_param_2); /* Fill output structure */ out_struct.ret = ret; /* Free handle and send response back */ HG_Handler_start_output(handle, &out_struct); }
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX © 2013 The HDF Group 18 Routines currently supported: accessfdatasyncmkdirtruncate chdirfpathconfmkfifoumask chmodfstatmknodunlink chownfsyncopenwrite creatftruncatepathconf closegetcwdread+Large file versions: duplchownreadlinkcreat64 dup2linkrmdirftruncate64 fchdirlockfstatlseek64 fchmodlseeksymlinkopen64 fchownlstatsyncetc.
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX © 2013 The HDF Group 19 Routines not yet supported: closedir pipereaddir fcntlpreadrewinddir+Large file versions: opendirpwriteutime?
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX - Configuration © 2013 The HDF Group 20 Environment variables required: MERCURY_NA_PLUGIN : Underlying network transport method used to forward calls to remote servers. e.g., "bmi” MERCURY_PORT_NAME : Port name information (IP/port) specific to the network transport chosen – used to establish a connection with a remote server. e.g., "tcp:// :22222” LD_PRELOAD: Path to Mercury POSIX shared library. e.g., “/usr/local/lib/libmercury_posix.so” Setting LD_PRELOAD redirects all POSIX calls to the Mercury server (can be an issue with local scripts, etc. that make use of POSIX I/O)
09/24/2013Initial POSIX Function Shipping Demonstration Mercury POSIX - Testing © 2013 The HDF Group 21 Integrated regression tests (limited POSIX test suite) HDF5 sec2 driver (demo) Lustre POSIX test suite However: framework issues, needs to be modified, possibly need to support fdopen and FILE* routines?
09/24/2013Initial POSIX Function Shipping Demonstration Demo – Mercury POSIX and HDF5 tools © 2013 The HDF Group 22 $ pwd ~jsoumagne/demo $ ls *.h5 ls: *.h5: No such file or directory $ export MERCURY_NA_PLUGIN=“bmi” $ export MERCURY_PORT_NAME=“tcp:// :22 222” $ export LD_PRELOAD=/path/to/libmercury_posix. so $ pwd ~jsoumagne/demo_server $ ls coord.h5 $ mercury_posix_server bmi Waiting for client...
09/24/2013Initial POSIX Function Shipping Demonstration Demo – Mercury POSIX and HDF5 tools © 2013 The HDF Group 23 $ h5dump -H coord.h5 HDF5 "coord.h5" { GROUP "/" { DATASET "multiple_ends_dset" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4, 5, 3, 4, 2, 3, 6, 2 ) / ( 4, 5, 3, 4, 2, 3, 6, 2 ) } } DATASET "multiple_ends_dset_chunked" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 4, 5, 3, 4, 2, 3, 6, 2 ) / ( 4, 5, 3, 4, 2, 3, 6, 2 ) } } DATASET "single_end_dset" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2, 3, 6, 2 ) / ( 2, 3, 6, 2 ) }... (skip) $ mercury_posix_server bmi Waiting for client... Thu, 19 Sep 13 17:31:00 CDT: Executing open64 Thu, 19 Sep 13 17:31:00 CDT: Executing __fxstat64 Thu, 19 Sep 13 17:31:00 CDT: Executing lseek64 Thu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:31:00 CDT: Executing lseek64 Thu, 19 Sep 13 17:31:00 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:31:00 CDT: Executing getcwd... (skip)
09/24/2013Initial POSIX Function Shipping Demonstration Demo – Mercury POSIX and HDF5 tools © 2013 The HDF Group 24 $ h5copy -i coord.h5 -s single_end_dset -o coord_simple.h5 -d simple Thu, 19 Sep 13 17:33:51 CDT: Executing open64... (skip) Thu, 19 Sep 13 17:33:51 CDT: Executing __fxstat64 Thu, 19 Sep 13 17:33:51 CDT: Executing lseek64 Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:33:51 CDT: Executing lseek64 Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_read... (skip) Thu, 19 Sep 13 17:33:51 CDT: Executing hg_posix_write Thu, 19 Sep 13 17:33:51 CDT: Executing close
09/24/2013Initial POSIX Function Shipping Demonstration Demo – Mercury POSIX and HDF5 tools © 2013 The HDF Group 25 $ h5dump coord_simple.h5 HDF5 "coord_simple.h5" { GROUP "/" { DATASET "simple" { DATATYPE H5T_STD_I32LE DATASPACE SIMPLE { ( 2, 3, 6, 2 ) / ( 2, 3, 6, 2 ) } DATA { (0,0,0,0): 0, 1, (0,0,1,0): 1, 2,... (skip) (1,2,2,0): 122, 123, (1,2,3,0): 123, 124, (1,2,4,0): 124, 125, (1,2,5,0): 125, 126 } Thu, 19 Sep 13 17:36:57 CDT: Executing open64 Thu, 19 Sep 13 17:36:57 CDT: Executing __fxstat64 Thu, 19 Sep 13 17:36:57 CDT: Executing lseek64 Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:36:57 CDT: Executing lseek64 Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_read... (skip) Thu, 19 Sep 13 17:36:57 CDT: Executing lseek64 Thu, 19 Sep 13 17:36:57 CDT: Executing hg_posix_read Thu, 19 Sep 13 17:36:57 CDT: Executing close
09/24/2013Initial POSIX Function Shipping Demonstration Conclusion – Future Work © 2013 The HDF Group 26 Very easy to forward POSIX I/O calls and does not require modification of existing tools / code Mercury POSIX can be easily extended to support additional system / library calls Can directly take advantage of updates to Mercury (network transports, etc.) Next Quarter: Support remaining POSIX routines Test with MPI I/O (ROMIO driver) Test with Lustre POSIX test suite If framework issues are solved
09/24/2013Initial POSIX Function Shipping Demonstration Questions © 2013 The HDF Group 27