NUG Meeting 1 File and Data Conversion Jonathan Carter NERSC User Services
NUG Meeting 2 Introduction Converting file and data for use on the IBM SP IBM uses IEEE data representation Industry standard Fortran unformatted file structure Tools available on the Cray systems Tools available on the IBM SP
NUG Meeting 3 Demand for File Conversion Currently, CTSS text files ctou, rlib will be available on the IBM SP After decommissioning the Cray Systems in October 2002 Cray Fortran unformatted files Cray C binary files
NUG Meeting 4 Tools on the Cray Systems - FFIO Flexible File I/O - general system of specifying how data should be written or read Can be used without recompiling or linking (Fortran) Can be changed at runtime Various layers available to convert both file structure and data Controlled via the assign command
NUG Meeting 5 assign Command Can specify how I/O is done On a Fortran unit basis: assign –F f77 u:10 On a filename basis: assign –F f77 f:filename Common options Clear assigns: assign -R See current assigns in effect: assign -V
NUG Meeting 6 Fortran Unformatted Sequential-access Files Cray uses a vendor specific format called COS blocked, or simply blocked IBM (and most Unix vendors) use f77 blocking Use –F f77 option to have the FFIO f77 blocking layer used instead of the default COS blocking: assign –F f77 u:10 T3E already uses IEEE arithmetic, so –F f77 is sufficient Note that default real and integer data types on the T3E are 64 bit SV1 data needs to be converted, so an IEEE conversion layer is needed -N ieee performs basic conversion assign –F f77 -N ieee f:filename
NUG Meeting 7 Fortran Unformatted Direct-access Files Files are not blocked on Cray or IBM Data conversion layers can be used as in sequential- access files for the SV1 machines assign -N ieee u:20 T3E files don’t need any conversion
NUG Meeting 8 C Binary Files Files are not blocked on Cray or IBM FFIO conversion layer not easy to use Use library routines such as cry2cri
NUG Meeting 9 Using FFIO to Convert a File Isolate I/O statements for the file from program to make a simple conversion program Pair each read with a write Use assign to have all written data converted, or use data conversion routines
NUG Meeting 10 Tools on the IBM SP - NCARU Library Library developed by the SCD at NCAR Read COS blocked file Convert Cray data to IEEE data Does not use Fortran API, so program modification is required Basic calls are crayopen, crayread, crayrew, crayback, crayclose Calls to crayread can convert data if record is composed of one data type only, otherwise user must handle explicitly Conversion routines are ctodpf, ctospf, ctospi Cray Fortran I/O sometimes inserts padding, user must handle explicitly
NUG Meeting 11 Using the NCARU Library To use: module load ncaru xlf -o a.out b.f $NCARU Limitations 2GB limit for unblocked files Currently no 64 bit address space support Not thread-safe No support for 128 bit data
NUG Meeting 12 Dealing with Different Files Open using blocked option to crayopen for Fortran unformatted sequential access, open with unblocked option for Fortran unformatted direct access If written on the SV1 use conversion option on read, or call conversion routines directly C binary files can be read by the unblocked I/O calls or by usual C I/O followed by data conversion routines
NUG Meeting 13 Records with Mixed Data Types Read into a buffer and convert items one by one real x(50) integer n(50) real*8 buffer(100) ! open in blocked mode ifc = crayopen(‘filename’,10,0) ! read record without converting nwds = crayread(ifc,buffer,100,0) ! convert data call ctospf(buffer,x,50) call ctospi(buffer(51),n,50)
NUG Meeting 14 Data Padding With Cray Fortran I/O, extra bytes are inserted into the user data. In cases where padding occurs, bytes are inserted so that any datum of length 8 bytes is at a byte offset, which is measured from the beginning of the record, that is a multiple of 8 bytes. Then the end of the record is padded so that the whole record length is a multiple of 8. Padding will only occur if you have used character variables that are not of lengths that are a multiple of 8 or have used real*4 or integer*4 data on the T3E (on the SV1 systems, 8 bytes are used).
NUG Meeting 15 Example A Fortran record is written on an SV1: real a(50) integer n(50) character*17 label write(50) n, a, label The lengths of n, a, and label are 8 bytes, 8 bytes, and 17 bytes respectively. Within the Fortran record, n starts at offset 0, a at offset 400, and label at offset 800. The only padding that occurs is at the end of the record, where 7 bytes are added to make the total record length 816 bytes, which is a multiple of 8.
NUG Meeting 16 Example A Fortran record is written on an SV1: real a(50) integer n(50) character*17 label write(50) label, n, a Without padding, the alignments are label at offset 0, a at offset 17, and n at offset 417. Since a has elements of length 8 bytes, it must be written at an offset that is a multiple of 8 bytes; therefore a pad of 7 bytes is inserted between the end of label and the beginning of a. In the record that is written to the file, the alignments are label at offset 0, a at offset 24, and n at offset 424.
NUG Meeting 17 Example A Fortran record is written on the T3E: real a(40), b(40) integer*4 n(13), m(13) character*12 label write(50) label, n, a, m, b The data has lengths: label 12 bytes, n and m 52 bytes, and a and b both 320 bytes. Without padding, the alignments are label at offset 0, n at offset 12, a at offset 64, m at offset 384, and b at offset 436. a and b need to be at offsets that are a multiple of 8 bytes; the offset of a is already correct, but 4 bytes must be inserted before b, so that it starts at offset 440.
NUG Meeting 18 crayconv Utility crayconv automatically converts files written on the SV1 to IBM compatible format Basic Fortran data types only Sequential access unformatted files only Possible problem if compiler option -Onofastint used, or integer*8 explicitly declared and written-- Integers over 2 46 not correctly interpreted Pad data not removed Extension to T3E data and direct access unformatted files planned
NUG Meeting 19 More Information -by Mike Stewart ml man ncaru
NUG Meeting 20