1 Metadata Working G roup Report Members (fixed in mid-January) G.AndronicoINFN,Italy P.CoddingtonAdelaide,Australia R.EdwardsJlab,USA C.MaynardEdinburgh,UK D.PleiterDESY,Germany J.SimoneFNAL,USA T.YoshieTsukuba,Japan B.Joo (observer)Edinburgh,UK Mailing List –About 80 mails circulated QCDML (QCD Markup Language) for ILDG
2 0. Introduction 1.QCDML: Strategy and Standard Configuration Format (T.Yoshie) 2.QCDML: Physics (C.Maynard) 3.QCDML: Machine and Management (D.Pleiter) My proposal for QCDML not be used in my talk may be useful for discussions
3 Strategy QCDML: XML schema for ILDG –write a QCDML document for each configuration –store QCDML documents in (a) database(s) –search/retrieve configurations design QCDML so that developing applications is easy QCDML defines a minimal set of XML tags –necessary for exchanging configurations tags which will be searched –researchers are usually interested in required: physics parameters (beta,mq) not included: random number seed
4 Strategy (cont.) Each collaboration can extend QCDML and use it for own purposes Every collaborations are asked to provide values of all relevant QCDML tags
5 Category of QCDML Standard configuration format (SCF) 1.Physics and parameters 2.Algorithm and status 3.Code 4.Machine 5.Management 6.Miscellaneous finalized 4,5: almost finalized 1: discussions on-going (different opinions)
6 SCF: Strategy Standard Format is an abstract (reference) format for exchanging configurations –collaborations submitting configurations to ILDG do not have to convert archived files –some groups have already archived a lot of configurations with an original format –each format is chosen for convenience Conversions will be done at a user side –two methods to convert format of configurations given format to the standard one via C-library one format to another using BinX technology (without referring to the standard format)
7 SCF: Format Definition of Gauge configuration – i,j=1,2,3 color indices mu=1,2,3,4 (x,y,z,t) employ NERSC (Gauge Connection) format –a sequence of 8-byte double precision real numbers –coded in 32-bit IEEE numerical format –endian is not specified
8 SCF: Format (cont.) In C-program, –last index runs faster, index runs from 0 re =0 (real part) re=1 (imaginary part) Store first two rows (2x3) of 3x3 link matrix –U11,U12,U13,U21,U22,U23 mu=1,2,3,4 x=0,1,2,...NX-1 y=0,1,2,...NY-1 z, t double Complex*16 Row-Column Column-Row
9 SCF: C-library Each collaboration submitting configurations to ILDG prepares a C-library to read their configurations in the standard format –pointer to the C-library is stored in QCDML document read a hyper-cubic region – (ix0:ix1)* (iy0:iy1) *( iz0:iz1)* (it0:it1) of (0:NX-1)*(0:NY-1)*(0:NZ-1)*(0:NT-1) lattice void ILDG_read_conf(file, NX, ix0,ix1, NY, iy0,iy1, NZ, iz0,iz1, NT, it0,it1, endian,config)
10 SCF: C-library (cont.) the region (0-3)*(4-7)*(4-7)*(0-15) of the whole lattice (0-7)*(0-7)*(0-7)*(0-15) will be read in big endian format and stored in U[8][4][4][4][4][2][3][2]. main() { int NX=8,NY=8,NZ=8,NT=16 ; int endian=1 ; /* big endian, =0 for little endian */ double U[8][4][4][4][4][2][3][2] ; ILDG_read_conf("test-file", NX,0,3, NY,4,7, NZ,4,7, NT,0,15, endian,U) ; }
11 SCF: C-library (cont.) in general, the conversion program requires huge memory of 1-2 configuration size: --- memory bottleneck cannot be avoided We propose the above interface: –Simple –mainly for full QCD configurations 32^3 x Nt lattice for forthcoming several years can be handled by a high-end PC with memory of 2GB some extension might be necessary in future
12 SCF: BinX BinX –an XML schema to describe format of binary file developed by the edikt project (a part of OGSA-DIA) –software to convert one binary format to the other will be available in May, 2003 –enables us to convert configuration without referring to the standard format Each collaboration submitting configurations to the ILDG describes its own format by BinX –User may write his/her favorite format in BinX
13 SCF: BinX (Cont.)
14 SCF: BinX (Cont.) Mechanism for describing an array split across several files
15 Distribution SCF defines format of only binary configuration –no parameters (size,coupling..) –no management info (checksums, collaboration name..) –all of them are described in a QCDML document Keeping identification of configuration –encapsulate the configuration and the QCDML document into one file –distribute it via ILDG –(need opinions and help from the middleware working group)
16 Distribution (cont.) Candidate : DIME (Direct Internet Message Encapsulation) –format is fixed (different from MIME) header (fixed bytes) length (fixed bytes) body of data (QCDML document) length (fixed bytes) body of data (QCDML-BinX document) length (fixed bytes) body of data (configuration itself) footer (fixed bytes)
17 Distribution (cont.) Merits –don’t have to unpack files before reading –file size is not increased (cf. MIME: factor 3/2 incl.) Discussions: –prepare a tool to extract QCDML document –C-library has to seek the file to point the origin (the first byte) of binary configuration –Compatibility with BinX
18 My opinion for QCDML my opinion/proposal agreed by working group Physics –actions, physics parameters, lattice size Simulation –algorithm, machine, code, series, trajectory Management –revision, crc, reference, collaboration, project, action Pointers –site, file, C-library
19 Action a human readable document for each action –XML schema is powerful, but cannot describe completely the action Three versions –UKQCD Schema v0.5 –A compromise proposal –My very simple version Problems in UKQCD schema –too complicated Action consists of operators Operators consist of coupling and fields –Action and operator names are XML tags
20 Action (cont.) My very simple version –just listing up coupling names and values A compromised version sample2.xml sample2.xml –fields for each operator are removed –names of actions and operators are described by values –action is divided into gluon and quark sections enables us to include boundary conditions
21 Simulation Algorithm section: –we may have to prepare a human readable document –simple version is sufficient Machine Code Series –several runs with the same parameter sets –distinguishes them Trajectory_or_Sweep
22 Management Action Checksums –CRC32 or MD5 –for binary configuration with original format Collaboration name and Project Name –Useful tags to search configuration Reference –some information not suitable to include into QCDML auto-correlation time –do not have to include all references Revision –To check whether the QCDML document is changed