ATLAS Metadata Interface Campaign Definition in AMI S.Albrand 23/02/2016ATLAS Metadata Interface1
Story so far Requested last SW week. Details of proposed implementation circulated in January. – Some examples received – but not all my questions answered. Implementation started <2 weeks ago.
"Campaigns are defined by :" "A short name (30 characters) : unique in the database A dataset project (or set of projects) A map which associates each member of a set of pairs of productionSteps and dataTypes to a set of AMI configuration tags A description (1000 chars) The dataset projects are either dataNN_* or mcNN_*. Thus datasets which do not belong to these groups cannot be part of production campaign (such as valid_*, user*, group*)" N.B. Nothing was said about streams
Defines a data campaign. Requires two arguments: campaignName - a short name (30 alpha numeric characters, no spaces) projectName - a datasetProject name ( a mistake on my part – needs a separate step) If no other argument is given a new empty campaign is created. Optional arguments: ( for an MC campaign ) pyDict - a python dictionary in Text format. (pyAMI only) campaignDictFile=filename (containing the dictionary as described above) StreamName (equivalent to physicsShort, usually omitted ) (I added this because at least one of the examplesI was given had a stream wild card) description - a long (1000 chars) description of the campaign. AddCampaign Not yet implemented
Examples: AddCampaign campagnName=mc11a projectName=mc11_7TeV pyDict="{'MC11c': {'mc11_7TeV': {'*': {'recon': {'AOD': ['r3043', 'r3060', 'r3108', 'r3072', 'r3073', 'r3074', 'r3075', 'r3076', 'r3077', 'r3078', 'r3079', 'r3080', 'r3081', 'r3082', 'r3083', 'r3084', 'r3085', 'r3086', 'r3044', 'r3110', 'r3097', 'r3071', 'r3068', 'r3070', 'r3069', 'a145', 'a146'], 'ESD': ['r3043', 'r3060', 'r3108', 'r3072', 'r3073', 'r3074', 'r3075', 'r3076', 'r3077', 'r3078', 'r3079', 'r3080', 'r3081', 'r3082', 'r3083', 'r3084', 'r3085', 'r3086', 'r3044', 'r3110', 'r3097', 'r3071', 'r3068', 'r3070', 'r3069', 'a145', 'a146']}, 'merge': {'AOD': ['r2993', 'r3109', 'r3063']}, 'digit': {'RDO': ['d621', 'd622', 'd623', 'd619']}}}}} "description='This is an example' Questions : Once the existing campaigns have been entered in AMI will anyone need this? If yes, do you need an "overwrite" function? AddCampaign campagnName=mc11a_empty projectName=mc11_7TeV [description="a description"] /* creates (reserves) an empty campaign */
Problems I received several examples of pyDict format from different people, and the formats were all a bit different. I chose Borut's format as it looked "real". My error : I have made (by error) a simplification. At the moment one campaignName is associated with exactly one project and stream. – Not too difficult to correct transparently – but decided to ignore it for the moment – so that I could have something to show today.
ListCampaign ListCampaign –pyDict=true campaignName=solveig_test2 {'solveig_test2': {'mc11_7TeV': {'*': {'recon':{'AOD': ['a146', 'r3000', 'r1235', 'r2346'], 'ESD': ['r1234', 'r2345']}}}}} Or get it in standard AMI format. Questions : Does the order of the tags matter? Who reads the dict format? Can I have a copy of the reading code?
Other functions for filling a campaign Already available: – AddProdStepGroup : adds a prodstep, and dataType couple and optionally a tagSet to a campaign. Rejects illegal values. – AddTagSet : adds a tagSet to a prodstep, dataType couple of a campaign. Rejects undeclared tags. The other ones described in the specification will follow.They are "Updates" and "Removes". I will of course correct the treatment of projectTags. Are you sure you really want streams? (Data Prep uses the same super tag for all streams)
A few remarks & questions I suppose that there will be a phase of building up a definition with fairly frequent updates? – Borut said "No notion of "closed" campaigns" How do clients want to be informed of changes in a campaign definition? What do they do with the information? – Presume that if DDM is using regex to identify datasets as part of a campaign, then they can generate them themselves from a pyDict? It doesn't seem very scalable to me to mark in AMI which datasets as members of a campaign which may change at any moment (or is it always additive?)
Messy Tag prodstep coupling I would have liked to be able to say to a client "This tag type does not go with the dataType/prodStep you provided". But the use of tags, and even prodSteps is too messy. (Double use of s tags and r tags in particular) So I am only checking that prodstep is declared and that a tag exists at the moment. I will add a warning "This tag is already in another camapign"
Next steps Make web interface (c.f. Period definition interface) Test it ? Who? Document. Release…