Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007.

Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007

2 DQ2 0.4.x Continue to optimize DB schema to cope with higher load –channel allocation to follow ‘Dataset Subscription policy’ Hiro/Patrick also asking for local configurable ordered list of preferred sources within cloud –implications on channel allocation How much to ‘prefer’ a T1 before going to a T2 for a replica? Right now, shortest queue wins… –distinguishing files unlikely to have replicas in the future (bad subscriptions) particularly in the local monitoring –removing ‘holes’ in system (growing backlogs) Reduce load (better GSI session reuse) Goal O(100K) file transfers/day/site –or SRM/storage limitations –Need better understanding outside DQ2

3 Local monitoring of site services

4 Staging… Did not recognize this was a problem for OSG.. It is very hard to do with remote storages without SRM –FTS 2 + SRMv2 move on the right direction but not there yet Could do a local mechanism for T1->T2 transfers in the same cloud –provided site services for T2 run “close” to the T1 storage … but not for cross T1 transfers

5 Hierarchies current thoughts, for discussion Hierarchical datasets would be a special kind of dataset. These would have only 2 states: open AND frozen These would not have versions The constituents of a hierarchical dataset could only be closed dataset versions or frozen datasets Not sure if the following commands should be provided explicitly: –list files in hierarchical dataset directly? or only list datasets in hierarchical dataset and forcing user to loop over results? –subscribe open hierarchical dataset? or only allow listing datasets in open hierarchical dataset and forcing user to manually subscribe sub-units point is: having to loop over OPEN hierarchies (likely manageable) –locations of hierarchical dataset? or only allow listing locations of the individual datasets in the hierarchical dataset?

6 Merging Not much to do from DQ2 side here but provide an attribute for each dataset –“merged” Y/N (or protocol: zip, tar?) DQ2 does 3rd party transfers only –does not actually ‘see’ the data

7 Checksums Not much from DQ2 here but enforcing checksums in the central catalogues and its protocol –‘md5:’ for MD5 adler32 is frequently discussed as a better checksum candidate –but not relevant to DQ2, rather to the sites and production people

8 Subscription lifetime Increasingly important… –Would clean up what no one is cleaning up now… (some sites with O(100K) files in impossible situations) Discussion from yesterday: –allow only waitForSources to be set by users with production role ? avoid creating looping subscriptions in the system Forbid subscriptions for datasets with more than X files, if not production user requesting? Forbid more than Y subscriptions per sure, if not production user? Ignore subscription - regardless of its state - after more than 3 months? –Subscription is marked as broken

9 Central catalogues [ as mentioned yesterday ] Main changes are: –for Scalability only… –dropping VUIDs (becomes DUID+Version number) –DUID becomes timestamp-oriented UUID so that backend is partitioned in time and highly optimized UUID storage on ORACLE –meaning shorter index ORACLE partitioning, redirect service… –.. but fully backward compatible with 0.3 clients Many queries become much faster –list files in dataset is query by DUID as opposed to query by N number of VUIDs –ORACLE IOTs guarantees listing files from a dataset [version] reads close to sequential blocks on disk

10 Location catalogue [ as mentioned yesterday ] Location catalogue will be populated asynchronously with: –information on missing files –(re)marking complete/incomplete locations for existing datasets - consistency –Missing files are extra information made available on ‘best- effort’ to the users derived from request by Ganga This is populated by the ‘tracker’ service –Which was being reworked for the site services –The tracker service is a ‘stronger’ Fetcher (as existing on the site services), used to find content on site VS content missing on site - one of the site services performance bottleneck

11 Dashboard Relatively big update coming soon –distinguish errors source/destination –display messages on the dashboard for all sites –alarms supported –more overview of site services state from a central place e.g. states of files (based also on new site services monitoring)

12 ToA More and more info there… Blacklist/whitelist Preferred site connections This is a cache file, same style as ToA –but independent file from ToA cache since it is more dynamic ToA renewal much stronger –I’d claim it is the most reliable info system so far on the Grid :-)

13 Communication… … still not working: –e.g. did not recognize staging as a problem –e.g. 0.3.2 apparently not deployed on OSG T2s quite bad as 0.3.1 had a simple bug where agents could simply die whenever a glitch happened in the central catalogue connection –glitches “common” with the central catalogue request rate, but harmless and ok to retry … what to do here? Jabber chatroom :-) –ddmdev@conference.jabber.orgddmdev@conference.jabber.org –ask me - msbranco@gmail.com or atlas-dq2-dev@cern.ch - to be authorizedmsbranco@gmail.comatlas-dq2-dev@cern.ch

Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007.

Similar presentations

Presentation on theme: "Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007.

Similar presentations

Presentation on theme: "Distributed Data Management Miguel Branco 1 DQ2 discussion on future features BNL workshop October 4, 2007."— Presentation transcript:

Similar presentations

About project

Feedback