Download presentation
Presentation is loading. Please wait.
Published byRebekah Senior Modified over 9 years ago
1
Management of User Requested Data in US ATLAS Armen Vartapetian University of Texas, Arlington US ATLAS Distributed Facility Workshop UC Santa Cruz, November 14, 2012
2
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 2 Outline User Analysis Output Central Deletion Service Victor USERDISK cleanup Monitoring and Notifications DaTRI LOCALGROUPDISK policy
3
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 3 Storing User Analysis Output User analysis output in US is stored in USERDISK of the site where the job has run Only US sites have USERDISKS. In non-US sites the destination for output is SCRATCHDISK US has specific policy for USERDISK maintenance/cleanup – more relaxed/user-friendly than for SCRATCHDISK (details later) Both space tokens are temporary storage, but users can subscribe their data to different locations using DaTRI request system (details later) Typical destination for user data by DaTRI requests is LOCALGROUPDISK or GROUPDISK for longer storage, or even to SCRATCHDISK for further temporary storage Datasets in LOCALGROUPDISK or GROUPDISK by default don’t have limited lifetime, so these space tokens (unlike some other space tokens) are not cleaned up on a regular basis
4
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 4 Central Deletion Service Cleanup of all space tokens is carried out through the central deletion service The very basic command to submit a dataset for deletion is: dq2-delete-replicas The command will submit the dataset deletion to the Central Deletion Service and right away put it on queue Deletion service flow for datasets is: ToDelete -> Waiting -> Resolved -> Queued -> Deleted. It also shows the status ToDelete -> Deleted for file count, as well as for the space. Errors are also shown, if any. Currently the typical deletion rate for US sites is 2-4 Hz for T2-s and 7-8 Hz for T1 One can change/optimize the deletion rate tweaking some site specific parameters in deletion service configuration file Load, bottlenecks and other srm issues can create timeouts, reduction of the deletion rate and cause errors If site has more than 100 errors in 4 hours, the ADCoS shifter must file a ggus ticket
5
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 5 Cleanup Decision - Victor Daily monitoring of the space tokens to detect low space availability and trigger space cleanup is done by the system called Victor Victor takes care of only those space tokens which need regular cleanup It prepares a list of datasets to be sent to central deletion system. A grace period of 1 day is exercised SCRATCHDISK – cleanup is triggered when free space is 55%. DATADISK – when free space is getting low. Only “secondary” type of datasets are triggered for deletion, older than 15 days. Popularity of datasets is taken into account. forT2-s cleanup is triggered when free space 15% for T1 cleanup is triggered when free space 750TB PRODDISK – cleanup is triggered when free space 12TB. Only datasets older than 31 days. The issue is also to cleanup the pandamover files, done locally GROUPDISK – cleanup defined by the group responsible person
6
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 6 USERDISK Cleanup The USERDISK cleanup is done on average every 2 months We target datasets older than 2 months Targeted user datasets are matched with dataset owner DN from dq2 catalog and dataset lists per DN are created Notification email is sent to users about the upcoming cleanup of the datasets with a link to the list and some basic information on how to proceed if the dataset is still needed We maintain and use a list of DN to email address associations, and regularly take care of the missing/obsolete emails After the notification email the users have 10 days to save the data they need This cleanup procedure is used during the last 4 years Very smooth operation, no complains, users happy
7
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 7 USERDISK Cleanup Notification Question whether the user is well informed on all available options to save the data targeted for deletion Excerpt from the notification email with the information for users: You are advised to save any dataset, which is still of interest, to your private storage area. You may also use your local group disk storage area xxx_LOCALGROUPDISK if such area has been defined. Please contact your local T1/T2/T3 responsible of disk storage for further assistance. If the list contains datasets of common interest to a particular physics group, please contact that group representative to move your datasets to xxx_ATLASGROUPDISK area. If you are going to copy your dataset to xxx_LOCALGROUPDISK or xxx_ATLASGROUPDISK please use the Subscription Request page: http://panda.cern.ch:25980/server/pandamon/query?mode=ddm_req If you are going to copy your dataset to any private storage area (not known to grid) please use dq2-get. See the link for help: https://twiki.cern.ch/twiki/bin/view/Atlas/DQ2ClientsHowTo This must cover all the practical options…
8
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 8 Storage Monitoring, Notifications Storage monitoring from ddm group: http://bourricot.cern.ch/dq2/accounting/site_reports/USASITES/ Drop-down menus provide other storage tables and plots, grouped by space tokens, clouds, etc. Also notifications with the list of space tokens, which run low on free space, and if any space token runs out of space ( < 0.5TB ) and is blacklisted Notification thresholds: T1 DATADISK < 10TB T2 DATADISK < 2TB PRODDISK < 20% USERDISK < 10% Others < 10TB
9
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 9 DaTRI Data Transfer Request Interface (DaTRI) – to submit transfer requests, also provides monitoring of the transfer status Request can be placed by web interface or automatically as output destination of the analysis job All the links are available at the left bar of Panda Monitor page under the Datasets Distribution drop-down menu Users need to be registered within DaTRI. Registration link is in the main page. Also there is a link to check the registration status. Also if you are not sure, use the opportunity to check your certificate for usatlas role DaTRI request on web interface – basically you fill dataset pattern, destination and justification for transfer
10
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 10 DaTRI Submitted DaTRI request has following states/stages: PENDING -> AWAITING_APPROVAL -> AWAITING_SUBSCRIPTION -> SUBSCRIBED -> TRANSFER -> DONE Once scheduled for approval, a request ID will be assigned Error message if dataset pattern is not correct, dataset is empty, destination site has not enough space, group quota at the destination site is exceeded, etc. Each cloud has DaTRI coordinators for manual approval. In US Kaushik De, Armen Vartapetian Approval to GROUPDISKs done by group representatives An automatic approval if summary size is < 0.5TB, and only if user has usatlas role (a very common issue/problem) Monitoring provides also link to the dashboard, as well as replica status for each dataset Plan to provide a functionality within DaTRI web interface to upload list/pattern of user datasets for deletion. Help users to get rid of the obsolete data
11
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 11 LOCALGROUPDISK Policy Intended as a long term storage for users Unpledged resource (main concern T1/T2) No ADC policy or recommendations for management Central cleaning only for aborted and failed tasks The main issue is the absence of the usage and cleanup policy. Because of that, tendency to grow in size Usage tables for some of the US LOCALGROUPDISK-s in backup slides Common trend is that usually there are 2-3 super users per site who occupy more than half of the space (there may be a group behind such user). A dozen of top users occupy more than 90% of the space, and there are many more users with less of a share Similar situation with storage distribution can be seen in other clouds as well Part of that data may have more relevance to GROUPDISK or even DATADISK (move data to pledged resources).
12
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 12 LOCALGROUPDISK Policy Some datasets with many replicas. Some of them owned by the same top users. The situation will become unsustainable if the number of such top users will grow over time Some datasets with only replica, and big chunk of that is not used for a while. Put in place policy/path for their retirement Popularity analysis may help to distinguish datasets which may be obsolete, and candidates for retirement We may start with soft space limit of 2-3TB per user per site Start to ask questions when size is above that Particularly for the datasets not used for N months (1 year?) – check if user still needs them Approval mechanism for sample transfers > N TB (10TB?). Centralized approval and decision for space allocation for big samples. LOCALGROUPDISK management policy is currently under discussion at RAC
13
Armen VartapetianUS ATLAS Distributed Facility Workshop November 14, 2012 13 BACKUP
14
BNL localgroupdisk, used space 196TB User DNUsed Space (TB)# of Datasets /dc=org/dc=doegrids/ou=people/cn=david adams 40713753695 /dc=org/dc=doegrids/ou=people/cn=anyes taffard 36511137767 /dc=org/dc=doegrids/ou=people/cn=andrew haas 477621247959 /dc=org/dc=doegrids/ou=people/cn=caleb lampen 1374752126312 /dc=org/dc=doegrids/ou=people/cn=shuwei ye 48100511832 /c=ru/o=rdig/ou=users/ou=mephi.ru/cn=mikhail titov7380 /c=uk/o=escience/ou=manchester/l=hep/cn=john almond5777 /dc=org/dc=doegrids/ou=people/cn=jacob searcy 5856185428 /dc=org/dc=doegrids/ou=people/cn=vivek jain 391044146 /dc=org/dc=doegrids/ou=people/cn=douglas benjamin 4389163136 /dc=org/dc=doegrids/ou=people/cn=tarrade fabien 6159362120 /dc=org/dc=doegrids/ou=people/cn=stephanie majewski 989915212 /dc=org/dc=doegrids/ou=people/cn=venkatesh kaushik 292404237 Total for Top 13 Users (used space > 2TB), see the list above176 Total for Remaining 35 Users (used space < 2TB)20 Total Used Space196
15
SLAC localgroupdisk, used space 355TB User DNUsed Space (TB)# of Datasets /dc=ch/dc=cern/ou=organic units/ou=users/cn=eifert1225048 /dc=ch/dc=cern/ou=organic units/ou=users/cn=toshi68352 /dc=org/dc=doegrids/ou=people/cn=anyes taffard 365111441637 /dc=org/dc=doegrids/ou=people/cn=brokk toggerson 91808621600 /dc=org/dc=doegrids/ou=people/cn=andrew haas 477621205067 /dc=org/dc=doegrids/ou=people/cn=steven andrew farrell 628960171489 /dc=org/dc=doegrids/ou=people/cn=jason veatch 42108815362 /dc=org/dc=doegrids/ou=people/cn=michael werth 3408449165 /dc=org/dc=doegrids/ou=people/cn=bart clayton butler 621226138 /dc=org/dc=doegrids/ou=people/cn=alaettin serhan mete 462708577 /dc=org/dc=doegrids/ou=people/cn=david wilkins miller 3599455555 /dc=org/dc=doegrids/ou=people/cn=robert w. gardner jr. 669916356 /dc=org/dc=doegrids/ou=people/cn=venkatesh kaushik 292404332 /dc=org/dc=doegrids/ou=people/cn=maximilian swiatlowski 7596453966 /dc=org/dc=doegrids/ou=people/cn=douglas benjamin 438916242 Total for Top 15 Users (used space > 2TB), see the list above343 Total for Remaining 19 Users (used space < 2TB)12 Total Used Space355
16
MWT2+ILLINOISHEP localgroupdisk, used space 302TB User DNUsed Space (TB)# of Datasets /dc=org/dc=doegrids/ou=people/cn=samuel meehan 301165140631 /dc=org/dc=doegrids/ou=people/cn=david lesny 786524341358 /dc=org/dc=doegrids/ou=people/cn=frederick luehring 621522261310 /dc=org/dc=doegrids/ou=people/cn=anton kapliy 714928231387 /dc=org/dc=doegrids/ou=people/cn=jordan scott webster 343989201012 /c=uk/o=escience/ou=oxford/l=oesc/cn=maria fiascaris161325 /dc=org/dc=doegrids/ou=people/cn=antonio boveia 203522151076 /dc=org/dc=doegrids/ou=people/cn=constantinos melachrinos 3668686432 /c=ru/o=rdig/ou=users/ou=mephi.ru/cn=mikhail titov312 /dc=org/dc=doegrids/ou=people/cn=robert w. gardner jr. 669916244 /dc=org/dc=doegrids/ou=people/cn=joseph tuggle 1077652155 /dc=org/dc=doegrids/ou=people/cn=douglas benjamin 438916243 /dc=org/dc=doegrids/ou=people/cn=elizabeth jue hines 74583321 Total for Top 13 Users (used space > 2TB), see the list above291 Total for Remaining 20 Users (used space < 2TB)11 Total Used Space302
17
AGLT2 localgroupdisk, used space 238TB User DNUsed Space (TB)# of Datasets /dc=org/dc=doegrids/ou=people/cn=haijun yang 9380032044739 /dc=org/dc=doegrids/ou=people/cn=shawn mckee 8346710496 /dc=ch/dc=cern/ou=organic units/ou=users/cn=lxu99 /c=il/o=iucc/ou=tau/cn=nir amram48 /dc=org/dc=doegrids/ou=people/cn=douglas benjamin 438916240 Total for Top 5 Users ( used space > 2TB ), see the list above229 Total for Remaining 18 Users (used space < 2TB)9 Total Used Space238
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.