Presentation is loading. Please wait.

Presentation is loading. Please wait.

Report of Dubna discussion

Similar presentations


Presentation on theme: "Report of Dubna discussion"— Presentation transcript:

1 Report of Dubna discussion
2013.5

2 Content dCache and Lustre Site monitoring Dataset management
Solution of SE for remote sites Commissioning tests Plan of simulation and reconstruction chain in remote sites Introduce DYBII Visit computing center Others

3 dCache and Lustre (1) Purpose: To solve and discussed
Lustre is used for local access and dCache is used as an interface for remote access Find an efficient way to simply data exchange of lustre and dCache To solve and discussed how to connect dCache and Lustre Take Lustre as a kind of MSS (tape storage) backend and use the same infrastructure to load the data from Lustre Prepare a script to stage in/out a file between Lustre and dCache disk pool Check the mapping option of EnStore and configure it for mapping PNFS names and local filenames

4 dCache and Lustre (2) how to let dCache know the data already in lustre or written to Lustre independently from the local farm Check the capabilities and interface of the dCache recovery tools and write a script which will update the Chimera database Or writing is only allowed through dCache interface? No real practice how to deal with authentication and authorization to access the Lustre data Different user management: grid users and local users dCache gPlazma service allows all possible kinds of mapping between Grid certificate DNs and local uids To avoid a long mapping lists, need to understand safety requirements of BESIII and of the IHEP CC and to elaborate mapping rules and access right This technique is available

5 dCache and Lustre (3) Members
Vladimir , the dCache administrator, currently responsible for dCache and Enstore system for CMS T1 He is busy until September. Later in autumn we may consider more close collaboration, including possibility of a short visit of Vladimir to Beijing or visit of IHEP specialists to JINR Who from IHEP will be in cooperation with him?

6 Site Monitoring (1) Types of monitoring are needed Service monitoring
CVMFS, squid, FTS, DIRAC, DFC Site monitoring (from test jobs) Basic facility: BOSS deployment, storage..... Job monitoring (from DIRAC job monitoring) Job number: pending, run, failed, completed Storage monitoring Space info from DFC (physical, used, available) Consistency between DFC and SE No real solutions are decided to count physical space. It is not easy to do in LCG way

7 Site Monitoring (2) Infrastructure Data transfer Popularity monitoring
Transfer rate , status (good, failed, active), transfer volume Popularity monitoring Count dataset popularity, which will make decisions for data replications and deletions Infrastructure Information from various system (DIRAC, DFC, data transfer system…) will be finally adapted and summarized into one database Resources->Adapters->database->Web framework and portal

8 Similar to LCG one

9 Site monitoring(3) Questions and discussion Members
Do we need T3 atlas monitoring ? Is CERN dashboard workable for BESIII site monitoring? What about Hadoop and MapReduce applied in monitoring? Consider to be one of hot topic in monitoring area How to do physical storage monitoring? Members Computing Center show interested and one people will take part in monitoring work

10 Dataset management(1) Why we need check table?
That is, in which cases will logical info of datasets be changed? case1: SE hardware failure, remove some files from FC case2: SQL query get different results with different platforms, python, etc case3: Some LFNs are possible to be wrongly deleted How to maintain the check table in deletion when there are overlapping among different datasets? Some solutions are discussed, but only one is workable

11 Dataset management(2) The solution
Metaflag: “standard”, “user” to distinguish user and official dataset Rules users can delete only user data For production datasets, we need to check overlaps How to do the check overlaps? Compare the properties of datasets (query condition) to find overlapping We keep overlaping part and delete the dataset id

12 Dataset management(3) How local jobs use the same FC as grid jobs?
Local data files can be registered to FC by API, Or use dCache interface Local jobs can retrieve dataset info and these files with/without certificate?? With certificate Local jobs can access DFC without problems Ganga help access data Local data can be read and registered in DFC Burden for normal users to have a certificate? Without certificate Now DFC can be queried without certificate Alexey consider to be a bug Ganga can do mapping from SE address to local address to access Lustre data But have problems to register data in DFC

13 Solution of SE for remote sites
BestMan is chosen to be current SE solution for remote sites. DIP can be another option. Detailed deployment will be taken care by Alexey prepare to have a host for installation apply a host certificate for this host install gridftp server install BestMan server with srm protocol configure BestMan to support user mapping for BES VO

14 Commissioning tests(1)
Purposes Push forward the progress of sites for SE set-up and simulation and reconstruction chain Rules Do them separately for each site Start from JINR, will practice with dCache and BestMan BestMan is a solution for remote site, and Alexey has set it up for technique supports and testing Steps (1). data transfer of a test file via srmcp via dirac fts commands via FTS

15 Commissioning tests(2)
(2) data transfer of dataset Create datasets from DFC Test via data transfer system Create regular transfer tests for IHEP and the site for daily check (3). random trigger files deployed to remote sites Use transfers between SE. The files are deployed to local file system can be one of the solutions, but Alexey is not agreed with that because it will introduce complexity to job management system (4). simulation and reconstruction chain job tests Do statistic physics check for results from each site after jobs completed

16 Plan of simulation and reconstruction chain in remote sites(1)
Deployment of random trigger files to remote sites (1) Partial distribution can be done based on site SE space (2) Deploy and Register random trigger files as datasets in DFC How to get the physical address of random trigger files in jobOption? a) Get the name of path and file from BesMixer, which is the package to define the path of random trigger b) Replace the path with the real path where each site downloaded it from DFC c) Something need to be fixed by Alexey in BesMixer to enable replacing the whole path of random trigger Currently if you specified the path “/tmp” in jobOptions, the replacement happened in following way: eg. /besfs/aaa/round1/xxx.root -> /tmp/round1/xxx.root

17 Plan of simulation and reconstruction chain in remote sites(2)
3. How to let Ganga know in advance which random trigger files are needed since it need to download files to local disk of WNs before BOSS running (1) Need a script to let Ganga know which random trigger files are needed to be downloaded before submitting the jobs (2) Ganga gets these files with DFC “get” command to local disk of WN. Eg. /tmp (3) BOSS job Options will point to the real path of random trigger

18 Introduce DYBII to JINR
Provided by Ziyan Deng There is a group in JINR working at DYB physics They are more interested in framework

19 Visit of computing center(1)
Their future contributions to BESIII distributed computing Good relationship between CERN and JINR CC since many people working at CERN originally from there Good experience in site monitoring since many people from JINR are working in monitoring, such as CMS monitoring, IT dashboard, Atlas T3 monitoring Current interested in “dCache and Lustre” and “site monitoring”, also possible help in “badger and dataset” issues Warmly welcome us to NEC'2013 (International Symposium on Nuclear Electronics & Computing) held in BULGARIA Relationship need to be strengthened through official visit and agreement between BESIII and JINR computer center, such as financial supports. Cloud technology is also studied there, and simple cloud infrastructure based on Opennebula has been set up for developing, testing and education purpose

20 Visit of computing center(2)
Visit machines room of computing center Impressive to see the exhibitions of many very old machines, disks and tapes Large machine room, only half of rooms are occupied Plan to become CMS T1, and testbed is ready dCache will be extended to have Enstore tape backend Farm and grid use one PBS scheduler and one dCache SE Use similar air conditioning system as our computing center Have a 10Gb/s connection to Moscow, which a good backbone to CERN

21 Others Production manager
Accounting of BESIII data transfer and fts time clock


Download ppt "Report of Dubna discussion"

Similar presentations


Ads by Google