Download presentation
Presentation is loading. Please wait.
Published byDiane O’Neal’ Modified over 9 years ago
1
ORBMeeting July 11, 2001
2
Outline SAM Overview and Station description Resource Management Station Cache Station Prioritized Fair Share Job Control File Storage Server Setup and administration Station server File storage server Current most active stations, viewing statistics, station specialization
3
Overview of Sam Database Server Name Server Global Resource Manager(s) Log server Station 1 Servers Station 2 Servers Station 3 Servers Station n Servers Mass Storage System(s) Shared Globally Local Shared Locally Arrows indicate Control and data flow
4
Components of a SAM Station Station & Cache Manager File Storage Server File Stager(s) Project Managers /Consumers eworkers File Storage Clients MSS or Other Station MSS or Other Station Data flow Control Producers/ Cache Disk Temp Disk
5
Resource Management: Cache Level: Station; Established for particular groups Parameters include: Size of cache Size of cache allowed to be “locked” Refresh algorithm: LRU,FIFO,MRU, etc.
6
Resource Management: Jobs Level: station; by group and access mode Number of Concurrent Projects Fair share algorithm Each group assigned a number. The numbers are normalized so the sum is 1. Determines each jobs queue assignment: sam_hi or sam_lo.
7
Station Setup Configurable on startup Min-delivery (Kbytes) Preferred locations Honor optimizer order File release timeout Max project file usage Default batch system Configurable by command Adding disks Adding caches configuring group allocations caches, max size, locks, refresh algo, max projects, admins
8
Station Administration: Dump(1) lueking@d0mino:~ % sam dump station –groups *** BEGIN DUMP STATION central-analysis, id=21 running at d0mino 5 days 22 hours 24 minutes 20 seconds, admins: lueking Known batch systems: lsf Default batch system: lsf No Source location is preferred There are 1 authorized transfer groups Full delivery unit is enforced; external deliveries are unconstrained
9
Station Administration: Dump (2) AUTHORIZED GROUPS: group algo: admins: cope lueking melanson terekhov veseli white, swap policy: LRU, fair share: 0, quotas (cur/max): projects = 5/50, disk: 72838247KB/100000000KB, locks:0B/30000000KB group cal: admins: lueking terekhov veseli white, swap policy: LRU, fair share: 0, quotas (cur/max): projects = 1/10, disk: 11856085KB/78125MB, locks:0B/78125MB group demo: admins: lueking terekhov veseli white, swap policy: LRU, fair share: 0.608163, quotas (cur/max): projects = 2/50, disk: 4867877KB/5000000KB, locks:0B/0KB group dzero: admins: lueking melanson terekhov veseli white, swap policy: LRU, fair share: 0.142857, quotas (cur/max): projects = 10/100, disk: 499860527KB/500000000KB, locks:0B/100000000KB group emid: admins: lueking terekhov veseli white, swap policy: LRU, fair share: 0, quotas (cur/max): projects = 0/10, disk: 6396015KB/10000000KB, locks:0B/10000000KB group test: admins: lueking terekhov veseli white, swap policy: LRU, fair share: 0.11512, quotas (cur/max): projects = 1/20, disk: 21381359KB/26000000KB, locks:237179KB/20000000KB group thumbnail: admins: lueking melanson schellma, swap policy: LRU, fair share: 0.13386, quotas (cur/max): projects = 0/5, disk: 20687259KB/50000000KB, locks:0B/0KB *** END OF STATION DUMP ***
10
Resource Management: File Storage Cache Routing table Retry parameters Auto-destination File Family File Family Width Other storage parameters:library manager, storage group, cpio wrapper, permissions
11
File Storage Server: Setup Configurable on startup Default-route Route – if sending to through remote station Route =enstore,central-analysis:d0mino.fnal.gov:/sam/cache21/nikhef Retrial options --opter-retrial-count=,--opter-retrial-interval= --auth-retrial-count=,--auth-timeout= --stager-retrial-count=,--stager-retrial-interval= --xfer-retrial-count=,--xfer-retrial-interval= --relay-retrial-count=,--relay-retrial-interval= --dbs-retrial-count=, --dbs-retrial-interval= Configurable by command Get_encp_priority.py – to change priority sent to enstore
12
File Storage Server: Dump lueking@d0mino:~ % sam dump fss Next Generation FSS at station central-analysis running on d0mino.fnal.gov 1 days 17 hours 53 minutes 15 seconds No routing (all transfers are direct) Configuration for operation retrial (count, interval/timeout) DBS contact: 3, 1 hours Opter contact: 1, 1 hours Authorization receipt:1, 1 hours Stager contact: 1, 1 hours Transfer (retrials upon timeout and upon failure): 3, 6 hours Relay (multi-stage routing only): 3, 1 hours File Storage Server Dump: Stagers are known at nodes: d0mino.fnal.gov 932 requests submitted, 0 rejected, 931 complete File Store requests: reco_mcp06_p08.10.00_prague_pythia_qcd-incl-PtGt80.0_mb-poisson- 2.5_179132943_2001:reqID 932) sam d0mino.fnal.gov:/sam/cache17/import/prague -> enstore:/pnfs/sam/m2/copy1/monte_carlo/phase6/mcc99/reco/all subm time 08 Jul 12:21:59 auth req time 08 Jul 12:21:59 auth time 08 Jul 12:21:59 stager contacted 08 Jul 12:21:59
13
Autodestination: Map configuration destList = [ { # map entry number 0: 'pathPattern‘:'(/pnfs/sam/mammoth/copy1/monte_carlo/mcp03/)([^/]+)(/generated/)([^/]+)', 'destinationPath' : '/pnfs/sam/mammoth/copy1/monte_carlo/phase3/mcc99/gen/all', 'library' : 'sammam', 'file_family' : 'mc_phase3_gen', 'file_family_wrapper' : 'cpio_odc', 'storage_group' : 'D0', 'file_family_width' : 1, 'permissions' : 'rwxr-xr-x', }, { # map entry number 1: 'pathPattern':'(/pnfs/sam/mammoth/copy1/monte_carlo/mcp05/)([^/]+)(/generated/)([^/]+)', 'destinationPath' : '/pnfs/sam/mammoth/copy1/monte_carlo/phase5/mcc99/gen/all', 'library' : 'sammam', 'file_family' : 'mc_phase3_gen', 'file_family_wrapper' : 'cpio_odc', 'storage_group' : 'D0', 'file_family_width' : 1, 'permissions' : 'rwxr-xr-x', }, ( and so on) { # map entry number 29: 'pathPattern' : '(/pnfs/sam/mammoth/copy1/monte_carlo/phase5/mcc99/)([^/]+)(/digitized/)([^/]+)', 'destinationPath' : '/pnfs/sam/m2/copy1/monte_carlo/bphysmcp08/mcc99/sim/all', 'library' : 'samm2', 'file_family' : 'bphysmcp08', 'file_family_wrapper' : 'cpio_odc', 'storage_group' : 'D0', 'file_family_width' : 1, 'permissions' : 'rwxr-xr-x', }, ] Currently 30 map entries in production
14
Most Active Station List Rows 1 protofarmHiedi's protofarm imperial-testInitial test station to get production data to Imperial College lancsLancaster ccin2p3-analysisLyon comp center for In2P3, France central-analysisMain D0 Analysis server d0_main_analysisMain D0 analysis server (d02ka) hoeveNikhef Farm clued0Roger Moore dataloggerStation for d0online msuStation running at Michigan State University d0-demo-stationd0 demo station central-computed0lxcs cluster station prague-test-stationfirst installation at prague lac-1linux analysis cluster station d0nevis-stationnevis labs/columbia d0small-01small linux test station central-archivestation to archive a second copy of all online data d0-test-stationtest station d02katest station on d02ka pctestfarmto test the fbs/SAM stuff for the farms
15
Viewing Statistics Queries Plots Enstore summaries
16
SAM Stats (6/19 – 6/26) Users and usage is picking up Data sets created - 242 Projects run - 607 Files processed – 4586 Files cached - 2675 Files stored - 8291 GB Stored –1.8 TB reco_mcp06_p08.10.00_nikhef_pythia_ttbar- incl_mb-poisson-2.5_144170834_2001 was delivered – 57 times
17
Cache stats Available from sam page, under plots and statistics
18
Enstore stats Available under www-d0en.fnal.gov/enstore under “plots”
19
Data added
20
Decisions, decisions… Station deployment and configuration issues Station operational tuning Disk assignment and Cache allocations Fair share numbers File family, File family widths Tape storage resources FSS priorities Cache routing issues
Similar presentations
© 2024 SlidePlayer.com. Inc.
All rights reserved.