Download presentation
Presentation is loading. Please wait.
Published byOlivia Carter Modified over 9 years ago
1
Wir schaffen Wissen – heute für morgen Paul Scherrer Institut Timo Korhonen Improvements to Indexing Tool (Channel Archiver) EPICS Meeting, BNL 2010
2
10/11/10EPICS Meeting, BNL, Currently four different archive servers are in use. –SLS Accelerator data: slsmcarch (machine archive server; HP, Xenon quadcore 2.66 GHz, 32 GB RAM) Long Term: since January 2001; 10314 channels; 70 GB Medium Term: 6 months; 66883 channels; 120 GB Short Term Archiver: 14 days; 70381 channels; 114G GB Post Mortem Archiver: Stores the last famous words Total available disc space for data: 500 GB –SLS Beamline data: slsblarch (beamline archive server; HP, AMD Opteron dualcore 1.8 GHz; 6 GB RAM) Long and short term archivers for every beamline (total 29 Engines) Short term archivers store data up to 12 months Total amount of data: 163 GB / 384 GB Channel Archiver at PSI
3
10/11/10EPICS Meeting, BNL, archive servers (cont) – PSI (office) data: gfaofarch –Long Term Archiver: Stores data since January 2006 –Medium and Short Term Archivers ZHE Cyclotron High Energy –Long (since April 2008) –Medium and short term –SwissFEL: felarch1 (HP, Quadcore 2.66 GHz, 10 G RAM) Small teststand OBLA –638 channels, 2.1 Terabytes! »Waveforms, images FIN250 test injector –LT, MT and ST (.6, 7.9 and 464 GB) Channel Archiver at PSI
4
10/11/10EPICS Meeting, BNL, –The archive engines are running stable –The problems we have had are on the retrieval side –Indexing is used to speed up retrieval Indexes on daily files Master index on the whole archived data –We need the performance The SwissFEL test machine is going to produce a lot of data –Waveforms, images –We need to archive more than in a production machine –For us, there is no need for (immediate) change We would like to keep the channel archiver going –Updates, bugfixes –Retrieval tools »Waveform viewer, etc have been developed »Matlab export would be welcome Indexing tools need work Channel Archiver at PSI
5
10/11/10EPICS Meeting, BNL, Background –The ArchiveIndexTool is used at PSI in the night between Saturday and Sunday each week to create master indexes for the midterm archive. –Indexing is essential for good retrieval performance –The tool produces many errors when run on the EPICS archive indices to produce or to update the master index. Disclaimer: I know very little about this, I just tell what the people who work on this have reported. –Involved people: Gaudenz Jud (archiver maintenance, operation and development) Hans-Christian Stadler (PSI IT, Scientific Computing) is investigating the issue together with Gaudenz Index Tool improvements
6
10/11/10EPICS Meeting, BNL, Findings so far: –After investigating an error log: From the code it is clear that the ArchiveEngine and the ArchiveIndexTool are not supposed to be used concurrently on the same indices. Running them concurrently does produce errors – but not those we see in production. –the errors seem to only occur on the production machine, when there is a high load and a lot of disk activity. –try a quick fix: a retry mechanism on the highest level. All index files are closed and reopened after a delay. This quick fix seems to work so far. Index Tool improvements
7
10/11/10EPICS Meeting, BNL, Observations: –The RTree implementation does not allow concurrent read/write access. It might be possible to arrange the file operations in a way that allows concurrent access when the index is stored on a strictly POSIX compliant file system. –The RTree implementation has a RTree node "cache" that only grows. Nodes are never evicted from the cache. I'm implementing a new LRU node cache with a fixed number of entries to see if this reduces system load. –The RTree implementation uses many small disk operations (see example code above). A reimplementation should use large disk transfers. –The RTree implementation is like a B-Tree, but does not adjust the node size to the disk sector size for improved I/O performance.
8
10/11/10EPICS Meeting, BNL, Observations (continued): –The RTree implementation is not optimal for the use case seen at SLS, where data is inserted at the end only. This leads to a reduced fill level of the nodes. The RTree maintains the invariant, that only the root node may be filled less than 1/2. In addition to that data is moved between nodes too often, leading to many random accesses on disk. A reimplementation should feature a datastructure that is optimal for appends at the end.
9
10/11/10EPICS Meeting, BNL, Conclusions so far: –Finding out the real reason for the errors is a time consuming process. The real reason for the errors has not yet been identified. –the offsets zu Data structures in index get corrupted. However, it is not clear where. – Because the corruption only happens when the load on the production system is high, logical errors in the normal execution path can be almost certainly excluded. –The experience so far suggests that a new implementation of the RTree Code could solve a number of problems
10
10/11/10EPICS Meeting, BNL, Thank you for your attention!
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.