Download presentation
Presentation is loading. Please wait.
Published byPrudence Weaver Modified over 9 years ago
1
Storage issues for end-user analysis Bernd Panzer-Steindel, CERN/IT 08 July 2008 1
2
F Batch and Interactive nodes G End-User Analysis Analysis E B A B D C MC and Data Import/Export MC and Data Import/Export Tape Read Tape Migration Online DAQ Data Export file merge file copy file merge file copy T0 Calibration and Alignment T2/T3 Analysis Production Data Disk Pools Scratch file copy Reprocessing Tape Read Reprocessing Tape Read T1/T2/T3 High level Data Flow 08 July 20082Bernd Panzer-Steindel, CERN/IT
3
Tentative ‘requirements’ for end-user analysis storage Storage capacity of 1-2 TB per user ( assume 1.5 TB) (Ntuple, data samples, logfiles, ….) Reliable storage, ‘server-mirroring’ 99.9% availability 4 times per year unavailable for 4 h each No tape access too many small files Some backup possibility ( archive ? ) Backup 5% changes per day (75 GB/d) + 4 month retention time = 9 TB backup space per user Quota system Easy accessibility from batch and interactive worker nodes and the notebook POSIX access type distributed file system World-wide access High file access read/write performance User identity and security ……………………………. 08 July 20083Bernd Panzer-Steindel, CERN/IT
4
Estimated 500 users at CERN 750 TB of analysis disk storage + backup and archive? 1.5 TB USB disk Amazon Cloud Storage TSM Backup HSM Backup, Archive AFS Isilon, BlueArc, Exanet, DataDirect NFS4, Lustre, xrootd, Cost and Technology Scenarios Distributed File System implementations Hardware investments over 2 years ‘guestimates’ 7 MCHF 0.4 MCHF 3 MCHF 2.5 MCHF 1 MCHF 2 MCHF 4.5 MCHF HSM, disk-only 2 MCHF 08 July 20084Bernd Panzer-Steindel, CERN/IT + software operation, support, functionality,..
5
Questions Where are these ‘extra’ resources coming from ? Is there only one unique storage per user world-wide ? What about users working on different sites ? Do they have multiple end-user storage instances ? How is data transferred between instances ? The difference between the ‘home-directory’ storage and end-user analysis space is small. Analysis tools/programs and the data itself must be accessed at the same time. Who decides which user gets how much space where ? Experiment specific policies What is the data flow model ? Notebook disk + site local file system + global file system Notebook disk + site local scratch + cloud storage Global file system only ….. More combinations………. Notebook issues OS support, virtual analysis infrastructure, network connectivity = data ‘gas station’ ……many more questions…………. 08 July 20085Bernd Panzer-Steindel, CERN/IT
6
Is there some common interest to solve this problem ? Need/interest for the creation of a working group to investigate in more detail? Experiments, Sites ? 08 July 20086Bernd Panzer-Steindel, CERN/IT
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.