Download presentation
Presentation is loading. Please wait.
Published byMatilda Ford Modified over 9 years ago
1
Introduction to Data Management Dr Jens Jensen Head of Data Services Group,Leader of Storage and Data Management and Scientific Computing DeptGridPPmore STFC...
2
Scientific data management: – Large data volumes (10s of PB) – Distributed user base – Need for high performance transfers – Need for data security (or not) – Scalability
5
Data in “the Grid”? “The Grid” Data
6
Data in “the Cloud”? “The Cloud” Data
7
Transfer Protocols – GridFTP (http://www.ogf.org/documents/GFD.20.pdf)http://www.ogf.org/documents/GFD.20.pdf Aka “gsiftp” (GSI = Globus (Grid) Security Infrastructure, cf RFC3820) – HTTP(S) – WebDAV (RFC 4918)
8
GridFTP – based on FTP Ancient protocol... RFCs 114 (1971), 141 (1971), 172 (1971), 265 (1971), 354 (1972), 542 (1973), 765 (1980), 959 (1985) Splitting control and data connection Extensions RFC 2228, 2773 (security), 2640 (internationalisation), 3659 (misc.), 2389, 5797 (FEAT)
9
Control connection: port 21 (FTP), 2811 (GridFTP) ClientServer Data connections and firewalls (active vs passive mode (PASV))
10
(Grid)FTP - “3 rd party copying”
11
GridFTP – extensions to FTP GSI security (later RFC 3820) Striping (and EBLOCK mode) TCP buffer size control/negot.? Data channel authentication (DCAU)
12
The Grid.... Ad-hoc transfers between GridFTP endpoints Initial user ingest? scp? Hands on with GridFTP: uberftp (cf ftp)
13
Moving data in (and to, and from) the Grid “Manually,” with GridFTP Portals – e.g. NGS portal GlobusOnline FTS (as of 3.0, tbc)
14
The gLite grid – daily TLA dose EMI – European Middleware Initiative UMD – Unified Middleware Distribution EGI – European Grid Infrastructure IGE – Infrastructure for Globus in Europe NGI – National Grid Initiative
15
The gLite grid – component TLAs SE – Storage Element SRM – Storage Resource Manager LFC – LHC file catalogue FTS – File Transfer Service BDII – Berkeley Database Information Index (LDAP)
16
LFC SRMGridFTPBDII Storage Element FTS SRM (OGF GFD.129) – control interface – support for “spaces” (reserved areas) – retention policies (replica, output, custodial) – access latencies (offline, nearline, online) – storage “type” - permanent, volatile LFN – Logical File Name (optional) Resolved by LFC into GUID – Globally Unique Identifier Resolved by LFC into SURL – Storage URL (or Site URL) Resolved by SE into TURL – Transfer URL (eg gsiftp)
17
gLite - Summary of basic data commands lcg-cp Copy to/from SE, or between SEs (no LFC) lcg-cr Copy file into SE, and register in LFC (guid) lcg-del lcg-rep Replicate
18
Exercises Lots of small files (10 5, 10 6 ) Large files (10 8 -10 12 ) Migration Format migration, checksumming Who can copy data? Write/Modify?
19
Exercises How is scientific data mgmt different? – How do research disciplines differ? – What are the interdisciplinary benefits? How grids and clouds differ...? Can we trust the grids/clouds? Who leads the way? HEP? Industry?
20
Storage Accounting - static Ongoing work... – Distributed storage systems – Temporary file copies created – Scheduled deletions – Inaccessible free spaces, reserved space – Filesystem/tape overheads – Timeliness and accuracy – Impact of compression
21
GridFTP today GridFTP – workhorse of WAN grid data (OGF standard) The need for GSI (non-TLS) Numerous LAN protocols... … moving towards more common standards? (eg HTTP)
22
lcg-cr --vo dteam -l lfn:my_stuff -d srm-dteam.gridpp.rl.ac.uk file://`pwd`/foo.tmp guid:921ac0b8-82aa-61dc-0192-6effece Subsequent access and replication is by GUID
23
Data Security Data security is like data security everywhere... Except that the devil is in the detail And the details are always different...
24
Data Security – Confidentiality Data In flight, or at rest The performance issue And the time issue Who can “activate” it? Data
25
Data Security – Availability LOCKSS again... clouds are good at this. Data Somebody already thought about the difficult stuff...? Liability, SLAs,...
26
Data Security – Availability DDoS Intentional Botnets Unintentional
27
Referencing Data DOIs for data – DONA – Digital Objects Numbering Authority Granularity? Licences, permissions Implementing data policies
28
Cloud Data – Cost Clouds are elastic Elasticity is good for (rapid) growth Or shrinkth Elasticity can be expensive, though Compared to “traditional” data centre Or in-house (but don’t underestimate this!) Different cost models (Hybrids!)
29
Infrastructure Security End-to-end security Authentication and authorisation Developing a threat model Protecting credentials Usability of security Anonymised??
30
Infrastructure Federated identity and single sign-on Integration with existing infrastructures Accounting Securely... Anonymously? And billing
31
The Role of Standards Standards promote interoperation And maturity (sometimes) Interoperation solves problems Sometimes E.g. eggs and baskets Standards peer reviewed
32
Other Data Services IRODS – “data grid” Successor to SRB Server side workflows: rules, microservices Safety Deposit Box Commercial product from Tessella Data preservation
33
NGS data services NGS portal – https://portal.ngs.ac.uk/ http://www.ngs.ac.uk/tools/vbrowser Databases: Oracle, MySQL
34
EU Funded Data Projects EUDAT (www.eudat.eu) Collaborative iRODS based infrastructure Multidisciplinary, scalable, long tail SCIDIP-ES (earth science) www.scidip-es.eu SCAPE (www.scape-project.eu)www.scape-project.eu PANDATA (neutron/synchrotron) pan-data.eu
35
New Stuff? More mature approach to clouds? CCN – Content Centric Networking RAID --> ECC, “object” storage
36
Exercises Lots of small files (10 5, 10 6 ) Large files (10 8 -10 12 ) Migration Format migration, checksumming Who can copy data? Write/Modify?
37
Exercises How is scientific data mgmt different? – How do research disciplines differ? How much can be shared? – What are the interdisciplinary benefits? How grids and clouds differ...? Can we trust the grids/clouds? Who leads the way? HEP? Industry?
38
References www.ngs.ac.uk www.ogf.org UMD user guide https://edms.cern.ch/document/722398/ https://edms.cern.ch/document/722398/ GridPP storage and data management group – http://www.gridpp.ac.uk/wiki/Grid_Storage
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.