Presentation is loading. Please wait.

Presentation is loading. Please wait.

HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013.

Similar presentations

Presentation on theme: "HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013."— Presentation transcript:

1 HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013

2 Plugins functionalities 13/12/2013 DPM Workshop 2 NFS HTTP/DAV XROOT GridFTP RFIO Namespace Management Pool Management Pool Driver I/O Legacy DPM MySQL HDFS Oracle S3 HDFS Memcache

3 HDFS plugin  dmlite plugin implementing I/O, pool driver and namespace functionalities through Apache Hadoop HDFS ensuring:  Automatic data replication  Fault tolerance to client’s read  Dead of Datanode and Namenode  Scalability 13/12/2013 DPM Workshop 3

4 Deployment with Lcgdm-dav 13/12/2013 DPM Workshop 4 DPM Head Node Lcgdm-dav + dmlite HDFS-plugin HDFS Namenode HDFS Datanode(s) Lcgdm-dav + dmlite HDFS-plugin

5 Some details 13/12/2013 DPM Workshop 5  HDFS C APIs (libhdfs) do not implement functions to retrieve the available datanodes ( LIVE nodes)  Patch implemented and submitted to Hadoop  hadoop-libhdfs rpm from our repo  First version for Puppet installation is available.  To be adapted to recent dav/dmlite module changes

6 On-going issues 13/12/2013 DPM Workshop 6  Tested with new dmlite-based GridFTP plugin  Same deployment model as http/dav frontend or single node writing to HDFS  But…HDFS does not support multiple write streams / random writes:  OSG developed in-memory stream reordering in GridFTP in order to avoid this limitation ( gridftp-hdfs DSI available also in Globus toolkit)  To test and understand integration

7 On-going issues 13/12/2013 DPM Workshop 7  SRM frontend does not speak dmlite  SRM calls through old dpm daemons do not handle properly new pools (as HDFS)  Patch to dpm daemon to be implemented

8 Future steps 13/12/2013 DPM Workshop 8  Distribution:  Need to understand how to distribute the plugin  HDFS client only in Fedora 20 and Rawhide   Support for security enabled HDFS clusters ( Kerberos)

9 Performances 13/12/2013 DPM Workshop 9 Tests through LCDM-DAV:  HDFS Namespace  stat/s half performances compared to Mysql plugin namespace  To be optimized with Memcached in front  ROOT analysis with massive Vector I/O and TTreeCache  Comparable performance with standard disk pools

10 S3 plugin 13/12/2013 DPM Workshop 10

11 Key Facts  Data directly to the cloud  HTTP/HTTPS only  DPM provides the namespace 13/12/2013 DPM Workshop 11 3 2 1

12 Data in the Cloud 12 REDIRECT GET  No data through DPM  Inherits all capabilities from S3 provider:  Amazon: range-header, no multi- range, multi-stream download only, no 3 rd party copy, http access only DATA DPM Workshop 13/12/2013

13 How to install an S3 pool  yum install dmlite-plugins-s3  dmlite-shell > pooladd poolaws s3 > poolmodify poolaws bucketsalt xFVlsrg > poolmodify poolaws s3accesskeyid > poolmodify poolaws s3secretaccesskey  13/12/2013 DPM Workshop 13

14 More info  HDFS plugin  ev/Dmlite/Plugins/HDFS ev/Dmlite/Plugins/HDFS  S3 plugin  ev/Dmlite/Plugins/S3 ev/Dmlite/Plugins/S3 13/12/2013 DPM Workshop 14

15 15 Thanks! Questions? DPM Workshop 13/12/2013

Download ppt "HDFS and S3 plugins Andrea Manzi Martin Hellmich 13/12/2013."

Similar presentations

Ads by Google