Per Møldrup-Dalum State and University Library SCAPE Information Day State and University Library, Denmark, Hadoop and its applications at the State and University Library
A bit on Hadoop in general A bit on our experience in deploying Hadoop at the library 2 Agenda This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
MapReduce: Simplified Data Processing on Large Clusters, Jeffrey Dean and Senjay Ghemawat, 2004 In 2005 Cutting and Cafarella created Hadoop at Yahoo! Now an Apache project Commercial distributions, community editions, DIY 3 Origins This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
4 Map/Reduce This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ). MAP SHUFFLE REDUCE
5 Lorem ipsum This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ). Count addresses that have fruits etc in their street name Kirsebærhaven Jordbærvej Nødde allé Result Kirsebær: 1203 Nødder: 34 Jordbær: 543
6 The Zoo This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ). HDFS – data locality MapReduce
7 Hadoop at the Library This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
Blade servers with no local storage Storage exclusively on NAS We‘ve done several experiments 8 Can it be done? This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ). Existing infrastructure Existing infrastructure CPUStorage
4 CPU nodes Two 6-core CPU Intel® Xeon® Processor X5670 with 12M Cache, 2.93 GHz, and 6.40 GT/s Intel® QPI 96GB RAM 2Gbit Ethernet interface CentOS NFS mount point on NAS for HDFS Reachable NAS storage: ~4PB 9 Cluster topology This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ). Science Museum/Science & Society Picture Library
10 Cloudera Hadoop Distribution This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
11 Interface This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
oogle.com/en//archive/mapreduce-osdi04.pdfhttp://static.googleusercontent.com/media/research.g oogle.com/en//archive/mapreduce-osdi04.pdf 12 References This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).
13 This work was partially supported by the SCAPE Project. The SCAPE project is co‐funded by the European Union under FP7 ICT‐ (Grant Agreement number ).