Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Cloud Computing What, why, how? Noam Bercovici Renata Dividino
ISWeb - Information Systems & Semantic Web Oberseminar 2 of 23 Motivation Count how frequent each words appears in the corpus MEDline (18 millions texts)
ISWeb - Information Systems & Semantic Web Oberseminar 3 of 23 Motivation I want to extend my research to another corpus Need more computing resources
ISWeb - Information Systems & Semantic Web Oberseminar 4 of 23 Agenda Introduction Data Grid vs. Computing Grid Grid Computing Cloud Computing Data Grid (HaDoop File System) Computing Grid (Map Reduce) Conclusion
ISWeb - Information Systems & Semantic Web Oberseminar 5 of 23 Data Grid vs. Computing Grid Data Grid: distributed data storage controlled sharing and management of large amounts of distributed data. Computing Grid: Parallel execution divide pieces of a program among several computers Data Grid + Computing Grid Grid Computing
ISWeb - Information Systems & Semantic Web Oberseminar 6 of 23 Grid Computing The Grid Master Slaves Task
ISWeb - Information Systems & Semantic Web Oberseminar 7 of 23 Grid Computing Motivation: high performance, improving resources utilization Aims to create illusion of a simple, yet powerful computer out of a large number of heterogeneous systems Tasks are submitted and distributed on nodes in the grid
ISWeb - Information Systems & Semantic Web Oberseminar 8 of 23 Cloud Computing “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. “ Larry Ellison during Oracle’s Analyst Day
ISWeb - Information Systems & Semantic Web Oberseminar 9 of 23 Cloud Computing Pay-as-you-go No initial investments Reduced operation costs Scalability Availability
ISWeb - Information Systems & Semantic Web Oberseminar 10 of 23 Grid vs. Cloud Computing AreaGridCloud Motivation Performance, CapacityFlexibility, scalability Infrastructure Owner by participantsProvided by third party Business Model Share costsPay-as-you-go Virtualization In some casesPrevalent Typical Applications Research, batch jobsOn-demand infrastructure, web applications Advantages Mature TechnologiesLow entry barrier, flexible Disadvantages Initial investments, less flexibility Third party dependence, costs, open issues
ISWeb - Information Systems & Semantic Web Oberseminar 11 of 23 Cloud Computing - Open Issues Bandwidth and latency Lack of standard and portability „Black-box“ implementations Security and lack of control Immature tools and framework support Legal issues (ownership, auditing, etc) Limited Service Level of Agreements (SLAs)
ISWeb - Information Systems & Semantic Web Oberseminar 12 of 23 Data Grid vs. Computing Grid Data Grid: distributed data storage controlled sharing and management of large amounts of distributed data. Computing Grid: Parallel execution divide pieces of a program among several computers Data Grid + Computing Grid Grid Computing
ISWeb - Information Systems & Semantic Web Oberseminar 13 of 23 Data Grid (Hadoop FS - Overview) Caching of Data Namenode (master node) Metadata (Name,..,..) … Index: Datanodes (Slave node) Block ops Client Ask specific text Replication
ISWeb - Information Systems & Semantic Web Oberseminar 14 of 23 Data Grid (HDFS - Replication Data)
ISWeb - Information Systems & Semantic Web Oberseminar 15 of 23 Counting Words in Text Files … Split-Operation countWords(File) Map-Operation w1:w1: w2:w2: w4:w4: w3:w3: w5:w5: … … w 1 : 6 w 2 : 14 w 3 : 15 w 4 : 17 w 5 : 1 Reduce-Operation
ISWeb - Information Systems & Semantic Web Oberseminar 16 of 23 Advantages of Hadoop Purely written in Java, requires installation of Cygwin under Windows Available under LGPL and Apache 2.0 license Usually offers only one implementation for the different features of a grid framework May also use other file systems than Hadoop FS Very flexible implementation of MapReduce For split operation only supports FileSplit out of the box Better suited for computations where … … large data collections should be handled … if reduce-operation is more than a simple aggregation of the map‘s output
ISWeb - Information Systems & Semantic Web Oberseminar 17 of 23 Danke! Questions?