Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Cloud Computing What, why, how? Noam Bercovici Renata Dividino.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

 Open source software framework designed for storage and processing of large scale data on clusters of commodity hardware  Created by Doug Cutting and.
Digital Library Service – An overview Introduction System Architecture Components and their functionalities Experimental Results.
Based on the text by Jimmy Lin and Chris Dryer; and on the yahoo tutorial on mapreduce at index.html
A Hadoop Overview. Outline Progress Report MapReduce Programming Hadoop Cluster Overview HBase Overview Q & A.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
Google Distributed System and Hadoop Lakshmi Thyagarajan.
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
The Hadoop Distributed File System: Architecture and Design by Dhruba Borthakur Presented by Bryant Yao.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
資訊工程系智慧型系統實驗室 iLab 南台科技大學 1 Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 出處 : 2011 UKSim 5th European Symposium on Computer Modeling.
THE HOG LANGUAGE A scripting MapReduce language. Jason Halpern Testing/Validation Samuel Messing Project Manager Benjamin Rapaport System Architect Kurry.
SOFTWARE SYSTEMS DEVELOPMENT MAP-REDUCE, Hadoop, HBase.
李智宇、 林威宏、 施閔耀. + Outline Introduction Architecture of Hadoop HDFS MapReduce Comparison Why Hadoop Conclusion
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
Software Architecture
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Presented by CH.Anusha.  Apache Hadoop framework  HDFS and MapReduce  Hadoop distributed file system  JobTracker and TaskTracker  Apache Hadoop NextGen.
Face Detection And Recognition For Distributed Systems Meng Lin and Ermin Hodžić 1.
報告人 : 葉瑞群 日期 :2012/01/9 出處 : IEEE Transactions on Knowledge and Data Engineering.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Hadoop Basics -Venkat Cherukupalli. What is Hadoop? Open Source Distributed processing Large data sets across clusters Commodity, shared-nothing servers.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
HAMS Technologies 1
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 作者 :Rutvik Karve , Devendra Dahiphale , Amit Chhajer 報告 : 饒展榕.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.
Tutorial: Big Data Algorithms and Applications Under Hadoop KUNPENG ZHANG SIDDHARTHA BHATTACHARYYA
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
Presented by: Katie Woods and Jordan Howell. * Hadoop is a distributed computing platform written in Java. It incorporates features similar to those of.
HADOOP DISTRIBUTED FILE SYSTEM HDFS Reliability Based on “The Hadoop Distributed File System” K. Shvachko et al., MSST 2010 Michael Tsitrin 26/05/13.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Copyright © 2015, SAS Institute Inc. All rights reserved. THE ELEPHANT IN THE ROOM SAS & HADOOP.
IBM Research ® © 2007 IBM Corporation Introduction to Map-Reduce and Join Processing.
Web Log Data Analytics with Hadoop
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Experiments in Utility Computing: Hadoop and Condor Sameer Paranjpye Y! Web Search.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Distributed Process Discovery From Large Event Logs Sergio Hernández de Mesa {
Learn Hadoop and Big Data Technologies. Hadoop  An Open source framework that stores and processes Big Data in distributed manner on a large groups of.
INTRODUCTION TO HADOOP. OUTLINE  What is Hadoop  The core of Hadoop  Structure of Hadoop Distributed File System  Structure of MapReduce Framework.
By: Joel Dominic and Carroll Wongchote 4/18/2012.
1 Student Date Time Wei Li Nov 30, 2015 Monday 9:00-9:25am Shubbhi Taneja Nov 30, 2015 Monday9:25-9:50am Rodrigo Sanandan Dec 2, 2015 Wednesday9:00-9:25am.
Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.
Hadoop Introduction. Audience Introduction of students – Name – Years of experience – Background – Do you know Java? – Do you know linux? – Any exposure.
Image taken from: slideshare
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
Software Systems Development
An Open Source Project Commonly Used for Processing Big Data Sets
CSS534: Parallel Programming in Grid and Cloud
Hadoop MapReduce Framework
Introduction to MapReduce and Hadoop
Introduction to HDFS: Hadoop Distributed File System
Calculation of stock volatility using Hadoop and map-reduce
Central Florida Business Intelligence User Group
Ministry of Higher Education
Introduction to Apache
Lecture 16 (Intro to MapReduce and Hadoop)
Apache Hadoop and Spark
Presentation transcript:

Information Systems & Semantic Web University of Koblenz ▪ Landau, Germany Cloud Computing What, why, how? Noam Bercovici Renata Dividino

ISWeb - Information Systems & Semantic Web Oberseminar 2 of 23 Motivation Count how frequent each words appears in the corpus MEDline (18 millions texts)

ISWeb - Information Systems & Semantic Web Oberseminar 3 of 23 Motivation I want to extend my research to another corpus Need more computing resources

ISWeb - Information Systems & Semantic Web Oberseminar 4 of 23 Agenda  Introduction  Data Grid vs. Computing Grid  Grid Computing  Cloud Computing  Data Grid (HaDoop File System)  Computing Grid (Map Reduce)  Conclusion

ISWeb - Information Systems & Semantic Web Oberseminar 5 of 23 Data Grid vs. Computing Grid  Data Grid:  distributed data storage  controlled sharing and management of large amounts of distributed data.  Computing Grid:  Parallel execution  divide pieces of a program among several computers Data Grid + Computing Grid Grid Computing

ISWeb - Information Systems & Semantic Web Oberseminar 6 of 23 Grid Computing The Grid Master Slaves Task

ISWeb - Information Systems & Semantic Web Oberseminar 7 of 23 Grid Computing  Motivation: high performance, improving resources utilization  Aims to create illusion of a simple, yet powerful computer out of a large number of heterogeneous systems  Tasks are submitted and distributed on nodes in the grid

ISWeb - Information Systems & Semantic Web Oberseminar 8 of 23 Cloud Computing “The interesting thing about cloud computing is that we’ve redefined cloud computing to include everything that we already do. “ Larry Ellison during Oracle’s Analyst Day

ISWeb - Information Systems & Semantic Web Oberseminar 9 of 23 Cloud Computing  Pay-as-you-go  No initial investments  Reduced operation costs  Scalability  Availability

ISWeb - Information Systems & Semantic Web Oberseminar 10 of 23 Grid vs. Cloud Computing AreaGridCloud Motivation Performance, CapacityFlexibility, scalability Infrastructure Owner by participantsProvided by third party Business Model Share costsPay-as-you-go Virtualization In some casesPrevalent Typical Applications Research, batch jobsOn-demand infrastructure, web applications Advantages Mature TechnologiesLow entry barrier, flexible Disadvantages Initial investments, less flexibility Third party dependence, costs, open issues

ISWeb - Information Systems & Semantic Web Oberseminar 11 of 23 Cloud Computing - Open Issues  Bandwidth and latency  Lack of standard and portability  „Black-box“ implementations  Security and lack of control  Immature tools and framework support  Legal issues (ownership, auditing, etc)  Limited Service Level of Agreements (SLAs)

ISWeb - Information Systems & Semantic Web Oberseminar 12 of 23 Data Grid vs. Computing Grid  Data Grid:  distributed data storage  controlled sharing and management of large amounts of distributed data.  Computing Grid:  Parallel execution  divide pieces of a program among several computers Data Grid + Computing Grid Grid Computing

ISWeb - Information Systems & Semantic Web Oberseminar 13 of 23 Data Grid (Hadoop FS - Overview)  Caching of Data Namenode (master node) Metadata (Name,..,..) … Index: Datanodes (Slave node) Block ops Client Ask specific text Replication

ISWeb - Information Systems & Semantic Web Oberseminar 14 of 23 Data Grid (HDFS - Replication Data)

ISWeb - Information Systems & Semantic Web Oberseminar 15 of 23 Counting Words in Text Files … Split-Operation countWords(File) Map-Operation w1:w1: w2:w2: w4:w4: w3:w3: w5:w5: … … w 1 : 6 w 2 : 14 w 3 : 15 w 4 : 17 w 5 : 1 Reduce-Operation

ISWeb - Information Systems & Semantic Web Oberseminar 16 of 23 Advantages of Hadoop  Purely written in Java, requires installation of Cygwin under Windows  Available under LGPL and Apache 2.0 license  Usually offers only one implementation for the different features of a grid framework  May also use other file systems than Hadoop FS  Very flexible implementation of MapReduce  For split operation only supports FileSplit out of the box  Better suited for computations where …  … large data collections should be handled  … if reduce-operation is more than a simple aggregation of the map‘s output

ISWeb - Information Systems & Semantic Web Oberseminar 17 of 23 Danke! Questions?