Lecture 3 CS492 Special Topics in Computer Science Distributed Algorithms and Systems.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Introduction to Data Center Computing Derek Murray October 2010.
Data-Intensive Computing with MapReduce/Pig Pramod Bhatotia MPI-SWS Distributed Systems – Winter Semester 2014.
Computations have to be distributed !
Jennifer Widom NoSQL Systems Overview (as of November 2011 )
Optimus: A Dynamic Rewriting Framework for Data-Parallel Execution Plans Qifa Ke, Michael Isard, Yuan Yu Microsoft Research Silicon Valley EuroSys 2013.
1/19 Presented by: Maedeh Tashakkorian Supervisor: Hadi Salimi Mazandaran University of Science and Technology February, 2011.
Distributed Computations
(Hadoop) Pig Dataflow Language B. Ramamurthy Based on Cloudera’s tutorials and Apache’s Pig Manual 6/27/2015.
MapReduce Simplified Data Processing on Large Clusters Google, Inc. Presented by Prasad Raghavendra.
Distributed Computations MapReduce
L22: SC Report, Map Reduce November 23, Map Reduce What is MapReduce? Example computing environment How it works Fault Tolerance Debugging Performance.
MapReduce Simplified Data Processing On large Clusters Jeffery Dean and Sanjay Ghemawat.
Lecture 2 – MapReduce CPE 458 – Parallel Programming, Spring 2009 Except as otherwise noted, the content of this presentation is licensed under the Creative.
MapReduce : Simplified Data Processing on Large Clusters Hongwei Wang & Sihuizi Jin & Yajing Zhang
Applied Architectures Eunyoung Hwang. Objectives How principles have been used to solve challenging problems How architecture can be used to explain and.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
MapReduce: Simpliyed Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat To appear in OSDI 2004 (Operating Systems Design and Implementation)
SIDDHARTH MEHTA PURSUING MASTERS IN COMPUTER SCIENCE (FALL 2008) INTERESTS: SYSTEMS, WEB.
Lecture 3-1 Computer Science 425 Distributed Systems CS 425 / CSE 424 / ECE 428 Fall 2010 Indranil Gupta (Indy) August 31, 2010 Lecture 3  2010, I. Gupta.
MapReduce.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
By: Jeffrey Dean & Sanjay Ghemawat Presented by: Warunika Ranaweera Supervised by: Dr. Nalin Ranasinghe.
MapReduce: Simplified Data Processing on Large Clusters 컴퓨터학과 김정수.
MapReduce: Acknowledgements: Some slides form Google University (licensed under the Creative Commons Attribution 2.5 License) others from Jure Leskovik.
Süleyman Fatih GİRİŞ CONTENT 1. Introduction 2. Programming Model 2.1 Example 2.2 More Examples 3. Implementation 3.1 ExecutionOverview 3.2.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
MapReduce vs. Parallel DBMS Hamid Safizadeh, Otelia Buffington
Parallel Programming Models Basic question: what is the “right” way to write parallel programs –And deal with the complexity of finding parallelism, coarsening.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
1 The Map-Reduce Framework Compiled by Mark Silberstein, using slides from Dan Weld’s class at U. Washington, Yaniv Carmeli and some other.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
Introduction to Apache Hadoop Zibo Wang. Introduction  What is Apache Hadoop?  Apache Hadoop is a software framework which provides open source libraries.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Map Reduce: Simplified Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat Google, Inc. OSDI ’04: 6 th Symposium on Operating Systems Design.
MAP REDUCE : SIMPLIFIED DATA PROCESSING ON LARGE CLUSTERS Presented by: Simarpreet Gill.
MapReduce M/R slides adapted from those of Jeff Dean’s.
The Limitation of MapReduce: A Probing Case and a Lightweight Solution Zhiqiang Ma Lin Gu Department of Computer Science and Engineering The Hong Kong.
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Large scale IP filtering using Apache Pig and case study Kaushik Chandrasekaran Nabeel Akheel.
Hung-chih Yang 1, Ali Dasdan 1 Ruey-Lung Hsiao 2, D. Stott Parker 2
Information Retrieval Lecture 9. Outline Map Reduce, cont. Index compression [Amazon Web Services]
SLIDE 1IS 240 – Spring 2013 MapReduce, HBase, and Hive University of California, Berkeley School of Information IS 257: Database Management.
Graph RAT A framework for integrating social and content data By Daniel McEnnis University of Waikato To what extent do artists cluster into genres Pattern.
Department of Computer Science MapReduce for the Cell B. E. Architecture Marc de Kruijf University of Wisconsin−Madison Advised by Professor Sankaralingam.
By Jeff Dean & Sanjay Ghemawat Google Inc. OSDI 2004 Presented by : Mohit Deopujari.
Chapter 5 Ranking with Indexes 1. 2 More Indexing Techniques n Indexing techniques:  Inverted files - best choice for most applications  Suffix trees.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
MapReduce Computer Engineering Department Distributed Systems Course Assoc. Prof. Dr. Ahmet Sayar Kocaeli University - Fall 2015.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
MapReduce: simplified data processing on large clusters Jeffrey Dean and Sanjay Ghemawat.
Data Parallel and Graph Parallel Systems for Large-scale Data Processing Presenter: Kun Li.
Flight Simulator Overview Flight Compartment Host Computer Motion Control Cabinet Motion Platform 13/6/2016 Visual Display Visual Image Generator Interface.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
MapReduce: Simplied Data Processing on Large Clusters Written By: Jeffrey Dean and Sanjay Ghemawat Presented By: Manoher Shatha & Naveen Kumar Ratkal.
Lecture #4 Introduction to Data Parallelism and MapReduce CS492 Special Topics in Computer Science: Distributed Algorithms and Systems.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
Distributed Programming in “Big Data” Systems Pramod Bhatotia wp
Distributive Property
Map Reduce.
NoSQL Systems Overview (as of November 2011).
MapReduce Simplied Data Processing on Large Clusters
Map reduce use case Giuseppe Andronico INFN Sez. CT & Consorzio COMETA
CS-4513 Distributed Computing Systems Hugh C. Lauer
Overview of big data tools
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
CS639: Data Management for Data Science
Presentation transcript:

Lecture 3 CS492 Special Topics in Computer Science Distributed Algorithms and Systems

MapReduce  Simplified data processing on large clusters A programming model and an associated implementation for processing and generating large data sets  Inspired by “map” and “reduce” primitives present in Lisp and many other functional languages  Simple and powerful interface that enables automatic paralization and distribution of large-scale computations 2 Fall 2008 CS492

Map and Reduce  Map map  list (k2, v2) example  k1 = document, v1 = contents  k2 = words, v2 = 1  Reduce reduce (k2, list(v2))  list (v2)  k2 = words, v2 = 1  list (v2) = total count of v2 (or k2) 3 Fall 2008 CS492

MapReduce Execution Overview 4 Fall 2008 CS492

Examples  Distributed grep  Count of URL access frequency  Reverse web-link graph  Term-vector per host 5 Fall 2008 CS492

 Pig platform for analyzing large data sets that consists of a hig h-level language for expressing data analysis programs, cou pled with infrastructure for evaluating these programs. The salient property of Pig programs is that their structure is a menable to substantial parallelization, which in turns enabl es them to handle very large data sets  Dryad Distributed data-parallel programs from sequential building blocks (EuroSys 2007) 6 Fall 2008 CS492