Distributed Operating Systems

Slides:



Advertisements
Similar presentations
LIBRA: Lightweight Data Skew Mitigation in MapReduce
Advertisements

EHarmony in Cloud Subtitle Brian Ko. eHarmony Online subscription-based matchmaking service Available in United States, Canada, Australia and United Kingdom.
Paula Ta-Shma, IBM Haifa Research 1 “Advanced Topics on Storage Systems” - Spring 2013, Tel-Aviv University Big Data and.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
DAvinCi: A Cloud Computing Framework for Service Robots
B 葉彥廷 B 林廷韋 B 王頃恩. Why we choose this topic Introduction Programming Model Example Implementation Conclusion.
Facebook (stylized facebook) is a Social Networking System and website launched in February 2004, operated and privately owned by Facebook, Inc. As.
© Spinnaker Labs, Inc. Google Cluster Computing Faculty Training Workshop Open Source Tools for Teaching.
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
 What is an operating system? What is an operating system?  Where does the OS fit in? Where does the OS fit in?  Services provided by an OS Services.
Charles Tappert Seidenberg School of CSIS, Pace University
Software Architecture
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
Hadoop Ali Sharza Khan High Performance Computing 1.
An Architecture for Distributed High Performance Video Processing in the Cloud Speaker : 吳靖緯 MA0G IEEE 3rd International Conference.
Optimizing Cloud MapReduce for Processing Stream Data using Pipelining 2011 UKSim 5th European Symposium on Computer Modeling and Simulation Speker : Hong-Ji.
Apache Hadoop Daniel Lust, Anthony Taliercio. What is Apache Hadoop? Allows applications to utilize thousands of nodes while exchanging thousands of terabytes.
 Frequent Word Combinations Mining and Indexing on HBase Hemanth Gokavarapu Santhosh Kumar Saminathan.
Grid Appliance The World of Virtual Resource Sharing Group # 14 Dhairya Gala Priyank Shah.
Distributed Computing Systems CSCI 4780/6780. Scalability ConceptExample Centralized servicesA single server for all users Centralized dataA single on-line.
7 Fun Things to do with MapReduce Chris Hillman – Teradata Data
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
MapReduce. Google and MapReduce Google searches billions of web pages very, very quickly How? It uses a technique called “MapReduce” to distribute the.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
Derek Weitzel Grid Computing. Background B.S. Computer Engineering from University of Nebraska – Lincoln (UNL) 3 years administering supercomputers at.
1 HBASE – THE SCALABLE DATA STORE An Introduction to HBase XLDB Europe Workshop 2013: CERN, Geneva James Kinley EMEA Solutions Architect, Cloudera.
COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn University
MapReduce using Hadoop Jan Krüger … in 30 minutes...
Leverage Big Data With Hadoop Analytics Presentation by Ravi Namboori Visit
A Tutorial on Hadoop Cloud Computing : Future Trends.
BAHIR DAR UNIVERSITY Institute of technology Faculty of Computing Department of information technology Msc program Distributed Database Article Review.
Chapter 4: Threads Modified by Dr. Neerja Mhaskar for CS 3SH3.
Chapter 1 Characterization of Distributed Systems
Big Data is a Big Deal!.
Hadoop-based Distributed Web Crawler
MapReduce Compiler RHadoop
Understanding Big Data
Hadoop Aakash Kag What Why How 1.
Introduction to Distributed Platforms
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Map Reduce.
Abstract Major Cloud computing companies have started to integrate frameworks for parallel data processing in their product portfolio, making it easy for.
Algorithms for Big Data Delivery over the Internet of Things
Hadoop Clusters Tess Fulkerson.
Software Engineering Introduction to Apache Hadoop Map Reduce
Myoungjin Kim1, Yun Cui1, Hyeokju Lee1 and Hanku Lee1,2,*
Information Systems in Organizations 1.1 Introduction to MIS
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
Distributed Systems CS
The Basics of Apache Hadoop
CS 179 Lecture 14.
Big Data - in Performance Engineering
湖南大学-信息科学与工程学院-计算机与科学系
EECS 498 Introduction to Distributed Systems Fall 2017
Parallel and Multiprocessor Architectures – Shared Memory
MapReduce.
Big Data Young Lee BUS 550.
TIM TAYLOR AND JOSH NEEDHAM
Distributed Systems CS
Zoie Barrett and Brian Lam
Introduction to MapReduce
Big DATA.
Distributed Edge Computing
Anjuman College of Engineering & Technology Computer Science & Engineering Department Subject Code: BECSE408T Subject Name: (ELECTIVE-III)Clustering &
Distributed Systems (15-440)
Presentation transcript:

Distributed Operating Systems Luke Wood

What is a distributed operating system?

Distributed Operating Systems Runs across multiple physical or virtual machines Utilizes the processing power of multiple machines Huge issues with synchronization in development Play a huge role in the world of "big data" (big daters)

What is driving this development We just have so much data!

of the world's data was generated over past two years 90% of the world's data was generated over past two years

For real - why didn't we just increase our clock speed Oh that's why. Why Distributed? For real - why didn't we just increase our clock speed Oh that's why.

Today's Solution: Hadoop Hadoop is the most widely used distributed OS in industry. It is made up of: Hadoop common Hadoop FS MapReduce and so much more...

Hadoop History Google File System published in October 2003 MapReduce: Simplified Data Processing on Large Clusters published in December 2004 Named after Doug Cutting's Son's toy elephant hadoop!

Used to Process Data Such As Surveillance Data Social Media Data Stock Exchange Data Power Grid Data Transport Data Search Engine Data

Hadoop Case Study - Incredibly impressive results - Insane performance gains using the cluster Results from Cloud Hadoop Map Reduce For Remote Sensing Image Analysis by Mohamed Almeer

The end goal of a distributed OS is to harness the power of multiple machines

What? How!? We utilize the Map Reduce Paradigm

The End.

Just Kidding.

Issues and Solutions From an OS and Application level perspective

#1: Shared Data When we use a map function - how do we access a shared state? What if our operations are not communicative?

Programmer Dependent Solution: Operating System Solution: - Just use pure functions - This can be a challenge - Not super "general population friendly" Operating System Solution: Operating system provides broadcast functionality Can we update the broadcasted data? How expensive is this broadcasting? Is this a programmer invoked function?

#2: Data distribution How do we distribute data between devices?

Data Distribution Architectures Master to workers only useful in MapReduce much simpler than other architectures Peer to Peer file distribution much harder to implement

Programmer Dependent Solution: Operating System Solution: - Explicitly broadcast data - Prevents unnecessary data distribution Operating System Solution: - Try to intelligently distribute data - Delegate specific tasks to specific systems

Conclusion - Distributed operating systems have allowed companies to crunch insane amounts of data in reasonable time frames - Parallel and distributed computing are made significantly easier through the use of the mapreduce paradigm - Many of the synchronization problems we have studied in this class are taken care of by the mapreduce implementation

Thank you - check out distributed OS programming - it's a ton of fun