Ali Ghodsi UC Berkeley & KTH & SICS

Slides:



Advertisements
Similar presentations
2  Industry trends and challenges  Windows Server 2012: Beyond virtualization  Complete virtualization platform  Improved scalability and performance.
Advertisements

The Datacenter Needs an Operating System Matei Zaharia, Benjamin Hindman, Andy Konwinski, Ali Ghodsi, Anthony Joseph, Randy Katz, Scott Shenker, Ion Stoica.
University of Minnesota Optimizing MapReduce Provisioning in the Cloud Michael Cardosa, Aameek Singh†, Himabindu Pucha†, Abhishek Chandra
Teaser - Introduction to Distributed Computing
Chapter 4 Infrastructure as a Service (IaaS)
THE DATACENTER NEEDS AN OPERATING SYSTEM MATEI ZAHARIA, BENJAMIN HINDMAN, ANDY KONWINSKI, ALI GHODSI, ANTHONY JOSEPH, RANDY KATZ, SCOTT SHENKER, ION STOICA.
Introduction CSCI 444/544 Operating Systems Fall 2008.
CS 345 Computer System Overview
© 2009 VMware Inc. All rights reserved Big Data’s Virtualization Journey Andrew Yu Sr. Director, Big Data R&D VMware.
Mesos A Platform for Fine-Grained Resource Sharing in Data Centers Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony D. Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
U NIVERSITY OF M ASSACHUSETTS, A MHERST Department of Computer Science Virtualization in Data Centers Prashant Shenoy
1© Copyright 2015 EMC Corporation. All rights reserved. SDN INTELLIGENT NETWORKING IMPLICATIONS FOR END-TO-END INTERNETWORKING Simone Mangiante Senior.
Cloud computing (and Google AppEngine) material adapted from slides by Indranil Gupta, Jimmy Lim, Christophe Bisciglia, Aaron Kimball, & Sierra Michels-Slettvet,
By, Casey Riva. The Craze Of The Cloud Why is Cloud Computing becoming so popular? People are always on the move, this drives to a demand for more portable.
AN INTRODUCTION TO CLOUD COMPUTING Web, as a Platform…
1 CS : Cloud Computing, Systems, Networking, and Frameworks Fall 2011 (MW 1:00-2:30, 293 Cory Hall) Ion Stoica (
On Availability of Intermediate Data in Cloud Computations Steven Y. Ko, Imranul Hoque, Brian Cho, and Indranil Gupta Distributed Protocols Research Group.
A Platform for Fine-Grained Resource Sharing in the Data Center
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
Data Mining on the Web via Cloud Computing COMS E6125 Web Enhanced Information Management Presented By Hemanth Murthy.
Tyson Condie.
Cloud Distributed Computing Environment Content of this lecture is primarily from the book “Hadoop, The Definite Guide 2/e)
1 Lecture 20: WSC, Datacenters Topics: warehouse-scale computing and datacenters (Sections ) – the basics followed by a look at the future.
Introduction and Overview Questions answered in this lecture: What is an operating system? How have operating systems evolved? Why study operating systems?
SDN : What We’ve Learned Martìn Casado I’ve. Outline SDN : a History SDN : a Definition SDN : What I’ve Learned.
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Hadoop Hardware Infrastructure considerations ©2013 OpalSoft Big Data.
CSE 548 Advanced Computer Network Security Document Search in MobiCloud using Hadoop Framework Sayan Cole Jaya Chakladar Group No: 1.
المحاضرة الاولى Operating Systems. The general objectives of this decision explain the concepts and the importance of operating systems and development.
1 CS 294: Big Data System Research: Trends and Challenges Fall 2015 (MW 9:30-11:00, 310 Soda Hall) Ion Stoica and Ali Ghodsi (
Mesos A Platform for Fine-Grained Resource Sharing in the Data Center Benjamin Hindman, Andy Konwinski, Matei Zaharia, Ali Ghodsi, Anthony Joseph, Randy.
Software Defined Networks for Dynamic Datacenter and Cloud Environments.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
A Platform for Fine-Grained Resource Sharing in the Data Center
Copyright © 2016 – Curt Hill The Digital World Understanding the challenges of this world.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Cloud Distributed Computing Environment Hadoop. Hadoop is an open-source software system that provides a distributed computing environment on cloud (data.
LIMPOPO DEPARTMENT OF ECONOMIC DEVELOPMENT, ENVIRONMENT AND TOURISM The heartland of southern Africa – development is about people! 2015 ICT YOUTH CONFERENCE.
Next Generation of Apache Hadoop MapReduce Owen
Data Centers and Cloud Computing 1. 2 Data Centers 3.
Clouds, Grids and Clusters Prepared by M.Chandana Department of CSE Engineered for Tomorrow Course code: 10CS845.
BIG DATA/ Hadoop Interview Questions.
Resilient Distributed Datasets A Fault-Tolerant Abstraction for In-Memory Cluster Computing Matei Zaharia, Mosharaf Chowdhury, Tathagata Das, Ankur Dave,
Raju Subba Open Source Project: Apache Spark. Introduction Big Data Analytics Engine and it is open source Spark provides APIs in Scala, Java, Python.
A Tutorial on Hadoop Cloud Computing : Future Trends.
Network Requirements for Resource Disaggregation
Organizations Are Embracing New Opportunities
BEST CLOUD COMPUTING PLATFORM Skype : mukesh.k.bansal.
5/13/2018 1:53 PM © Microsoft Corporation. All rights reserved. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN.
Introduction to Distributed Platforms
By Chris immanuel, Heym Kumar, Sai janani, Susmitha
Spark Presentation.
Couchbase Server is a NoSQL Database with a SQL-Based Query Language
Software Engineering Introduction to Apache Hadoop Map Reduce
Introduction to Spark.
به نام خدا Big Data and a New Look at Communication Networks Babak Khalaj Sharif University of Technology Department of Electrical Engineering.
Dr. John P. Abraham Professor, Computer Engineering UTPA
湖南大学-信息科学与工程学院-计算机与科学系
CS110: Discussion about Spark
Apache Spark Lecture by: Faria Kalim (lead TA) CS425, UIUC
Big Data Young Lee BUS 550.
Partition Starter Find out what disk partitioning is, state key features, find a diagram and give an example.
Lecture 16 (Intro to MapReduce and Hadoop)
Apache Spark Lecture by: Faria Kalim (lead TA) CS425 Fall 2018 UIUC
CS 239 – Big Data Systems Fall 2018
Presentation transcript:

Ali Ghodsi UC Berkeley & KTH & SICS alig@cs.berkeley.edu IoT Meets the Cloud Ali Ghodsi UC Berkeley & KTH & SICS alig@cs.berkeley.edu

Cloud Computing? Larry Ellison, CEO of Oracle Corporation “The computer industry is the only industry that is more fashion-driven than women's fashion. Maybe I'm an idiot, but I have no idea what anyone is talking about. What is it? It's complete gibberish. It's insane. When is this idiocy going to stop?” Richard M. Stallman, President of FSF “It’s stupidity. It’s worse than stupidity: it’s a marketing hype campaign. Somebody is saying this is inevitable — and whenever you hear somebody saying that, it’s very likely to be a set of businesses campaigning to make it true.” My claim: Cloud computing is inevitable for the Internet-of-Things

Most of the Computation on the Cloud Already! Mobile Applications Proximity sensor, ambient light, digital compass, gyroscope, accelerometer, dual microphone, multi-touch touchscreen bluetooth Most of the Computation on the Cloud Already!

Do we need the cloud for IoT? Device deluge 3 billion smart phones Another 40 billion IoT devices Devices will be challenged Limited storage Limited processing Limited communication Limited energy 12kbit/s, P2P Clouds needed for IoT, just as for phones and desktops

What is the cloud? Datacenter Computing Thousands of servers Co-located storage Routers and switches Backup power supplies Cooling Talk of the size... Half a mill.... Amazon

Why do we need datacenters? Multi-core Computing Processing speed stagnation Increased parallelism Supercomputer not sufficient Parallel computing quintessential to cloud computing Request-level parallelism Parallel algorithms (MapReduce, Indexing …) Typical machine, 12 disk drives, 16 cores, several NICs, 160 GB memory

Why do we need datacenters? (2) Economy of scale Reduce server cost Reduce cooling cost Reduce power cost Clouds are efficient PUE = total_facility_power/ equipment_power ~ 1.2 Energy economy-of-scale Commodity servers Workload consolidation Energy: common cooling, near cheap places and good climate for cooling, common infrastructure, buildings, backup power Commodity: reliability

Workload Consolidation Data replicated over commodity machines Pioneered by Inktomi Interactive and latency sensitive jobs User facing applications e.g. search queries, tweets, … Millisecond SLOs Batch-jobs Building search indexes … Analytics of trends, business data … AV/spam filtering …

Workload Consolidation (2) Interactive and batch on same machines Virtualization of computation e.g. migration, hardware agnosticism Isolation of workloads e.g. meet SLO guarantees Automatic fault-handling e.g. through replication

Transformation of Computing Datacenter as a computer Programs timeshare thousands of servers Twitter

Berkeley Vision Create an “Operating System Kernel” for the Datacenter Computer First step with Mesos (mesosproject.org) Twitter

Today’s Cloud Frameworks Dryad Pregel Frameworks simplify distributed programming Programming models Hide failures, synchronization, delay variance Each framework runs on a dedicated cluster/partition

One Framework Per Cluster Challenges Inefficient resource usage E.g., Hadoop cannot use available resources from IoT FW cluster No opportunity for stat. multiplexing Hard to share data Copy or access remotely, expensive Hard to cooperate E.g., Not easy for IoT FW to use data generated by Hadoop Hadoop IoT FW Hadoop IoT FW Need to run multiple frameworks on the same cluster

Solution: Mesos Common resource sharing layer Mesos Uniprograming abstracts (“virtualizes”) resources to frameworks enable diverse frameworks to share cluster Hadoop IoT FW Hadoop IoT FW Mesos Our solution is Mesos, a resource sharing layer that abstracts resources to framework, and this way allows over which diverse frameworks cab run, by abstracting resources to framework By doing so we go from uniprograming or one framework per cluster to multiprogramming where we share entire cluster among diverse frameworks. Uniprograming Multiprograming

IoT Framework Diversity Today’s frameworks tailored for specific application domains MapReduce for indexing and filtering Pregel for graph algorithms IoT problem domain highly diverse Existing frameworks poor fit for IoT

New IoT Frameworks for Clouds IoT framework requirements Efficient device tag matching and filtering Online stream processing of IoT data Offline storage and batch processing of IoT data Goal: Build first cloud framework for IoT

IoT Framework Applications Real time stream processing of data Security, safety, health applications Locating people, devices, objects Real time. Supply-chain, real time trigger the ordering of new supplies when inventory reaches low Real time. Enough-smoke sensors over an area goes off, trigger emergency system

IoT Framework Applications (2) Batch processing of big data Learning trends, patterns, anomalies Collaborative filtering/recommendation Computing global device statistics Batch. Retailers in a region can collaborate to get aggregate trend of undersupply of a product category in a city. Batch. Aggregate users behavior to recommend to others

Summary Dichotomy: Challenged IoT vs Powerful Clouds ”nerves”—sensors, actuators—collect and send data to the ”brain”—the datacenter Datacenter is the new super computer Will need to multiplex between many IoT FW Need IoT-tailored frameworks to aid IoT services