X-Informatics Cloud Technology (Continued) March 4 2013 Geoffrey Fox Associate.

Slides:



Advertisements
Similar presentations
Large Scale Computing Systems
Advertisements

C-Store: Data Management in the Cloud Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY Jun 5, 2009.
By Adam Balla & Wachiu Siu
International Conference on Cloud and Green Computing (CGC2011, SCA2011, DASC2011, PICom2011, EmbeddedCom2011) University.
Clouds from FutureGrid’s Perspective April Geoffrey Fox Director, Digital Science Center, Pervasive.
Cloud Computing PRESENTED BY- Rajat Dixit (rd2392)
Emerging Platform#6: Cloud Computing B. Ramamurthy 6/20/20141 cse651, B. Ramamurthy.
Authors: Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox Publish: HPDC'10, June 20–25, 2010, Chicago, Illinois, USA ACM Speaker: Jia Bao Lin.
 Need for a new processing platform (BigData)  Origin of Hadoop  What is Hadoop & what it is not ?  Hadoop architecture  Hadoop components (Common/HDFS/MapReduce)
What is Cloud Computing? o Cloud computing:- is a style of computing in which dynamically scalable and often virtualized resources are provided as a service.
Google Bigtable A Distributed Storage System for Structured Data Hadi Salimi, Distributed Systems Laboratory, School of Computer Engineering, Iran University.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 4.
Undergraduate Poster Presentation Match 31, 2015 Department of CSE, BUET, Dhaka, Bangladesh Wireless Sensor Network Integretion With Cloud Computing H.M.A.
Presented by Sujit Tilak. Evolution of Client/Server Architecture Clients & Server on different computer systems Local Area Network for Server and Client.
MapReduce in the Clouds for Science CloudCom 2010 Nov 30 – Dec 3, 2010 Thilina Gunarathne, Tak-Lon Wu, Judy Qiu, Geoffrey Fox {tgunarat, taklwu,
Big Data A big step towards innovation, competition and productivity.
WORKFLOWS IN CLOUD COMPUTING. CLOUD COMPUTING  Delivering applications or services in on-demand environment  Hundreds of thousands of users / applications.
Cloud computing Tahani aljehani.
Virtual Clusters Supporting MapReduce in the Cloud Jonathan Klinginsmith School of Informatics and Computing.
3DAPAS/ECMLS panel Dynamic Distributed Data Intensive Analysis Environments for Life Sciences: June San Jose Geoffrey Fox, Shantenu Jha, Dan Katz,
Clouds on IT horizon Faculty of Maritime Studies University of Rijeka Sanja Mohorovičić INFuture 2009, Zagreb, 5 November 2009.
Cloud Computing. Cloud Computing Overview Course Content
Osama Shahid ( ) Vishal ( ) BSCS-5B
A Brief Overview by Aditya Dutt March 18 th ’ Aditya Inc.
Big Data and Clouds: Challenges and Opportunities NIST January Geoffrey Fox
PhD course - Milan, March /09/ Some additional words about cloud computing Lionel Brunie National Institute of Applied Science (INSA) LIRIS.
Cloud computing is the use of computing resources (hardware and software) that are delivered as a service over the Internet. Cloud is the metaphor for.
 Cloud computing  Workflow  Workflow lifecycle  Workflow design  Workflow tools : xcp, eucalyptus, open nebula.
SALSASALSASALSASALSA AOGS, Singapore, August 11-14, 2009 Geoffrey Fox 1,2 and Marlon Pierce 1
Cloud Computing. What is Cloud Computing? Cloud computing is a model for enabling convenient, on-demand network access to a shared pool of configurable.
Introduction to Cloud Technology StratusLab Tutorial (Orsay, France) 28 November 2012.
Software Architecture
Science Clouds and FutureGrid’s Perspective June Science Clouds Workshop HPDC 2012 Delft Geoffrey Fox
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.
Introduction to Cloud Computing
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Cloud Interoperability & Standards. Scalability and Fault Tolerance Fault tolerance is the property that enables a system to continue operating properly.
VMware vSphere Configuration and Management v6
Windows Azure. Azure Application platform for the public cloud. Windows Azure is an operating system You can: – build a web application that runs.
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Security: systems, clouds, models, and privacy challenges iDASH Symposium San Diego CA October Geoffrey.
CLOUD COMPUTING RICH SANGPROM. What is cloud computing? “Cloud computing is a model for enabling ubiquitous, convenient, on-demand network access to a.
3/12/2013Computer Engg, IIT(BHU)1 CLOUD COMPUTING-1.
Web Technologies Lecture 13 Introduction to cloud computing.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
HPC in the Cloud – Clearing the Mist or Lost in the Fog Panel at SC11 Seattle November Geoffrey Fox
1 TCS Confidential. 2 Objective : In this session we will be able to learn:  What is Cloud Computing?  Characteristics  Cloud Flavors  Cloud Deployment.
Directions in eScience Interoperability and Science Clouds June Interoperability in Action – Standards Implementation.
PARALLEL AND DISTRIBUTED PROGRAMMING MODELS U. Jhashuva 1 Asst. Prof Dept. of CSE om.
Deploying Highly Available SQL Server in Windows Azure A Presentation and Demonstration by Microsoft Cluster MVP David Bermingham.
KAASHIV INFOTECH – A SOFTWARE CUM RESEARCH COMPANY IN ELECTRONICS, ELECTRICAL, CIVIL AND MECHANICAL AREAS
Big Data analytics in the Cloud Ahmed Alhanaei. What is Cloud computing?  Cloud computing is Internet-based computing, whereby shared resources, software.
© 2012 Eucalyptus Systems, Inc. Cloud Computing Introduction Eucalyptus Education Services 2.
BIG DATA/ Hadoop Interview Questions.
BIG DATA BIGDATA, collection of large and complex data sets difficult to process using on-hand database tools.
Unit 3 Virtualization.
Introduction to Cloud Technology
Chapter 6: Securing the Cloud
Organizations Are Embracing New Opportunities
Introduction to Distributed Platforms
Introduction to Cloud Computing
Outline Virtualization Cloud Computing Microsoft Azure Platform
Ch 4. The Evolution of Analytic Scalability
Clouds from FutureGrid’s Perspective
Big Data Architectures
Cloud Computing: Concepts
Cloud versus Cloud: How Will Cloud Computing Shape Our World?
Convergence of Big Data and Extreme Computing
Presentation transcript:

X-Informatics Cloud Technology (Continued) March Geoffrey Fox Associate Dean for Research and Graduate Studies, School of Informatics and Computing Indiana University Bloomington 2013

The Course in One Sentence Study Clouds running Data Analytics processing Big Data to solve problems in X-Informatics

What is Cloud Computing Continued

Designed to Provide Information and Computation to Many Users Automatic Deployment and Management of Virtual Machine Instances tens to thousands & dynamic scalability Dynamic Fault Recovery of Failed Resources Cloud Services must run 24x7 Automatic Data Replication Geo-replication if needed Two levels of parallelism Thousands of concurrent users Thousands of servers for a single task. Cloud Properties Gannon Talk

Clouds Offer From different points of view Features from NIST: – On-demand service (elastic); – Broad network access; – Resource pooling; – Flexible resource allocation; – Measured service Economies of scale in performance and electrical power (Green IT) Powerful new software models – Platform as a Service is not an alternative to Infrastructure as a Service – it is instead an incredible valued added – Amazon is as much PaaS as Azure 6

7 What is Cloud? A scalable, persistent outsourced infrastructure A framework for massive data analysis An amplifier of our desktop experience from MICROSOFT

8

9 Cloud Mythologies Cloud computing infrastructure is just a web service interface to operating system virtualization. –“I’m running Xen in my data center –I’m running a private cloud.” Clouds and Grids are equivalent –“In the mid 1990s, the term grid was coined to describe technologies that would allow consumers to obtain computing power on demand.” Cloud computing imposes a significant performance penalty over “bare metal” provisioning. –“I won’t be able to run a private cloud because my users will not tolerate the performance hit.” Eucalyptus

10 Why Cloud Might Matter To You? A New Science “A new, fourth paradigm for science is based on data intensive computing” … understanding of this new paradigm from a variety of disciplinary perspective – The Fourth Paradigm: Data-Intensive Scientific Discovery A New Architecture “Understanding the design issues and programming challenges for those potentially ubiquitous next-generation machines” – The Datacenter As A Computer

As a Service and Platform Model

13

IaaS Open Source Eucalyptus Nimbus OpenStack OpenNebula (Europe) CloudStack (break away from OpenStack) These manage basic computing and storage but these are separable ideas Management, Networking, Access Tools You install on your own clusters and do have control over operation Compare Public Clouds (Amazon Azure Google) where operation is opaque but you just need a credit card and a web browser Compare VMware commercial version which is similar to OpenSource but with better support and probably greater robustness/functionality Eucalyptus has Open Source and Commercial versions

15 ComputingStorage Eucalyptus

Platform as a Service

17

Platform Extension to Cloud is a Continuum MICROSOFT

Cloud Computing: Infrastructure and Runtimes  Cloud Infrastructure: outsourcing of servers, computing, data, file space, utility computing, etc.  Handled through (Web) services that control virtual machine lifecycles.  Cloud Runtimes or Platform: tools (for using clouds) to do data- parallel (and other) computations.  Apache Hadoop, Google MapReduce, Microsoft Dryad, Bigtable, Chubby (synchronization) and others  MapReduce designed for information retrieval but is excellent for a wide range of science data analysis applications  Can also do much traditional parallel computing for data-mining if extended to support iterative operations  MapReduce not usually done on Virtual Machines

Cloud Data Models Clouds have a unique processing model – MapReduce with Hadoop which we discussed They have also inspired a revolution in data models – Originally enterprises used databases but – Science did not as not efficient and “transactions” not typical use – Rather tended to access large chunks of data and/or did not do SQL queries. Rather used custom user software to access events of interest e.g. events like the Higgs – LHC explored use of “object oriented” databases (Objectivity) but abandoned “Web 2.0” applications had same issues as science – Huge data sizes (in fact larger than science) and non SQL queries This spurred NOSQL approaches to data and approaches to parallelism

Different formats for data Traditional Windows or UNIX file systems arranged in a hierarchy with directories – Generalize to share files across many computers – Access with standard UNIX I/O commands and in growing fashion via web (as in Oncourse or Dropbox) Google File System – realized in Hadoop File System HDFS – splits files up for parallelism – Traditional file systems also had parallel extensions but GFS very explicit Object stores like Amazon S3 have simpler interface than traditional file systems Databases will appear on next page as management system

Clouds as Support for Data Repositories? The data deluge needs cost effective computing – Clouds are by definition cheapest – Need data and computing co-located Shared resources essential (to be cost effective and large) – Can’t have every scientists downloading petabytes to personal cluster Need to reconcile distributed (initial source of ) data with shared analysis – Can move data to (discipline specific) clouds – How do you deal with multi-disciplinary studies Data repositories of future will have cheap data and elastic cloud analysis support? – Hosted free if data can be used commercially? 22

Architecture of Data Repositories? Traditionally governments set up repositories for data associated with particular missions – For example EOSDIS (Earth Observation), GenBank (Genomics), NSIDC (Polar science), IPAC (Infrared astronomy) – LHC/OSG computing grids for particle physics This is complicated by volume of data deluge, distributed instruments as in gene sequencers (maybe centralize?) and need for intense computing like Blast – i.e. repositories need lots of computing? 23

Traditional File System? Typically a shared file system (Lustre, NFS …) used to support high performance computing Big advantages in flexible computing on shared data but doesn’t “bring computing to data” Object stores similar structure (separate data and compute) to this S Data S S S Compute Cluster C C C C C C C C C C C C C C C C Archive Storage Nodes

Data Parallel File System? No archival storage and computing brought to data C Data C C C C C C C C C C C C C C C File1 Block1 Block2 BlockN …… Breakup Replicate each block File1 Block1 Block2 BlockN …… Breakup Replicate each block