Lecture 0. Introduction Instructor: Weidong Shi (Larry), PhD COSC6376 Cloud Computing Lecture 0. Introduction Instructor: Weidong Shi (Larry), PhD Computer Science Department University of Houston
Topics Scope of the course Grading policy What do I need from you
Web Sites Class website: http://i2c.cs.uh.edu/class/fall2016-cloud
Scope of This Course Understand the basic ideas of cloud computing Get familiar with Tools Systems Expose to some research topics Complete a team project
Prerequisites Linux OS Some programming skills Java, python, ruby, shell scripting Comfortable with learning new programming frameworks Sufficient knowledge about Data structure and databases Operating systems Distributed systems
Tentative Schedule Parallel data processing Cloud infrastructures Distributed file systems (HDFS) MapReduce Cloud based databases High-level distributed data management Cloud infrastructures Virtualization Amazon AWS Interactive front-end – Google App Engine Open source cloud framework Research topics Resource provisioning Big data analytics and cloud Mobile services and cloud Cloud economics Privacy and security
Assignments Reading papers Some simple programming assignments Individual readings will be posted on the wiki Need to submit a short summary for some posted papers Libraries of papers http://www.citeulike.org/group/15533 Some simple programming assignments Help you master the concepts Learn to use tools and systems A team project
Course Grading 30% assignments 70% project 10% assigned reading summaries (due before class) 20% programming assignments 70% project 15% for report one 15% for report two 25% for final report 15% for final presentation
Course Project Teams of 2-3 people 3 milestones We encourage 3 people 3 milestones 1/3: Plan and design 2/3: Report progress, challenges, … Final Paper In class presentations: last weeks of the semester 1/3 and 2/3 presentations should be prepared
What do I need from you Your name Skillset relevant Expectations Your research or career goals
What is Cloud Computing?
Overview What is meant by Cloud Computing Utility Computing X as a Service Infrastructure as a Service Platform as a Service Software as a Service Why do corporations need to pay attention Applications
What is Cloud Computing? Old idea: Software as a Service (SaaS) Def: delivering applications over the Internet Recently: “[Hardware, Infrastrucuture, Platform] as a service” Utility Computing: pay-as-you-use computing Illusion of infinite resources No up-front cost Fine-grained billing (e.g. hourly)
Why need Cloud Computing? Traditional licensed software Software as service With fixed money, you buy all the bells and whistles with using it or not often. Pay cash up front. Like leasing a car, you use it but can not make any significant changes. Pay according to the distance you traveled.
Hardware Views NO NEED! Effect foundry on hardware Cloud computing on companies Only companies like Intel and Samsung can own fabrication lines. Foundries enable “fab-less” semiconductor chip companies. Large companies amortize operational costs. Similarly, datacenter providers offer service for datacenter-less companies. NO NEED!
Why need Cloud Computing? Public Cloud: available in a pay-as-you-go way. e.g. Amazon Web Services, Google Engine, and Microsoft Azure. Private: not available to the public such as internal datacenters of a business or other organization. Advantages Service providers: simplified software installation and centralized control. End users: access the services and share data easily. Store data safely. Application providers: same as foundries to chip companies.
Cloud Computing vs. Grid Computing Cloud computing = virtualization+ grid + services + utility computing Grid computing: resource provisioning, load balancing, parallel processing Views of different users System admin/hadoop users: grid Application owners/service users: service, utility
Google Trends
Google Trends
Gartner’s 2011 Hype Cycle
Gartner’s 2011 Hype Cycle
Driving Forces Behind Cloud Experience with very large datacenters – profitable for cloud providers Economics of scale Pervasive broadband Internet Fast x86 virtualization Pay-as-you-go billing model Large user base Online payment Online Ads Content distribution Web 2.0 lowers the entry point to e-business more small e-business owners Large user base of clouds
Economics of Scale How many servers does Google has?
Google Server Count
Who Owns The Most Servers
Perils of Corporate Computing Own information systems However Capital investment Heavy fixed costs Redundant expenditures High energy cost, low CPU utilization Dealing with unreliable hardware High-levels of overcapacity (Technology and Labor) NOT SUSTAINABLE
Google: CPU Utilization Activity profile of a sample of 5,000 Google Servers over a period of 6 months
Google Server Farms (Oregon)
A central cooling plant in Google's Douglas County, Georgia, data center. Photo: Google/Connie Zhou
Reading Assignment
Cloud Characteristics On-demand self-service Ubiquitous network access Location independent resource pooling Rapid elasticity Pay per use
Delivery Models Software as a Service (SaaS) Use provider’s applications over a network SalesForce.com Platform as a Service (PaaS) Deploy customer-created applications to a cloud AppEng Infrastructure as a Service (IaaS) Rent processing, storage, network capacity, and other fundamental computing resources EC2, S3
Software Stack Mobile (Android), Thin client (Zonbu) Thick client (Google Chrome) Clients Identity, Integration Payments, Mapping, Search, Video Games, Chat Services Application Peer-to-peer (Bittorrent), Web app (twitter), SaaS (Google Apps, SAP) Platform Java Google Web Toolkit, Django, Ruby on Rails, .NET Storage S3, Nirvanix, Rackspace Cloud Files, Savvis, Infrastructure Full virtualization (GoGrid), Management (RightScale), Compute (EC2), Platform (Force.com)
Cloud Killer Apps Mobile and web applications Parallel batch processing / MapReduce Mobile apps Data analytics OLAP, data mining, machine learning Extensions of desktop software Matlab, Mathematica
Big Data Every Where! Lots of data is being collected and warehoused Web data, e-commerce purchases at department/ grocery stores Bank/Credit Card transactions Social Network The rise of analytics understanding customers, supply chains, buying habits, ranking, and so on Cloud based big data analytics Computation produces small data output containing a high density of information Implemented in Clouds
How Much Data? Google processes 20 PB a day (2008) Twitter generates approximately 12 TB of data per day New York Stock Exchange 1TB of data everyday eBay processes 50 PB of data a day
Facebook 2.5 billion content items shared per day (status updates + wall posts + photos + videos + comments) 2.7 billion Likes per day 300 million photos uploaded per day 500+terabytes of new data ingested into the databases every day
Topics Covered
Topics Covered Economics of cloud computing Tools to create your own cloud infrastructure Public cloud AWS, Google Engine Big data analytics using cloud Large scale services using cloud Resource management Security and privacy
Companies Are Afraid to Use Clouds [Chow09ccsw]
Privacy
Dropbox Security
Internet of Things (IoT) A popular Google Play app, Camera360 Ultimate, has been found to inadvertently leak sensitive data. This gives malicious parties unauthorized access to users’ Camera360 Cloud accounts and photos