Learning Google foster@hf.webex.com.

Slides:



Advertisements
Similar presentations
MAP REDUCE PROGRAMMING Dr G Sudha Sadasivam. Map - reduce sort/merge based distributed processing Best for batch- oriented processing Sort/merge is primitive.
Advertisements

Large Scale Computing Systems
Nikolay Tomitov Technical Trainer SoftAcad.bg.  What are Amazon Web services (AWS) ?  What’s cool when developing with AWS ?  Architecture of AWS 
INTRODUCTION TO CLOUD COMPUTING Cs 595 Lecture 5 2/11/2015.
Google App Engine and Java Application: Clustering Internet search results for a person Aleksandar Kartelj Faculty of Mathematics,
Platform as a Service (PaaS)
Google AppEngine. Google App Engine enables you to build and host web apps on the same systems that power Google applications. App Engine offers fast.
Google App Engine Danail Alexiev Technical Trainer SoftAcad.bg.
Introduction to Parallel Programming MapReduce Except where otherwise noted all portions of this work are Copyright (c) 2007 Google and are licensed under.
USING HADOOP & HBASE TO BUILD CONTENT RELEVANCE & PERSONALIZATION Tools to build your big data application Ameya Kanitkar.
MapReduce April 2012 Extract from various presentations: Sudarshan, Chungnam, Teradata Aster, …
Map Reduce and Hadoop S. Sudarshan, IIT Bombay
CS525: Special Topics in DBs Large-Scale Data Management Hadoop/MapReduce Computing Paradigm Spring 2013 WPI, Mohamed Eltabakh 1.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
MapReduce – An overview Medha Atre (May 7, 2008) Dept of Computer Science Rensselaer Polytechnic Institute.
W HAT IS H ADOOP ? Hadoop is an open-source software framework for storing and processing big data in a distributed fashion on large clusters of commodity.
Hadoop/MapReduce Computing Paradigm 1 Shirish Agale.
Introduction to Hadoop and HDFS
f ACT s  Data intensive applications with Petabytes of data  Web pages billion web pages x 20KB = 400+ terabytes  One computer can read
Introduction to Hbase. Agenda  What is Hbase  About RDBMS  Overview of Hbase  Why Hbase instead of RDBMS  Architecture of Hbase  Hbase interface.
GIS in the cloud: implementing a Web Map Service on Google App Engine Jon Blower Reading e-Science Centre University of Reading United Kingdom
CS525: Big Data Analytics MapReduce Computing Paradigm & Apache Hadoop Open Source Fall 2013 Elke A. Rundensteiner 1.
Paperless Timesheet Management Project Anant Pednekar.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
Hadoop/MapReduce Computing Paradigm 1 CS525: Special Topics in DBs Large-Scale Data Management Presented By Kelly Technologies
Information Systems in Organizations 5.2 Cloud Computing.
{ Tanya Chaturvedi MBA(ISM) Hadoop is a software framework for distributed processing of large datasets across large clusters of computers.
Next Generation of Apache Hadoop MapReduce Owen
Cofax Scalability Document Version Scaling Cofax in General The scalability of Cofax is directly related to the system software, hardware and network.
Lecture 4. MapReduce Instructor: Weidong Shi (Larry), PhD
Introduction to Oracle Forms Developer and Oracle Forms Services
Platform as a Service (PaaS)
Data Services for Service Oriented Architecture in Finance
  Choice Hotels’ journey to better understand its customers through self-service analytics Narasimhan Sampath & Avinash Ramineni Strata Hadoop World |
DEPARTMENT OF COMPUTER SCIENCE AND ENGINEERING CLOUD COMPUTING
Platform as a Service (PaaS)
SAS users meeting in Halifax
Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
Platform as a Service (PaaS)
Hadoop Aakash Kag What Why How 1.
Software Systems Development
Advanced Topics in Concurrency and Reactive Programming: Case Study – Google Cluster Majeed Kassis.
Lecture 20: WSC, Datacenters
Introduction to Oracle Forms Developer and Oracle Forms Services
CS122B: Projects in Databases and Web Applications Winter 2017
Large-scale file systems and Map-Reduce
Introduction to Oracle Forms Developer and Oracle Forms Services
NOSQL.
Dr.S.Sridhar, Director, RVCT, RVCE, Bangalore
The Improvement of PaaS Platform ZENG Shu-Qing, Xu Jie-Bin 2010 First International Conference on Networking and Distributed Computing SQUARE.
Hadoop Clusters Tess Fulkerson.
Google and Cloud Computing
Big Data Programming: an Introduction
Database Applications (15-415) Hadoop Lecture 26, April 19, 2016
The University of Adelaide, School of Computer Science
Google App Engine Danail Alexiev
Introduction to PIG, HIVE, HBASE & ZOOKEEPER
Lecture 18 Warehouse Scale Computing
February 26th – Map/Reduce
Cse 344 May 4th – Map/Reduce.
Hadoop Technopoints.
Introduction to Apache
Distributed System Gang Wu Spring,2018.
Internet and Web Simple client-server model
CS 345A Data Mining MapReduce This presentation has been altered.
Let's make a complex dataset simple using Azure Cosmos DB
Lecture 18 Warehouse Scale Computing
Lecture 18 Warehouse Scale Computing
Apache Hadoop and Spark
Copyright © JanBask Training. All rights reserved Get Started with Hadoop Hive HiveQL Languages.
Presentation transcript:

Learning Google foster@hf.webex.com

Reference http://zh.wikipedia.org/zh/MapReduce Engineering Case Study training by Peter Xiao and Stanley Huang: https://go.webex.com/go/ldr.php?AT=pb&SP=MC&rID=4363322&rKey=5c29baa6dc43ce4c http://zh.wikipedia.org/zh/MapReduce http://code.google.com/appengine/ http://www.codechina.org/doc/google/gfs-paper/guarantees-by-gfs.html http://royal.pingdom.com/2008/04/11/map-of-all-google-data-center-locations/

Agenda Google Overview Data Center Google File System Map Reduce Big Table Google App Engine -Demo

Google Overview Why named google: typo from word "googolplex" , mathematical term for a 1 followed by 100 zeros Mission:Organize the world’s information and make it universally accessible and useful Infrastructure: Three layer stack

Google Data Center - Overview Total of 35 data centers globally 19 in US 12 in Europe 3 in Asia 1 in South America. 1 in Russia

Google Data Center – Cost and Scale According to Google’s earning report, 1.9 billion for 2006 2.4 billion for 2007 4 data centers proposed in 2007, each costs $600 million Power consumption 50-100+ Mega Watts per major data center Size No standard physical size Google’s data center at Dalles,Oregon(俄勒冈州 ) 2000 square foot administration building 1600 square foot “transient employee dormitory” 1800 square foot facility for cooling towers (Estimate of power consumption is 103 mage watts)

Google Data Center – Interior & Exterior Google data center at Dalles,Oregon

Google Data Center – Hardware and Software Google customizes commodity hardware to minimize energy consumption Google web servers Google Ethernet switches Google builds in-house software to achieve high performance and scalability Google web servers (GWS) Google Front End (WFE) Google File System (GFS) Google MapReducer Google BigTable

Google Data Center – Service Availability All 32 data centers reached 4 nines of uptime

Google Data Center vs Cisco/CSG’s

Architecture Foundation - overview Architecture serve the Mission organize the world’s information and make it universally accessible and useful Storage For Raw data Google file system Distributed On Thousands of Machines on Tans of Data Centers Backend computing K-V Relational Data storage Map Reduce Big Table

Architecture Foundation - GFS Background Typical way to store persistent file data; Local Disk, NFS, Storage GFS is a scalable(~100TB) distributed file system. on top of thousands of machine Goal: Performance and Scalability for Large file and Concurrent visit Workflow

Architecture Foundation – GFS –Cont’ Dive in further Hadoop: Java based open-source software for reliable, scalable, distributed computing. HDFS: Hadoop File System, a similar distributed file system as GFS GFS is good in handing large file with Appending(no random editing) write and tons of read KFS:(KOSMOS DISTRIBUTED FILE SYSTEM),一个类似GFS、Hadoop中HDFS 的一个开源的分布式文件系统 Where we are Distributed storage concept has been used in Queue&Dispatch service(WAPI2.0) and Search Farm

Architecture Foundation - MapReduce Background Typical way to do computing:local CPU, parallel computing in application level MapReduce is a programming Model for computing large data set by distribution Goal:computing Terabytes of data on thousands of machines for performance Example: Google Page rank; 1 Terabytes file, Calculate count of every word Workflow Pseudo Code Map(Stirng key,String value) //key: document name //Value: document contents For each word w in value Emitlntermediate(w,”1”); Reduce(String key,Merator values) //key: a word //value: a list of counts Int result=0; For each v in values; Result += ParseInt(v); Emit(AsString(result));

Architecture Foundation – MR – cont’ Dive in further MapReduce is good for Simple large computing work Hadoop map reduce provide similar functionality Where we are Some computing happened in Oracle DB layer Many computing happened parallel in application layer, example: Search Farm, Activity Server etc.

Architecture Foundation – BigTable Background BigTable is a scalable, distributed, multi-dimensional K-V store Goal: High performance One way search in large data volume Example: Google earth, grab all geographic data based on location Workflow

Architecture Foundation – BigTable – cont’ Dive in further Cassandra is Open Source implementation for Big Table concept Data Store design is all about: CAP (Data Consistency, Availability, Partition) Which one you focus? Depend on what value/user experience you try to provide. Where we are Cassandra with modification has been used in WAPI 2.0 for User Wall/Feed Memcached used from WAP2.0

Google App Engine – Development and deployment deom An experienced developer can develop and deploy a “Hello word” application to App Engine within 1-2 hours Create and App Engine account Download App Engine SDK or Eclipse with App Engine plug-in Develop “Hello World” application Deploy application Access “Hello World” Application via http://appID.appspot.com/appname

Thanks!