Introduction to Data Center Computing Derek Murray October 2010.

Introduction to Data Center Computing Derek Murray October 2010

What we’ll cover Techniques for handling “big data” –Distributed storage –Distributed computation Focus on recent papers describing real systems

Example: web search WWW CrawlingIndexingQuerying

A system architecture? Computers Network Storage

Data Center architecture Server Rack switch Server Rack switch Server Rack switch Core switch

Distributed storage High volume of data High volume of read/write requests Fault tolerance

Brewer’s CAP theorem (2000) Consistency Partition Tolerance Availability

The Google file system (2003) GFS Master Chunk server Client

Dynamo (2007) Client

Distributed computation Parallel distributed processing Single Program, Multiple Data (SPMD) Fault tolerance Applications

Task farming Master Worker Storage

MapReduce (2004)

Dryad (2007) Arbitrary directed acyclic graph (DAG) Vertices and channels Topological ordering

DryadLINQ (2008) Language Integrated Query (LINQ) var table = PartitionedTable.Get (“…”); var result = from x in table select x * x; int sumSquares = result.Sum();

Scheduling issues Heterogeneous performance Sharing a cluster fairly Data locality

Percolator (2010) Built on Google BigTable Transactions via snapshot isolation Per-column notifications (triggers)

Skywriting and C IEL (2010) Universal distributed execution engine Script language for distributed programs Opportunities for student projects…

References Storage –Ghemawat et al., “The Google File System”, Proceedings of SOSP 2003 –DeCandia et al., “Dynamo: Amazon’s Highly-Available Key-value Store”, Proceedings of SOSP 2007 Computation –Dean and Ghemawat, “MapReduce: Simplified Data Processing on Large Clusters”, Proceedings of OSDI 2004 –Isard et al., “Dryad: Distributed Data-Parallel Programs from Sequential Building Blocks”, Proceedings of EuroSys 2007 –Yu et al., “DryadLINQ: A System for General-Purpose Distributed Data-Parallel Computing Using a High-Level Language”, Proceedings of OSDI 2008 –Olston et al., “Pig Latin: A Not-So-Foreign Language for Data Processing”, Proceedings of SIGMOD 2008 –Murray and Hand, “Scripting the Cloud with Skywriting”, Proceedings of HotCloud 2010 Scheduling –Zaharia et al., “Improving MapReduce Performance in Heterogeneous Environments”, Proceedings of OSDI 2008 –Isard et al., “Quincy: Fair Scheduling for Distributed Computing Clusters”, Proceedings of SOSP 2009 –Zaharia et al., “Delay Scheduling: A Simple Technique for Achieving Locality and Fairness in Cluster Scheduling”, Proceedings of EuroSys 2010 Transactions –Peng and Dabek, “Large-Scale Incremental Processing using Distributed Transactions and Notifications”, Proceedings of OSDI 2010

Conclusions Data centers achieve high performance with commodity parts Efficient storage requires application- specific trade-offs Data-parallelism simplifies distributed computation on the data

Questions Now or after the lecture –Email Derek.Murray@cl.cam.ac.uk –Web http://www.cl.cam.ac.uk/~dgm36/

Introduction to Data Center Computing Derek Murray October 2010.

Similar presentations

Presentation on theme: "Introduction to Data Center Computing Derek Murray October 2010."— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Introduction to Data Center Computing Derek Murray October 2010.

Similar presentations

Presentation on theme: "Introduction to Data Center Computing Derek Murray October 2010."— Presentation transcript:

Similar presentations

About project

Feedback