Download presentation
Presentation is loading. Please wait.
Published byBryan Barber Modified over 9 years ago
1
Advanced Topics in Distributed Systems Fall 2011 Instructor: Costin Raiciu
2
We’ve gotten used to great applications
3
Enabling Such Apps is Hard Apps – Process huge amounts of data – Are fast – Are reliable One machine is not enough – Limited reliability, speed Super computers are expensive
4
What Makes These Applications Tick?
5
Distributed Systems
6
Cares about technology relating to distributed systems: – Networks – Virtual machines – Distributed filesystems – Distributed computation We care about details, not about products – Why? This course…
7
Traditional Data Center Network Topology … Racks of servers Top of Rack Switches Aggregation Switches Core Switch 1Gbps 10Gbps
8
Fat Tree Topology [Fares et al., 2008; Clos, 1953] Aggregation Switches K Pods with K Switches each K=4 Racks of servers 1Gbps
9
Many operating systems running on a single box Provides: – Isolation – Flexibility – Better utilization of the machine Inside a Machine: Virtualization
10
How do we store data? Distributed filesystem – NFS: UNIX-like semantics Single server Limited scalability – Google File System Optimized for large-batch writes and sequential reads Tolerates inconsistency
11
How do we get work done? Map reduce – Apply the same function in parallel on different data on many machines – Aggregate results Useful for: – Building big web-search indices – Processing large amounts of data (PB)
12
This is just a taster
13
Course outline Distributed Apps we care about – Distributed Computation (Map Reduce, Driad, Hadoop) – Distributed Filesystems (NFS and GFS) – Web search – Caching (Memcached) – Distributed Hash Tables (Chord, Dynamo) – NoSQL databases (BigTable, Cassandra) Infrastructure: networks – Topologies: FatTree, VL2, Bcube – Using capacity: Hedera, MPTCP – Performance Optimizations: Incast, DCTCP
14
Course outline [2] Infrastructure: OS abstractions – Virtual Machines (Xen, VMM) – Distributed memory (Ivy) Security – Information Leakage – Good Isolation vs. High Utilization (Seawall, CloudPolice)
15
Course Admin Lectures: – 2 hours per week, Tuesday 8-10 EC102 Lab classes: – 2 hours per week, Tuesday 10-12 EG106 – Project discussions – Help with practical issues – Help with high level goals, theory Website: curs.cs.pub.ro – If you have problems, let me know
16
Grading Project: 5p – Groups of 3-4 students – 4 stages: to help you get the job done easily, without last minute work over Christmas Exam: 3p Presentation (1h): 1p Class participation: 1p
17
Presentation Select one topic before the end of October (list will be posted this week) – Presentation date is fixed – If you miss your presentation, you lose 2p Class participation – 2 papers presented per course by your colleagues – Read them before and take part in discussion
18
Exam Open book Need to understand and think – not memorize Studying 3 days before the exam won’t work – You need to take part in classes and read-up
19
Projects Large scale data processing with MapReduce – We will use Apache Hadoop – We will run code on Amazon EC2 (and maybe on local clusters) – Several datasets you can choose from
20
Datasets available Crawled set of HTML pages from.uk Wikipedia Page Traffic Statistics Apache Mail Archives Million Song Dataset M-Lab dataset: Network Path and Application Diagnosis tool Human genome US Census databases Freebase data dump
21
Stage 1 Choose dataset to use Select one/many questions to answer using the dataset Write small Hadoop script to parse a subset of the data Come up with a few simple graphs (e.g. dataset size, histograms) Start writing: Introduction to your report, problem statement Start the implementation and evaluation – Size of dataset, time to do one pass, etc. Strict deadline [1p]: November 1 st
22
Stage 2 How do we solve the problem? – Review related work – Select potential approaches Discuss pros/cons Implementation and evaluation – Implement the code – Run experiments – Refine code and reiterate Goal: 70% of functionality should be implemented Deadline [1p]: December 1 st – Output in report: Implementation section Early evaluation section
23
Stage 3 Final implementation Evaluation What did we learn? Deadline [1p]: December 21 th – In class project presentation: 10 mins
24
Stage 4 Write-up – Polish report – Create a coherent story – Convince me that this is useful Deadline to hand-in final report: last day of semester (January 14 th ) [1p]
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.