Camdoop Exploiting In-network Aggregation for Big Data Applications

Slides:



Advertisements
Similar presentations
Symbiotic Routing in Future Data Centers Hussam Abu-Libdeh, Paolo Costa, Antony Rowstron, Greg OShea, Austin Donnelly Cornell University Microsoft Research.
Advertisements

MapReduce Online Tyson Condie UC Berkeley Slides by Kaixiang MO
MapReduce.
LIBRA: Lightweight Data Skew Mitigation in MapReduce
MapReduce Online Created by: Rajesh Gadipuuri Modified by: Ying Lu.
Developing a MapReduce Application – packet dissection.
Web Server Benchmarking Using the Internet Protocol Traffic and Network Emulator Carey Williamson, Rob Simmonds, Martin Arlitt et al. University of Calgary.
SplitStream: High- Bandwidth Multicast in Cooperative Environments Monica Tudora.
Spark: Cluster Computing with Working Sets
Towards Virtual Routers as a Service 6th GI/ITG KuVS Workshop on “Future Internet” November 22, 2010 Hannover Zdravko Bozakov.
Cross-Layer Scheduling in Cloud Systems Hilfi Alkaff, Indranil Gupta, Luke Leslie Department of Computer Science University of Illinois at Urbana-Champaign.
Scribe: A Large-Scale and Decentralized Application-Level Multicast Infrastructure Miguel Castro, Peter Druschel, Anne-Marie Kermarrec, and Antony L. T.
Distributed Computations
Google’s Map Reduce. Commodity Clusters Web data sets can be very large – Tens to hundreds of terabytes Cannot mine on a single server Standard architecture.
Camdoop: Exploiting In-network Aggregation for Big Data Applications Paolo Costa, Austin Donnelly, Antony Rowstron, Greg O’Shea Presenter – Manoj Kumar(mkumar11)
Copyright © 2012 Cleversafe, Inc. All rights reserved. 1 Combining the Power of Hadoop with Object-Based Dispersed Storage.
Hadoop Team: Role of Hadoop in the IDEAL Project ●Jose Cadena ●Chengyuan Wen ●Mengsu Chen CS5604 Spring 2015 Instructor: Dr. Edward Fox.
Advanced Topics: MapReduce ECE 454 Computer Systems Programming Topics: Reductions Implemented in Distributed Frameworks Distributed Key-Value Stores Hadoop.
U.S. Department of the Interior U.S. Geological Survey David V. Hill, Information Dynamics, Contractor to USGS/EROS 12/08/2011 Satellite Image Processing.
Edge Based Cloud Computing as a Feasible Network Paradigm(1/27) Edge-Based Cloud Computing as a Feasible Network Paradigm Joe Elizondo and Sam Palmer.
Network Support for Cloud Services Lixin Gao, UMass Amherst.
Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.
Google MapReduce Simplified Data Processing on Large Clusters Jeff Dean, Sanjay Ghemawat Google, Inc. Presented by Conroy Whitney 4 th year CS – Web Development.
Map Reduce for data-intensive computing (Some of the content is adapted from the original authors’ talk at OSDI 04)
Network Aware Resource Allocation in Distributed Clouds.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Copyright © 2011, Programming Your Network at Run-time for Big Data Applications 張晏誌 指導老師:王國禎 教授.
Storage in Big Data Systems
1 Fast Failure Recovery in Distributed Graph Processing Systems Yanyan Shen, Gang Chen, H.V. Jagadish, Wei Lu, Beng Chin Ooi, Bogdan Marius Tudor.
David G. Andersen CMU Guohui Wang, T. S. Eugene Ng Rice Michael Kaminsky, Dina Papagiannaki, Michael A. Kozuch, Michael Ryan Intel Labs Pittsburgh 1 c-Through:
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
1 Configurable Security for Scavenged Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh with: Samer Al-Kiswany, Matei Ripeanu.
MapReduce: Hadoop Implementation. Outline MapReduce overview Applications of MapReduce Hadoop overview.
The Only Constant is Change: Incorporating Time-Varying Bandwidth Reservations in Data Centers Di Xie, Ning Ding, Y. Charlie Hu, Ramana Kompella 1.
Sharing Information across Congestion Windows CSE222A Project Presentation March 15, 2005 Apurva Sharma.
MapReduce M/R slides adapted from those of Jeff Dean’s.
1 Flow Identification Assume you want to guarantee some type of quality of service (minimum bandwidth, maximum end-to-end delay) to a user Before you do.
Benchmarking MapReduce-Style Parallel Computing Randal E. Bryant Carnegie Mellon University.
MapReduce Kristof Bamps Wouter Deroey. Outline Problem overview MapReduce o overview o implementation o refinements o conclusion.
VL2: A Scalable and Flexible Data Center Network Albert Greenberg, James R. Hamilton, Navendu Jain, Srikanth Kandula, Changhoon Kim, Parantap Lahiri, David.
Grid Computing at Yahoo! Sameer Paranjpye Mahadev Konar Yahoo!
Optimization of Wavelength Assignment for QoS Multicast in WDM Networks Xiao-Hua Jia, Ding-Zhu Du, Xiao-Dong Hu, Man-Kei Lee, and Jun Gu, IEEE TRANSACTIONS.
1 Making MapReduce Scheduling Effective in Erasure-Coded Storage Clusters Runhui Li and Patrick P. C. Lee The Chinese University of Hong Kong LANMAN’15.
Networking Fundamentals. Basics Network – collection of nodes and links that cooperate for communication Nodes – computer systems –Internal (routers,
SECTION 5: PERFORMANCE CHRIS ZINGRAF. OVERVIEW: This section measures the performance of MapReduce on two computations, Grep and Sort. These programs.
MapReduce and the New Software Stack CHAPTER 2 1.
Symbiotic Routing in Future Data Centers Hussam Abu-Libdeh Paolo Costa Antony Rowstron Greg O’Shea Austin Donnelly MICROSOFT RESEARCH Presented By Deng.
Subways: A Case for Redundant, Inexpensive Data Center Edge Links Vincent Liu, Danyang Zhuo, Simon Peter, Arvind Krishnamurthy, Thomas Anderson University.
DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.
Yaping Zhu with: Jennifer Rexford (Princeton University) Aman Shaikh and Subhabrata Sen (ATT Research) Route Oracle: Where Have.
NETE4631 Network Information Systems (NISs): Big Data and Scaling in the Cloud Suronapee, PhD 1.
SYMBIOTIC ROUTING IN FUTURE DATA CENTER 工科三 陳泰穎. Outline 1. CamCube 1. Traditional data center 2. The problems 3. CamCube philosophy 4. Feature 5. What’s.
Interconnect Networks Basics. Generic parallel/distributed system architecture On-chip interconnects (manycore processor) Off-chip interconnects (clusters.
C-Store: MapReduce Jianlin Feng School of Software SUN YAT-SEN UNIVERSITY May. 22, 2009.
MapReduce & Hadoop IT332 Distributed Systems. Outline  MapReduce  Hadoop  Cloudera Hadoop  Tutorial 2.
MapReduce: Simplified Data Processing on Large Clusters By Dinesh Dharme.
Data Centers and Cloud Computing 1. 2 Data Centers 3.
CubicRing ENABLING ONE-HOP FAILURE DETECTION AND RECOVERY FOR DISTRIBUTED IN- MEMORY STORAGE SYSTEMS Yiming Zhang, Chuanxiong Guo, Dongsheng Li, Rui Chu,
EpiC: an Extensible and Scalable System for Processing Big Data Dawei Jiang, Gang Chen, Beng Chin Ooi, Kian Lee Tan, Sai Wu School of Computing, National.
Seth Pugsley, Jeffrey Jestes,
Data Center Network Architectures
Large-scale file systems and Map-Reduce
Auburn University COMP7330/7336 Advanced Parallel and Distributed Computing MapReduce - Introduction Dr. Xiao Qin Auburn.
MapReduce Simplied Data Processing on Large Clusters
The Basics of Apache Hadoop
湖南大学-信息科学与工程学院-计算机与科学系
Hadoop Basics.
Fast Reroute in P4: Keep Calm and Enjoy Programmability
CS639: Data Management for Data Science
5/7/2019 Map Reduce Map reduce.
Presentation transcript:

Camdoop Exploiting In-network Aggregation for Big Data Applications Paolo Costa joint work with Austin Donnelly, Antony Rowstron, and Greg O’Shea (MSR Cambridge) Presented by: Navid Hashemi

MapReduce Overview Map Shuffle Reduce Input file Intermediate results Map Task Final results Reduce Task Chunk 0 Chunk 1 Chunk 2 Map Task Reduce Task Map Task Reduce Task Map Processes input data and generates (key, value) pairs Shuffle Distributes the intermediate pairs to the reduce tasks Reduce Aggregates all values associated to each key Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Problem Shuffle phase is challenging for data center networks Input file Intermediate results Map Task Final results Reduce Task Split 0 Split 1 Split 2 Map Task Reduce Task Map Task Reduce Task Shuffle phase is challenging for data center networks Led to proposals for full-bisection bandwidth Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Data Reduction Final results Input file Intermediate results Map Task Reduce Task Split 0 Split 1 Split 2 Map Task Reduce Task Map Task Reduce Task The final results are typically much smaller than the intermediate results In most Facebook jobs the final size is 5.4 % of the intermediate size In mostYahoo jobs the ratio is 8.2 % Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Data Reduction Final results Input file Intermediate results Map Task Reduce Task Split 0 Split 1 Split 2 Map Task Reduce Task Map Task Reduce Task The final results are typically much smaller than the intermediate results How can we exploit this to reduce the traffic and improve the performance of the shuffle phase? Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Background: Combiners Input file Intermediate results Final results Map Task Combiner Reduce Task Split 0 Split 1 Split 2 Reduce Task Map Task Combiner Reduce Task To reduce the data transferred in the shuffle, users can specify a combiner function − Aggregates the local intermediate pairs • Server-side only => limited aggregation Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Distributed Combiners It has been proposed to use aggregation trees in MapReduce to perform multiple steps of combiners − e.g., rack-level aggregation [Yu et al., SOSP’09] Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Logical and Physical Topology What happens when we map the tree to a typical data center topology? ToR Switch Logical topology Physical topology Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Logical and Physical Topology What happens when we map the tree to a typical data center topology? Only 500 Mbps per child ToR Switch Logical topology Physical topology The server link is the bottleneck Full-bisection bandwidth does not help here Mismatch between physical and logical topology Two logical links are mapped onto the same physical link Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Logical and Physical Topology What happens when we map the tree to a typical data center topology? Only 500 Mbps per child ToR Switch Camdoop Goal Physical topology The server link is the bottleneck Perform the combiner functions within the network as opposed to application-level solutions Reduce shuffle time by aggregating packets on path Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

How Can We Perform In-network Processing? y We exploit CamCube Direct-connect topology 3D torus Uses no switches / routers for internal traffic z x Nodes have (x,y,z)coordinates This defines a key-space (=> key-based routing) Coordinates are locally re-mapped in case of failures Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

How Can We Perform In-network Processing? y We exploit CamCube Direct-connect topology 3D torus Uses no switches / routers for internal traffic Servers intercept, forward and process packets z x Nodes have (x,y,z)coordinates This defines a key-space (=> key-based routing) Coordinates are locally re-mapped in case of failures Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

How Can We Perform In-network Processing? (1,2,1) (1,2,2) y Nodes have (x,y,z)coordinates: This defines a key-space (=> key-based routing) Coordinates are locally re-mapped in case of failures z x Key Property No distinction between network and computation devices Servers can perform arbitrary packet processing on-path Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

How Can We Perform In-network Processing? (1,2,1) (1,2,2) y In CamCube: Keys are encoded using 160-bit identifier Each server responsible for subset of them Most significant bits are used to generate (x,y,z) If server unreachable => identifier map to its neighbors Remaining (160 - k) bits used to determine the neighbor z x

Mapping a tree… The 1 Gbps link becomes the bottleneck … on a switched topology The 1 Gbps link … on CamCube Packets are aggregated on path (=> less traffic) 1:1 mapping btw. logical and physical topology becomes the bottleneck 1/in-degree Gbps 1 Gbps Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Camdoop Design Goals No change in the programming model Exploit network locality Good server and link load distribution Fault-tolerance Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Design Goal #1 Programming Model Camdoop adopts the same MapReduce model GFS-like distributed file-system Each server runs map tasks on local chunks We use a spanning tree Combiners aggregate map tasks and children results (if any) and stream the results to the parents The root runs the reduce task and generates the final output Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Design Goal #2 Network locality How to map the tree nodes to servers? Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Design Goal #2 Network locality Simple rule: Map task outputs are always read from the local disk Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 20/52

Design Goal #2 Network locality (1,2,1) (1,2,2) (1,1,1) The parent-children are mapped on physical neighbors Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 21/52

This ensures maximum locality and optimizes network transfer Design Goal #2 Network locality (1,2,1) (1,2,2) (1,1,1) This ensures maximum locality and optimizes network transfer

Network Locality Logical View Physical View (3D Torus) One physical link is used by one and only one logical link Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Design Goal #3 Load Distribution Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Design Goal #3 Load Distribution Only 1 Gbps (instead of 6) Different in-degree Poor server load distribution Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Design Goal #3 Load Distribution Solution: stripe the data across disjoint trees Different links are used Improves load distribution

Solution: stripe the data across 6 disjoint trees Design Goal #3 Solution: stripe the data across 6 disjoint trees All links are used => (Up to) 6 GBps / server Good load distribution Paolo Costa

Design Goal #4 Fault-tolerance The tree is built in the coordinate space Link Failure: Using CamCube key-based routing service and shortest path reroute the packets along the new path Server Failure: CamCube remaps coordinates API notifies all the servers Parent of vertex v receives notification send control packet containing the last key to the newly mapped server Each child of v resends all (key, value) from last key onwards Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Evaluation Testbed Simulator 27-server CamCube (3 x 3 x 3) Quad-core Intel Xeon 5520 2.27 Ghz 12GB RAM 6 Intel PRO/1000 PT 1 Gbps ports Simulator Packet-level simulator (CPU overhead not modelled) 512-server (8x8x8) CamCube Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Evaluation Design and implementation recap Camdoop Shuffle & reduce parallelized  Reduce phase is parallelized with the shuffle phase Since all streams are ordered, as soon as the root receive at least one packet from all children, it can start the reduce function No need to store to disk the intermediate results on reduce servers Map Shuffle Map Shuffle Reduce Reduce Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Evaluation Design and implementation recap  Camdoop Shuffle & reduce parallelized  CamCube Six disjoint trees In-network aggregation Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Evaluation Design and implementation recap TCP Camdoop (switch) Shuffle & reduce parallelized  CamCube  Six disjoint trees In-network aggregation TCP Camdoop (switch) Map Shuffle Map Shuffle Reduce 27 CamCube servers attached to a ToR switch TCP is used to transfer data in the shuffle phase Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Evaluation Design and implementation recap TCP Camdoop (switch) Camdoop (no agg) Shuffle & reduce parallelized  CamCube  Six disjoint trees In-network aggregation Camdoop (no agg) (Non-commutative/ associative functions) Map Shuffle Map Shuffle Reduce TCP Camdoop (switch) 27 CamCube servers attached to a ToR switch TCP is used to transfer data in the shuffle phase Like Camdoop but without in-network aggregation Shows the impact of just running on CamCube Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Validation against Hadoop & Dryad Dryad/DryadLINQ Camdoop (no agg) Hadoop TCP Camdoop (switch) Sort and WordCount Same size intermediate and output data Significant aggregation Camdoop baselines are competitive against Hadoop and Dryad Several reasons: 1000 Worse 100 Time logscale (s) 10 1 Better Sort WordCount Shuffle and reduce parellized Fine-tuned implementation Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Validation against Hadoop We consider these as our baseline Validation against Hadoop Hadoop Dryad/DryadLINQ TCP Camdoop (switch) Camdoop (no agg) Sort and WordCount Camdoop baselines are competitive against Hadoop and Dryad Several reasons: 1000 Worse 100 Time logscale (s) 10 1 Better Sort WordCount Shuffle and reduce parellized Fine-tuned implementation Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Parameter Sweep Output size / intermediate size (S) S=1 (no aggregation) Every key is unique S=1/N ≈ 0 (full aggregation) Every key appears in all map task outputs Distribute the workload Intermediate data size is 22.2 GB (843 MB/server) Reduce tasks (R) R= 1 (all-to-one) E.g., Interactive queries, top-K jobs R=N (all-to-all) Common setup in MapReduce jobs N output files are generated Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

All-to-one (R=1) Worse Time (s) logscale TCP Camdoop (switch) 1000 Time (s) logscale 100 TCP Camdoop (switch) 10 Camdoop (no agg) Camdoop Better 1 Full aggregation Paolo Costa 0.2 0.4 0.6 0.8 1 Output size/ intermediate size (S) Camdoop: Exploiting In-network Aggregation for Big Data Applications

Impact of running on CamCube in-network aggregation Performance independent of S Impact of running on CamCube Worse 1000 Time (s) logscale 100 TCP Camdoop (switch) 10 Camdoop (no agg) Camdoop Better 1 Full aggregation Paolo Costa 0.2 0.4 0.6 0.8 1 Output size/ intermediate size (S) Camdoop: Exploiting In-network Aggregation for Big Data Applications

Impact of running on CamCube in-network aggregation Performance independent of S Impact of running on CamCube Worse 1000 Time (s) logscale 100 TCP Camdoop (switch) 10 Camdoop (no agg) Camdoop Better 1 Fa0c.e2book rep0o.4rted 0.6 0.8 1 Full aggregation Paolo Costa No aggregation 40/52 aggregation ratio Camdoop: Exploiting In-network Aggregation for Big Data Applications

All-to-all (R=27) Worse Time (s) logscale TCP Camdoop (switch) 100 Time (s) logscale 10 TCP Camdoop (switch) Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Better 1 Full aggregation Paolo Costa 0.2 0.4 0.6 0.8 1 No Output size / intermediate size (S) Camdoop: Exploiting In-network Aggregation for Big Data Applications

Impact of running on CamCube in-network aggregation 10 Maximum theoretical performance over the switch Time (s) logscale TCP Camdoop (switch) Camdoop (no agg) Camdoop Switch 1 Gbps (bound) Better 1 Full aggregation Paolo Costa 0.2 0.4 0.6 0.8 1 No Output size / intermediate size (S) Camdoop: Exploiting In-network Aggregation for Big Data Applications

Number of reduce tasks (S=0) Worse 1000 TCP Camdoop (switch) Camdoop (no agg) Camdoop 10.19 x 4.31 x Time (s) logscale 100 10 1.91 x 1.13 x Better 1 1 6 11 16 21 26 27 All-to-one All-to-all # reduce tasks (R) Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 43/52

Performance depends on R Worse R1d00o0es not (significantly) impact perTfCoPrmCaamncdeoop (switch) Camdoop (no agg) Camdoop 10.19 x 100 4.31 x 10 1.91 x Time (s) logscale 1.13 x Better 1 1 All-to-one 6 11 16 # reduce tasks (R) 21 26 27 All-to-all Implementation bPaooltotCloesntaeck Camdoop: Exploiting In-network Aggregation for Big Data Applications 44/52

Numberdoepfenrdseodn Ruce tasks (S=0) Performance Numberdoepfenrdseodn Ruce tasks (S=0) Worse R1d00o0es not (significantly) impact perTfCoPrmCaamncdeoop (switch) Camdoop (no agg) Camdoop 10.19 x 100 4.31 x 10 1.91 x Time (s) logscale 1.13 x Better 1 26 All resources are used in the aggregation phase even when R = 1 1 27 All-to-one All-to-all # reduce tasks (R) The number of output files just become a configuration parameter Paolo Costa

Behavior at scale (simulated) N=512, S= 0 Worse 100 Switch 1 Gbps (bound) Camdoop 10 512x Time (s) logscale 1 0.1 0.01 Better 0.001 1 All-to-one 4 16 64 # reduce tasks (R) logscale 256 512 All-to-all Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

ior at scale (simulated) This assumes B ehav ior at scale (simulated) N=512, S= 0 full-bisection bandwidth Worse 100 Switch 1 Gbps (bound) Camdoop 10 512x Time (s) logscale 1 0.1 0.01 Better 0.001 1 All-to-one 4 16 64 # reduce tasks (R) logscale 256 512 All-to-all Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Experiment of failure Without failure: With failure: A single edge in tree means a single one-hop link in CamCube With failure: A single edge becomes a multi-hop path in CamCube Graph shows the shuffle and reduce time normalized against T(time taken to run without failure) Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Typically, these services use Beyond MapReduce requests The partition-aggregate model also common in interactive services − e.g., Bing Search, Google Dremel Web server Cache Small-scale data (Good use-case for CamCube) − 10s to 100s of KB returned per server … Parent servers … Typically, these services use one reduce task (R=1) − Single result must be returned to the user Full aggregation is common (S ≈ 0) Leaf servers − There is a large opportunity for aggregation Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications

Small-scale data (R=1, S=0) Worse 60 TCP Camdoop (switch) Camdoop (no agg) Camdoop 50 40 Time (ms) 30 20 10 Better 0 20 KB 200 KB Input data size / server Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 50/52

Small-scale data (R=1, S=0) Worse 60 TCP Camdoop (switch) Camdoop (no agg) Camdoop 50 40 Time (ms) 30 20 10 Better 0 20 KB 200 KB In-network aggregation can be beneficial also for (small-scale data) interactive services

for data centers using a 3D torus Conclusions Camdoop Explores the benefits of in-network processing by running combiners within the network No change in the programming model Achieves lower shuffle and reduce time Decouples performance from the # of output files A final thought: how would Camdoop run on this? AMD SeaMicro – a 512-core cluster for data centers using a 3D torus Fast interconnect: 5 Gbps / link Paolo Costa Camdoop: Exploiting In-network Aggregation for Big Data Applications 52/52

Questions?