Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL.

Slides:

Advertisements

Similar presentations

Multiple Processor Systems

Advertisements

CGrid 2005, slide 1 Empirical Evaluation of Shared Parallel Execution on Independently Scheduled Clusters Mala Ghanesh Satish Kumar Jaspal Subhlok University.

Resource Management §A resource can be a logical, such as a shared file, or physical, such as a CPU (a node of the distributed system). One of the functions.

Min Song 1, Yanxiao Zhao 1, Jun Wang 1, E. K. Park 2 1 Old Dominion University, USA 2 University of Missouri at Kansas City, USA IEEE ICC 2009 A High Throughput.

MADFS: The Mobile Agent- based Distributed Network File system Presented by : Hailong Hou Instructor: Yanqing Zhang.

LOAD BALANCING IN A CENTRALIZED DISTRIBUTED SYSTEM BY ANILA JAGANNATHAM ELENA HARRIS.

XENMON: QOS MONITORING AND PERFORMANCE PROFILING TOOL Diwaker Gupta, Rob Gardner, Ludmila Cherkasova 1.

Scalable Content-aware Request Distribution in Cluster-based Network Servers Jianbin Wei 10/4/2001.

Locality Aware Dynamic Load Management for Massively Multiplayer Games Jin Chen, Baohua Wu, Margaret Delap, Bjorn Knutson, Honghui Lu and Cristina Amza.

Gueyoung Jung, Nathan Gnanasambandam, and Tridib Mukherjee International Conference on Cloud Computing 2012.

A Parallel Computational Model for Heterogeneous Clusters Jose Luis Bosque, Luis Pastor, IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO.

Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.

Design and Performance Evaluation of Queue-and-Rate-Adjustment Dynamic Load Balancing Policies for Distributed Networks Zeng Zeng, Bharadwaj, IEEE TRASACTION.

Improving Proxy Cache Performance: Analysis of Three Replacement Policies Dilley, J.; Arlitt, M. A journal paper of IEEE Internet Computing, Volume: 3.

Efficient, Proximity-Aware Load Balancing for DHT-Based P2P Systems Yingwu Zhu, Yiming Hu Appeared on IEEE Trans. on Parallel and Distributed Systems,

Dynamic Load Balancing Experiments in a Grid Vrije Universiteit Amsterdam, The Netherlands CWI Amsterdam, The

Exploiting Content Localities for Efficient Search in P2P Systems Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang 1 1 College of William and Mary,

Design, Implementation, and Evaluation of Differentiated Caching Services Ying Lu, Tarek F. Abdelzaher, Avneesh Saxena IEEE TRASACTION ON PARALLEL AND.

12006/9/26 Load Balancing in Dynamic Structured P2P Systems Brighten Godfrey, Karthik Lakshminarayanan, Sonesh Surana, Richard Karp, Ion Stoica INFOCOM.

A Scheduling-based Routing Network Architecture Omar Y. Tahboub & Javed I. Khan Multimedia & Communication Networks Research Lab (MediaNet) Kent State.

A Multi-Agent Learning Approach to Online Distributed Resource Allocation Chongjie Zhang Victor Lesser Prashant Shenoy Computer Science Department University.

Peer-to-peer Multimedia Streaming and Caching Service by Won J. Jeon and Klara Nahrstedt University of Illinois at Urbana-Champaign, Urbana, USA.

Redundant Parallel File Transfer with Anticipative Adjustment Mechanism in Data Grids Chao-Tung Yang, Yao-Chun Chi, Chun-Pin Fu, High-Performance Computing.

Distributed Process Management1 Learning Objectives Distributed Scheduling Algorithms Coordinator Elections Orphan Processes.

MULTICOMPUTER 1. MULTICOMPUTER, YANG DIPELAJARI Multiprocessors vs multicomputers Interconnection topologies Switching schemes Communication with messages.

CoNA : Dynamic Application Mapping for Congestion Reduction in Many-Core Systems 2012 IEEE 30th International Conference on Computer Design (ICCD) M. Fattah,

Research on cloud computing application in the peer-to-peer based video-on-demand systems Speaker : 吳靖緯 MA0G rd International Workshop.

Designing Efficient Systems Services and Primitives for Next-Generation Data-Centers K. Vaidyanathan, S. Narravula, P. Balaji and D. K. Panda Network Based.

Simulation of Memory Management Using Paging Mechanism in Operating Systems Tarek M. Sobh and Yanchun Liu Presented by: Bei Wang University of Bridgeport.

A Lightweight Platform for Integration of Resource Limited Devices into Pervasive Grids Stavros Isaiadis and Vladimir Getov University of Westminster

Software Pipelining for Stream Programs on Resource Constrained Multi-core Architectures IEEE TRANSACTIONS ON PARALLEL AND DISTRIBUTED SYSTEM 2012 Authors:

An Autonomic Framework in Cloud Environment Jiedan Zhu Advisor: Prof. Gagan Agrawal.

南台科技大學資訊工程系 A web page usage prediction scheme using sequence indexing and clustering techniques Adviser: Yu-Chiang Li Speaker: Gung-Shian Lin Date:2010/10/15.

1 Optimal Resource Placement in Structured Peer-to-Peer Networks Authors: W. Rao, L. Chen, A.W.-C. Fu, G. Wang Source: IEEE Transactions on Parallel and.

Collective Buffering: Improving Parallel I/O Performance By Bill Nitzberg and Virginia Lo.

A Distributed Paging RAM Grid System for Wide-Area Memory Sharing Rui Chu, Yongzhen Zhuang, Nong Xiao, Yunhao Liu, and Xicheng Lu Reporter : Min-Jyun Chen.

1 Challenge the future KOALA-C: A Task Allocator for Integrated Multicluster and Multicloud Environments Presenter: Lipu Fei Authors: Lipu Fei, Bogdan.

1 Multiprocessor and Real-Time Scheduling Chapter 10 Real-Time scheduling will be covered in SYSC3303.

Company name KUAS HPDS dRamDisk: Efficient RAM Sharing on a Commodity Cluster Vassil Roussev, Golden G. Richard Reporter :

Eneryg Efficiency for MapReduce Workloads: An Indepth Study Boliang Feng Renmin University of China Dec 19.

Swapping to Remote Memory over InfiniBand: An Approach using a High Performance Network Block Device Shuang LiangRanjit NoronhaDhabaleswar K. Panda IEEE.

The Owner Share scheduler for a distributed system 2009 International Conference on Parallel Processing Workshops Reporter: 李長霖.

Distributed Computing CSC 345 – Operating Systems By - Fure Unukpo 1 Saturday, April 26, 2014.

1 Advanced Behavioral Model Part 1: Processes and Threads Part 2: Time and Space Chapter22~23 Speaker: 陳奕全 Real-time and Embedded System Lab 10 Oct.

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

1 Coscheduling in Clusters: Is it a Viable Alternative? Gyu Sang Choi, Jin-Ha Kim, Deniz Ersoz, Andy B. Yoo, Chita R. Das Presented by: Richard Huang.

Efficient P2P Search by Exploiting Localities in Peer Community and Individual Peers A DISC’04 paper Lei Guo 1 Song Jiang 2 Li Xiao 3 and Xiaodong Zhang.

A P2P-Based Architecture for Secure Software Delivery Using Volunteer Assistance Purvi Shah, Jehan-François Pâris, Jeffrey Morgan and John Schettino IEEE.

DynamicMR: A Dynamic Slot Allocation Optimization Framework for MapReduce Clusters Nanyang Technological University Shanjiang Tang, Bu-Sung Lee, Bingsheng.

Design Issues of Prefetching Strategies for Heterogeneous Software DSM Author :Ssu-Hsuan Lu, Chien-Lung Chou, Kuang-Jui Wang, Hsiao-Hsi Wang, and Kuan-Ching.

6.894: Distributed Operating System Engineering Lecturers: Frans Kaashoek Robert Morris

Memory Coherence in Shared Virtual Memory System ACM Transactions on Computer Science(TOCS), 1989 KAI LI Princeton University PAUL HUDAK Yale University.

A Two-phase Execution Engine of Reduce Tasks In Hadoop MapReduce XiaohongZhang*GuoweiWang* ZijingYang*YangDing School of Computer Science and Technology.

Fen Hou 、 Lin X. Cai, University of Waterloo Xuemin Shen, Rutgers University Jianwei Huang, Northwestern University IEEE TRANSACTIONS ON VEHICULAR TECHNOLOGY,

A Two-Tier Heterogeneous Mobile Ad Hoc Network Architecture and Its Load-Balance Routing Problem C.-F. Huang, H.-W. Lee, and Y.-C. Tseng Department of.

Zeta: Scheduling Interactive Services with Partial Execution Yuxiong He, Sameh Elnikety, James Larus, Chenyu Yan Microsoft Research and Microsoft Bing.

Cluster computing. 1.What is cluster computing? 2.Need of cluster computing. 3.Architecture 4.Applications of cluster computing 5.Advantages of cluster.

COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques Dr. Xiao Qin Auburn University

System Models Advanced Operating Systems Nael Abu-halaweh.

PACMan: Coordinated Memory Caching for Parallel Jobs Ganesh Ananthanarayanan, Ali Ghodsi, Andrew Wang, Dhruba Borthakur, Srikanth Kandula, Scott Shenker,

Spark on Entropy : A Reliable & Efficient Scheduler for Low-latency Parallel Jobs in Heterogeneous Cloud Huankai Chen PhD Student at University of Kent.

Presenter: Yue Zhu, Linghan Zhang A Novel Approach to Improving the Efficiency of Storing and Accessing Small Files on Hadoop: a Case Study by PowerPoint.

A Practical Performance Analysis of Stream Reuse Techniques in Peer-to-Peer VoD Systems Leonardo B. Pinho and Claudio L. Amorim Parallel Computing Laboratory.

INTRODUCTION TO HIGH PERFORMANCE COMPUTING AND TERMINOLOGY.

Optimizing Distributed Actor Systems for Dynamic Interactive Services

Distributed Network Traffic Feature Extraction for a Real-time IDS

PA an Coordinated Memory Caching for Parallel Jobs

Auburn University COMP7500 Advanced Operating Systems I/O-Aware Load Balancing Techniques (2) Dr. Xiao Qin Auburn University.

Cluster Resource Management: A Scalable Approach

An Implementation of User-level Distributed Shared Memory

Presentation transcript:

Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL AND DISTRIBUTED SYSTEM, VOL. 17, NO. 1 JANUARY 2006 Presented by 張肇烜

Outline  Introduction  Design Rationale  Methodology  Results  Analysis  Conclusions

Introduction  Large scientific parallel applications demand large amounts of memory space.  Uniform use of resources at the parallel process level does not necessarily mean the system itself is evenly utilized.  To ensure good CPU utilization, we must limit the number of processes in a parallel process.

Introduction (cont.)  The key problem is memory usage and this problem has two parts: –Memory fragmentation –Paging overhead  Network RAM has been proposed for use by sequential jobs in clusters to even memory load and reduce paging overhead.

Introduction (cont.)  Existing network RAM techniques should not be directly applied to parallel jobs. –Processes from the same parallel job synchronize regularly. –Network congestion.

Introduction (cont.)  We propose a new peer-to-peer solution called Parallel Network Ram (PNR) that allows overloaded cluster nodes to utilize idle remote memory.  Each node contacts a manager node and requests that it allocate network RAM on its behalf.

Design Rationale  Diagram of Parallel Network-Ram. Application 2 is assigned to nodes of P3, P4, and P5, but utilizes the available memory spaces in other nodes, such as P2, P6, P7.

Design Rationale (cont.)  We propose a novel and effective technique called Parallel Network RAM (PNR).  PNR does not coordinate with or receive information from the assumed centralized scheduler of the system.  Managers act as proxies for clients to communicate with servers.

Design Rationale (cont.)  We propose four different PNR designs. –Centralized PNR design (CEN) –Client-only PNR design (CLI) –Local manager PNR design (MAN) –Backbone PNR Design (BB)

Design Rationale (cont.)  Centralized PNR design (CEN) : It coordinates all client requests.

Design Rationale (cont.)  Client-only PNR design (CLI)

Design Rationale (cont.)  Local manager PNR design (MAN) : The node volunteer to act as the manager.

Design Rationale (cont.)  Backbone PNR Design (BB) :

Methodology  We use a large trace collected from the CM- 5 parallel platform at the Los Alamos National Lab.  To directly compare DP to the various PNR designs, we create another metric based on average response time (R) :

Methodology (cont.)  Experimental setup:

Results  Base experiment-64 nodes and 4000 jobs:

Results (cont.)  Base experiment-128 nodes and 4000 jobs:

Results (cont.)  Base experiment-64 nodes and 5000 jobs:

Results (cont.)  Base experiment-128 nodes and 5000 jobs:

Results (cont.)  RAM experiments-64 nodes and 4000 jobs:

Results (cont.)  RAM experiments-128 nodes and 4000 jobs:

Results (cont.)  RAM experiments-64 nodes and 5000 jobs:

Results (cont.)  RAM experiments-128 nodes and 5000 jobs:

Results (cont.)  Space sharing experiments-64 nodes and 4000 jobs:

Results (cont.)  Space sharing experiments-128 nodes and 4000 jobs:

Results (cont.)

Analysis  PNR is very sensitive to network performance.  The main limiting factor on the space sharing system is network RAM allocation coordination.  Under light load, CLI is the best choice for a space sharing system.  CLI does surprisingly well in certain situations, if RAM is plentiful.

Analysis (cont.)  When a high-performance network is available, PNR can produce pronounced performance gains.  For heavily loaded systems, PNR can significantly reduce the response time of jobs as compared to DP.

Conclusion  In this paper, we identified a novel way of reducing page fault service time and average response time in a cluster system running parallel processes.  We proposed several different PNR designs and evaluated the performance of each under different condition.