Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

Slides:



Advertisements
Similar presentations
Network Resource Broker for IPTV in Cloud Computing Lei Liang, Dan He University of Surrey, UK OGF 27, G2C Workshop 15 Oct 2009 Banff,
Advertisements

Sabyasachi Ghosh Mark Redekopp Murali Annavaram Ming-Hsieh Department of EE USC KnightShift: Enhancing Energy Efficiency by.
Energy Efficiency through Burstiness Athanasios E. Papathanasiou and Michael L. Scott University of Rochester, Computer Science Department Rochester, NY.
SLA-Oriented Resource Provisioning for Cloud Computing
Using Parallel Genetic Algorithm in a Predictive Job Scheduling
IoP HEPP 2004 Birmingham, 7/4/04 David Cameron, University of Glasgow 1 Simulation of Replica Optimisation Strategies for Data.
Energy-efficient Virtual Machine Provision Algorithms for Cloud System Ching-Chi Lin Institute of Information Science, Academia Sinica Department of Computer.
The War Between Mice and Elephants LIANG GUO, IBRAHIM MATTA Computer Science Department Boston University ICNP (International Conference on Network Protocols)
1 Placement of Continuous Media in Wireless Peer-to-Peer Networks Shahram Ghadeharizadeh, Bhaskar Krishnamachari, Shanshan Song, IEEE Transactions on Multimedia,
Effectively Utilizing Global Cluster Memory for Large Data-Intensive Parallel Programs John Oleszkiewicz, Li Xiao, Yunhao Liu IEEE TRASACTION ON PARALLEL.
Peer-to-Peer Based Multimedia Distribution Service Zhe Xiang, Qian Zhang, Wenwu Zhu, Zhensheng Zhang IEEE Transactions on Multimedia, Vol. 6, No. 2, April.
VCR-oriented Video Broadcasting for Near Video-On- Demand Services Jin B. Kwon and Heon Y. Yeon Appears in IEEE Transactions on Consumer Electronics, vol.
Energy Efficient Prefetching – from models to Implementation 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software Engineering.
Energy Efficient Prefetching with Buffer Disks for Cluster File Systems 6/19/ Adam Manzanares and Xiao Qin Department of Computer Science and Software.
Performance Evaluation of Peer-to-Peer Video Streaming Systems Wilson, W.F. Poon The Chinese University of Hong Kong.
Implementing ISA Server Caching. Caching Overview ISA Server supports caching as a way to improve the speed of retrieving information from the Internet.
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Matei Ripeanu.
CS Spring 2012 CS 414 – Multimedia Systems Design Lecture 34 – Media Server (Part 3) Klara Nahrstedt Spring 2012.
Cutting the Electric Bill for Internet-Scale Systems Andreas Andreou Cambridge University, R02
Take An Internal Look at Hadoop Hairong Kuang Grid Team, Yahoo! Inc
1 Exploring Data Reliability Tradeoffs in Replicated Storage Systems NetSysLab The University of British Columbia Abdullah Gharaibeh Advisor: Professor.
Ch 4. The Evolution of Analytic Scalability
Report : Zhen Ming Wu 2008 IEEE 9th Grid Computing Conference.
PARAID: The Gear-Shifting Power-Aware RAID Charles Weddle, Mathew Oldham, An-I Andy Wang – Florida State University Peter Reiher – University of California,
Dynamic Resource Allocation Using Virtual Machines for Cloud Computing Environment.
Grid Data Management A network of computers forming prototype grids currently operate across Britain and the rest of the world, working on the data challenges.
A Web Crawler Design for Data Mining
Yongzhi Wang, Jinpeng Wei VIAF: Verification-based Integrity Assurance Framework for MapReduce.
November , 2009SERVICE COMPUTATION 2009 Analysis of Energy Efficiency in Clouds H. AbdelSalamK. Maly R. MukkamalaM. Zubair Department.
MapReduce: Simplified Data Processing on Large Clusters Jeffrey Dean and Sanjay Ghemawat.
Introduction to Hadoop and HDFS
Mobile Relay Configuration in Data-Intensive Wireless Sensor Networks.
Critical Power Slope Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi, Charles Lefurgy, Eric Van Hensbergen Ram Rajamony Raj Rajkumar.
Tag line, tag line Power Management in Storage Systems Kaladhar Voruganti Technical Director CTO Office, Sunnyvale June 12, 2009.
A Dynamic Data Grid Replication Strategy to Minimize the Data Missed Ming Lei, Susan Vrbsky, Xiaoyan Hong University of Alabama.
Your university or experiment logo here Caitriana Nicholson University of Glasgow Dynamic Data Replication in LCG 2008.
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Module 9: Implementing Caching. Overview Caching Overview Configuring General Cache Properties Configuring Cache Rules Configuring Content Download Jobs.
Tevfik Kosar Computer Sciences Department University of Wisconsin-Madison Managing and Scheduling Data.
Caitriana Nicholson, CHEP 2006, Mumbai Caitriana Nicholson University of Glasgow Grid Data Management: Simulations of LCG 2008.
PROP: A Scalable and Reliable P2P Assisted Proxy Streaming System Computer Science Department College of William and Mary Lei Guo, Songqing Chen, and Xiaodong.
Improving Disk Throughput in Data-Intensive Servers Enrique V. Carrera and Ricardo Bianchini Department of Computer Science Rutgers University.
U N I V E R S I T Y O F S O U T H F L O R I D A Hadoop Alternative The Hadoop Alternative Larry Moore 1, Zach Fadika 2, Dr. Madhusudhan Govindaraju 2 1.
ENERGY-EFFICIENCY AND STORAGE FLEXIBILITY IN THE BLUE FILE SYSTEM E. B. Nightingale and J. Flinn University of Michigan.
Efficient Live Checkpointing Mechanisms for computation and memory-intensive VMs in a data center Kasidit Chanchio Vasabilab Dept of Computer Science,
June 30 - July 2, 2009AIMS 2009 Towards Energy Efficient Change Management in A Cloud Computing Environment: A Pro-Active Approach H. AbdelSalamK. Maly.
Adaptive Sleep Scheduling for Energy-efficient Movement-predicted Wireless Communication David K. Y. Yau Purdue University Department of Computer Science.
GreenCloud: A Packet-level Simulator of Energy-aware Cloud Computing Data Centers Dzmitry Kliazovich ERCIM Fellow University of Luxembourg Apr 16, 2010.
Best Available Technologies: External Storage Overview of Opportunities and Impacts November 18, 2015.
Xi He Golisano College of Computing and Information Sciences Rochester Institute of Technology Rochester, NY THERMAL-AWARE RESOURCE.
Dynamic Control of Coding for Progressive Packet Arrivals in DTNs.
20 Copyright © 2008, Oracle. All rights reserved. Cache Management.
Content Delivery Networks: Status and Trends Speaker: Shao-Fen Chou Advisor: Dr. Ho-Ting Wu 5/8/
Data Consolidation: A Task Scheduling and Data Migration Technique for Grid Networks Author: P. Kokkinos, K. Christodoulopoulos, A. Kretsis, and E. Varvarigos.
Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.
Thin Clienting Justin Spratt. What is thin clienting? Thin clienting is a form of cloud computing—running applications on a server rather than on a local.
Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer
IT-DSS Alberto Pace2 ? Detecting particles (experiments) Accelerating particle beams Large-scale computing (Analysis) Discovery We are here The mission.
Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,
DENS: Data Center Energy-Efficient Network-Aware Scheduling
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Resource Management IB Computer Science.
Green cloud computing 2 Cs 595 Lecture 15.
Condor – A Hunter of Idle Workstation
O.S Lecture 13 Virtual Memory.
Zhen Xiao, Qi Chen, and Haipeng Luo May 2013
Ch 4. The Evolution of Analytic Scalability
Energy Efficient Scheduling in IoT Networks
Energy-Efficient Storage Systems
Exploring Multi-Core on
Presentation transcript:

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University of Alabama IEEE 2010 Cloud Computing Technology and Science March 16, 2011 Taikyoung Kim SNU IDB Lab.

Outline  Introduction  Data Replication  Performance Results  Conclusion and Future Work 2

Introduction  Data grid features –Millions of files are generated and thousands of clients access the files –Need to manage an extremely large number of data sets  Present systems support scalability, but extremely energy inefficient –Power and cooling of the data center are inefficient –The power demanded by data centers is predicted to double from 2006 to 2011  Storing, managing and moving massive amounts of data are also a significant bottleneck 3

Introduction  Our approach –Save energy through the use of efficient CPU usage –Consider strategies to minimize disk storage and data transmission  We propose to minimize the amount of data stored by utilizing smart replication strategies –Consider replicating the data only when necessary  Goal –Design data aware strategies for data-intensive computing  Shorter running times  Decreased amount of data transmitted  Smaller storage space –Reduce power needed 4

Outline  Introduction  Data Replication –Data Grid Architecture –Sliding Window Strategy  Performance Results  Conclusion and Future Work 5

Data Replication  Utilize data replication –High probability to access data which is not in the local site –Remote data file access can be a very expensive operation  Network bandwidth, network congestion –It reduces the access time and avoids remote file access  limit size of the storage –To decrease the amount of energy needed to store the data  Use of smart data replication to reduce the cost of accessing and storing data 6

Data Replication Data Grid Architecture  We consider only single-tier grids –Expect the strategies developed for single-tier grids can be used within the multi-tier structure  It is common for a job in a data grid to list all the files needed to complete its task –We utilize this aspect in designing a data replication scheme 7

Data Replication Sliding Window Strategy  SWIN [Sliding Window replica scheme] –Consider the file access times in the future and local site Storage Element size –Build a “sliding window” that is a set of distinct files which will be used immediately in the future  Includes all the files the current job will access and the distinct files from the next arriving jobs  The sum of the files in the sliding window will be at most the size of the local Storage Element –Slides forward on more file each time the system finishes processing one file  Keep changing in this way 8

Data Replication Sliding Window Strategy  Q= : a set of jobs  FAS(J i )= : file accessing sequence (f in ≠f im )  G_FAS= : global file accessing sequence  POS(f x,G_FAS): return the first position of f x in G_FAS  Sliding Window rules 1.The sum of the sizes of all the files in the sliding window ≤ Size(SE) 2.No duplicated files exist in the sliding window 3.Any files in the sliding window will not be in a position before the POS(f K,G_FAS) 4.Any files not in the sliding window will be in a position after POS(f m,G_FAS) 9

Outline  Introduction  Data Replication  Performance Results –Performance Environment –Number of Nodes Powered On –File Availability  Conclusion and Future Work 10

Performance Results  Evaluate the performance of SWIN replica strategy using Sage- built at the University of Alabama  Sage nodes –Intel D201GLY2 mainboard with 1.2 GHz Celeron CPU  On-board 10/100 Megabit LAN –1 Gb 533 MHz RAM –80 Gb SATA 3 hard drive  Energy usage rates –Booting and peak : 430 Watts –Idle : 335 Watts (Cooling fans turned on) 315 Watts (Cooling fans turned off) 11

Performance Results Performance Environment  The client nodes are responsible for –Processing the request –Maintaining replica copies –Notifying the server when a job is completed  Default experiment parameters  Metric –Total running time –Average number of watts required to process a job  Sampled every 1 minute 12 (400MB)

Performance Results Number of Nodes Powered On  The power consumed is affected by whether or not all of the nodes are powered on –Regardless of whether they are being used in the computation of the jobs 13 LFU -Least Frequently Used LRU -Least Recently Used MRU -Most Recently Used

Performance Results Number of Client Nodes  Measured the total running time for 100 jobs with all nodes powered on 14

Performance Results Number of Client Nodes  While LRU requires the most watts, it has a shorter running time overall than LFU and MRU –Does not require the highest number of watts  The jobs with only 1 or 2 client nodes take longer to run than those utilizing 8 client nodes  The watts required for computation is a smaller percentage of the total watts 15

Performance Results File Availability  The files are only available at the server –(a) The jobs are able to run in a shorter amount of time as clients increase –(b) The bottleneck increases as the number of client nodes increases  Assume all file requests must go through the resource broker at the server  The amount of power consumed is not always strictly related to the running time of the jobs  Lastly, have shown that the window size can be decreased without increasing the running time or power consumed 16

Outline  Introduction  Data Replication  Performance Results  Conclusion and Future Work 17

Conclusion and Future Work  Propose the smart strategies for replication files –One way to minimize the energy consumed in data grid  SWIN strategy –Minimize the amount of data transmitted and storage needed –Performs better than existing strategies, such as LRU, MRU and LFU –Particularly beneficial in power saving when resource contention is high –Decrease running time and watts required  Smaller storage can be used to lower the amount of power  Future work –Study the performance of SWIN when the files are of different sizes –Explore more efficient implementations for transferring files –Design and test additional replica schemes by utilizing the CPU –Consider ways to schedule the jobs 18

Thank you Question?