Juan Rubio, Lizy K. John Charles Lefurgy

Slides:



Advertisements
Similar presentations
Storing Data: Disk Organization and I/O
Advertisements

Fast Algorithms For Hierarchical Range Histogram Constructions
Allocating Memory.
Variability in Architectural Simulations of Multi-threaded Workloads Alaa R. Alameldeen and David A. Wood University of Wisconsin-Madison
Introduction to Simulated Annealing 22c:145 Simulated Annealing  Motivated by the physical annealing process  Material is heated and slowly cooled.
Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.
Vilalta&Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.
1 Hybrid methods for solving large-scale parameter estimation problems Carlos A. Quintero 1 Miguel Argáez 1 Hector Klie 2 Leticia Velázquez 1 Mary Wheeler.
CMPE 421 Parallel Computer Architecture
1 Local search and optimization Local search= use single current state and move to neighboring states. Advantages: –Use very little memory –Find often.
An Integration Framework for Sensor Networks and Data Stream Management Systems.
AUTHORS: STIJN POLFLIET ET. AL. BY: ALI NIKRAVESH Studying Hardware and Software Trade-Offs for a Real-Life Web 2.0 Workload.
Modern Floor-planning Based on B ∗ -Tree and Fast Simulated Annealing Paper by Chen T. C. and Cheng Y. W (2006) Presented by Gal Itzhak
A Survey of Distributed Task Schedulers Kei Takahashi (M1)
1 Distributed Energy-Efficient Scheduling for Data-Intensive Applications with Deadline Constraints on Data Grids Cong Liu and Xiao Qin Auburn University.
Reporter : Yu Shing Li 1.  Introduction  Querying and update in the cloud  Multi-dimensional index R-Tree and KD-tree Basic Structure Pruning Irrelevant.
Critical Power Slope: Understanding the Runtime Effects of Frequency Scaling Akihiko Miyoshi †,Charles Lefurgy ‡, Eric Van Hensbergen ‡, Ram Rajamony ‡,
Optimization Problems
An Introduction to Simulated Annealing Kevin Cannons November 24, 2005.
Local Search Algorithms and Optimization Problems
Metaheuristics for the New Millennium Bruce L. Golden RH Smith School of Business University of Maryland by Presented at the University of Iowa, March.
Accurate WiFi Packet Delivery Rate Estimation and Applications Owais Khan and Lili Qiu. The University of Texas at Austin 1 Infocom 2016, San Francisco.
A Hierarchical Edge Cloud Architecture for Mobile Computing IEEE INFOCOM 2016 Liang Tong, Yong Li and Wei Gao University of Tennessee – Knoxville 1.
Eick: Informed Search Informed Search and Exploration Search Strategies Heuristic Functions Local Search Algorithms Vilalta&Eick: Informed Search.
Optimization Problems
Improving Multi-Core Performance Using Mixed-Cell Cache Architecture
Optimizing Distributed Actor Systems for Dynamic Interactive Services
Scientific Research Group in Egypt (SRGE)
OPERATING SYSTEMS CS 3502 Fall 2017
Presented by: Nick Kirchem Feb 13, 2004
Module 11: File Structure
Heuristic Optimization Methods
Xiaodong Wang, Shuang Chen, Jeff Setter,
FileSystems.
What is Fibre Channel? What is Fibre Channel? Introduction
Ching-Chi Lin Institute of Information Science, Academia Sinica
STEREO MATCHING USING POPULATION-BASED MCMC
Subject Name: File Structures
Local Search Algorithms
Nithin Michael, Yao Wang, G. Edward Suh and Ao Tang Cornell University
Database Performance Tuning and Query Optimization
BitWarp Energy Efficient Analytic Data Processing on Next Generation General Purpose GPUs Jason Power || Yinan Li || Mark D. Hill || Jignesh M. Patel.
Bank-aware Dynamic Cache Partitioning for Multicore Architectures
ISP and Egress Path Selection for Multihomed Networks
Artificial Intelligence (CS 370D)
April 30th – Scheduling / parallel
Yu Su, Yi Wang, Gagan Agrawal The Ohio State University
Announcements Homework 3 due today (grace period through Friday)
Haim Kaplan and Uri Zwick
KISS-Tree: Smart Latch-Free In-Memory Indexing on Modern Architectures
Optimization Problems
Bursty and Hierarchical Structure in Streams
Admission Control and Request Scheduling in E-Commerce Web Sites
Qingbo Zhu, Asim Shankar and Yuanyuan Zhou
Introduction to Simulated Annealing
Department of Electrical Engineering
More on Search: A* and Optimization
Boltzmann Machine (BM) (§6.4)
Chapter 11 Database Performance Tuning and Query Optimization
Xin-She Yang, Nature-Inspired Optimization Algorithms, Elsevier, 2014
Artificial Intelligence
Biointelligence Laboratory, Seoul National University
Lecture 9 Administration Heuristic search, continued
More on HW 2 (due Jan 26) Again, it must be in Python 2.7.
More on HW 2 (due Jan 26) Again, it must be in Python 2.7.
Local Search Algorithms
Greg Knowles ECE Fall 2004 Professor Yu Hu Hen
Local Search Algorithms
2019/9/14 The Deep Learning Vision for Heterogeneous Network Traffic Control Proposal, Challenges, and Future Perspective Author: Nei Kato, Zubair Md.
Stochastic Methods.
Presentation transcript:

Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement Juan Rubio, Lizy K. John Charles Lefurgy Laboratory for Computer Architecture IBM Austin Research Lab The University of Texas at Austin, USA Presented by Sean Leather Laboratory for Computer Architecture Good morning. Thank you for coming to this presentation. My name is _____ and I am graduate student at the University of Texas at Austin. Today I’ll be talking about …

Commercial Systems Computer systems running commercial workloads operate on large amounts of data Researchers have noticed that performance is hindered by data accesses System architecture trends point to a distributed storage model with a non-uniform access latency for disk and memory [go through this slide fast!] Latency to access remote data can be an order of magnitude more than local data 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Data Placement Goal: Challenges: Place data to reduce access penalties It is difficult when looking at large amounts of data, a handful of processors and multiple operations Uses of the data change with time Key points: Looking at ALL the possible combinations is hard Even IF we figure out the right way, the usage patterns change 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Data Placement: Example 5 4 6 3 5 3 6 4 [present example] For each figure, the dotted line represents a node. These nodes contain processors – represented by the squares on the left, and blocks of data, represented by the colorful blocks on the right. The lines show the usage of those blocks of data by the processor. The number on top of each line represents the amount of data that needs to be transferred. Of these transfers, those that go beyond the boundaries of a node result in the longest latencies and therefore affect the performance the most. We can see that by changing the location of some blocks (purple and green), the cost of those inter-node transfers can be reduced from 16 to 13. 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Approach Static data placement Run-time data reorganization Applied before the workload runs Organizes blocks of data across disks of the system to result in low number of remote accesses Run-time data reorganization Applied while the system runs the workload Used to adapt the layout of blocks of data to characteristics of the workload 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Outline Problem Static data placement Run-time data reorganization Simulated Annealing (SA) Evaluation Summary This is an outline of the rest of the talk. I just present an introduction and motivation behind our study. Now, I’ll explain the simulated annealing process and the reason why we are using it in this work. In the following sections I will explain how the simulated annealing technique can be used to guide the placement of data in a new system, and how to reorganize the data during run-time. Then I will present an evaluation of these ideas, and will concludes the talk with a summary of our observations. 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Static Data Placement Arranges blocks of data across disks of the system Approach: Use probabilistic knowledge about the workload to formulate a cost function Obtain layout that minimizes the cost function Update disks to reflect the resulting layout 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Run-time Data Reorganization Approach: Periodically run a reorganization routine for the whole system Compute a cost function based on the description of some completed and pending operations Determine changes to layout that would lower the cost function Pre-fetch the data from the remote disks 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Simulated Annealing [1/2] Choose a number of “steps” (iterations) For each step Randomly introduce a perturbation (a small change to the current combination) Always accept the new alternative if it reduces the cost Randomly accept some alternatives that increase the cost (uphill change) Slowly decrease the uphill acceptance probability Later steps are less likely to accept bad perturbations Simulated annealing is a heuristic commonly used to minimize the value of a cost function while avoiding a local minimum An iterative algorithm It uses a probabilistic approach to quickly achieve an adequate solution This heuristic is used to minimize or reduce the value of the cost function as required by the previous 2 ideas. To perform the algorithm, we must first select a number of steps to perform. We have to keep in mind that since this is an iterative process, the number of steps has a strong impact over the quality of solution. A perturbation is a small change to the current system. And therefore it is specific to the application in question. For example in the examples I presented earlier, a perturbation might consist of moving a block of data from one node to another node. Each step has an associated “temperature”, which is an abstract variable that affects the value of the uphill probability for a particular combination. 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Simulated Annealing [2/2] A randomizing algorithm, allows SA to quickly explore a vast design space Accepting uphill changes allows SA to escape a local minima Uphill changes can also be bad We reduce the probability of accepting them as the exploration progresses This uphill acceptance probability was modeled based on the physical annealing process. And it’s related to the temperature by this exponential expression. Since the temperature is decreased between the different steps of the algorithm, the uphill acceptance probability also decreases. As a result, the uphill changes turn to be more conservative. We can see that in this plot that describes the progress made by the simulated annealing algorithm to obtain the minimum cost for a sample function. The blue line represents the cost obtained by SA after each step, and the red curve represents the cost obtained by the Iterative Improvement method, which only accepts a perturbation when it reduces the cost of the objective function. The advantage of SA come from the fact that it accepts some combinations that appear to hurt the solution. But turn to be beneficial at the long term. 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

System Full-system simulation (SimOS-PPC) 4 x 4 cc-NUMA 1 GHz CPUs, 512 MB per node, 7 disk units per node, 128-bit 100 MHz bus 128 bit 200 MHz inter-node bus Directory-based cc-NUMA System runs AIX 4.3.1 Using simulation in this work allowed me to measure characteristics of the system that wouldn’t be possible in a life system. This proved to be helpful especially during the testing stages. It also allowed me better control the layout of the system in the static case, as well as the control the repeatability of the Directory: 16 K entries per node. Disk: 7 (4 for data, 2 for database logs, 1 for OS) 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Benchmark DSS queries based on TPC-H Database was populated based on a TPC-H database with a scale factor of 1 Around 2.5 GB between table and indices Data set of the queries ranged from 585 MB to 2.8 GB Web interactions are based on TPC-W DB2 was optimized for the simulated hardware running each type of workload 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Performance for DSS workload Static (global): Generates a single cost function for a group of queries Obtains the layout most suitable for all the queries Static (local): Uses a single query to produce a layout Produces a very optimistic layout Dynamic: Starts with an optimized layout Adapts layout as queries run on the system 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Cost functions Inter-node data transferred Time estimate Sum of all data blocks that are accessed from a remote disk Time estimate Time to access local/remote data Time to operate on the data Time: Estimates the time to access the disk, transfer the data to memory and perform the operation Then uses the maximum time of all the nodes 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Quality of the solution: steps If we reduce the temperature slowly, we can achieve a better schedule This comes at the cost of extra time Around 0.87 seconds of think-time for 50 steps 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Summary We phrase the data placement problem as a combinatorial optimization problem We propose a technique that uses simulated annealing to generate an initial data layout based on the expected usage of the data We extend the simulated annealing technique to reorganize the data at run-time We take advantage of the locality of data references to improve the effectiveness of the reorganization 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement

Thank you For additional information http://www.ece.utexas.edu/projects/ece/lca/ 10/28/2004 Improving Server Performance on Transaction Processing Workloads by Enhanced Data Placement