Parapet Research Group, Princeton University EE Vice-Versa Talk #2 Apr 29, 2005 Phase Analysis on Real Systems Canturk ISCI Margaret MARTONOSI.

Slides:



Advertisements
Similar presentations
Memory.
Advertisements

Part IV: Memory Management
Discovering and Exploiting Program Phases Timothy Sherwood, Erez Perelman, Greg Hamerly, Suleyman Sair, Brad Calder CSE 231 Presentation by Justin Ma.
D. Tam, R. Azimi, L. Soares, M. Stumm, University of Toronto Appeared in ASPLOS XIV (2009) Reading Group by Theo 1.
Thread Criticality Predictors for Dynamic Performance, Power, and Resource Management in Chip Multiprocessors Abhishek Bhattacharjee Margaret Martonosi.
Combining Statistical and Symbolic Simulation Mark Oskin Fred Chong and Matthew Farrens Dept. of Computer Science University of California at Davis.
Zhiguo Ge, Weng-Fai Wong, and Hock-Beng Lim Proceedings of the Design, Automation, and Test in Europe Conference, 2007 (DATE’07) April /4/17.
Sensor-Based Abnormal Human-Activity Detection Authors: Jie Yin, Qiang Yang, and Jeffrey Junfeng Pan Presenter: Raghu Rangan.
Page 15/4/2015 CSE 30341: Operating Systems Principles Allocation of Frames  How should the OS distribute the frames among the various processes?  Each.
Segmentation and Paging Considerations
G. Alonso, D. Kossmann Systems Group
Workloads Experimental environment prototype real sys exec- driven sim trace- driven sim stochastic sim Live workload Benchmark applications Micro- benchmark.
NUMA Tuning for Java Server Applications Mustafa M. Tikir.
Phase Detection Jonathan Winter Casey Smith CS /05/05.
High Performance Computing and Software Lab, W&M ALS '01, 11/10/ Adaptive Page Replacement to Protect Thrashing in Linux Song Jiang, Xiaodong Zhang.
Operating System Support Focus on Architecture
Memory Management 2010.
Reverse Hashing for Sketch Based Change Detection in High Speed Networks Ashish Gupta Elliot Parsons with Robert Schweller, Theory Group Advisor: Yan Chen.
1 PATH: Page Access Tracking Hardware to Improve Memory Management Reza Azimi, Livio Soares, Michael Stumm, Tom Walsh, and Angela Demke Brown University.
1 Chapter 8 Virtual Memory Virtual memory is a storage allocation scheme in which secondary memory can be addressed as though it were part of main memory.
Memory Management Five Requirements for Memory Management to satisfy: –Relocation Users generally don’t know where they will be placed in main memory May.
University of California San Diego Locality Phase Prediction Xipeng Shen, Yutao Zhong, Chen Ding Computer Science Department, University of Rochester Class.
1 Presenter: Chien-Chih Chen Proceedings of the 2002 workshop on Memory system performance.
Catching Accurate Profiles in Hardware Satish Narayanasamy, Timothy Sherwood, Suleyman Sair, Brad Calder, George Varghese Presented by Jelena Trajkovic.
1 Token-ordered LRU an Effective Policy to Alleviate Thrashing Presented by Xuechen Zhang, Pei Yan ECE7995 Presentation.
Layers and Views of a Computer System Operating System Services Program creation Program execution Access to I/O devices Controlled access to files System.
Rensselaer Polytechnic Institute CSC 432 – Operating Systems David Goldschmidt, Ph.D.
Managing Multi-Configuration Hardware via Dynamic Working Set Analysis By Ashutosh S.Dhodapkar and James E.Smith Presented by Kyriakos Yioutanis.
Page 19/17/2015 CSE 30341: Operating Systems Principles Optimal Algorithm  Replace page that will not be used for longest period of time  Used for measuring.
Review of Memory Management, Virtual Memory CS448.
A Low-Cost Memory Remapping Scheme for Address Bus Protection Lan Gao *, Jun Yang §, Marek Chrobak *, Youtao Zhang §, San Nguyen *, Hsien-Hsin S. Lee ¶
Chapter 5 Operating System Support. Outline Operating system - Objective and function - types of OS Scheduling - Long term scheduling - Medium term scheduling.
Uncovering the Multicore Processor Bottlenecks Server Design Summit Shay Gal-On Director of Technology, EEMBC.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Dept. of Computer and Information Sciences : University of Delaware John Cavazos Department of Computer and Information Sciences University of Delaware.
CIS250 OPERATING SYSTEMS Memory Management Since we share memory, we need to manage it Memory manager only sees the address A program counter value indicates.
Virtualization Part 2 – VMware. Virtualization 2 CS5204 – Operating Systems VMware: binary translation Hypervisor VMM Base Functionality (e.g. scheduling)
Parapet Research Group, Princeton University EE IEEE International Symposium on Workload Characterization IISWC ’05, Austin, TX Oct 06, 2005 Detecting.
Preeti Ranjan Panda, Anant Vishnoi, and M. Balakrishnan Proceedings of the IEEE 18th VLSI System on Chip Conference (VLSI-SoC 2010) Sept Presenter:
MadCache: A PC-aware Cache Insertion Policy Andrew Nere, Mitch Hayenga, and Mikko Lipasti PHARM Research Group University of Wisconsin – Madison June 20,
Methodologies for Performance Simulation of Super-scalar OOO processors Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project.
Replicating Memory Behavior for Performance Skeletons Aditya Toomula PC-Doctor Inc. Reno, NV Jaspal Subhlok University of Houston Houston, TX By.
Embedded System Lab. 정범종 A_DRM: Architecture-aware Distributed Resource Management of Virtualized Clusters H. Wang et al. VEE, 2015.
Princeton University Electrical Engineering 12th International Symposium on High-Performance Computer Architecture HPCA-12, Austin, TX Feb 14, 2006.
CS333 Intro to Operating Systems Jonathan Walpole.
1 Hidra: History Based Dynamic Resource Allocation For Server Clusters Jayanth Gummaraju 1 and Yoshio Turner 2 1 Stanford University, CA, USA 2 Hewlett-Packard.
Exploiting Instruction Streams To Prevent Intrusion Milena Milenkovic.
Lectures 8 & 9 Virtual Memory - Paging & Segmentation System Design.
Best detection scheme achieves 100% hit detection with
Tree and Forest Classification and Regression Tree Bagging of trees Boosting trees Random Forest.
Parapet Research Group, Princeton University EE Workshop on Hardware Performance Monitor Design and Functionality HPCA-11 Feb 13, 2005 Hardware Performance.
Test Loads Andy Wang CIS Computer Systems Performance Analysis.
Kalman Filter and Data Streaming Presented By :- Ankur Jain Department of Computer Science 7/21/03.
PINTOS: An Execution Phase Based Optimization and Simulation Tool) PINTOS: An Execution Phase Based Optimization and Simulation Tool) Wei Hsu, Jinpyo Kim,
Jonathan Walpole Computer Science Portland State University
Canturk ISCI Margaret MARTONOSI
Chapter 9 – Real Memory Organization and Management
18742 Parallel Computer Architecture Caching in Multi-core Systems
Swapping Segmented paging allows us to have non-contiguous allocations
What we need to be able to count to tune programs
Practical Rootkit Detection with RAI
Predictive Performance
Page Replacement.
Distributed Systems CS
Phase Capture and Prediction with Applications
Request Behavior Variations
Phase based adaptive Branch predictor: Seeing the forest for the trees
Canturk Isci Gilberto Contreras Margaret Martonosi
Scheduling of Regular Tasks in Linux
Srinivas Neginhal Anantharaman Kalyanaraman CprE 585: Survey Project
Presentation transcript:

Parapet Research Group, Princeton University EE Vice-Versa Talk #2 Apr 29, 2005 Phase Analysis on Real Systems Canturk ISCI Margaret MARTONOSI

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 2 Previously…  Runtime processor power monitoring and estimation  Power Phase Behavior of programs (Power Vectors) POWER CLIENT POWER SERVER Gcc GzipVpr Vortex Gap Crafty MeasuredModeled

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 3 Previously…  Runtime processor power monitoring and estimation  Power Phase Behavior of programs (Power Vectors)

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 4 Today!  Phase detection on real systems:  Variability effects and potentials for repeatability  Virtual memory behavior – Tuning  Initial results  What’s going on?  BBVs – PMCs – PVs… and POWER  Simple metric prediction studies  Short term vs. long term MAJOR MINOR MAYBE

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 5 Phase Detection with Power Vectors  Initial idea was to look at phase distributions of app-s and use some signature analysis to detect/predict phases  HOWEVER:  Multiple runs -inevitably- exhibit different real system behavior  The quantities & durations vary  The phase distributions vary  Metric Var  Time Var

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 6 Variability Effects in Real System Behavior  A direct apples to apples comparison of phase signatures is not very relevant in real world!

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 7 Ammp and Apples  Although obvious to the eye, comparing phase sequences directly does not reveal the recurrence clearly!

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 8 How do Phase Distributions Compare? Ex: 2 runs of gcc

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 9 How do Phase Distributions Compare? Ex: 2 runs of gcc

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 10 We Got Ourselves a Problem:  How do we extract this recurrent behavior information?  Speech/Humming recognition:  Stored libraries, signal stats  Pitch tracking  Image/Biomedical:  Image warping  Registration/Mutual information  Architects:  Simple to apply online  Implementable w/o massive state & combinationals

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 11 Interesting Observation with Transitions  Trying to detect application from behavior  Upper Case:  Hit!  Lower Case:  False alarm?  Tracking phase transitions rather than phase sequences proves to be more useful in detecting recurrent behavior* Gcc1-Gcc2 Gcc-Equake

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 12 Our Transition-Guided Detection Framework Benchmark run #1 Sample PMCs to form 12D vectors Benchmark run #2 Vector stream #1 Identify Transitions Vector stream #2 T init #1 Apply glitch/gradient filtering T init #2 T gg #1T gg #2 Apply near-neighbor blurring T ggN #1 Match ⇒ Peak at best alignment Mismatch ⇒ No observable peak Apply cross correlation

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 13 Sampling Effects: Glitches & Gradients  Nothing happens without disturbances  Glitches  Glitch: Instability where before & after is same  Spurious Transitions  Nothing happens instantaneously  Gradients  Gradient: Instability where before & after is different  A single true trans-n

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 14 Glitch/Gradient Filtering  Very simple: no consecutive transitions  Leads to large reductions in transition count  We call these “Refined Transitions (T gg )”

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 15 (Also Helps with Threshold Choices)

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 16 Time Shifts  We have binary information  We can do cheaper than shifted correlation coeff-s  Using Cross-Correlations show equally useful results  Easily implementable  Ex: Matching and Mismatch cases, and “The Peak” Gcc1-Gcc2 Gcc-Equake

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 17  Observation: Dilations exist as small jitters (few samples)  Proposed Solution: “Near-Neighbor Blurring”  Blur edges slightly  Consider transitions as distributions around their actual locations  Tolerance: Spread of this distribution, [t-x, t+x] samples  Ex: Matching improvement with tolerance=4: Time Dilations run1 run2 run1 run2 Mismatch ! Match!

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 18 Our Transition-Guided Detection Framework Benchmark run #1 Sample PMCs to form 12D vectors Benchmark run #2 Vector stream #1 Identify Transitions Vector stream #2 T init #1 Apply glitch/gradient filtering T init #2 T gg #1T gg #2 Apply near-neighbor blurring T ggN #1 Match ⇒ Peak at best alignment Mismatch ⇒ No observable peak Apply cross correlation

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 19 Results  How do we quantify the strength of the peak?  Matching Score:  Detection Results: (green: highest match; red: highest mismatch)

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 20 Receiver Operating Characteristics  Our best detection scheme (tolerance=1) achieves 100% hit detection with <5% false alarms.  (For a uniform threshold!)

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 21 Comparison of Methods  Comparing 3 cases:  Original (Value Based) Phases vs. Refined Trans-ns vs. Near-Nbr Blurred Trans-ns  In all cases transitions perform better  In almost all cases near-neighbor blurring improves detection

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 22 Conclusions  Phase-recurrent behavior detection on real systems has interesting problems resulting from system induced variability  Looking at phase transition information in part improves detection capabilities  Supporting methods such as Glitch/Gradient Filtering and Near-Neighbor Blurring improve detectability of transition signatures

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 23 Today!  Phase detection on real systems:  Variability effects and potentials for repeatability  Virtual memory behavior – Tuning  Initial results  What’s going on?  BBVs – PMCs – PVs… and POWER  Simple metric prediction studies  Short term vs. long term

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 24 Workload Phases  Memory Behavior?  Few of the Inspirations:  Redhat Magazine Issue #1 [Dec 2004]  Dynamically Tracking Page Miss Ratio Curve [ASPLOS 2005]  Gokul Kandiraju [PhD Thesis 2004]  Can we track phase behavior from PMCs and VM related stats to dynamically manage memory behavior?  Less page locality  fetch less contiguous pages at once  Recurring reference with high reuse distance  launder less aggressively  Targets  Exec time & Energy IndicatorActionEffect James Donald -

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 25 Platform  P4, No SMT, 256K Mem, Linux  SPEC2K is designed to fit in 256K  Choose High Memory Benchmarks + Multiprogramming  Multiprogramming combinations of these leads to lots of thrashing

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 26 Effect of Thrashing with Multiprogramming  For most cases, it leads to 5-10% power/performance penalty  Applu+Apsi!  6X Time  2.5X Energy  Non thrashing combinations, achieve 5-10% improvement James Donald -

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 27 Action  Effect  Non-intrusive tuning possibilities:  Kswapd:tries_base Max # of pages swapout daemon tries to free at once  Kswapd:swap_cluster # of pages swapout daemon writes at once  Page-cluster: Log 2 (# of contiguous pages) kernel reads at once at a page fault  Intrusive tuning possibilities:  Page scanning period (Overhead if tasks fit in Mem)  Page age counters (reuse vs. pollution)  Inactive-Clean Percentage (balance I/O and Mem demand)  Task memory allocation (Workload dependent Mem demand) IndicatorActionEffect James Donald -

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 28 Non-intrusive Results  Gzip: gzip + gzip + gzip  Gap: gap + gzip  Bzip2: bzip2 + bzip2  Tries_base and swap_cluster have no visible effect  Page-cluster shows ~7% improvement wrt default James Donald -

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 29 Conclusions and Todos  Multiprogramming involving thrashing has a lot of potential for improvement for performance/power  Experimented cases don’t show promising actions  Intrusive actions may be more useful leading to effective actions as well as better (per task) tracking  NEXT STEPS:  Looking into mm for potential dynamic tunings  Defining indicators tracking relevant behavior Page miss ratio / Swap rates / Bus Utilization  Q: Is There any Potential? James Donald -

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 30 Tomorrow!  Phase detection on real systems:  Variability effects and potentials for repeatability  Virtual memory behavior – Tuning  Initial results  What’s going on?  BBVs – PMCs – PVs… and POWER  Simple metric prediction studies  Short term vs. long term

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 31 Comparing Phase Methods for Power  All lead to different interesting characterizations  How do these compare in terms of power representation?  Is there a dominant method or does a (hierarchical) combination work better?  We specifically look at BBVs & PMC-Power Vectors Similarity Based On: Metrics (IPC, EPI, etc) Hardware Performance Vectors BBVs, Working Sets ProceduresBranches Sampling Quanta: Code/Time/Energy intervals From Performance Monitoring Counters From Sampled PC Traces

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 32 Different Phases Ex: Dcache Microkernel  Specify L1 hit rate, generate ~desired hits via random linked list traversal A C M P Z Cache Size

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 33 Dcache Performance Traces  Each hit rate range is obvious  Trends NOT identical across metrics:  Linear L1 misses vs. Nonlinear IPC  FOR A SINGLE METRIC: How you capture phases depends on metric and chosen threshold

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 34 Dcache PC Traces  No visible phases from PC samples  Address Space Sampling alone is NOT sufficient!!

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 35 Experiment Setup  PIN kit 1795  3 level Trace instrumentation  ~Every user trace: Conditional inlined trace count  Every K Trace call: Sample EIP  Every 5-20M Trace call: Generate BBV & Collect PMCs & Read PWR history  Constraint: Instrumentation should not overwhelm Power variations!!  BBV Generation:  Sample BBL heads  hash into 32 dimensions (based on Jenkins)  PMC Reading:  Single rotation subset  Sample via ‘popen’s due to platform conflicts  Power Reading:  Read from serial device buffer  No polling possible  disable device at major instrumentation & exhaust buffer

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 36 BBV Results  Is sampling good enough? Are they Meaningful?  B. Calder’s Full Blown BBV SimMatrices  Our sampled & hashed BBV Simmatrices

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 37 Power Results  Do we still have the hook on power variability?  Native  From PIN  Native  From PIN

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 38 Currently…  Still need to verify benchmarks for power and validity  Constructing power vectors with the reduced set  Applying symmetric phase analyses to BBVs and PMCs  Power representation of phases wrt measurements  Prediction with regression trees

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 39 Today!  Phase detection on real systems:  Variability effects and potentials for repeatability  Virtual memory behavior – Tuning  Initial results  What’s going on?  BBVs – PMCs – PVs… and POWER  Simple metric prediction studies  Short term vs. long term

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 40 Metric (IPC) Value Prediction  No big challenge to get good results, but improving for edges is interesting  Statistical Predictor: Transition guided, history based (EWMA) IPC Prediction  Instead of fixed history window, use stable regions between transitions as your history in a circular buffer  Transitions based on a threshold Threshold = 0   “Last Value Predictor”  Our experience:  Variabilities are bursty transitions  There are stable regions with probable gradients between transitions

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 41 Ammp, thr=0% (Last Value)

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 42 Ammp, thr=10%

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 43 Using Stability Considerations (8) in IPC Pred-ns

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 44 Predicting Durations  X=f(x) approach:  F(x) = x, x/2, x/8, …  Initial Stability requirement: 2,8,…  Table based?  Idea was: At each transition: predict once for duration based on history:  Log(prev_duration) = key val-s [0,1,2,3,4,5] History:  |5|3|5|3|5|  3  |1|3|5|1|3|  5 -need to filter bursts somehow -Partial matchings??  NOT EXPLORED!!

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 45 Ammp Duration Prediction  Predict Based on F(x)=x/8  Stability Criterion=8 samples  Extend duration  stability continues  IPC based on last value  Predictions only at checkpoints

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 46 Long Term IPC Prediction with Gradients  Last value not very useful at long term  Instead of 0 order, consider a 1 st order prediction:  Need additional ΔIPC information  Next IPC = Current IPC + ΔIPC  Ex: F(x)=x/8

Phase Analysis – Challenges on Real-Systems Canturk Isci - Margaret Martonosi 47 Improvements?  Using Prediction Probability Tables:  P{N more|20 IPC}  Ex: Vortex  Using adaptive functions based on history  Table based function approaches NP(N|20)