Applying SVM to Data Bypass Prediction

Applying SVM to Data Bypass Prediction
Warisa Sritriratanarak, Mongkol Ekpanyapong, Prabhas Chongstitvatana Presented by Yijia Liu, Isaiah Mayerchak, Qiao Yan, Luyao Chen Introduction Hi everyone! We are here to present a paper on using machine learning algorithm to predict data bypassing, particularly using support vector machine. Yijia will introduce the concept of cache bypassing first Then Isaiah will present today’s weather forecast

Cache Bypassing first of all, i would like to introduce cache bypassing

What is cache bypassing?
Store the results of calculation locally for future use Cache bypassing: Do not keep local copies of results in cache in order to explain what is cache bypassing, we need to understand what cache is: the most common scenario to use cache is for browsers. if the browser cache the web content, it does not have to download it again, which saves the time and bandwidth. Similarly in processors, it stores the results of calculation in local memory for future use. By contrast, cache bypassing is not keeping local copies of results in cache.

Why cache bypassing? The depth of cache hierarchy and cache size on has increased as the number of cores and processors increase More than 30% of chip area Little data reuse Cache access only adds to the total latency and why do we need to perform cache bypassing? As of number of processors and the number of cores increase, cache becomes the bottleneck of computer performance, and therefore the depth of cache hierarchy and cache size on has also increased and currently takes more than 30% of chip area and power, which this constrains the area/power budget available for cores. Therefore, for data with little reuse, it takes extra resources and only adds to the latency for cache allocate/deallocate. in this case, we want to perform cache bypassing for better running performances.

Why use machine learning for cache bypassing?
Injudicious use of cache bypassing can lead to bandwidth congestion increased miss-rate Human-made rules are static and sometimes ineffective Why use machine learning for cache bypassing? If we perform cache bypassing Injudiciously, everything will be recalculated when it is used, which would lead to bandwidth congestion and increased missing rate. and missing rate means the rate of calculations that are reused while not cached. to avoid Injudicious use of cache bypassing, previously, this techniques are mainly based on human-made rules, which are static and sometimes could be ineffective. By using machine learning, we can perform cache bypassing specific to each program and make a more beneficial use of cache bypassing.

Support Vector Machine A cache bypassing classifier

What is SVM? A supervised learning model for classification
A discriminative classifier defined by a separating hyperplane

Why SVM? One of the best binary classifiers
Maximizes the decision boundary margin Good when number of features large Can utilize kernel tricks Why SVM?

Applying SVM to Cache Bypassing
Okay, so now that we have that background, we can talk about how the authors of this paper applied SVM to solve the problem of when to use the cache and when to bypass it.

Main Methodology Take subsets of traces => features for SVM
Use simulator to model processor with 2-level cache Run benchmarks with simulator & collect cache access traces Take subsets of traces => features for SVM Take rest of traces => test set for SVM At a high level, their experiment went like this--this was a proof-of-concept experiment, so rather than using actual hardware, they used a simulator to model a processor that has a simple 2-level cache. They then ran benchmark programs, and using the simulator, were able to collect cache access traces for each instruction that could utilize the cache--this gave them information about each of these instructions that they were then able to use as features to train the SVM to recognize for any given instruction, whether we should use the cache or bypass the cache. They then took the rest of the traces that they did not use for training and instead used them as a test set to test just how effective the SVM was for this benchmark.

Technology Choice Simulator: Multi2Sim Quad-core processor, LRU cache
Benchmark: SPLASH2 Select random subset & run multithreaded 7 different combinations For each combination: Try different types of SVM kernel Simulate 200 million instructions And so to go a little more into detail, they used a simulator called Multi2Sim to model a quad-core processor that used LRU as the cache replacement strategy. For benchmarks, they used a benchmark called Splash2, which is a suite of 11 programs that is extremely popular in the scientific community for testing parallel machines with shared memory. What they did in this experiment is they would randomly select 4 out of the 11 programs in the suite and run them simultaneously, giving a few threads to each program, and they did this 7 different times to get 7 different “benchmark combinations” to report results for. Then, for each one of these combinations, they tested with different types of SVM kernels, to see which one would be most effective. And each one of these combinations would have 200 million instructions simulated.

High Level Architecture
So here’s what the experiment looked like. You can ignore everything below the L2 Cache, because the important thing is the trapezoid right here--You can see that after the model is trained, every time something considers the action of storing something in the cache, it first consults the Bypass Classifier (which is our SVM model) and supplies it with the information that the model was trained on, which is the set of features.

Result This graph shows a percentage of L2 miss rate achieved for each combination. The positive results mean miss rate decrease and the negative results mean miss rate increase. And the four different colors correspond to 4 different kernel functions. Cache miss rate is improved across combinations (except for combination 6), which decreasees, regardless of the kernel. Additional experiments on this combination showed hit rate can be improved, by adding the time of access as a 7th feature, resulting that the miss rate of combination 6 reduced from 29.52% to 28.83% which could possibly translate as combination 6 access is time-sensitive.

Conclusion SVM can improve cache bypassing across SPLASH2 benchmark combinations Radial basis function is the optimal kernel Temporal locality could contribute to cache bypassing prediction With specific SVM kernel functions, features and training data look-ahead window-size, SVM shows the feasibility of bypass prediction Across positive results for 6 features, SVM-predicted bypass yields an average miss rate decrease of 6.72%. It provides better cache utilization than ad hoc replacement policy mechanisms, the current standard, which has an average 6.01% miss rate decrease, according to related works. Radial basis function is, across most combinations, the kernel function which yields the best gains. For certain combinations, such as combination 6, temporal locality could be a significant feature. This analysis will provide clues into classifying SVM kernels and number of features across particular memory access behaviors in the future.

Future Work Refine SVM kernel analysis for application suitability
Efficient cache hardware implementation of bypass classifier Other classification algorithms Future Work Determine why combination 6 differs from others, which includes requiring a different set of training parameters for accurate prediction, and exploring more on the temporal features and machine learning algorithms that better capture temporal locality, such as LSTM. Neural Network, Random Forests, Logistic Regression

Applying SVM to Data Bypass Prediction

Similar presentations

Presentation on theme: "Applying SVM to Data Bypass Prediction"— Presentation transcript:

Similar presentations

About project

Feedback

Log in

Auth with social network:

Applying SVM to Data Bypass Prediction

Similar presentations

Presentation on theme: "Applying SVM to Data Bypass Prediction"— Presentation transcript:

Similar presentations

About project

Feedback