Optimization-Based Models For Pruning Processor Design-Space

Slides:



Advertisements
Similar presentations
Request Dispatching for Cheap Energy Prices in Cloud Data Centers
Advertisements

SpringerLink Training Kit
Luminosity measurements at Hadron Colliders
From Word Embeddings To Document Distances
Choosing a Dental Plan Student Name
Virtual Environments and Computer Graphics
Chương 1: CÁC PHƯƠNG THỨC GIAO DỊCH TRÊN THỊ TRƯỜNG THẾ GIỚI
THỰC TIỄN KINH DOANH TRONG CỘNG ĐỒNG KINH TẾ ASEAN –
D. Phát triển thương hiệu
NHỮNG VẤN ĐỀ NỔI BẬT CỦA NỀN KINH TẾ VIỆT NAM GIAI ĐOẠN
Điều trị chống huyết khối trong tai biến mạch máu não
BÖnh Parkinson PGS.TS.BS NGUYỄN TRỌNG HƯNG BỆNH VIỆN LÃO KHOA TRUNG ƯƠNG TRƯỜNG ĐẠI HỌC Y HÀ NỘI Bác Ninh 2013.
Nasal Cannula X particulate mask
Evolving Architecture for Beyond the Standard Model
HF NOISE FILTERS PERFORMANCE
Electronics for Pedestrians – Passive Components –
Parameterization of Tabulated BRDFs Ian Mallett (me), Cem Yuksel
L-Systems and Affine Transformations
CMSC423: Bioinformatic Algorithms, Databases and Tools
Some aspect concerning the LMDZ dynamical core and its use
Bayesian Confidence Limits and Intervals
实习总结 (Internship Summary)
Current State of Japanese Economy under Negative Interest Rate and Proposed Remedies Naoyuki Yoshino Dean Asian Development Bank Institute Professor Emeritus,
Front End Electronics for SOI Monolithic Pixel Sensor
Face Recognition Monday, February 1, 2016.
Solving Rubik's Cube By: Etai Nativ.
CS284 Paper Presentation Arpad Kovacs
انتقال حرارت 2 خانم خسرویار.
Summer Student Program First results
Theoretical Results on Neutrinos
HERMESでのHard Exclusive生成過程による 核子内クォーク全角運動量についての研究
Wavelet Coherence & Cross-Wavelet Transform
yaSpMV: Yet Another SpMV Framework on GPUs
Creating Synthetic Microdata for Higher Educational Use in Japan: Reproduction of Distribution Type based on the Descriptive Statistics Kiyomi Shirakawa.
MOCLA02 Design of a Compact L-­band Transverse Deflecting Cavity with Arbitrary Polarizations for the SACLA Injector Sep. 14th, 2015 H. Maesaka, T. Asaka,
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Fuel cell development program for electric vehicle
Overview of TST-2 Experiment
Optomechanics with atoms
داده کاوی سئوالات نمونه
Inter-system biases estimation in multi-GNSS relative positioning with GPS and Galileo Cecile Deprez and Rene Warnant University of Liege, Belgium  
ლექცია 4 - ფული და ინფლაცია
10. predavanje Novac i financijski sustav
Wissenschaftliche Aussprache zur Dissertation
FLUORECENCE MICROSCOPY SUPERRESOLUTION BLINK MICROSCOPY ON THE BASIS OF ENGINEERED DARK STATES* *Christian Steinhauer, Carsten Forthmann, Jan Vogelsang,
Particle acceleration during the gamma-ray flares of the Crab Nebular
Interpretations of the Derivative Gottfried Wilhelm Leibniz
Advisor: Chiuyuan Chen Student: Shao-Chun Lin
Widow Rockfish Assessment
SiW-ECAL Beam Test 2015 Kick-Off meeting
On Robust Neighbor Discovery in Mobile Wireless Networks
Chapter 6 并发:死锁和饥饿 Operating Systems: Internals and Design Principles
You NEED your book!!! Frequency Distribution
Y V =0 a V =V0 x b b V =0 z
Fairness-oriented Scheduling Support for Multicore Systems
Climate-Energy-Policy Interaction
Hui Wang†*, Canturk Isci‡, Lavanya Subramanian*,
Ch48 Statistics by Chtan FYHSKulai
The ABCD matrix for parabolic reflectors and its application to astigmatism free four-mirror cavities.
Measure Twice and Cut Once: Robust Dynamic Voltage Scaling for FPGAs
Online Learning: An Introduction
Factor Based Index of Systemic Stress (FISS)
What is Chemistry? Chemistry is: the study of matter & the changes it undergoes Composition Structure Properties Energy changes.
THE BERRY PHASE OF A BOGOLIUBOV QUASIPARTICLE IN AN ABRIKOSOV VORTEX*
Quantum-classical transition in optical twin beams and experimental applications to quantum metrology Ivano Ruo-Berchera Frascati.
The Toroidal Sporadic Source: Understanding Temporal Variations
FW 3.4: More Circle Practice
ارائه یک روش حل مبتنی بر استراتژی های تکاملی گروه بندی برای حل مسئله بسته بندی اقلام در ظروف
Decision Procedures Christoph M. Wintersteiger 9/11/2017 3:14 PM
Limits on Anomalous WWγ and WWZ Couplings from DØ
Presentation transcript:

Optimization-Based Models For Pruning Processor Design-Space 4/24/2018 12:44 AM Optimization-Based Models For Pruning Processor Design-Space Nilay Vaish Committee Members Michael C. Ferris Mark D. Hill Jeffrey T. Linderoth Michael M. Swift David A. Wood (advisor) 04/05/2017 © 2007 Microsoft Corporation. All rights reserved. Microsoft, Windows, Windows Vista and other product names are or may be registered trademarks and/or trademarks in the U.S. and/or other countries. The information herein is for informational purposes only and represents the current view of Microsoft Corporation as of the date of this presentation. Because Microsoft must respond to changing market conditions, it should not be interpreted to be a commitment on the part of Microsoft, and Microsoft cannot guarantee the accuracy of any information provided after the date of this presentation. MICROSOFT MAKES NO WARRANTIES, EXPRESS, IMPLIED OR STATUTORY, AS TO THE INFORMATION IN THIS PRESENTATION.

Summary Designing processors becoming increasingly harder Thesis explores optimization-based models for pruning/exploring design space. Three case studies. Exploring the design space of cache hierarchies On-chip resource placement and distribution Bandwidth-optimized Latency-optimized

Microprocessors Big role in IT revolution Used to scale as suggested by Moore [Moo65] Moore's law still fine, but single-threaded performance tapering.

Single Threaded CPU Performance For SPECint Performance on SPECint Source: J. Pershing. http://preshing.com/20120208/a-look-back-at-single-threaded-cpu-performance

Microprocessors Single Core Multi Core Many Core P = CV^2f. F_max roughly linear in V. Reduce voltage, reduce frequency. Run tasks in parallel. More cores but operated at lower voltage and frequency. Exploit application-level parallelism for better performance

= 4455 possible spatial arrangements (includes symmetric ones) Design Challenges More components  more design possibilities Architectural simulators are slow: simulates 100,000 instructions / second. More simulated cores  Less simulation throughput = 4455 possible spatial arrangements (includes symmetric ones)

Optimization-based models and algorithms sufficient for pruning Our Thesis Optimization-based models and algorithms sufficient for pruning Provide solutions that perform better than previously proposed solutions Reduce time to explore the design space compared to brute-force, randomized or genetic algorithm-based search procedures. Allow solve problems of larger scale and complexity Provide information about optimality.

Our Case Studies Designing cache hierarchy (partly done before preliminary exam) Resource placement and distribution Bandwidth-optimized (mostly done before preliminary exam) Latency-optimized (most recent work)

Problem Statement Input: cache and physical memory designs, processor design (cores and interconnect topology), resource constraints, workload behavior Output: a hierarchy of caches well suited for the processor

Variability in Cache Hierarchies Across Designs Processor Cores Level 1 (Instruction) Level 1 (Data) Level 2 Level 3 Memory Controllers Cavium ThunderX 48 78KB 32KB 16 MB shared - 6 DDR3/4 Cavium ThunderX2 54 64KB 40KB 32 MB shared Intel Xeon E7-8890 v4 24 256 KB 60 MB 4 DDR4 channels Intel Xeon Phi 72 1 MB shared by two cores 16 GB (HBM) 2 DDR4, 6 channels Mellanox Tile-Gx72 18MB 4 DDR3 Oracle M7 32 16KB 256 KB I$, 256 KB D$, shared by four cores, 8MB, shared by four cores 4 DDR4, 16 channels

Example Design Space Infeasible Maximal Dominated Expected Dynamic Energy Expected Access Latency

So many dimensions to optimize! What to do? Energy Latency Power Area Levels

Possible Approach: Exhaustive Simulation As per Cacti[MBJ09], more than 1012 cache designs possible Can be algorithmically and heuristically pruned to about several thousand designs Still ~1010 three-level hierarchies Architectural simulators are slow: simulates 100,000 instructions / second. Size, banks, associativity, nspd, wire type, ndwl, ndbl, ndcm, ndsam_lev1, ndsam_lev2

Possible Approach: Continuous Modeling Several different mathematical models for designing cache hierarchy [JCSM96, PPG11, OLLC, SHK+] Common feature: continuous functions for latency program behavior cost / energy / power

Possible Approach: Continuous Modeling Requires algebraic functional forms

Dynamic Energy Vs Size

Possible Approach: Continuous Modeling Requires algebraic functional forms Designs not physically realizable

Gap Between Continuous And Discrete Solutions Continuous Optimal Discrete Optimal Y X

Desired Features Trade-off among different metrics: access time, dynamic energy consumption, static power dissipation, chip area requirements Obtain physically realizable designs Optimize for shared caches, on-chip network.

Our Approach Consists of two parts Discrete Modeling of the design space Dynamic Programming + Multi-dimensional divide and conquer to compute hierarchies on the Pareto optimal frontier

Discrete Model Prior work uses thumb rules to fit continuous functions to data obtained from simulation and from tools like Cacti Instead use obtained data directly. No need to model design parameters continuously that are actually discrete.

Dynamic Programming Observed two structural properties of solutions under the discrete model Optimal Substructure: every optimal hierarchy for n levels contains an optimal hierarchy for first n-1 levels. Overlapping Subproblems: different n-level hierarchies mostly differ in their nth-level. Compute the Pareto Frontier of (n-1)-level hierarchies and use that again and again while computing n-level hierarchies.

Find The Latency Optimal Cache Hierarchy Access Latency, Global Miss Ratio Cache Design 4ns, 1/9 4ns, 1/8 2ns, 1/5 2ns, 1/4 1ns, 1/4 Level 1 40ns 1/100 20ns 1/50 Level 2 50ns Physical Memory

Choose Level-1 Maximal Designs Eliminate dominated designs 4ns, 1/9 4ns, 1/8 2ns, 1/5 2ns, 1/4 1ns, 1/4 Level 1 40ns 1/100 20ns 1/50 Level 2 50ns Physical Memory

Compute Level-1 × Level-2 Maximal Designs Compute expected access times 4 + 40/9 = 76/9 ≈ 8.44 2 + 40/5 = 10 4ns, 1/9 4ns, 1/8 2ns, 1/5 2ns, 1/4 1ns, 1/4 Level 1 40ns 1/100 20ns 1/50 Level 2 50ns Physical Memory

Choose Level-1 × Level-2 Maximal Designs 1/100 20ns 1/50 20ns 1/50 Level 2 50ns Physical Memory

Choose Level-1 × Level-2 × Memory Designs 4 + 40/9 + 50/100 = 161/18 ≈ 8.95 2 + 20/5 + 50/50 = 7 1 + 20/4 + 50/50 = 7 4ns, 1/9 2ns, 1/5 1ns, 1/4 Level 1 40ns 1/100 20ns 1/50 20ns 1/50 Level 2 50ns Physical Memory

Proposed Solution: Short Description Choose scenarios containing multiple applications. Collect data on cache behavior of applications Select individual cache designs maximal for the chosen metrics Add private levels, one at a time. Keep only feasible and maximal designs around Design shared levels, one at a time. Analyze shared cache behavior, on-chip network behavior. Select feasible and maximal designs Account for the physical memory

Evaluation of Our Method Comparison with a continuous model for single core hierarchy Cumulative Distribution of true positives for 8- core designs

Two Kinds Of Errors Dominated Infeasible Maximal False -ve False +ve Expected Dynamic Energy Expected Access Latency

Estimating True Positives Dominated Infeasible Maximal False -ve False +ve Expected Dynamic Energy Loss in performance Expected Access Latency

Continuous Model 𝑠 i , t i : size and access delay for cache at level i. 𝑠𝑡𝑎𝑡𝑖 𝑐 𝑖 , 𝑑𝑦𝑛𝑎𝑚𝑖 𝑐 𝑖 , 𝑎𝑟𝑒 𝑎 𝑖 : static power, dynamic energy and area for cache at level i. 𝑝 𝑖 : probability of accessing cache at level i. a 0 , a 1 , b 0 , b 1 , c 0 , c 1 , d 0 , d 1 ,α, β : constants obtained from fitting linear functions to data obtained from Cacti and applications under consideration.

Comparing Discrete and Continuous Models Solved the two models with same set of parameters Output from continuous model rounded to nearest feasible cache design Simulations carried out with SPEC CPU 2006 applications using gem5. Optimal hierarchy computed by simulating hierarchies that are off the Pareto-optimal frontier.

Improvement With Discrete Model Over Continuous Model (Inorder Core)

Improvement With Discrete Model Over Continuous Model (Out-of-Order Core)

Experiments With 8-core Design Assumed the architecture shown: 8 cores connected via crossbar One / two levels of private cache, one level shared Computed and simulated hierarchies for SPEC CPU2006 benchmark suite with detailed core and memory system.

Distribution of true Pareto-optimal (Homogeneous Workload) Ideal Point

Distribution of true Pareto-optimal (Mixed Workload)

Conclusion Need for a structured approach towards designing cache hierarchies Developed a discrete model. Observed two structural properties. Proposed a dynamic programming + multi- dimension divide and conquer algorithm Performs better than continuous models and approximations proposed in literature.

Our Case Studies Designing cache hierarchy (partly done before preliminary exam) Resource placement and distribution Bandwidth-optimized (mostly done before preliminary exam) Latency-optimized (most recent work)

Problem Statement Input: a processor design (cores and on-chip network topology), a set of memory controllers, on-chip network resource budgets, traffic pattern Output: a ‘suitable’ placement of memory controllers in the on-chip network + design of the network : processor core : network link : memory controller : memory channel : memory (DRAM)

Why Should We Care? : processor core : network link : memory controller : memory channel : memory (DRAM) : message packet Data needs to move between the processor chip and the memory But bandwidth is limited, so needs to be shared by on- chip components

Why Should We Care? Controllers managing off-chip accesses placed strategically Similarly the on-chip network needs to be designed as per the traffic patterns. Combined problem more challenging, but potentially better designs

What Has Been Done Before? Abts et al [AEJK+] solved the controller placement problem using: experience exhaustive simulation of smaller designs Genetic Algorithm (GA) based approach : processor core and caches : memory controller diamond diagonal

What Has Been Done Before? Mishra et al. [MVD] designed a heterogeneous mesh network: two types of links: wide and narrow two types of routers: big and small exhaustive simulation and extrapolation. : big router : small router : wide link : narrow link

Our Approach Design space is large: n tiles, m memory controller  𝑛 𝑚 ways to place 64 tiles, 16 controllers  4.89 × 1014 ways Distributing on-chip network resources along with placing controllers even harder Use mathematical optimization for solving the problem

Optimization Model Index Input Variables (x,y): coordinates on a 2-D plane l: link Path(x,y,x’,y’): set containing all the links l that are used in the path going from (x,y) to (x’,y’) Ωx,y,x’,y’ : weight of traffic from (x,y) to (x’,y’) link budget, number of memory controllers BWl : link’s bandwidth BW: load per unit bandwidth Ix,y: binary variable denoting whether a memory controller is placed at (x,y) Load(l): traffic on l due to communication between controllers and cores. Index Input Variables

Analysis Of The Model Mixed Integer Non-linear Non-convex Possible to linearize the formulation

diagonal (Mishra et al.) Solution Design diagonal (Mishra et al.) com-opt : big router + memory controller : small router : narrow link : wide link : memory controller

Evaluation Methodology simulator 64 out-of-order cores, mesh interconnect 45 multi-programmed workloads generated using SPEC CPU2006 Simulations run till each core executed at least 25,000,000 instructions Compares diagonal and com-opt designs

Evaluation Results Speedup obtained by com-opt over diagonal Average Speedup obtained by com-opt over diagonal On average, about 10% improvement in performance

What’s Lacking Prior models focused on minimizing the maximum traffic bandwidth over a link Applications not necessarily bound by communication bandwidth Need for models that can help design processors for latency-sensitive applications.

Our Contribution Developed two optimization models with different objectives minimize maximum latency minimize average latency Models only need the one single input parameter: traffic input rate. Non-linear models did not work well, linearized versions did  algebraic queueing model may not be essential.

MinMax Optimization Model BWl : link’s bandwidth LoadOnLink(l): traffic on l due to communication between controllers and cores. 𝜇: average of rate of transmission at a link 𝜆: rate of traffic input from any core in the processor 𝜆 𝑙 : load per unit bandwidth of link l 𝑊 𝑙 : average waiting time at link l 𝑍 𝑥,𝑦, 𝑥 ′ , 𝑦 ′ : delay for path from (x,y) to (x’,y’)

Specifics Assume each link acts as an M/D/1 server. Estimate the average waiting time for a flit at link l as: 𝑊 𝑙 = 𝜆 𝑙 2(𝜇(𝜇− 𝜆 𝑙 )) . Latency in traversing link l: 1 𝜇 + 𝑊 𝑙 Latency over path from (x,y) to (x’, y’): 𝑍 𝑥,𝑦, 𝑥 ′ ,𝑦′ ≥ 𝐼 𝑥𝑦 𝑙∈𝑃𝑎𝑡ℎ(𝑥,𝑦,𝑥′,𝑦′) ( 1 𝜇 + 𝑊 𝑙 ) Z = maximum latency over all paths. Objective: minimize Z

Linearizing Constraints Non-linear model unable to compute optimal design or explore design space Several constraints non-linear Average waiting time at a link: W= 𝜆 2(𝜇(𝜇−𝜆))

Plots For Queueing Latency And Piecewise Linear Estimations

Linearizing Constraints Non-linear model unable to compute optimal design or explore design space Several constraints non-linear Average waiting time at a link: W= 𝜆 2(𝜇(𝜇−𝜆)) Constraint for waiting time linearized using piecewise linear function and features from the modeling language

Linearized MinMax Model

Results With Linearized Model Solved the model for different values of traffic intensity: 𝜌=𝜆/𝜇. Found designs that lie in between center and diamond.

Simulation Performance Of Different Designs

Conclusion Developed two new optimization-based models for placing memory controllers and designing on-chip network Designers need to set one single parameter to explore the design space Found designs with performance in-between that of center and diamond. Finding confirmed with synthetic simulations. Use of piecewise-linear model shows algebraic queueing model not necessary.

To Conclude Designing processors getting harder. Need better tools. We believe optimization-based models suitable for pruning. Explored few of them in our thesis. Need more research on what else might be beneficial. Develop a multi-scale model and estimate error bounds.

Acknowledgements Prof Wood Committee members Fellow students and friends Family members

Bibliography [AEJK+] Dennis Abts, Natalie D. Enright Jerger, John Kim, Dan Gibson, and Mikko H. Lipasti, Achieving Predictable Performance through Better Memory Controller Placement in Many-Core CMPs, ISCA '09. [BBB+] Nathan Binkert, Bradford Beckmann, Gabriel Black, Steven K. Reinhardt, Ali Saidi, Arkaprava Basu, Joel Hestness, Derek R. Hower, Tushar Krishna, Somayeh Sardashti, Rathijit Sen, Korey Sewell, Muhammad Shoaib, Nilay Vaish, Mark D. Hill, and David A. Wood, The gem5 simulator, SIGARCH Comput. Archit. News 39 (2011), 1-7. [DGR+74] R.H. Dennard, F.H. Gaensslen, V.L. Rideout, E. Bassous, and A.R. LeBlanc, Design of ion-implanted MOSFET's with very small physical dimensions, Solid-State Circuits, IEEE Journal of 9 (1974), no. 5, 256-268. [JCSM96] Bruce L. Jacob, Peter M. Chen, Seth R. Silverman, and Trevor N. Mudge, An Analytical Model for Designing Memory Hierarchies, IEEE Trans. Comput. 45 (1996), no. 10, 1180-1194.

Bibliography [MBJ09]Naveen Muralimanohar, Rajeev Balasubramonian, and Norman P. Jouppi, Cacti 6.0: A tool to model large caches, HP Laboratories (2009). [Moo65] Gordan E. Moore, Cramming more components onto integrated circuits, Electronics 38 (1965), no. 8. [MVD] Asit K. Mishra, N. Vijaykrishnan, and Chita R. Das, A Case for Heterogeneous On-Chip Interconnects for CMPs, ISCA '11. [OLLC] Taecheol Oh, Hyunjin Lee, Kiyeon Lee, and Sangyeun Cho, An Analytical Model to Study Optimal Area Breakdown between Cores and Caches in a Chip Multiprocessor, ISVLSI '09, IEEE Computer Society, pp. 181- 186. [PPG11] Pablo Prieto, Valentin Puente, and Jose-Angel Gregorio, Multilevel Cache Modeling for Chip-Multiprocessor Systems, IEEE Comput. Archit. Lett. 10 (2011), no. 2, 49-52. [SHK+] Guangyu Sun, Christopher J. Hughes, Changkyu Kim, Jishen Zhao, Cong Xu, Yuan Xie, and Yen-Kuang Chen, Moguls: A Model to Explore the Memory Hierarchy for Bandwidth Improvements, ISCA '11, pp. 377-388.

Classic CMOS Dennard Scaling: the Science behind Moore’s Law Source: Future of Computing Performance: Game Over or Next Level?, National Academy Press, 2011 Scaling: Voltage: V/a Oxide: tOX/a Results: Power/ckt: 1/a2 Power Density: ~Constant National Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org), Mark D. Hill

Post-classic CMOS Dennard Scaling Post Dennard CMOS Scaling Rule Scaling: Voltage: V/a V Oxide: tOX/a Results: Power/ckt: 1/a2 1 Power Density: ~Constant a2 National Research Council (NRC) – Computer Science and Telecommunications Board (CSTB.org), Mark D. Hill

How We Linearize The Model Form of the non-linear constraint: Bilinear with product of a continuous and a discrete bounded variables

Benefits Of Using Optimization Scalable Theoretical performance bounds Flexible

Dynamic Programming Technique for solving optimization problems in which decisions are made over multiple stages. In each stage, a decision is made, some reward is given, and possibly new information about the system under consideration is provided. Aim: maximize the sum total of all the rewards received. Algorithm: step forward / backward in time.

Dynamic Programming : Drawback Not possible in many situations due to the curse of dimensionality. Ginormous number of: stages designs or policies state variables scenarios

b. How To Select Individual Cache Designs Need a tool (like Cacti) for estimating design’s performance on chosen metrics: access latency, dynamic energy, static power, chip area Trillions of designs possible as per Cacti. Need to prune Compute maximal designs using divide and conquer algorithm [Ben80]. Time complexity: 𝑂(𝑛 log 𝑘−1 𝑛 ) for n points, k-dimensional space. About 0.07% of designs produced by Cacti are maximal.

Variation in Cache Miss Rates Across Applications

Designing Shared Levels Of Hierarchy Expected access time computed using queueing theory Miss probabilities computed using equilibrium state of the shared caches On-chip network performance also computed using queueing theory

Why Not Scalarize Scalarize the objective by combining the metrics: Min 1 𝑝 λ𝑖 𝑓𝑖(𝑥) subject to x ∈ X Not clear what 𝜆𝑖 to chose. Yields only one maximal design Not possible to generate all maximal designs

Scalarization Dominated Infeasible Maximal Expected Dynamic Energy Expected Access Latency

Why Not Scalarize Scalarize the objective by combining the metrics: Min 1 𝑝 λ𝑖 𝑓𝑖(𝑥) subject to x ∈ X Not clear what 𝜆𝑖 to chose. Yields only one maximal design Not possible to generate all maximal designs

Example Design Space Dominated Infeasible Maximal Expected Dynamic Energy Expected Access Latency

Multi-Dimensional Divide and Conquer Algorithm for computing Pareto Frontier [Ben80]. Divide into two halves. Compute Pareto Frontier for the two halves. Combine the two frontiers together for the complete frontier. Time complexity: 𝑂(𝑛 log 𝑘−1 𝑛 ) for n points, k- dimensional space.

Distribution of virtual channels Solution Design Distribution of virtual channels

Synthetic Evaluation