Download presentation
Presentation is loading. Please wait.
Published byVictoria Powers Modified over 8 years ago
2
A few notes on testing Preliminary tests Computed Tomography Astrophysics Molecular Dynamics HS COSA f2f - 18/05/2015 2 Lucia Morganti – INFN-CNAF
3
+ COSA f2f - 18/05/2015 To avoid dynamic CPU frequency scaling and to maximize CPU performance: use cpufrequtils, e.g. cpufreq-set --governor performance (https://wiki.debian.org/HowTo/CpuFrequencyScaling)https://wiki.debian.org/HowTo/CpuFrequencyScaling To controll GPU performance (Jetson): set a supported rate with echo 852000000 > /sys/kernel/debug/clock/override.gbus/rate (https://devtalk.nvidia.com/default/topic/747641/anyon e-know-how-to-monitor-the-gpu-mhz-/)https://devtalk.nvidia.com/default/topic/747641/anyon e-know-how-to-monitor-the-gpu-mhz-/ CPUs become online only when the scheduler judges there’s enough work to do. To manually force the 4 main CPU cores (Jetson) to run: (http://elinux.org/Jetson/Performance)http://elinux.org/Jetson/Performance 3 Lucia Morganti – INFN-CNAF
4
+ COSA f2f - 18/05/2015 We compute power consumption by measuring the flow of electrical current and knowing the voltage. We do not subtract power consumption of the idle board. 4 Lucia Morganti – INFN-CNAF
5
COSA f2f - 18/05/2015 5 Lucia Morganti – INFN-CNAF
6
x 20 COSA f2f - 18/05/2015Lucia Morganti – INFN-CNAF 6 CPU ONLY
7
+ COSA f2f - 18/05/2015Lucia Morganti – INFN-CNAF 7 CPU ONLY x 4.6
8
COSA f2f - 18/05/2015 fftw3 on CPUs cuFFT on GPUs Lower is better 8 Lucia Morganti – INFN-CNAF CPU & GPU
9
COSA f2f - 18/05/2015 9 Lucia Morganti – INFN-CNAF CPU & GPU fftw3 on CPUs cuFFT on GPUs Higher is better
10
COSA f2f - 18/05/2015 10 Lucia Morganti – INFN-CNAF Master thesis of Elena Corni : Implementazione dell’algoritmo Filtered Back-Projection (FBP) per architetture Low-Power di tipo Systems-On-Chip
11
COSA f2f - 18/05/2015 1024 x 1024 pixel 11 Lucia Morganti – INFN-CNAF CPU &GPU Lower is better
12
COSA f2f - 18/05/2015 1024 x 1024 pixel 12 Lucia Morganti – INFN-CNAF Lower is better On (2xE5-2620+K20): 3241 slices in 1 h: 350 Wh On 5 x Jetson-K1: 3840 slices in 1 h: 41 Wh
13
COSA f2f - 18/05/2015 13 Lucia Morganti – INFN-CNAF http://www.mpa-garching.mpg.de/gadget/ Gadget-2 : MPI code for cosmological simulations by Volker Springel
14
60000 disk + halo particles, Barnes Hut N-body simulation Jetson-K1 10.9 W Xeon ~220 W COSA f2f - 18/05/2015 14 Lucia Morganti – INFN-CNAF CPU ONLY
15
COSA f2f - 18/05/2015 15 Lucia Morganti – INFN-CNAF
16
COSA f2f - 18/05/2015 16 Lucia Morganti – INFN-CNAF CPU ONLY 276498 particles, formation of a cluster of galaxies in an expanding Universe
17
COSA f2f - 18/05/2015 17 Lucia Morganti – INFN-CNAF
18
18 CPU ONLY 32 3 dark matter particles, 32 3 gas particles in a box 50 Mpc size, following structure formation in an evolving Universe, with adiabatic gas physics Gas particles Gas particles with T>10 6 K
19
COSA f2f - 18/05/2015 19 Lucia Morganti – INFN-CNAF
20
Jetson-K1 about 2.6X slower than Xeon using 4 CPU cores, and about 4.4X slower using 8 CPU cores Jetson-K1 12.1 W Xeon ~220 W COSA f2f - 18/05/2015 128 3 particles in a cube of 250 Mpc side, 13.6 Gyr 20 Lucia Morganti – INFN-CNAF CPU ONLY
21
COSA f2f - 18/05/2015 21 Lucia Morganti – INFN-CNAF
22
COSA f2f - 18/05/2015 22 Lucia Morganti – INFN-CNAF
23
Jetson-K1 about 10X slower using the same number of CPU cores Jetson-K1 about 10X slower using the GPU (vs. an NVIDIA K20) Jetson-K1 13.5 W Xeon+K20 ~320 W COSA f2f - 18/05/2015 1 Jetson MPI 23 Lucia Morganti – INFN-CNAF Lower is better 2 Jetson MPI 1 Jetson CUDA CPU & GPU
24
COSA f2f - 18/05/2015 24 Lucia Morganti – INFN-CNAF
25
COSA f2f - 18/05/2015 HS06 on Exynos5, TegraK1 and Atom C2750 – Per core loaded (data from M.Michelotto@HEPIX 2014 https://indico.cern.ch/event/320819/session/3/contribution/30/material/slides/0.pptx ) https://indico.cern.ch/event/320819/session/3/contribution/30/material/slides/0.pptx 25 Lucia Morganti – INFN-CNAF
26
Tests were ok on the Jetson, for 4 cores NFS doesn’t work, eMMC works For 8 core boards (Odroid & Cubieboard), 1.5 GB of RAM are not enough for 8 processes So, for the Odroid, we put swap on eMMC, faster than SSD Now testing the Cubieboard, seems ok COSA f2f - 18/05/2015Lucia Morganti – INFN-CNAF 26
27
COSA f2f - 18/05/2015Lucia Morganti – INFN-CNAF 27
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.