Exploring Multi-Core on

Slides:

Advertisements

Similar presentations

Capacity Planning in a Virtual Environment

Advertisements

Xavier León PhD defense

Grid and Cloud Computing By: Simon Luangsisombath.

Volunteer Computing and Hubs David P. Anderson Space Sciences Lab University of California, Berkeley HUBbub September 26, 2013.

Public-resource computing for CEPC Simulation Wenxiao Kan Computing Center/Institute of High Physics Energy Chinese Academic of Science CEPC2014 Scientific.

Dynamic and Decentralized Approaches for Optimal Allocation of Multiple Resources in Virtualized Data Centers Wei Chen, Samuel Hargrove, Heh Miao, Liang.

1 port BOSS on Wenjing Wu (IHEP-CC)

INTRODUCTION TO CLOUD COMPUTING CS 595 LECTURE 2.

Cloud Computing Energy efficient cloud computing Keke Chen.

Introduction to Cloud Computing

Storage Management in Virtualized Cloud Environments Sankaran Sivathanu, Ling Liu, Mei Yiduo and Xing Pu Student Workshop on Frontiers of Cloud Computing,

EXPOSE GOOGLE APP ENGINE AS TASKTRACKER NODES AND DATA NODES.

Exa-Scale Volunteer Computing David P. Anderson Space Sciences Laboratory U.C. Berkeley.

Wenjing Wu Andrej Filipčič David Cameron Eric Lancon Claire Adam Bourdarios & others.

Wenjing Wu Computer Center, Institute of High Energy Physics Chinese Academy of Sciences, Beijing BOINC workshop 2013.

David Cameron Claire Adam Bourdarios Andrej Filipcic Eric Lancon Wenjing Wu ATLAS Computing Jamboree, 3 December 2014 Volunteer Computing.

David Cameron Riccardo Bianchi Claire Adam Bourdarios Andrej Filipcic Eric Lançon Efrat Tal Hod Wenjing Wu on behalf of the ATLAS Collaboration CHEP 15,

Data Replication and Power Consumption in Data Grids Susan V. Vrbsky, Ming Lei, Karl Smith and Jeff Byrd Department of Computer Science The University.

BOINC: Progress and Plans David P. Anderson Space Sciences Lab University of California, Berkeley BOINC:FAST August 2013.

Ensieea Rizwani An energy-efficient management mechanism for large-scale server clusters By: Zhenghua Xue, Dong, Ma, Fan, Mei 1.

1 Cloud Services Requirements and Challenges of Large International User Groups Laurence Field IT/SDC 2/12/2014.

Capacity Planning in a Virtual Environment Chris Chesley, Sr. Systems Engineer

Introduction to Exadata X5 and X6 New Features

Course 03 Basic Concepts assist. eng. Jánó Rajmond, PhD

Architecture for Resource Allocation Services Supporting Interactive Remote Desktop Sessions in Utility Grids Vanish Talwar, HP Labs Bikash Agarwalla,

Claudio Grandi INFN Bologna Virtual Pools for Interactive Analysis and Software Development through an Integrated Cloud Environment Claudio Grandi (INFN.

ARC-CE: updates and plans Oxana Smirnova, NeIC/Lund University 1 July 2014 Grid 2014, Dubna using input from: D. Cameron, A. Filipčič, J. Kerr Nilsen,

Volunteer Computing and BOINC Dr. David P. Anderson University of California, Berkeley Dec 3, 2010.

Frontiers of Volunteer Computing David Anderson Space Sciences Lab UC Berkeley 30 Dec

Volunteer Computing: Involving the World in Science David P. Anderson U.C. Berkeley Space Sciences Lab February 16, 2007.

The Limits of Volunteer Computing Dr. David P. Anderson University of California, Berkeley March 20, 2011.

David P. Anderson UC Berkeley Gilles Fedak INRIA The Computational and Storage Potential of Volunteer Computing.

Using volunteered resources for data-intensive computing and storage David Anderson Space Sciences Lab UC Berkeley 10 April 2012.

An Overview of Volunteer Computing

NFV Group Report --Network Functions Virtualization LIU XU →

Volunteer Computing and BOINC

Getting the Most out of Scientific Computing Resources

Dynamic Extension of the INFN Tier-1 on external resources

Status of WLCG FCPPL project

Getting the Most out of Scientific Computing Resources

Memory Management.

Matt Lemons Nate Mayotte

Volunteer Computing for Science Gateways

AWS Integration in Distributed Computing

Cluster Optimisation using Cgroups

Designing a Runtime System for Volunteer Computing David P

Diskpool and cloud storage benchmarks used in IT-DSS

Green cloud computing 2 Cs 595 Lecture 15.

Lead SQL BankofAmerica Blog: SQLHarry.com

ATLAS Cloud Operations

Job Scheduling in a Grid Computing Environment

CS 425 / ECE 428 Distributed Systems Fall 2016 Nov 10, 2016

David Cameron ATLAS Site Jamboree, 20 Jan 2017

Philippe Charpentier CERN – LHCb On behalf of the LHCb Computing Group

Windows Azure Migrating SQL Server Workloads

PES Lessons learned from large scale LSF scalability tests

CS 425 / ECE 428 Distributed Systems Fall 2017 Nov 16, 2017

Simulation use cases for T2 in ALICE

CernVM Status Report Predrag Buncic (CERN/PH-SFT).

WLCG Collaboration Workshop;

The Scheduling Strategy and Experience of IHEP HTCondor Cluster

Comparison of the Three CPU Schedulers in Xen

Grid Canada Testbed using HEP applications

Haiyan Meng and Douglas Thain

Cluster Resource Management: A Scalable Approach

Cloud computing mechanisms

Ivan Reid (Brunel University London/CMS)

Backfilling the Grid with Containerized BOINC in the ATLAS computing

Implementation of a small-scale desktop grid computing infrastructure in a commercial domain

Exploit the massive Volunteer Computing resource for HEP computation

Presentation transcript:

Exploring Multi-Core on ATLAS@home Wenjing Wu on behalf of ATLAS Collaboration Computer Center, IHEP wuwj@ihep.ac.cn ISGC 2017 , IHEP-CC

Outline Brief introduction of volunteer computing ATLAS@home project Setting up Multi-Core on ATLAS@home Performance tests and tuning Summary ISGC 2017 , IHEP-CC

Scientists Have a large amount of data to process/a big computing task XX@home Project BOINC Continuous Jobs sent to BOINC server ISGC 2017 , IHEP-CC

Volunteers CAS@home SETI@home LHC@home ATLAS@home Features of volunteer computing resources: Idle CPU resources on personal computers harnessed by BOINC computing tasks run at a lower priority Shared by multiple VC projects. Most hosts have limited network bandwidth and hard disks suitable for CPU intensive, low latency computing tasks CAS@home SETI@home LHC@home ATLAS@home ISGC 2017 , IHEP-CC

Middle Ware for VC BOINC(Berkeley Open Infrastructure for Network Computing) is the most popular middle ware for VC It was firstly developed by the SSL Lab of UC Berkeley for SETI@home in 1998 It was rewritten to generic software for VC ISGC 2017 , IHEP-CC

The scale of BOINC Volunteer Computing(1) Around 50 VC projects :HEP, biology, cosmology, chemistry, physics, environment. 260K active volunteers 830K active volunteer computers Real time computing power: 17PetaFLOPS Successful Individual projects: Einstein@home (819 TeraFLOPS), SETI@home (711 TeraFLOPS) LHC@home (17TeraFLOPS) ATLAS@home(7.6TeraFLOPS) Both LHC@home and ATLAS@home use virtualization which limits its available hosts (hosts are not powerful enough or volunteers have trouble to install VirtualBox on their hosts) ISGC 2017 , IHEP-CC

ATLAS@home Goal: Why ATLAS@home Add significant computing power to ATLAS computing Increase publicity for the ATLAS experiments Find easy ways to harness Tier3 resources. Goal: to run ATLAS simulation jobs on volunteer computers 2019/9/2 ISGC 2017 , IHEP-CC

Architecture 2019/9/2 ISGC 2017 , IHEP-CC

the Whole ATLAS Computing ATLAS@home(BOINC) is the biggest site for simulation, it account for 3% of the whole atlas computing power! ATLAS has ~150K CPU cores from all Grid sites ATLAS@home ISGC 2017 , IHEP-CC

In a year, completed 3.3M jobs (99% MC simulation, 1% MC reconstruction), consumes 43M CPU hours, processed 91M events, running 6K jobs/day, processed 154TB data, produced 94TB data, Job failure 11.84% ISGC 2017 , IHEP-CC

Participated hosts and users keep growing, but the total amount of CPU remains flat! ISGC 2017 , IHEP-CC

ATLAS@home Performance BOINC Job (simulation jobs) Failure rate is 11.84% , while the average for ATLAS is 16.68% ISGC 2017 , IHEP-CC

ATLAS@home Single->Multi Core (1) Athena (ATLAS Software for production and analysis) already supports Multi Core(Athena MP), and most grid sites support multi-core queues. Athena MP can significantly save memory usage by utilizing the same amount of CPU cores (save up to 50% memory). CPU consumption in a month(2017.2) with all ATLAS sites, Single Core uses only 38% of CPU ISGC 2017 , IHEP-CC

ATLAS@home Single->Multi Core (2) A lot of volunteer hosts have more than 1 available cores Available cores == cores volunteers configured to be used by BOINC The majority has 4 or 8 cores The Single core ATLAS@home app spawns a virtual machine for each core on a host. Single core is inefficient if a host has multiple available cores (multiple virtual machines will be spawned on the host) Memory: It requires more memory, 2.3GB RAM per core (Per virtual machine) Network: Each virtual machine needs to download software and cache Hard disk: Each virtual machine needs 10GB space (the vm image and extra hard disk space for the vm) ISGC 2017 , IHEP-CC

BOINC support for multi core Plan class defines tags Each tag includes different attributes CPU numbers range Memory size Tag assigned to jobs which define jobs’ resource requirements Clients report their available resources, and request jobs. Scheduler match the job requirements and the available resources from the clients. CPU BOINC Multi core scheduler supports only fixed amount of memory, regardless of the CPU cores. Not suitable for the ATLAS jobs whose memory usages is proportional to the number of Cores. Memory ISGC 2017 , IHEP-CC

Customized multi core scheduler for ATLAS@home Dynamical memory allocation Memory usage is based on a formula: C+D*N1 C is basic memory size, D is memory size per Core, N1 is the number of Cores Both Number of Cores (N1) and Memory usage(M1) are calculated based on the job resource requirements and available resources from the clients. Memory In ATLAS Multi Core app: Core range: 1~12 Memory usage: C=1.3GB, D=1GB Memory usage formula: M1=1.3+1*N1, M1 is memory in GB, N1 is the number of Cores CPU ISGC 2017 , IHEP-CC

Memory usage improvement from multi core ISGC 2017 , IHEP-CC

Performance issues with different cores From both local testing machines and ATLAS job dashboard, we found out some virtual machines with big number of cores have very bad performance ATLAS@home has standard jobs: simulation jobs, 100 Events/job Performance is measured by CPU seconds/event ISGC 2017 , IHEP-CC

Tests on given hosts Goal: compare HT vs. Non HT, different versions of virtualbox Compare performance on 2 local hosts: Host 1, 2*8*2 Cores (2 physical CPU, each has 8 Cores, with Hyper Threading enabled) Host 2 , 2*8 Cores (2 physical CPU, each has 8 Cores, without Hyper Threading enabled) Compare performance on virtualbox 4.2 and 5.1 Method: test CPU performance with different number of cores, run ~20 jobs for each configuration, and calculate the average CPU time. Core No. 4, 6, 8, 10, 12 To avoid competition, run only 1 vm on each host.

CPU of HOST 1 is more powerful than HOST 2 Older VirtualBox New VirtualBox Test Result CPU of HOST 1 is more powerful than HOST 2 Conclusion: Older version (<5.0.0) of virtualbox has really had performance HT has no significant impact on performance In both cases, performance starts to get really bad as the core number goes beyond 8.

Analysis from ATLAS Job Dashboard (1) Performance of BOINC multi core site month BOINC Core No Event No CPU sec/event 1 0.07 22.8 326 2 0.06 16.1 268 3 0.14 40 286 4 1.5 332 221 5 0.23 58.6 255 6 0.27 106.6 395 7 0.11 44 400 8 0.46 205 446 9 0.01 3.5 350 10 0.02 11.7 585 11 2.8 280 12 0.44 373 848 Data is taken in a month period for BOINC_MCORE site Performance is OK up to 8 cores. For 9, 11 cores, the number is smaller, may not be representative .

Analysis from Job Dashboard (2) performance of different resources month grid Core No Event No CPU sec/event 1 31.32 8458 270 4 41.15 8527 207 6 13.17 3224 245 8 384 106179 277 12 2.39 297 124 cloud boinc_mcore 8% 4.75 2084 439 2 0.06 15.9 265 3 0.14 39.7 284 1.49 331 222 5 0.23 58.6 255 0.27 106 393 7 0.11 43.8 398 27.92 12114 434 9 0.01 3.5 350 10 0.02 11.7 585 11 2.8 280 0.43 372 865 BOINC 0.07 22.8 326 16.1 268 40 286 1.5 332 221 106.6 395 44 400 0.46 205 446 0.44 373 848 Data is taken in a month period for different resources (cloud, grid, boinc, and all resources) boinc contributes about 8% CPU of cloud resources

Analysis from Job Dashboard (3) performance of commonly used cores Core No Event No CPU sec/event grid 31.89 8502 267 cloud 4.75 2085 439 hpc 0.88 202 230 4 Core 41.17 8531 207 2.17 655.5 302 1.49 331 222 8 Core 384.62 106330 276 27.92 12114 434 7.9 1829 232 12 Core 2.4 298.4 124 0.43 372 865 4.26 1831 430 focuses on the most commonly used number of cores (1, 4, 8, 12) Data is taken in a month period for different resources (cloud, grid, HPC)

Conclusions CPU performance is very stable for grid sites despite the number of cores. It is even more CPU efficiently with more number of cores, but most grid sites have 8 cores, some have 12 cores. The number of cores in the grid sites are pre-configured, normally 8 or 12 cores The number of cores are configured according to the actual cores in the physical CPU For cloud sites and HPC sites, similar to BOINC, the CPU performance starts to get bad with more than 8 cores. Do not understand the HPC sites Cloud sites’ performance are due to the number of cores of in the physical CPU. Most physical CPU has 8 cores Using more than 8 cores causes cross-physical CPUs, resulting in bad performance.

Solutions for ATLAS multi core Reduce the max core number from 12 to 8 from the server side Core range: 1~8 Hosts with physical CPUs having more than 8 cores(minority) can configure on the client side to use more than 8 cores. ISGC 2017 , IHEP-CC

multi core vs. single Core (1) The multi core Application was officially released in July 2017 Both single core and multi core coexists until December 2017 Volunteers prefer to run the multi core app, and gradually migrate to the multi core app Volunteers can still run single core atlas job with the multi core app. Single Core Multi Core Events processed in a year by different number of cores ISGC 2017 , IHEP-CC

multi core vs. single core By wall clock, switching to multi core slightly increased the contributed CPU, however the processed event numbers decreased slightly. Single core jobs: 25 events/job Multi core jobs: 100 events/job Single Core multi Core Wall clock consumption in a year by different number of cores ISGC 2017 , IHEP-CC

Multi core numbers breakdown Among multi cores, 30% uses 8 cores, 30% uses 4 cores. Event processed in a year by different number of cores ISGC 2017 , IHEP-CC

Multi Core doesn’t improve CPU Efficiency BOINC Site: Single Core: 0.79, Mcore 0.66 CPU Efficiency by different number of Cores, for ALL ATLAS computing resources ISGC 2017 , IHEP-CC

Summary ATLAS@home is a pioneer volunteer computing project in the HEP field Has been providing ~ 3% to the whole ATLAS computing and performs well! Switching from single core to multi core has significantly reduced to memory usage on the volunteer hosts (up to 50%), volunteers are very happy with this feature Multi core is less CPU efficient, but more memory efficient ATLAS@home multi core makes it possible to exploit more available cores from the volunteer hosts by reducing its memory/hard disk/network usage. ISGC 2017 , IHEP-CC