Download presentation
Presentation is loading. Please wait.
Published byJack Owen Modified over 8 years ago
1
C++11 and three compilers: performance considerations CERN openlab Summer Students Lightning Talks Sessions Supervised by Pawel Szostek Stephen Wang › 19/08/2014
2
Motivation › “Surprisingly, C++11 feels like a new language” by Stroustrup. › Improvements: Language usability, Mulithreading and other stuff. › How about performance? › Four questions: Time measurement methods, For-loop efficiency, std::async, STL algorithms in parallel mode › GCC 4.9.0, ICC 15.0.0, Clang+LLVM 3.4 › -O2, -O3, -Ofast 19/08/2014Stephen Wang2
3
Contribution › Make four micro-benchmarks for each feature. › Code: https://github.com/wangyichao/CPP11_Benchmarks › Automatize the process, make results repeatable › Python and bash script are used. › Compile -> Run -> Table (3 compiler * 3 options * containers * algorithms …) › Use profiling tools to check performance such as vectorization and threads. › Perf, Intel Vtune, Likwid, even PMU › Answer basic questions. 19/08/2014Stephen Wang3
4
Conclusions auto start = std::chrono::steady_clock::no w(); auto end = std::chrono::steady_clock::no w(); int elapsed = std::chrono::duration_cast (end - start).count(); overhead.push_back(elapsed); 19/08/2014Stephen Wang4 RDTSCP std::chrono
5
Conclusions › std::chrono (C++11) is a reliable time measurement. 19/08/2014Stephen Wang5 auto start = std::chrono::steady_clock::no w(); microseconds_sleep(index); auto end = std::chrono::steady_clock::no w(); int elapsed = std::chrono::duration_cast (end - start).count();
6
Conclusions › Performance of for-loop varies with iteration method and container. › Which containers are vectorized? 19/08/2014Stephen Wang6 arraystd::arraystd::liststd::setstd::vector GCC ICC Clang Table range-based for loop
7
Conclusions › std::async spawns threads when called and the computation is done immediately. ( stdlibc++ from GCC) 19/08/2014Stephen Wang7 double sum = 0; auto handle1 = std::async(std::launch::async, fsum, v.begin(), v.begin()+distance); Auto handle2 = std::async(…); … sum = handle1.get()+handle2.get()+ha ndle3.get()+handle4.get();
8
Conclusions › GNU libstdc++ parallel speeds up part of STL algorithms, whose performance varies with containers. 19/08/2014Stephen Wang8
9
› Thanks 19/08/2014Stephen Wang9
Similar presentations
© 2025 SlidePlayer.com. Inc.
All rights reserved.