Experience with jemalloc

Slides:



Advertisements
Similar presentations
Why Our Customers Decide to Own Statseeker. The Problem –All other network management tools use traditional monitoring techniques (such as general purpose.
Advertisements

Is it hard to build a service for 100M user? Short answer – yes.
What is a Programming Language? The computer operates using binary numbers. The computer only knows about 1’s and 0’s. Humans can also use 1’s and 0’s,
Traffic Server Debugging using ASAN / TSAN Brian Geffon.
Chapter 3.2 C++, Java, and Scripting Languages. 2 C++ C used to be the most popular language for games Today, C++ is the language of choice for game development.
Turning Eclipse Against Itself: Finding Errors in Eclipse Sources Benjamin Livshits Stanford University.
Chapter 3.2 C++, Java, and Scripting Languages “The major programming languages used in game development.”
Chapter 11 - Monitoring Server Performance1 Ch. 11 – Monitoring Server Performance MIS 431 – created Spring 2006.
SM3121 Software Technology Mark Green School of Creative Media.
Hands-On Microsoft Windows Server 2008 Chapter 11 Server and Network Monitoring.
CH 13 Server and Network Monitoring. Hands-On Microsoft Windows Server Objectives Understand the importance of server monitoring Monitor server.
Windows Server 2008 Chapter 11 Last Update
Virtual Memory Tuning   You can improve a server’s performance by optimizing the way the paging file is used   You may want to size the paging file.
Basic Web Design. Technology is a tool  FIRST, understand how people actually interact with each other and with the information in their lives, in all.
{ flS Tutorial By  flS uses SMTP protocol to send mails, so your SMTP information is needed.  The first time you launch flS, you will be.
Versus JEDEC STAPL Comparison Toolkit Frank Toth February 20, 2000.
Use of Coverity & Valgrind in Geant4 Gabriele Cosmo.
Graphing and statistics with Cacti AfNOG 11, Kigali/Rwanda.
Know what a computer is used for Understand the difference between hardware and software Be able to describe the way that data is stored in a computer.
Targeting Audiences Plus: Reliability & Performance Updates.
bugs-stopped-working-bsod-lag-for-the-game-project-cars.
T Iteration Demo LicenseChecker I2 Iteration
Building programs LinuxChix-KE. What happens in your CPU? ● It executes a small set of instructions called "machine code" ● Each instruction is just a.
Canonical Ubuntu management tool gets hefty upgrade.
Condor Week Apr 30, 2008Pseudo Interactive monitoring - I. Sfiligoi1 Condor Week 2008 Pseudo-interactive monitoring in Condor by Igor Sfiligoi.
If you find that your HP Printer no longer works as expected, it will likely display an error message. This kind of error messages helps you in repairing.
Keeping your System/Computer healthy and Operating
Solar, according to Click
SQL Database Management
The Emergent Structure of Development Tasks
Marian Ivanov, Anar Manafov
An introduction to Reverse engineering, the tools and assembly
Unit & District Tools Phase 1
MASS Java Documentation, Verification, and Testing
YAHMD - Yet Another Heap Memory Debugger
Introduction to programming
Complex Geometry Visualization TOol
Hands-On Microsoft Windows Server 2008
Glidein Factory Operations
Valgrind, the anti-Alzheimer pill for your memory problems
Lecture 25 More Synchronized Data and Producer/Consumer Relationship
How Effective Our Dev Tools Are?
CLR MD A New Swiss Army Knife tool for Advanced Debugging
Skill Based Assessment
Skill Based Assessment
How to fix iTunes Error on Mac Computers?
Teaching Computing to GCSE
Keeping your System/Computer healthy and Operating
Challenges in Network Troubleshooting In big scale networks, when an issue like latency or packet drops occur its very hard sometimes to pinpoint.
Intro to Ethical Hacking
GDSS – Digital Signature
Fundamentals of Data Representation
Compile, Build, and Debug
System Review – The Forgotten Implementation Step
Tools.
Importance of logs in custom development
Games Development Practices 3D Model Import/Export
Runtime Root feature Jason Kenny.
Tools.
Apache Traffic Comcast Evan Zelkowitz
CS246: Search-Engine Scale
Utility Billing Balancing the Accounts Receivable
Tiffany Ong, Rushali Patel, Colin Dolese, Joseph Lim
Code Generation Tips Tricks Pitfalls 5/4/2019 A.Symons CiTR Pty Ltd.
INFO 344 Web Tools And Development
Arrays.
Review of Previous Lesson
Bootstrap Direct quote from source: bootstrap/
WHAT ARE THE ADVANTAGES AND CHALLENGES OF MICROSOFT SQL SERVER.
CHAPTER 6 Testing and Debugging.
Presentation transcript:

Experience with jemalloc Kit Chan (kichan@oath.com)

Problem – Difficult to debug memory leak in ATS Plugins Plugin coded in C or C++ - easy to produce memory leak bugs Hard to debug in large scale production system Leak can take days or weeks to be noticeable Can’t roll back Don’t know which one. Multiple changes can be suspects Critical feature cannot be rolled back

Options Valgrind AddressSanitizer (ASAN) Typically slows down by 10 to 20x AddressSanitizer (ASAN) Need to recompile. Can slow down by 2x In the past valgrind is a very popular tool to debug memory leak. However it typically will slow down the process by 10 to 20X. In ATS there is effort to use ASAN to find memory leak problems. And there is a presentation in ATS Summit in 2015 to go into details on how to use Address Santizier. One problem with ASAN is that we need to recompile the binary. Still it is reported by with ASAN we can still experience a 2X slow down in performance. So it may still not be suitable for live debugging for critical system. Finally, we can always set up monitoring for the Ats process memory usage. Then we can trace back the changes that cause the memory to grow over a period of times. However, as stated above, there is still a lot of guess works needed to pinpoint the actual root cause. So we need something more. And Jemalloc comes to the rescue.

Jemalloc for Memory Profiling Compile and install jemalloc Create a file (/usr/local/bin/start_ats.sh) with the following contents #!/bin/sh MALLOC_CONF="prof:true,prof_prefix:/tmp/jeprof.out,lg_prof_interval:34,lg_prof_sample:20" LD_PRELOAD=”/usr/local/lib/libjemalloc.so.2" export MALLOC_CONF export LD_PRELOAD /home/y/bin64/traffic_server "$@” Interval between sampling – 2^20 = 1MB Interval between file dump – 2^32 = 4GB Prefix of file dump - /tmp/jeprof.out Profiling is on. Update “proxy.config.proxy_binary” to the file above in records.config Other options available – please see jemalloc’s doc Please note that there are a few other options available for memory profiling and you can check it out in the jemalloc documents.

Viewing the Results Sample Usage jeprof --show_bytes --gif /usr/local/bin/traffic_server /tmp/jeprof.out.32201.3730.i3730.heap > /tmp/32201.3730.gif Generate a gif file containing the call graph of the program Other formats and options supported Here is an example

Case Study #1 ATS in front of multiple API Origins Leak happened for several months. Took about 2 weeks to be noticeable

Case Study #1

Case Study #2 ATS in front of multiple origins, serving HTML and JS/CSS/Images assets Leak happened and took 12 hours to OOM Multiple critical fixes out at the same time

Case Study #2 Our own Brotli plugin did not release the encoder instance correctly

Problem – ATS not scaling up on more Cores/Better CPU

Memory operations are the issues

Plugins (ESI) are the problem

Jemalloc is the solution CPU utilization can now stress to 90%+

Future Running it on production ATS 7.x allows us to turn off freelist Tuning Options. E.g lg_dirty_mult lg_chunk

Conclusion Jemalloc/Jeprof – good complementary tool for debugging memory leak Improve scalability Tunable