Gnosnay & His Silicon Pal

Troubleshooting Thread Leaks in Python

TL;DR

py-spy & grep & log saved the day. py-spy to identify the leak point, grep & log to confirm the leak chain.

Background

Recently, while doing security hardening, after introducing a second-party lib, one day an alert came in about abnormal CPU usage + fd leak.

This second-party lib is used to proxy HTTP requests. Through this proxy, we can ensure requests are trusted.

Troubleshooting Process

1. Using py-spy to Identify the Leak Point

py-spy can be used not only for Python performance analysis but also for thread dumps. We use py-spy to identify the leak point.

py-spy dump <pid>

With this command, you can see the thread stacks of the current process. If a process has a large number of similar thread stacks, then this process might have a leak. The leak point is the repeatedly appearing thread.

2. Using grep and log to Confirm the Leak Chain

Python’s LSP doesn’t directly jump to concrete implementations when analyzing some duck type or protocol calls. In this case, we can only rely on string matching to confirm the leak chain.

  1. Use grep to recursively search for trigger points within the library
  2. Use log content to verify if the call chain hypothesis is correct, such as checking if there are logs matching the hypothesis along the suspected path

Here are some tips:

  1. Generally, our application has a baseline version that runs normally, and the leak only appears after a certain change. So focusing on the changed code and binary searching for the trigger point will help us clarify the call chain
  2. grep is quite useful - recursive searching is faster and more accurate than you might expect
  3. You can approach from both the leak point and the trigger point, converging to find the common call chain

Others

This article is just an introduction. Later I will analyze py-spy’s working mechanism in more depth.