This article describes how to use SystemTap to troubleshoot pod issues.
Different operating systems have different methods for installing SystemTap and its dependencies. Pick one that suits you.
Run the following command to install SystemTap:
apt install -y systemtap
Run the following command to check for dependencies:
stap-prep
The following is a sample result:
Please install linux-headers-4.4.0-104-generic
You need package linux-image-4.4.0-104-generic-dbgsym but it does not seem to be available
Ubuntu -dbgsym packages are typically in a separate repository
Follow https://wiki.ubuntu.com/DebuggingProgramCrash to add this repository
apt install -y linux-headers-4.4.0-104-generic
The above result shows that you need to install dbgsym, which is not in the existing sources. Run the following command to add the third-party source:
sudo apt-key adv --keyserver keyserver.ubuntu.com --recv-keys C8CAB6595FDFF622
codename=$(lsb_release -c | awk '{print $2}')
sudo tee /etc/apt/sources.list.d/ddebs.list << EOF
deb http://ddebs.ubuntu.com/ ${codename} main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-security main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-updates main restricted universe multiverse
deb http://ddebs.ubuntu.com/ ${codename}-proposed main restricted universe multiverse
EOF
sudo apt-get update
Run the following command after adding the source:
stap-prep
The following is a sample result:
Please install linux-headers-4.4.0-104-generic
Please install linux-image-4.4.0-104-generic-dbgsym
apt install -y linux-image-4.4.0-104-generic-dbgsym
apt install -y linux-headers-4.4.0-104-generic
Run the following command to install SystemTap:
yum install -y systemtap
For the purpose of this article, we assume that debuginfo
is not added. Add the following to /etc/yum.repos.d/CentOS-Debug.repo
and save.
[debuginfo]
name=CentOS-$releasever - DebugInfo
baseurl=http://debuginfo.centos.org/$releasever/$basearch/
gpgcheck=0
enabled=1
protect=1
priority=1
Run the following command to check for dependencies and install them:
Note:The following command installs
kernel-debuginfo
.
stap-prep
Run the following command to check if the node has multiple versions of kernel-devel
installed:
rpm -qa | grep kernel-devel
The returned result is as follows:
kernel-devel-3.10.0-327.el7.x86_64
kernel-devel-3.10.0-514.26.2.el7.x86_64
kernel-devel-3.10.0-862.9.1.el7.x86_64
If there are multiple versions, keep the one that corresponds to the kernel version. For example, if the current kernel version is 3.10.0-862.9.1.el7.x86_64
, delete all version except kernel-devel-3.10.0-862.9.1.el7.x86_64
.
Note:
- You can use
uname -r
to view the kernel version.- Make sure
kernel-debuginfo
andkernel-devel
are both installed and their versions correspond to the kernel version.
rpm -e kernel-devel-3.10.0-327.el7.x86_64 kernel-devel-3.10.0-514.26.2.el7.x86_64
You can use SystemTap to monitor a process in order to troubleshoot pod issues. This is how it works:
modprobe
to load the module into the kernel.kubectl describe pod <pod name>
The returned result is as follows:
......
Container ID: docker://5fb8adf9ee62afc6d3f6f3d9590041818750b392dff015d7091eaaf99cf1c945
......
Last State: Terminated
Reason: Error
Exit Code: 137
Started: Thu, 05 Sep 2019 19:22:30 +0800
Finished: Thu, 05 Sep 2019 19:33:44 +0800
docker inspect -f "{{.State.Pid}}" 5fb8adf9ee62afc6d3f6f3d9590041818750b392dff015d7091eaaf99cf1c945
The returned result is as follows:
7942
Use the Exit Code
in the result of Step 1 to obtain the status code of the last container exit. For the purpose of this article, we will use 137 as an example. The analysis is as follows:
SIGKILL
. However, we still cannot determine the reason why the process exited.Assuming the issue is reproducible, you can use a SystemTap to troubleshoot the problem.
sg.stp
. Add the following content and save.global target_pid = 7942
probe signal.send{
if (sig_pid == target_pid) {
printf("%s(%d) send %s to %s(%d)\n", execname(), pid(), sig_name, pid_name, sig_pid);
printf("parent of sender: %s(%d)\n", pexecname(), ppid())
printf("task_ancestry:%s\n", task_ancestry(pid2task(pid()), 1));
}
}
Note:Substitute
pid
with the value of the main container process pid obtained in Step 2. For the purpose of this article, we will use 7942 as an example:
stap sg.stp
When the container process is killed, the script captures the event and outputs the following:
pkill(23549) send SIGKILL to server(7942)
parent of sender: bash(23495)
task_ancestry:swapper/0(0m0.000000000s)=>systemd(0m0.080000000s)=>vGhyM0(19491m2.579563677s)=>sh(33473m38.074571885s)=>bash(33473m38.077072025s)=>bash(33473m38.081028267s)=>bash(33475m4.817798337s)=>pkill(33475m5.202486630s)
By observing task_ancestry
, you can see the parent processes of the stopped process. In the example above, you can see a strange process called vGhyM0
. This usually indicates that there is a trojan in the system. Take the necessary steps to clean it so your containers can function properly.
Was this page helpful?