本文介绍如何根据 Pod 异常状态信息中的 Exit Code 进一步定位问题。
执行以下命令,查看异常 Pod 状态信息。
kubectl describe pod <pod name>
返回结果如下:
Containers:
kubedns:
Container ID: docker://5fb8adf9ee62afc6d3f6f3d9590041818750b392dff015d7091eaaf99cf1c945
Image: ccr.ccs.tencentyun.com/library/kubedns-amd64:1.14.4
Image ID: docker-pullable://ccr.ccs.tencentyun.com/library/kubedns-amd64@sha256:40790881bbe9ef4ae4ff7fe8b892498eecb7fe6dcc22661402f271e03f7de344
Ports: 10053/UDP, 10053/TCP, 10055/TCP
Host Ports: 0/UDP, 0/TCP, 0/TCP
Args:
--domain=cluster.local.
--dns-port=10053
--config-dir=/kube-dns-config
--v=2
State: Running
Started: Tue, 27 Aug 2019 10:58:49 +0800
Last State: Terminated
Reason: Error
Exit Code: 255
Started: Tue, 27 Aug 2019 10:40:42 +0800
Finished: Tue, 27 Aug 2019 10:58:27 +0800
Ready: True
Restart Count: 1
在返回结果的容器列表 Last State
字段中, Exit Code
为程序上次退出时的状态码,该值不为0即表示程序异常退出,可根据退出状态码进一步分析异常原因。
kill -9
或 ctrl+c
,导致程序状态变为 SIGKILL
或 SIGINT
。exit(-1)
),此时将会自动执行转换,最终呈现的状态码仍会在0 - 255之间。code
,则不同情况下转换方式如下:256 - (|code| % 256)
code % 256
SIGKILL
中断信号杀死。异常原因可能为:resources.limits
)。例如,内存溢出(OOM)。由于资源限制是通过 Linux 的 cgroup 实现的,当某个容器内存达到资源限制, cgroup 就会将其强制停止(类似于 kill -9
),此时通过 describe pod
可以看到 Reason 是 OOMKilled
。说明:
无论是 cgroup 限制,还是因为节点机器本身资源不够导致的进程停止,都可以从系统日志中找到记录。方法如下:
Ubuntu 系统日志存储在目录/var/log/syslog
,CentOS 系统日志存储在目录/var/log/messages
中,两者系统日志均可通过journalctl -k
命令进行查看。
exit(1)
或 exit(-1)
导致的,而-1将会根据规则转换成255。Linux 程序被外界中断时会发送中断信号,程序退出时的状态码为中断信号值加128。例如, SIGKILL
的中断信号值为9,那么程序退出状态码则为9 + 128 = 137。更多标准信号值参考如下表:
信号 Signal | 状态码 Value | 动作 Action | 描述 Comment |
---|---|---|---|
SIGHUP | 1 | Term | Hangup detected on controlling terminal or death of controlling process |
SIGINT | 2 | Term | Interrupt from keyboard |
SIGQUIT | 3 | Core | Quit from keyboard |
SIGILL | 4 | Core | Illegal Instruction |
SIGABRT | 6 | Core | Abort signal from abort(3) |
SIGFPE | 8 | Core | Floating-point exception |
SIGKILL | 9 | Term | Kill signal |
SIGSEGV | 11 | Core | Invalid memory reference |
SIGPIPE | 13 | Term | Broken pipe: write to pipe with no readers; see pipe(7) |
SIGALRM | 14 | Term | Timer signal from alarm(2) |
SIGTERM | 15 | Term | Termination signal |
SIGUSR1 | 30,10,16 | Term | User-defined signal 1 |
SIGUSR2 | 31,12,17 | Term | User-defined signal 2 |
SIGCHLD | 20,17,18 | Ign | Child stopped or terminated |
SIGCONT | 19,18,25 | Cont | Continue if stopped |
SIGSTOP | 17,19,23 | Stop | Stop process |
SIGTSTP | 18,20,24 | Stop | Stop typed at terminal |
SIGTTIN | 21,21,26 | Stop | Terminal input for background process |
SIGTTOU | 22,22,27 | Stop | Terminal output for background process |
/usr/include/sysexits.h
中进行了退出状态码标准化(仅限 C/C++),如下表:
定义 | 状态码 | 描述 |
---|---|---|
#define EX_OK | 0 | successful termination |
#define EX__BASE | 64 | base value for error messages |
#define EX_USAGE | 64 | command line usage error |
#define EX_DATAERR | 65 | data format error |
#define EX_NOINPUT | 66 | cannot open input |
#define EX_NOUSER | 67 | addressee unknown |
#define EX_NOHOST | 68 | host name unknown |
#define EX_UNAVAILABLE | 69 | service unavailable |
#define EX_SOFTWARE | 70 | internal software error |
#define EX_OSERR | 71 | system error (e.g., can't fork) |
#define EX_OSFILE | 72 | critical OS file missing |
#define EX_CANTCREAT | 73 | can't create (user) output file |
#define EX_IOERR | 74 | input/output error |
#define EX_TEMPFAIL | 75 | temp failure; user is invited to retry |
#define EX_PROTOCOL | 76 | remote error in protocol |
#define EX_NOPERM | 77 | permission denied |
#define EX_CONFIG | 78 | configuration error |
#define EX__MAX 78 | 78 | maximum listed value |
更多状态码含义可参考以下表格:
状态码 | 含义 | 示例 | 描述 |
---|---|---|---|
1 | Catchall for general errors | let "var1 = 1/0" | Miscellaneous errors, such as "divide by zero" and other impermissible operations |
2 | Misuse of shell builtins (according to Bash documentation) | empty_function() {} | Missing keyword or command |
126 | Command invoked cannot execute | /dev/null | Permission problem or command is not an executable |
127 | "command not found" | illegal_command | Possible problem with $PATH or a typo |
128 | Invalid argument to exit | exit 3.14159 | exit takes only integer args in therange0 - 255 (seefirst footnote) |
128+n | Fatal error signal "n" | kill -9 $PPID of script | $? returns137 (128 + 9) |
130 | Script terminated by Control-C | Ctl-C | Control-C is fatal error signal 2, (130 = 128 + 2, see above) |
255* | Exit status out of range | exit -1 | exit takes only integer args in the range0 - 255 |
本页内容是否解决了您的问题?