1. 查看 cgroup kill 的记录

[root@liqiang.io 11:39:35 ~]$ sudo dmesg | grep -i kill 
[15131.672861] prometheus invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
[15131.672892]  [<ffffffff81188094>] oom_kill_process+0x254/0x3d0
[15131.672911] Task in /system.slice/system-prometheus.slice killed as a result of limit of /system.slice/system-prometheus.slice
[15131.673051] Memory cgroup out of memory: Kill process 27980 (job-center) score 865 or sacrifice child
[15131.673089] Killed process 27696 (job-center) total-vm:2049880kB, anon-rss:923484kB, file-rss:9756kB, shmem-rss:0kB

这里可以查看到因为哪条限制被 kill 了,不过这个前面的时间没法看,因为这个是距离系统运行的时间的秒数,所以,如果需要让这个时间友好化,可以尝试这条命令:

[root@liqiang.io 11:39:35 ~]$ dmesg | grep -i kill | sed -r 's#^\[([0-9]+\.[0-9]+)\](.*)#echo -n "[";echo -n $(date --date="@$(echo "$(grep btime /proc/stat|cut -d " " -f 2)+\1" | bc)" +"%c");echo -n "]";echo -n "\2"#e'
[Tue 23 Apr 2019 01:14:48 PM CST] prometheus invoked oom-killer: gfp_mask=0xd0, order=0, oom_score_adj=0
[Tue 23 Apr 2019 01:14:48 PM CST] [<ffffffff81188094>] oom_kill_process+0x254/0x3d0
[Tue 23 Apr 2019 01:14:48 PM CST] Task in /system.slice/system-prometheus.slice killed as a result of limit of /system.slice/system-prometheus.slice
[Tue 23 Apr 2019 01:14:48 PM CST] Memory cgroup out of memory: Kill process 27980 (job-center) score 865 or sacrifice child
[Tue 23 Apr 2019 01:14:48 PM CST] Killed process 27696 (job-center) total-vm:2049880kB, anon-rss:923484kB, file-rss:9756kB, shmem-rss:0kB

Update @ 2019-06-04 17:27:42 星期二

我发现 demsg 的数据有时效性,所以换了一种方式更好用:

[root@liqiang.io]# journalctl -xb | egrep -i 'killed process'
May 30 05:07:51 node37 kernel: Killed process 19025 (prometheus) total-vm:17200736kB, anon-rss:1048088kB, file-rss:14156kB, shmem-rss:0kB
May 30 05:08:32 node37 kernel: Killed process 4481 (prometheus) total-vm:17200160kB, anon-rss:1046716kB, file-rss:11368kB, shmem-rss:0kB
May 30 05:09:18 node37 kernel: Killed process 6215 (prometheus) total-vm:17201696kB, anon-rss:1047984kB, file-rss:12248kB, shmem-rss:0kB
May 30 05:10:10 node37 kernel: Killed process 8459 (prometheus) total-vm:17201504kB, anon-rss:1046552kB, file-rss:11472kB, shmem-rss:0kB
May 30 05:10:54 node37 kernel: Killed process 12523 (prometheus) total-vm:17201248kB, anon-rss:1046980kB, file-rss:11524kB, shmem-rss:0kB
May 30 05:11:34 node37 kernel: Killed process 14263 (prometheus) total-vm:17201056kB, anon-rss:1046456kB, file-rss:11468kB, shmem-rss:0kB
May 30 05:12:18 node37 kernel: Killed process 15970 (prometheus) total-vm:17202016kB, anon-rss:1048176kB, file-rss:12020kB, shmem-rss:0kB
May 30 05:13:07 node37 kernel: Killed process 18263 (prometheus) total-vm:17202080kB, anon-rss:1048176kB, file-rss:12120kB, shmem-rss:0kB
May 30 05:13:47 node37 kernel: Killed process 19967 (prometheus) total-vm:17202016kB, anon-rss:1047676kB, file-rss:12428kB, shmem-rss:0kB
May 30 05:14:28 node37 kernel: Killed process 21267 (prometheus) total-vm:17202080kB, anon-rss:1048076kB, file-rss:12004kB, shmem-rss:0kB
May 30 05:15:15 node37 kernel: Killed process 23485 (prometheus) total-vm:17199904kB, anon-rss:1048180kB, file-rss:11416kB, shmem-rss:0kB
May 30 05:15:53 node37 kernel: Killed process 26533 (prometheus) total-vm:17200160kB, anon-rss:1048036kB, file-rss:11408kB, shmem-rss:0kB
May 30 05:16:32 node37 kernel: Killed process 28255 (prometheus) total-vm:17201632kB, anon-rss:1045644kB, file-rss:11504kB, shmem-rss:0kB
May 30 05:17:15 node37 kernel: Killed process 29517 (prometheus) total-vm:17201056kB, anon-rss:1045368kB, file-rss:11560kB, shmem-rss:0kB
May 30 05:17:49 node37 kernel: Killed process 31979 (prometheus) total-vm:17199648kB, anon-rss:1048328kB, file-rss:11560kB, shmem-rss:0kB
May 30 05:18:42 node37 kernel: Killed process 1026 (prometheus) total-vm:17202464kB, anon-rss:1048264kB, file-rss:12712kB, shmem-rss:0kB
May 30 05:19:24 node37 kernel: Killed process 3548 (prometheus) total-vm:17200928kB, anon-rss:1047216kB, file-rss:11552kB, shmem-rss:0kB
May 30 05:20:11 node37 kernel: Killed process 5275 (prometheus) total-vm:17201696kB, anon-rss:1045392kB, file-rss:11500kB, shmem-rss:0kB
May 30 05:20:56 node37 kernel: Killed process 9304 (prometheus) total-vm:17201184kB, anon-rss:1047960kB, file-rss:11536kB, shmem-rss:0kB
May 30 05:21:36 node37 kernel: Killed process 11027 (prometheus) total-vm:17202016kB, anon-rss:1047980kB, file-rss:12164kB, shmem-rss:0kB
May 30 05:22:11 node37 kernel: Killed process 12736 (prometheus) total-vm:17200608kB, anon-rss:1047116kB, file-rss:11316kB, shmem-rss:0kB
May 30 05:22:59 node37 kernel: Killed process 13979 (prometheus) total-vm:17201888kB, anon-rss:1047868kB, file-rss:12624kB, shmem-rss:0kB
May 30 05:23:42 node37 kernel: Killed process 15715 (prometheus) total-vm:17200800kB, anon-rss:1047392kB, file-rss:11280kB, shmem-rss:0kB
May 30 05:24:23 node37 kernel: Killed process 17464 (prometheus) total-vm:17200096kB, anon-rss:1041724kB, file-rss:11476kB, shme

2. 查看一个进程被哪些 cgroup 限制

[root@liqiang.io 11:39:35 ~]$ ps aux | grep prometheus | grep -v grep | awk {'print $2'}
23654
[root@liqiang.io 11:39:35 ~]$ cat /proc/23654/cgroup
11:hugetlb:/
10:cpuset:/kubernetes/app
9:freezer:/
8:perf_event:/
7:cpuacct,cpu:/system.slice/system-prometheus.slice
6:net_prio,net_cls:/
5:memory:/system.slice/system-prometheus.slice
4:blkio:/system.slice/system-prometheus.slice
3:devices:/system.slice/system-prometheus.slice
2:pids:/system.slice/system-prometheus.slice
1:name=systemd:/system.slice/system-prometheus.slice/prometheus.service