In the COMMAND column of BusyBox ps, the name of the process will sometimes be wrapped in curly braces or square brackets.
Curly braces {} are used to indicate that the filename of the executable retrieved from /proc/<pid>/stat doesn’t match the argv[0] value parsed from /proc/<pid>/cmdline. This can occur if the program has been run with an interpreter e.g. /usr/bin/python or has modified its own argv.
Square brackets [] are used to indicate that the process’ /proc/<pid>/cmdline was empty. Possible reasons for this are the process is a zombie or a kernel thread.
BusyBox is a bundle of UNIX utilities often found on embedded devices.
One command it provides is ps, which lists the currently running processes.
An example of the output of this command is:
PID USER TIME COMMAND
38 root 0:00 [oom_reaper]
3272 alan 2:24 {terminator} /usr/bin/python /usr/bin/terminator
5259 alan 0:00 vim procps/ps.c
The basic interpretation of this table is hopefully fairly obvious to anyone familiar with Linux. However I wanted to know some more information about the COMMAND column. In particular:
terminator above) wrapped in curly braces?oom_reaper above) wrapped in square brackets?I downloaded and built the BusyBox source code to get a better understanding of how ps works:
git clone git://busybox.net/busybox.git
cd busybox
git checkout 9663bbd17ba3ab9f7921d7c46f07d177cb4a1435
make menuconfig
make
make -j5 CONFIG_PREFIX=~/busybox-install install
As far as this investigation is concerned, BusyBox ps inspects the following two files. The documentation for each can be found in in man proc.
/proc/<pid>/cmdline - “This read-only file holds the complete command line for the process”
/proc/<pid>/stat - “Status information about the process”
The first BusyBox function we’re interested in is procps_scan(). This will read /proc/<pid>/stat and retrieve the string comm, which is the: “filename of the executable”.
When outputting the rows of process information to the terminal, the function format_process() will call read_cmdline().
This will retrieve the contents of /proc/<pid>/cmdline and sanitise its contents by swapping the delimiting NULL’s between the command line arguments for spaces. (I’ll refer to this sanitised string later as cmdline_contents).
This function will then find the name of the process actually being run by finding the basename of first command line argument, argv[0]. The strings basename(argv[0]) and comm are then compared. Based on this one of the following is chosen to be displayed in the COMMAND column of ps:
cmdline_contents if the process names were equal{comm} cmdline_contents if the process names were not equal[comm] if the file cmdline was emptyTo demonstrate this, we can use the example of the stat and cmdline files for an instance of terminator running on my system:
$ cat /proc/14159/stat
14159 (terminator) S 2736 2568 2568 1026 2568 4194560 23680 508 68 1 3268 423 0 0 20 0 4 0 9123075 905396224 17607 18446744073709551615 94654019346432 94654022492352 140735972289904 0 0 0 0 16781312 65538 0 0 0 17 2 0 0 7 0 0 94654024590000 94654025079160 94654044663808 140735972297096 140735972297132 140735972297132 140735972298724 0
$ strings -n1 /proc/14159/cmdline
/usr/bin/python
/usr/bin/terminator
From stat, BusyBox extracts terminator as the variable comm.
The sanitised value of the cmdline file, cmdline_contents will be /usr/bin/python /usr/bin/terminator.
The output of basename(argv[0]) here is python which is not equal to terminator. The ps output will therefore be:
{terminator} /usr/bin/python /usr/bin/terminator
If the comm value in the stat file is “The filename of the executable” why would this be different than the program listed as the cmdline’s argv[0]?
Certain executables are actually scripts that need an interpreter to run them, e.g. Bash or Python.
Before running a program Linux checks to see if a shebang exists at the start of the file. If so, the user command is passed as an argument to the interpreter listed within the file. At program execution argv[0] will be the path to the interpreter, not the command being executed.
I started wondering what would happen to /proc/<pid>/cmdline if a program started modifying its argv? I first went to check if this was actually legal behaviour in C. It is, as documented in the C99 standard:
The parameters argc and argv and the strings pointed to by the argv array shall be modifiable by the program, and retain their last-stored values between program startup and program termination.
To test how Linux would behave if these strings were modified I put together the following:
#include <stdio.h>
#include <string.h>
#include <sys/types.h>
#include <unistd.h>
int main(int const argc, char* argv[])
{
int argv_index = 0;
pid_t this_pid = getpid();
printf("pid is %d\n", this_pid);
getchar();
while (argv_index < argc)
{
memset(argv[argv_index],
'x',
strlen(argv[argv_index]));
argv_index++;
}
getchar();
return 0;
}
The outputs below show the contents of the above programs’ /proc/<pid>/cmdline before and after user input is provided to trigger the modification. It can be seen that modifications by a program to its argv are reflected in the cmdline file.
$ strings -n1 /proc/$(pgrep modify_argv)/cmdline
./modify_argv
testing
testing
one
two
three
$ strings -n1 /proc/$(pgrep modify_argv)/cmdline
xxxxxxxxxxxxx
xxxxxxx
xxxxxxx
xxx
xxx
xxxxx
After modification the output from BusyBox ps is:
PID USER TIME COMMAND
1908 alan 0:00 {modify_argv} xxxxxxxxxxxxx xxxxxxx xxxxxxx xxx xxx xxxxx
The manpage for proc suggests that if a process’ cmdline is empty then it is likely a zombie.
The following program was written to test this. It will fork a child process that returns after five seconds, creating a zombie.
#include <stdio.h>
#include <stdlib.h>
#include <sys/types.h>
#include <unistd.h>
int main()
{
if (fork())
{
getchar();
}
else
{
sleep(5);
}
return 0;
}
A quick check of the process’ cmdline file shows that it is in fact empty once it becomes a zombie. Additionally, we can observe this change by looking at the output of ps before sleep(5) expires:
8702 alan 0:00 ./zombie.exe
8703 alan 0:00 ./zombie.exe
and aftersleep(5) expires:
8702 alan 0:00 ./zombie.exe
8703 alan 0:00 [zombie.exe]
The following comment in the source code of BusyBox suggests that there is another reason why cmdline might be empty:
/* Puts [comm] if cmdline is empty (-> process is a kernel thread) */
The process [oom_reaper] listed a ps output certainly sounds like it could / should be running from a kernel thread. We can check this though by finding the parent of the process, whose PID can be found as the fourth value in the stat file.
$ cat /proc/38/stat
38 (oom_reaper) S 2 0 0 0 -1 2129984 0 0 0 0 0 0 0 0 20 0 1 0 4 0 0 18446744073709551615 0 0 0 0 0 0 0 2147483647 0 0 0 0 17 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Inspecting the parent’s stat file, /proc/2/stat, shows the process’ comm string is kthreadd. While the name itself is a give away, the following quote from lwn confirms it:
kthreadd is the kernel thread daemon in charge of asynchronously spawning new kernel threads whenever requested
The command pstree -p 2 will list the running child processes of kthreadd. From a manual comparison of pstree and ps on my system, the only process listed in [square brackets] that was not a child of kthreadd was a zombie. I’m unsure if there is a third reason why a process might have an empty cmdline.