Introduction
When you execute a command like "foo bar baz", the command-line arguments are typically "foo", "bar", and "baz". Although you might think that the arguments are only "bar" and "baz", this is the definition anyway.
In C and C++, command-line arguments can be referenced from the argv
array argument of the main
function in the program. In the example above, the executable name is stored in argv[0]
. The "bar" after that is in argv[1]
, and "baz" is in argv[2]
. The variable equivalent to argv is "$0","$1","$2"... in shell scripts, sys.argv
in Python, and os.Args
in Go, etc. However, scripts such as shell scripts and Python scripts do not directly expose command-line arguments like C, but show slightly modified ones. This will be discussed later.
About the first element of command-line arguments
The first element of command-line arguments (hereafter referred to as argv[0]
) conventionally contains the name of the executable. When executing a program, regardless of the language in which the program is written, the execve()
system call shown below is eventually called, specifying the program's executable name in the pathname argument and the command-line arguments in the argv
argument.
int execve(const char *pathname, char *const argv[], char *const envp[]);
At this time, following the convention, the same value is specified for pathname and argv[0]
.
When would you set argv[0]
to something other than the name of the executable? For example, when bash is a login shell, argv[0]
is not bash but rather "-bash" with a "-" at the beginning. This allows bash to know whether it is a login shell at the time of execution, and to branch its processing accordingly (such as changing the configuration file to be loaded). We will actually verify the value of argv[0]
for bash in the next section.
When executing a program, if the executable is run through an interpreter like a bash script, the interpreter's executable name, not the script name, is stored in argv[0]
. For example, if there is a bash script called "test.sh", when executing "./test.sh", argv[0]
is the executable name of bash, and "./test.sh" is stored in argv[1]
. However, this is hard to handle for programmers. So in bash, you can access argv[0]
with "$0" and argv[1]
with "$1". We will actually verify this in a later section as well.
Verifying the values of a process's command-line arguments using procfs
The command-line arguments of each process can be referenced from /proc/<pid>/cmdline
. For example, on the Linux machine where the author is currently logged in via ssh, the command-line arguments for rsyslogd
, which collects system logs, were as follows:
sat@tea:~$ pgrep rsyslogd
568
sat@tea:~$ cat /proc/568/cmdline
/usr/sbin/rsyslogd-n-iNONEsat@tea:~
The output looks a bit odd. A command-option-like string is connected after the executable-name-like string "/usr/sbin/rsyslogd". Moreover, there is no newline before the next prompt. This is not because the /proc/<pid>/cmdline
outputs all arguments without any delimiters such as " " by design. In fact, each argument is separated by a null character (a byte with a value of 0, or "\0" in C) and bash does not display the null character on the screen. We can use binary dump tools like hexdump
to confirm this behavior.
$ hexdump -c /proc/568/cmdline
0000000 / u s r / s b i n / r s y s l o
0000010 g d \0 - n \0 - i N O N E \0
000001d
Let's take a look at the argv[0]
of bash instances on the system. The last field of the ps ax
output shows the command-line arguments separated by spaces, so we'll use this to list the existing bash instances on the system.
sat@tea:~$ ps ax | grep bash
5239 pts/3 Ss+ 0:00 /usr/bin/bash --init-file /home/sat/.vscode-server/bin/74b1f979648cc44d385a2286793c226e611f59e7/out/vs/workbench/contrib/terminal/browser/media/shellIntegration-bash.sh
8725 pts/4 Ss 0:00 -bash
8907 pts/4 S 0:00 /bin/bash ./test.sh
8909 pts/4 S 0:00 /bin/bash ./test.sh
8929 pts/4 S+ 0:00 grep --color=auto bash
for p ax
We can see that the processes with pid 5239, 8725, 8907, and 8909 are bash instances. Among them, the process with pid=8725, where the first character of argv[0]
is "-", is the login shell where the author is running the above commands.
Thus, in reality, argv[0]
is "/usr/sbin/rsyslogd", argv[1]
is "-n", and argv[2]
is "-iNONE".
Let's also take a look at an example of a bash script. The script we'll use here outputs "$0" and then sleeps indefinitely.
sat@tea:~$ cat test.sh
#!/bin/bash
echo $0
sleep infinity
sat@tea:~$ ./test.sh
fg
./test.sh # Output of `$0`
^Z
[2]+ Stopped ./test.sh
sat@tea:~$ bg
[2]+ ./test.sh &
sat@tea:~$ hexdump -c /proc/8909/cmdline
0000000 / b i n / b a s h \0 . / t e s t
0000010 . s h \0
0000014
While the value of "$0" is the script name "./test.sh", the value of argv[0]
is "/bin/bash", and the script name is in argv[1]
. This means that although it appears to the user as if they are directly executing the "./test.sh" file, the actual program running is bash. Bash interprets and executes the script, mapping argv[1]
and beyond to $0 and later variables within the program.
Differences between the command name and command line arguments held by the kernel
Please note that the argv[0]
mentioned in this article, which "usually contains the executable file name," is different from the command name seen by the kernel, which is displayed by commands like ps
. For more information on the command name from the kernel's perspective, please refer to the following article.
https://dev.to/satorutakeuchi/command-name-from-the-perspective-of-the-linux-kernel-257l
The command name seen by the kernel is the "first 15 bytes of the basename of the executable file name" and is different from argv[0]
.
Conclusion
I hope this article will reduce confusion about command names, executable file names, command line arguments, and the command line arguments that can be referenced from within the program's source code.
Top comments (0)