Real examples of command line manipulation

Mailcap

RFC 1524 defines a file format for mail user agents to use in order to work out how to handle different MIME types. A typical entry in a mailcap file might look like this:

application/pgp; gpg < %s | metamail; needsterminal; \
       test=test %{encapsulation}=entity ; copiousoutput
application/pgp; gpg < %s; needsterminal ; copiousoutput

Here we can see a replaceable string: %s. You might think that the specification would be robust against malicious mail received by an innocent mail user agent, and would offer the implementor advice on how to avoid accidentally running a command embedded inside the mail as a result of trying to execute a mailcap command. Here's what it says:

 

Security issues are not discussed in this memo.

 
--RFC 1524, Security Considerations  

To be fair, it was written in 1993, when the Internet was a friendlier place than it is now. Nevertheless, one would think it reasonable to assume that some discussion of how to interpret the mailcap entries would involve a step-by-step procedure for the string replacements. Here are the relevant pieces:

 

(Because of differences in shells and the implementation and behavior of the same shell from one system to another, it is specified that the command line be intended as input to the Bourne shell, i.e., that it is implicitly preceded by "/bin/sh -c " on the command line.)

The two characters "%s", if used, will be replaced by the name of a file for the actual mail body data. […] Furthermore, any occurrence of "%t" will be replaced by the content-type and subtype specification. (That is, if the content-type is "text/plain", then %t will be replaced by "text/plain".) A literal % character may be quoted as \%. Finally, named parameters from the Content-type field may be placed in the command execution line using "%{" followed by the parameter name and a closing "}" character. The entire parameter should appear as a single command line argument, regardless of embedded spaces.

 
--RFC 1524, Appendix A  

The author has tried to be quite specific about what to do, but (unfortunately) not to the point of being pedantic about it. There are several problems with this text:

  • The part about prefixing the entire command line with "/bin/sh -c " was obviously an attempt to clarify how the command line is to be interpreted, but really I think it has made things more confusing. Is the entire mailcap-specified command line to be quoted or left as is before prefixing, and if quoted, then how: single quotes, or double quotes? The intent, I think, was to say “Bourne shell quoting and field-splitting rules apply” but already the damage is done: the implementor, reading the RFC, is now thinking that they need to manipulate the command line somehow. So, with that seed of confusion already in their mind…

  • …they are told to replace occurrences of %s with things. The quoting is quite confused in appendix A: %s is shown quoted; later %t is shown with no quotes; and %t's example replacement text, is then quoted. If it's intentional, it's pretty subtle, but noticeable enough to add to the confusion.

  • Good news near the end though, as %{…} is to be replaced with “a single command line argument”, implying one word in an argument vector. But what about our example above, “test %{encapsulation}=entity”? Is this to be replaced with [“test”, “encaps-value”, “=entity”]? It may be what's intended, but it isn't clear and it may matter.

  • There is no such talk of “single command line arguments” for %s or %t substitutions, and so we are left guessing about similar questions of those.

  • How are shell pipeline and sequence characters to be interpreted? Need they be quoted? Are they disallowed because they are not mentioned? (If so, our gpg example above will not work.)

In short, it isn't clear exactly how to construct an argument vector from a mailcap command line. Fortunately, the mail user agent gets to choose the temporary file name to use in place of %s, and in general mail user agents seem to replace awkward characters (such as quotes, dollar signs, etc) in other possible replacement strings with harmless characters (such as underscores) instead.

Nevertheless, some sample mailcap files have entries like this:

text/plain; shownonascii iso-8859-8 %s | more ; \
  test=test "`echo %{charset} | tr '[A-Z]' '[a-z]'`" = iso-8859-8; \
  needsterminal

This is exploitable. The problem with this is that %{charset} is not quoted, possibly because the author was unsure of how mail user agents would deal with string replacement in that case—and all because the RFC wasn't clear enough on this point.

It would probably have been a good idea if the specification had instead said something along these lines: the environment variable s is given the value of the name of the temporary file name containing the data to process; t is given the value of the MIME type of the data; and field_fieldname (for each fieldname in the Content-type header) are each given the value of the named Content-type field. The command line cmd is then executed using the argument vector [ “/bin/sh”, “-c”, cmd ].

Desktop Entries

Desktop environments such as KDE and GNOME use short configuration files known as desktop entries, which describe how an application is to be launched, what it's name is for menus, and so on. The Desktop Entry Standard is currently being drafted. One of the fields in the configuration file is Exec=, and it describes the command line to use to launch the program.

It has parameter variables, much like shell environment variables. For example, %f is replaced by a single file name if a file is dragged onto an icon representing a desktop entry, and %F is replaced by a list of files in case a selection of files is dropped onto the icon at once. What is unfortunate is that the manner in which these parameter variables are to be replaced is not specified—at least one implementation has transformed the command line “appname %f” into the argument vector [ “appname”, “” ] when there is no file name to substitute—and there is no discussion of quoting issues at all.

It's all very similar to the mailcap case, and since different desktop environment implementations (such as GNOME and KDE) need to understand how to convert a command line into an argument vector (and application packagers need to understand how to write desktop entry files in the first place), it is important to have a specification that is clear on these issues. It isn't enough to have all implementations agree: people have to write these files and so it needs to be documented.

The desktop entry file has a problematic requirement though. Not all parameter variables can be replaced by environment variables and given to the shell to deal with. Some parameter variables are lists.

POSIX shell parameter expansions have a nasty flaw in that they don't really allow for lists like this, except in one special case. The issue here is spaces, or more accurately “inter-field separators” (spaces, tabs, and newlines), which are the characters that are used to separate words in a command line. We've seen how putting double-quotes around a parameter expansion makes sure that the end result is contained in one word in the argument vector: F="my file.dvi"; ls $F won't show me my DVI file, but F="my file.dvi"; ls "$F" will. Positional parameters are slightly special because they are numbered, and we don't know in advance how many of them there are. The special string $* undergoes the following parameter expansion: it is replaced by all of the positional parameters, i.e. all of the arguments. Unfortunately field splitting happens afterwards and so any values containing spaces will get broken up into multiple words. But quoting doesn't help: "$*" is a single word containing all of the arguments, which isn't very useful. So $@, another special string, acts the same as $* except when enclosed in double-quotes, in which case it undergoes parameter expansion to produce one word per positional parameter. So if you want to pass the arguments on to another program, other_program "$@" is a good way to do it.

That's the only way to do it, and all you get are the positional parameters. With desktop entry files, there are several different list-type parameter variables, so they can't all use that mechanism.

One way around that is to use wordexp() first to do the hard parts of word expansion, and then manipulate the argument vector afterwards. Manipulating argument vectors is less error prone than manipulating command lines because there is no special interpretation to be done afterwards. Just call execve(), and your argument vector lands up in the executed program's main() function.

A good design for desktop entry Exec= lines would probably involve the following steps for transforming the text after the equal sign into an argument vector:

First, a couple of restrictions on the allowable values. All of the list variables (such as %F) must be on their own where they appear; they must not appear next to any character except a space or either end of the line. The string “ /../” is not permitted in the command line string.

Procedure 1.

  1. For each variable occurring in the string that is not a list variable, if the percent sign is not immediately prefixed by an odd number of backslash characters, replace both the percent sign and its variable letter x with ${x}. For example, replace %f with ${f}.

  2. For each variable occurring in the string that is a list variable, if the percent sign is not escaped as in the previous step, replace both the percent sign and its variable letter X with /../X.

  3. Call wordexp() to split the command line into words and form an argument vector, taking care to pass the flag that instructs wordexp() to return an error if command substitution would be performed.

  4. Build a new argument vector by copying, word by word, the one created by the call to wordexp(). For each word that starts “/../”, replace it with one word for each of the values that the relevant variable is a list of.

The string “/../” was chosen for marking the list variables because it is not subject to expansion or substitution, and while formed of valid file name characters, is not a useful file name itself. My suggestion for handling desktop entries does not include executing the shell to handle pipelines and sequences of commands as in the section called “Mailcap”; there seems little need in this application.