DEV Community

Yawar Amin
Yawar Amin

Posted on

Quick-and-dirty pure command-line arguments in OCaml

IT IS well known by now that the Cmdliner package is the way to go for powerful, pure command-line argument parsing in OCaml, unless you are already in the Jane Street ecosystem.

But what about the OCaml standard library's own command-line parsing module, Arg? The conventional wisdom is that it uses mutation, so it's not pure functional and hence possibly unsafe. But this is not necessarily the case. It is fairly easy to wrap up the usage of Arg in such a way that it becomes purely functional and perfectly safe.

If you have relatively simple command-line parsing needs, then it may be quicker and simpler to just use the built-in module rather than pulling in yet another dependency. Let's look at an example.

The example CLI

Say we have the following CLI:

$ ./cmdarg -help
cmdarg [-w <time>] [-r <repeat>] <msg>

Prints <msg> out to standard output.

  -w Time in seconds to wait before printing the message [default 0]
  -r How many times to print the message [default 1]
  -help  Display this list of options
  --help  Display this list of options
Enter fullscreen mode Exit fullscreen mode

We will implement this in two steps:

  1. Define a record type to represent the fully-parsed values from the command line
  2. Define a parse function to actually parse the command line and return a record filled with the correct values

Here's the annotated code:

(* cmdarg.ml *)

module Cmd = struct
  type t = { wait : int; repeat : int; msg : string }
Enter fullscreen mode Exit fullscreen mode

Note, all three fields are required. We have defaults for two of them, and the other one must be provided by the user.

  let usage = "cmdarg [-w <time>] [-r <repeat>] <msg>

Prints <msg> out to standard output.
"
Enter fullscreen mode Exit fullscreen mode

The rest of the usage message will be printed by the Arg module's parser function itself.

  let parse () =
    let wait = ref 0 in
    let repeat = ref 1 in
    let msg = ref None in
Enter fullscreen mode Exit fullscreen mode

We set up the mutable variables inside the Cmd.parse function, where they are not visible to callers. This is the key to making the whole thing pure.

    let specs = [
      "-w", Arg.Set_int wait, "Time in seconds to wait before printing the message [default 0]";
      "-r", Set_int repeat, "How many times to print the message [default 1]";
    ]
Enter fullscreen mode Exit fullscreen mode

A list of 3-tuples that describes the command-line options. The design is quite clever, it takes the mutable refs and sets them as it parses the command line and comes across the corresponding options. If the option is not found, then the ref stays at its default value.

    in
    let anon str = msg := Some str in
Enter fullscreen mode Exit fullscreen mode

What to do when we come across an 'anonymous' argument, i.e. one not preceded by an option. in this case, wrap it in Some and assign that to the msg ref.

    Arg.parse specs anon usage;
Enter fullscreen mode Exit fullscreen mode

Here is where the mutation happens. It takes the specs, the anonymous argument handler, and the usage message and fills up the previously-defined refs as needed.

    {
      wait = !wait;
      repeat = !repeat;
      msg = match !msg with
        | Some m ->
          m
        | None ->
          Arg.usage specs usage;
          invalid_arg "<msg> is required";
    }
Enter fullscreen mode Exit fullscreen mode

Here we create and return the actual record value, by grabbing the values set in the refs after parsing is finished. For msg, since it's wrapped in an option, we need to extract it and error out if the user didn't provide it. If the anonymous argument had been actually optional, we could have provided a default, or a list to handle more than one option.

Note the critical thing here is we never expose the internal refs to the function caller, only the pure immutable record value created after parsing the command line and filling up the refs. When a function has internal mutation but the callers can't observe it, it is a pure function.

end

let () =
  let { Cmd.wait; repeat; msg } = Cmd.parse () in
  for _ = 1 to repeat do
    Unix.sleep wait;
    print_endline msg
  done
Enter fullscreen mode Exit fullscreen mode

Finally, we call Cmd.parse and get the parsed command-line options in a nice, convenient record value that we destructure and use. Compile and test with:

ocamlopt -o cmdarg unix.cmxa cmdarg.ml
./cmdarg
Enter fullscreen mode Exit fullscreen mode

Conclusion

This was a pretty simple example, but you can actually do some pretty sophisticated parsing, with some care. E.g., imagine getting hostnames and port numbers on the command line and parsing them internally into Unix.sockaddr addresses, so that it's super convenient to open sockets.

It's probably possible to handle even more complex scenarios, e.g. subcommands like git, but in my opinion at that point it's probably worth using Cmdliner.

Top comments (2)

Collapse
 
pricesmith profile image
pricesmith

learning ocaml and this is helpful. do you have a git repo for this? Being so new to the syntax, I'm not 100% sure how some of these lines fit together :o

Collapse
 
yawaramin profile image
Yawar Amin

Hi, I don't have a repo but all the OCaml sources shown here all go into a single file cmdarg.ml, you can just copy-paste all the lines exactly as shown piece by piece. Then the compilation command should work.