2012-05-12

Thoughts on using function signatures as a DSL for CLI parsers

I have no idea why, but this morning I thought about a decorator for delineating what function should be treated as the main function (e.g. using a decorator instead of the traditional if __name__ == '__main__' idiom). Now I solved it in my head on the spot, and then immediately realized someone had to have solved this already. Turns out various people have done things as nuts as examine stack levels to detect the __main__ name, but the most straight-forward solution I found doesn't do anything nearly as nuts or CPython-specific and is basically what I came up with. There was a red herring, though, in everyone's solution where they claim the decorator has to be on the last function in your module. While technically true when using the decorator as a decorator only, you can also just as easily not decorate the function and instead, at the end of your module, do something like main(func) since that is the same as decorating func with main.

A really simple expansion of this idea of helping out with defining what function is the main function, is to pass in sys.argv and to return a value to signify exit status: sys.exit(func(sys.argv[1:])). So now you have made the decorator more useful than replacing the old __name__ idiom.

But while that is nice and helps deal with the very common case, I wanted more. Why can't you introspect on the arguments the function takes and use that to automatically generate a command-line parser? I did a search and the best I could find is entrypoint, but it doesn't go far enough for me. What I want is to use the full expressiveness of function parameters in Python to express as much about what should/could be given on the command-line along with passing in as little as possible to the decorator in order to replicate the common case of command-line parsing; think just as easy as getopt but more powerful by using as much of argparse as you can without coming up with complicated rules about how things should work (since once you pass a certain complexity threshold you should just build the argument parser using argparse's API directly and stop trying to optimize for it like I'm suggesting).

So what do we have at our disposal to build such a decorator? We have positional arguments so we know how many arguments are required without some specific qualifier. We have variable positional arguments (e.g. *args) to take an optional number of extra arguments at the end of the command-line. We have keyword arguments which are optional flags that one can specify. You could even have variable keyword arguments for major flexibility, but that just seems like a total lack of structure the CLIs just don't typically provide. With all of that you can reproduce getopt without any issue for long-form names. For short names, I would say you need to pass in a mapping of short names to long names into the decorator. Same goes for long names to help string (you can use the function's docstring for the main help for the app itself).

But where things get really interesting is when you take into consideration function annotations. That opens up the possibility of going beyond getopt and potentially supporting argparse's action, nargs, and type options. Take the type option as an example. You could say limit:int=10 to have a command-line option called --limit which only accepted an integer and defaulted to 10.  This obviously could also work with float or any other type where you can just pass in a string to the constructor to get back an instance of the type. So you have a general case which can be useful, but you can you potentially special-case some things to get enhanced functionality where it doesn't make sense to simply take in a string?

Lists pose an interesting option as argparse provides both nargs for specifying the number of arguments to a single option, or the append action for accepting multiple instances of the same option and accumulating them. In my mind both can be expressed in a way that I think makes sense but some might view as too magical. If you specify names:list=[], then that supports the append action, e.g. --names Brett --names Andrea leads to names being set to ['Brett', 'Andrea']. But if you were to do names:['+']=[], then that would get the same result from --names Brett Andrea. In other words, the list type specifies the append action while a list instance specifies using the nargs option with the single item in the list acting as the value to set to nargs.

For booleans, I would want the use of the bool type to mean use either the store_true or store_false action based on what the default argument was. So turn_on:bool=True would use the store_false action since the argument is meant to be a boolean and it's default value is True, meaning that if the option was specified it represents the reverse.

Finally, the tricky bit is for files since that is a common command-line argument and you might as well open the file and close it for the function. The solution argparse uses is a specific FileType class where you can pass specific arguments to use when opening the file. The problem is that it doesn't support everything open() does, e.g. encoding. So what I would want to do instead is provide a partial function that took everything but the file path and then when it came time to call the main function, passed in the file path to the partial function, passed the returned file to contextlib.closing(), and then passed it on to the main function. You could even generalize a lot of this and simply say that whatever is specified as the function annotation, if it isn't a special-case like lists, then you call the annotation with what came from the command-line and if it provides a context manager it is used before calling the main function.

So those are my thoughts on using function parameters as a DSL for getopt++/argparse-- functionality on a Saturday morning. Honestly the most complicated bit would be constructing the arguments to pass to the main function in the right order, otherwise it's just introspecting on a function's parameters and making the proper call to argparse. But then again the real question is whether anyone thinks this at all sounds reasonable enough to code it up.