2013-02-01

Remember that the "BC" in ABC means "Base Class"

[UPDATE: had a talk with +Thomas Wouters on IM and has caused me to rethink things. New thoughts up top, original post after the jump break]

I had mis-heard a comment Thomas Wouters made in a meeting about not raising NotImplementedError and using ABCs, which led to me thinking about the problem of having ABCs in your MRO which were not at the bottom and defined methods which would override methods you wanted to access farther down the inheritance chain. I had thought that calling super() in your ABCs in some manner was the solution. But after discussing things with Thomas I believe I was in the wrong and I had badly misheard what he had said. =)

Because importlib has a bunch of overlapping ABCs which inherit from each other I thought that the situation might come up where you inherited from two different classes which had overlap in methods but for which you would want them to build off of each other. But as Thomas pointed out to me, ABCs are meant to be at the bottom of an MRO; the BC stands for "Base Class" for a reason. If you are trying to interleave methods between two different classes implementing the same interface then either the granularity of the ABC is wrong or you shouldn't be inheriting from the ABC to begin with. I tried to come up with counter-examples but they were so convoluted and leading to bad API design in order to justify that I gave up and admitted they were stupid.

What does all of this mean for you when you are writing an ABC? You should just treat your ABCs as the bottom of your MRO. That means you should have all of your methods, even the abstract ones, do something sensible in case they are somewhat blindly reached through a super() call. If you have a default return value, return that. If that does not exist then you should raise the exception which signifies failure as defined by the API. But raising NotImplementedError is not the right thing to do when it can be avoided with a sensible default reaction (which I have not been doing in importlib.abc). This also has a nice side benefit of making sure you clearly define what the default reaction is for a call to the method.


[ORIGINAL POST]

I was in a meeting yesterday with Google's Python team and the point of using super() in a base class and not raising NotImplementedError came up. Thomas Wouters said you should always use it and that he would discuss it with the other engineer offline as the rest of us "already knew what to do". And that's when I realized I had forgotten do use super() in importlib.abc and was doing exactly what I shouldn't be. Lesson learned: code you work on over the span of five years may not follow best practices. =)

If you don't have much experience with super(), read the post by +Raymond Hettinger which explains best practices. Unfortunately it doesn't cover ABCs which have their own set of best practices in terms of super(). In a nutshell, because multiple inheritance doesn't guarantee that you will be the bottom of the MRO even when it comes to ABCs, you should expect to have every method which doesn't actually implement anything fall through using super() when appropriate. E.g. if a method's return value doesn't have a sensible default (e.g. it's not all about the side-effects), then you should blindly call super(), else call super() only when it won't fail.

To illustrate, let's work with importlib.abc.MetaPathFinder (and I should mention all code samples in this post are assumed for Python 3):

  1. class MetaPathFinder(Finder):
  2.     @abc.abstractmethod
  3.     def find_module(self, fullname, path):
  4.         """Abstract method which, when implemented, should find a module.
  5.        The fullname is a str and the path is a str or None.
  6.        Returns a Loader object.
  7.        """
  8.         raise NotImplementedError
  9.     def invalidate_caches(self):
  10.         """An optional method for clearing the finder's cache, if any.
  11.        This method is used by importlib.invalidate_caches().
  12.        """
  13.         return NotImplemented

What is wrong with this code? To start, what happens if I call MetaPathFinder.find_module() as part of a chain of super() calls? I get an exception, which is not really acceptable if this ABC is not at the bottom of the MRO for a class since a proper return value is expected for the method (one could argue that returning None is an acceptable default result in this specific instance, but a finder which never finds anything makes no sense in general so I'm not even entertaining that idea). What should really happen is return super().find_module(fullname, path). That way if someone calls this method in hopes of reaching a find_module() method farther down the MRO then they will be able to. And if no such method is defined? They still get an exception (albeit a different one) and so the (rough) semantics stay the same. Granted, most times this MRO will be the second-to-last class in an MRO (object is always last), but this is still an improvement worth making just in case.

What about MetaPathFinder.invalidate_caches()? That has no discernible return value (the method is all about its side-effects when implemented), so we don't have to worry about a call being a no-op as that is an acceptable result. That means invalidate_caches() should turn into:


  1. super() = super_
  2. if hasattr(super_, 'invalidate_caches'):
  3.   super().invalidate_caches()
  4. # Or as a one-liner
  5. return getattr(super(), 'invalidate_caches', lambdaNone)()


This then allows the no-op to occur if there is no one else to fall-through to, else the fall-through is allowed without error (I'm also relying on Python's semantics of returning None by default plus the semantics of the method not having an important return value).

One method that can be a little sticky if not handled properly is the __init__() method. If you read Raymond's blog post you will notice he recommends that you strip off arguments as they are consumed so that when you hit near the bottom of your MRO there are no arguments, letting it hit object.__init__() which takes no arguments. But if you don't do that and don't pay attention to where you are in the MRO you can easily pass something to object.__init__() and get a TypeError. As a challenge I decided to see if I could solve the problem somehow with code instead of convention and I came up with:


  1. def super___init__(super_, *args, **kwargs):
  2.   mro = super_.__self_class__.mro()
  3.   # Check this is not the relevant end of the MRO.
  4.   if mro[-1] != super_.__thisclass__:
  5.     super_.__init__(*args, **kwargs)
  6.   else:
  7.     assert mro[-1] is object  # Always at the bottom.


This allows you to make a call like super___init__(super(), *args, **kwargs) from within your own __init__() and not have to worry about whether you have properly stripped everything out of args and kwargs. Now I don't necessarily recommend doing this instead of continually stripping arguments, but this could help in situations where you would have an adapter class instead to catch the bottom of the MRO from calling into object.__init__() (I suspect Raymond didn't suggest this over adapter classes as it won't work with classic classes in Python 2).