More lessons learned from 7 years of annotating a large code base
If you haven't read the first part of this series, it's not strictly necessary to understand this article, but it's worth a read. This article is more focused on topics that relate to the typing of classes, whereas the other is more focused on general concepts and functions.
A quick primer on generics and variance
This is an advanced topic and the better you comprehend it, the less time you'll spend solving type errors via trial and error. It's especially important once you start creating your own generic classes, so the first thing you should do is read up on variance of generic types in the mypy docs.
Now that you've read all of that, let's forge ahead.
Consider this simple example:
class Employee:
def work(self) -> None:
pass
class Manager(Employee):
def manage(self) -> None:
pass
def do_work(x: Employee) -> None:
x.work()
do_work(Employee())
do_work(Manager())
mypy passes with flying colors. We can pass a Manager
instance to do_work
because Manager
is considered a subtype of Employee
. Subclasses are subtypes. Easy enough. But it gets more complicated when we introduce generics.
Consider this:
from typing import Iterable, TypeVar, Generic
T = TypeVar('T')
class Employee:
def work(self):
pass
class Manager(Employee):
def manage(self):
pass
class Team(Generic[T]):
pass
def do_team_work(x: Team[Employee]) -> None:
pass
team = Team[Employee]()
do_team_work(team)
management_team = Team[Manager]()
do_team_work(management_team) # mypy error!
Is Team[Manager]
a subtype of Team[Employee]
? Intuitively it seems like it should be, but in fact it is not! The code above produces the following error:
Argument 1 to "do_work" has incompatible type "Team[Manager]"; expected "Team[Employee]
This is pretty unintuitive. Now it's time to RTFD!
From the mypy docs on variance of generic types:
By default, mypy assumes that all user-defined generics are invariant.
Team
is a user-defined generic, so it's invariant. What does that mean?
The very first thing we need to understand is what is meant by "generic"? Essentially this means a container type. For example, a list
is a container type, it holds other objects, for example a list[str]
. A dictionary is a container type, it contains other types in the form of keys and values, e.g. a dict[str, int]
.
We intuitively understand that a subclass is a sub-type of its parent class. For example, in the code below, it's obvious that B
is a sub-type of A
.
class A:
pass
class B(A):
pass
"Variance" is all about understanding how and why generics -- i.e. containers -- are subtypes of each other.
To understand this, we have to review the 3 types of variance: invariant, covariant, and contravariant. Here's my simplified version of the mypy docs:
Given these classes:
from typing import Generic, TypeVar
T = TypeVar("T")
class A:
pass
class B(A):
pass
class Thing(Generic[T]):
pass
Here's how we can think about the variance of some generic type Thing
:
Variance type | Rule description | Rule code |
---|---|---|
covariant |
Thing[T] is covariant if Thing[B] is always a subtype of Thing[A]
|
issubclass(Thing[B], Thing[A]) is True |
contravariant |
Thing[T] is contravariant if Thing[A] is always a subtype of Thing[B]
|
issubclass(Thing[A], Thing[B]) is True |
invariant |
Thing[T] is called invariant if neither of the above is true |
issubclass(Thing[B], Thing[A]) is False and issubclass(Thing[A], Thing[B]) is False |
This definition is recursive, because to know if B
is a subtype of A
, we must again refer to their variance.
We want Team[B]
to be a subtype of Team[A]
, so referring to the list above we need the first one, covariance. Here's how we do this:
from typing import Iterable, TypeVar, Generic
T_co = TypeVar('T_co', covariant=True)
class Employee:
def work(self):
pass
class Manager(Employee):
def manage(self):
pass
class Team(Generic[T_co]):
pass
def do_work(x: Team[Employee]) -> None:
pass
team = Team[Employee]()
do_team_work(team)
management_team = Team[Manager]()
do_team_work(management_team) # <-- NO mypy error!
This does not mean that you should use covariant=True
for every TypeVar
that you define! A covariant TypeVar
should be reserved for immutable generics -- containers which cannot have their members added or removed after instantiation. If it is not immutable, then you subvert the variance protection that mypy provides. This is why Sequence
is covariant, but List
is not -- Sequence
does not provide any methods for modifying its contents.
It seems to be a pretty well-established convention to use
_co
and_contra
suffixes for covariant and contravariant respectively.
How to deal with classes that instantiate attributes outside of __init__
It often comes up that a class has a function to refresh its instance variables, and the principles of code reuse dictate that we use that function in our __init__
to initialize the variables as well:
def read_from_db() -> Tuple[int, str]:
...
class Person:
def __init__(self, name: str) -> None:
self.name = name
self.age: int = None # error!
self.location: str = None # error!
self.refresh()
def refresh(self) -> None:
self.age, self.location = read_from_db()
mypy produces the following errors:
error: Incompatible types in assignment (expression has type "None", variable has type "int")
error: Incompatible types in assignment (expression has type "None", variable has type "str")
The naive solution to this is to make self.age
and self.location
Optional
, however in this case this is not what we want, because in our contrived example read_from_db()
always returns a non-None value, and we want don't want code that uses our Person
to have to add is None
checks everywhere for these attributes.
Here's one solution:
def read_from_db() -> Tuple[int, str]:
...
class Person:
def __init__(self, name: str) -> None:
self.name = name
self.refresh()
def refresh(self) -> None:
self.age, self.location = read_from_db()
This works because a variable's type is assigned on the first line that it is defined. The downside is that the variables and their types are not front-and-center in the __init__
where we expect them, so developers reading your code may miss them if they don't go hunting. Also, it's somewhat brittle, since shuffling method order during a refactor could lead to some other line being the first assignment of these variables (not in this example, obviously, because there are only two methods and one assignment for each attribute).
Here's an alternative solution:
def read_from_db() -> Tuple[int, str]:
...
class Person:
age: int
location: str
def __init__(self, name: str) -> None:
self.name = name
self.refresh()
def refresh(self) -> None:
self.age, self.location = read_from_db()
This is valid, but we have to be careful when we use technique. We must instantiate self.age
and self.location
as early in the life-cycle of Person
as possible, because mypy now believes they are non-None.
Consider this example:
def read_from_db() -> Tuple[int, str]:
...
class Person(object):
age: int = None
location: str = None
def __init__(self, name: str) -> None:
self.name = name
# self.refresh() not called!
def refresh(self) -> None:
"You must call this manually!"
self.age, self.location = read_from_db()
p = Person('chad')
next_year = p.age + 1 # runtime error!
mypy will not complain, because it believes p.age
is an int
, however this code will fail at runtime because p.age
is actually None
since we have not instantiated it.
When to use assert
and typing.cast
typing.cast
and assert
are easy ways to resolve mypy errors, especially with Optional
types, but they should be used as a last resort.
assert
can be used to narrow a type in the same way that you can with an if/else
statement, but directly in the current scope.
typing.cast
does the same thing, but without the runtime implications.
These two approaches should only be used to correct mypy oversights that can't be corrected by other means.
Why? An assert
is a sanity check, it means "if everything is working, this statement should never fail". If there is any chance that it could fail, you should raise
an error with a proper exception type.
In the case of typing.cast
, mypy will blindly change the type of the variable to whatever you cast it to, and you may be wrong! There is no runtime check to keep you honest. I almost always reserve cast
for scenarios that involve "imaginary" types that can't be used in an isinstance
statement, such as TypeVars
.
Let's consider this example adapted from above:
def read_from_db() -> Tuple[int, str]:
...
class Person(object):
def __init__(self, name: str) -> None:
self.name = name
self.age: Optional[int] = None
def load_age(self) -> None:
"You must call this manually!"
self.age = read_from_db()[0]
def is_younger_than(self, other_age: int) -> bool:
self.load_age()
return self.age < other_age # mypy error!
self.age
is Optional
inside is_younger_than
, because mypy is not able to track conditional changes in state that happen outside of a function.
We can solve this by adding an assert
:
def is_younger_than(self, other_age: int) -> bool:
self.load_age()
assert self.age is not None, \
"self.age is set by load_age"
return self.age < other_age
When it comes to typing, it is a good habit to add an explanation for every assert or cast. Explain why you believe that the assertion should not fail or the cast is correct at this point in the program's execution. What logical assurances do we have that this will always be safe? If you can't explain it, then you should consider raising a proper exception.
A better alternative is almost always to restructure things to allow static typing to pass. For example, you could try redesigning your API so that the values are always set on __init__
(e.g. by providing alternate instantiators as classmethods).
Below is an example of using a cached property instead of an attribute to solve the situation outlined above:
from functools import cached_property
def read_from_db() -> Tuple[int, str]:
...
class Person:
def __init__(self, name: str) -> None:
self.name = name
@cached_property
def age(self) -> int:
"You must call this manually!"
return read_from_db()[0]
def is_younger_than(self, other_age: int) -> bool:
return self.age < other_age
Sometimes the cure is worse than the illness -- meaning refactoring the code to remove the type error would be too disruptive -- so it's up to you to decide whether an assert is the right solution, but you should use them sparingly.
Here's my order of preference:
- Try to solve the problem with better typing. A type error often means A) the type of the variable is wrong, or B) the type of a function that the variable is being passed to is wrong. If the types don't line up with reality then I try my best to make them. This may require making an argument more permissive (using a protocol), adding overloads to a function, or generics to a class.
- If option 1 fails, or the added complexity was untenable, but I know the actual type at runtime, then I do one of two things:
- use an
assert
if I'm reasonably certain that the assertion will never fail. provide a comment explaining why it will never fail. -
raise
a proper exception if there is a chance that it might fail.
- use an
- As a final fallback, I use
typing.cast
and provide a comment explaining why the cast is necessary.
I highly recommend enabling
--warn-redundant-casts
so that you can be notified if a call tocast
is attempting to cast an expression to the same type. A redundant cast can mislead developers into thinking that the type of the expression or variable was something other than the cast type.
Be mindful with tuples, they come in two forms
Tuples are unique in the typing system because they can be used in two different ways:
- a specific, bounded, and possibly heterogenous group of types. e.g.
Tuple[str, int]
is a 2-tuple containing a string and an integer - an unbounded sequence of homogenous types. e.g.
Tuple[str, ...]
(note the ellipsis!). this is a sequence of strings of arbitrary length
And of course you can also combine the two: Tuple[Tuple[str, int], ...]
When mypy infers the type of a tuple literal, it assumes the intent is option 1 by default. In order to tell mypy, "no I actually mean option 2", you must add an annotation. This is particularly annoying with inherited class variables.
class Base(object):
valid_things = ('thing1',)
class A(Base):
valid_things = ('thing1', 'thing2') # error: expression has type "Tuple[str, str]", base class defined type as "Tuple[str]"
This results in an error because Base.validThings
is Tuple[str]
and A.validThings
is Tuple[str, str]
. Here's a naive solution to this problem:
class Base(object):
valid_things: Tuple[str, ...] = ('thing1',)
class A(Base):
valid_things: Tuple[str, ...] = ('thing1', 'thing2')
Unfortunately, it is kind of verbose. One solution that's less verbose and maintains immutabilty is to use frozenset
(assuming of course that order does not matter):
class Base(object):
valid_things = frozenSet(['thing1'])
class A(Base):
valid_things = frozenSet(['thing1', 'thing2'])
Or we could use the faux-immutability approach explained above in "Use abstract types to help enforce immutability":
class Base(object):
valid_things: Sequence[str] = ['thing1']
class A(Base):
valid_things: Sequence[str] = ['thing1', 'thing2']
With the above solution you do need to redeclare the attribute type as Sequence[str]
on each sub-class or it will be promoted to List[str]
and will be thus be mutable.
Get the benefits of abc.ABCMeta
without the metaclass conflicts
The purpose of abc.ABCMeta
is to raise an error at runtime if you define a class that does not implement all of the methods marked as abstract on the abstract base class. Unfortunately, this can cause conflicts with other classes that uses metaclasses, which is particularly irksome with third-party projects like PyQt
, PySide
, or typing.Generic
(before python 3.7). The solutions to this can be quite unsavory.
mypy gives us a brand new solution to this problem: continue using the abc
decorators, but don't use the abc.ABCMeta
metaclass: mypy perform the same checks as abc.ABCMeta
, but during static analysis rather than at runtime. Of course, this is only effective if your code is annotated well enough to track your abstract classes, but it's a great option if it is.
Did I miss anything? Feel free to leave comments with questions or other tips that I missed!
Top comments (0)