Like the articles? Buy the book! Dead Simple Python by Jason C. McDonald is available from No Starch Press.
Programming is often about waiting. Waiting for a function, waiting for input, waiting for a calculation, waiting for the tests to pass...
...waiting for Jason to write another Dead Simple Python already.
Wouldn't it be nice if your program waited for you for once? That's precisely what generators and coroutines do! We've been building up to this for the past three articles, but I'm happy to announce that the wait is over.
If you haven't yet read Loops and Iterators, Iterator Power Tools, and List Comprehensions and Generator Expressions yet, you should go through those first.
For everyone else, let's dive right in.
Meet the Generator
How would you generate a Fibonacci sequence of any length? Clearly there's some data you'd need to keep track of, and it would need to be manipulated in a certain way to create the next element.
Your first instinct might be to create an iterable class, and that's not a bad idea. Let's start with that, using what we already covered in the previous sections:
class Fibonacci:
def __init__(self, limit):
self.n1 = 0
self.n2 = 1
self.n = 1
self.i = 1
self.limit = limit
def __iter__(self):
return self
def __next__(self):
if self.i > self.limit:
raise StopIteration
if self.i > 1:
self.n = self.n1 + self.n2
self.n1, self.n2 = self.n2, self.n
self.i += 1
return self.n
fib = Fibonacci(10)
for i in fib:
print(i)
stored more compactly, and -
If you've been following the series so far, there probably aren't any surprises there. However, that approach might feel a bit overpowered for something as simple as a sequence. There's certainly plenty of boilerplate.
This sort of situation is exactly what a generator is for.
def fibonacci(limit):
if limit >= 1:
yield (n2 := 1)
n1 = 0
for _ in range(1, limit):
yield (n := n1 + n2)
n1, n2 = n2, n
for i in fibonacci(10):
print(i)
The generator is definitely more compact — only 9 lines long, versus 22 for the class — but it is just as readable.
The secret sauce is the yield
keyword, which returns a value without exiting the function. yield
is functionally identical to the __next__()
function on our class. The generator will run up to (and including) its yield
statement, and then will wait for another __next__()
call before it does anything more. Once it does get that call, it will continue running until it hits another yield
.
NOTE: That strange-looking
:=
is the new "walrus operator" in Python 3.8, which assigns AND returns a value. If you're on Python 3.7 or earlier, you can break these statements up into two lines (separate assignment andyield
statements).
You'll also note the lack of a raise StopIteration
statement. Generators don't require them; in fact, since PEP 479, they don't even allow them. When the generator function terminates, either naturally or with a return
statement, StopIteration
is raised automatically behind the scenes.
Generators and Try
Revised: 29 Nov 2019
It used to be that yield
could not appearwithin the try
clause of a try-finally
statement. PEP 255, which defined the generator syntax, explains why:
The difficulty is that there's no guarantee the generator will ever be resumed, hence no guarantee that the finally block will ever get executed; that's too much a violation of finally's purpose to bear.
This was changed in PEP 342 PEP 342, which was finalized in Python 2.5.
So why discuss such an old change at all? Simple: up to today, I was under the impression that yield
couldn't appear in try-finally
. Some articles on the topic incorrectly cite the old rule.
Generator as an Object
You may recall that Python treats functions as objects, and generators are no exception! Building on our earlier example, we can save a particular instance of a generator.
For example, what if I wanted to print out only the 10th-20th values of the Fibonacci sequence?
First, I'll save the generator in a variable, so I can reuse it. The limit isn't going to matter much to me, so I'll use something large. It will be easier to use my loop ranges to determine what I display, as that keeps the limiting logic close to the print statements.
fib = fibonacci(100)
Next, I'll use a loop to skip the first 10 elements.
for _ in range(10):
next(fib)
The next()
function is actually what loops always use to advance through iterables. In the case of generators, this returns whatever value is being returned by yield
. In this situation, since we don't care about those values yet, we just throw them away (by doing nothing with them).
By the way, I could also have called fib.__next__()
— that's what next(fib)
calls anyway — but I prefer the clean look of the approach I took. It usually comes down to preference; both are equally valid.
I'm now ready to access some values from the generator, but not all of them. Thus, I'll still use a range()
, and retrieve the values from the generator directly with next()
.
for n in range(10, 21):
print(f"{n}th value: {next(fib)}")
This prints out the desired values quite nicely:
10th value: 89
11th value: 144
12th value: 233
13th value: 377
14th value: 610
15th value: 987
16th value: 1597
17th value: 2584
18th value: 4181
19th value: 6765
20th value: 10946
You'll recall that we set our limit to 100 earlier. We're done with our generator now, but we really shouldn't just walk away and leave it waiting for another next()
call! Leaving it sitting idle in memory for the rest of our program would be wasteful of resources (however few).
Instead, we can manually tell our generator we're done with it.
fib.close()
That will manually close the generator, the same as if it had reached a return
statement. It can now be cleaned up by the garbage collector.
Meet the Coroutine
Generators allow us to quickly define an iterable that stores its state in between calls. However, what if we want the opposite: to pass information in and have the function patiently wait until it gets it? Python provides coroutines for this purpose.
For anyone who is already a bit familiar with coroutines, you should understand that what I'm referring to are specifically known as simple coroutines (although I'm just saying "coroutine" throughout for the sanity of the reader.) If you've seen any Python code using concurrency, you may have already encountered its younger cousin, the native coroutine (also called the "asyncronous coroutine").
For now, understand that both simple coroutines and native coroutines are officially considered "coroutines," and they share many principles; native coroutines build upon the concepts introduced with simple coroutines. We'll come back to that one when we discuss async
in a later article.
Again, for now just assume that when I say "coroutine," I'm referring to a simple coroutine.
Imagine you want to find all the letters common between a bunch of strings, say, those funny character names in Charles Dickens' books. You don't know how many strings there are, they'll be input at runtime, and not necessarily all at once.
Clearly, this approach must:
- Be reusable.
- Have state (the letters in common so far.)
- Be iterative in nature, since we don't know how many strings we'll get.
A typical function isn't ideal for this sitation, since we'd have to pass all the data at once as a list or tuple, and because they don't store state by themselves. Meanwhile, generators can't handle input except when first called.
We could try a class, although that's a lot of boilerplate. Let's start there anyway, just to get a better grip on what we're dealing with.
In my first version, I'll be mutating a list I pass to the class, so I can view the results any time I please. If I were sticking with a class, I probably wouldn't do it that way, but it's the smallest viable class for our purposes. Besides, it's functionally identical to the coroutine we'll write shortly, and that's useful for comparing approaches.
class CommonLetterCounter:
def __init__(self, results):
self.letters = {}
self.counted = []
self.results = results
self.i = 0
def add_word(self, word):
word = word.lower()
for c in word:
if c.isalpha():
if c not in self.letters:
self.letters[c] = 0
self.letters[c] += 1
self.counted = sorted(self.letters.items(), key=lambda kv: kv[1])
self.counted = self.counted[::-1]
self.results.clear()
for item in self.counted:
self.results.append(item)
names = ['Skimpole', 'Sloppy', 'Wopsle', 'Toodle', 'Squeers',
'Honeythunder', 'Tulkinghorn', 'Bumble', 'Wegg',
'Swiveller', 'Sweedlepipe', 'Jellyby', 'Smike', 'Heep',
'Sowerberry', 'Pumblechook', 'Podsnap', 'Tox', 'Wackles',
'Scrooge', 'Snodgrass', 'Winkle', 'Pickwick']
results = []
counter = CommonLetterCounter(results)
for name in names:
counter.add_word(name)
for letter, count in results:
print(f'{letter} apppears {count} times.')
According to my output, Charles Dickens particularly liked names with e, o, s, l, and p. Who knew?
We can accomplish the same result with a coroutine.
def count_common_letters(results):
letters = {}
while True:
word = yield
word = word.lower()
for c in word:
if c.isalpha():
if c not in letters:
letters[c] = 0
letters[c] += 1
counted = sorted(letters.items(), key=lambda kv: kv[1])
counted = counted[::-1]
results.clear()
for item in counted:
results.append(item)
names = ['Skimpole', 'Sloppy', 'Wopsle', 'Toodle', 'Squeers',
'Honeythunder', 'Tulkinghorn', 'Bumble', 'Wegg',
'Swiveller', 'Sweedlepipe', 'Jellyby', 'Smike', 'Heep',
'Sowerberry', 'Pumblechook', 'Podsnap', 'Tox', 'Wackles',
'Scrooge', 'Snodgrass', 'Winkle', 'Pickwick']
results = []
counter = count_common_letters(results)
counter.send(None) # prime the coroutine
for name in names:
counter.send(name) # send data to the coroutine
counter.close() # manually end the coroutine
for letter, count in results:
print(f'{letter} apppears {count} times.')
Let's take a closer look at what's happening here. A coroutine doesn't look any different from a function at first blush, but as with generators, the use of the yield
keyword makes all the difference.
In a coroutine, however, yield
stands for "wait until you get input, and then use it right here".
You'll notice that most the processing logic is the same between the two approaches; we've merely done away with the class boilerplate. We store an instance of a coroutine the same as we would store an object, just to ensure we are using the same instance every time we send more data to it.
The major difference between a class and a coroutine is the usage. We send data to the coroutine using its send()
function:
for name in names:
counter.send(name)
Before we can do this, however, we must first prime the coroutine with a call to either counter.send(None)
(used above) or counter.__next__()
. A coroutine can't receive a value right away; it must first run through all its code leading up to its first yield
.
As with a generator, a coroutine is finished when it either reaches the end of its normal execution flow, or when it hits a return
statement. Since neither of these things has a chance of happening in our example, I close the coroutine manually:
counter.close()
In short, to use a coroutine:
- Save an instance of it as a variable, for example,
counter
, - Prime it with
counter.send(None)
,counter.__next__()
, ornext(counter)
, - Send data to it with
counter.send()
, - If necessary, close it with
counter.close()
.
Coroutines and Try
Remember that rule about generators and not putting a yield
in the try
clause of a try-finally
statement? It doesn't apply here! Because yield
behaves very differently in a coroutine (handling incoming data, not outgoing data), it's totally acceptable to use it in this manner.
throw()
Generators and coroutines also have a throw()
function, which is used to raise an exception at the place they're paused. You'll remember from the "Errors" article that exceptions can be used as a normal part of execution flow.
Imagine for example that you want to send data to a remote server. You've got convenient little Connection objects, and you use a coroutine to send data over that connection.
Somewhere else in your code, you detect that you've lost the network connection, but because of how you communicate with your server, all that data the coroutine is so diligently sending would just drop into a black hole without complaint. Oops.
Consider this example code I've stubbed out. (Assume that the actual Connection logic doesn't lend itself to either handling fallback or reporting connection errors itself.)
class Connection:
""" Stub object simulating connection to a server """
def __init__(self, addr):
self.addr = addr
def transmit(self, data):
print(f"X: {data[0]}, Y: {data[1]} sent to {self.addr}")
def send_to_server(conn):
""" Coroutine demonstrating sending data """
while True:
raw_data = yield
raw_data = raw_data.split(' ')
coords = (float(raw_data[0]), float(raw_data[1]))
conn.transmit(coords)
conn = Connection("example.com")
sender = send_to_server(conn)
sender.send(None)
for i in range(1, 6):
sender.send(f"{100/i} {200/i}")
# Simulate connection error...
conn.addr = None
# ...but assume the sender knows nothing about it.
for i in range(1, 6):
sender.send(f"{100/i} {200/i}")
Running that example, we see that the first five send()
calls go to example.com
, but the last five drop into None
. This obviously won't do - we want to report the problem, and start sending data to a file instead so it isn't lost forever.
This is where throw()
comes in. As soon as we know we've lost the connection, we can alert the coroutine to this fact, allowing it to respond appropriately.
We first add a try-except
to our coroutine:
def send_to_server(conn):
while True:
try:
raw_data = yield
raw_data = raw_data.split(' ')
coords = (float(raw_data[0]), float(raw_data[1]))
conn.transmit(coords)
except ConnectionError:
print("Oops! Connection lost. Creating fallback.")
# Create a fallback connection!
conn = Connection("local file")
Our usage example only needs one change: as soon as we know we've lost connection, we use sender.throw(ConnectionError)
:
conn = Connection("example.com")
sender = send_to_server(conn)
sender.send(None)
for i in range(1, 6):
sender.send(f"{100/i} {200/i}")
# Simulate connection error...
conn.addr = None
# ...but assume the sender knows nothing about it.
sender.throw(ConnectionError) # ALERT THE SENDER!
for i in range(1, 6):
sender.send(f"{100/i} {200/i}")
That is all! Now we get the message about the connection problem as soon as the coroutine is alerted, and the rest of the messages are routed to our local file.
yield from
When using a generator or a coroutine, you are not limited to only a local yield
. You can, in fact, get other iterables, generators, or coroutines involved using yield from
.
For example, let's say I want to rewrite my Fibonacci sequence to have no limits, and I just want to hardcode the first five values to get things started.
def fibonacci():
starter = [1, 1, 2, 3, 5]
yield from starter
n1 = starter[-2]
n2 = starter[-1]
while True:
yield (n := n1 + n2)
n1, n2 = n2, n
In this case, yield from
temporarily hands off to another iterable, whether it be a container, an object, or another generator. Once that iterable has reached its end, this generator picks up and carries on like normal.
In just using this generator, you wouldn't have known it was using another iterator for part of the time. It just works the same as always.
fib = fibonacci()
for n in range(1,11):
print(f"{n}th value: {next(fib)}")
fib.close()
Coroutines can also hand off in a similar manner. For example, in our Connection example, what if we created a second coroutine that handles writing data to a file? In the case we had a connection error, we could switch to using that behind the scenes.
class Connection:
""" Stub object simulating connection to a server """
def __init__(self, addr):
self.addr = addr
def transmit(self, data):
print(f"X: {data[0]}, Y: {data[1]} sent to {self.addr}")
def save_to_file():
while True:
raw_data = yield
raw_data = raw_data.split(' ')
coords = (float(raw_data[0]), float(raw_data[1]))
print(f"X: {coords[0]}, Y: {coords[1]} sent to local file")
def send_to_server(conn):
while True:
if conn is None:
yield from save_to_file()
else:
try:
raw_data = yield
raw_data = raw_data.split(' ')
coords = (float(raw_data[0]), float(raw_data[1]))
conn.transmit(coords)
except ConnectionError:
print("Oops! Connection lost. Using fallback.")
conn = None
conn = Connection("example.com")
sender = send_to_server(conn)
sender.send(None)
for i in range(1, 6):
sender.send(f"{100/i} {200/i}")
# Simulate connection error...
conn.addr = None
# ...but assume the sender knows nothing about it.
sender.throw(ConnectionError) # ALERT THE SENDER!
for i in range(1, 6):
sender.send(f"{100/i} {200/i}")
This behavior was defined in PEP 380, so read that for more information.
Combining Generators and Coroutines
You may be wondering: "can I combine the two return data directly from a coroutine like I can from a generator?"
I was curious about this too while writing the article, and apparently you can. It all has to do with recognizing when the function is being treated like a generator, instead of a coroutine.
The key to this is simple: __next__()
and send(None)
are effectively the same thing to a coroutine.
def count_common_letters():
letters = {}
word = yield
while word is not None:
word = word.lower()
for c in word:
if c.isalpha():
if c not in letters:
letters[c] = 0
letters[c] += 1
word = yield
counted = sorted(letters.items(), key=lambda kv: kv[1])
counted = counted[::-1]
for item in counted:
yield item
names = ['Skimpole', 'Sloppy', 'Wopsle', 'Toodle', 'Squeers',
'Honeythunder', 'Tulkinghorn', 'Bumble', 'Wegg',
'Swiveller', 'Sweedlepipe', 'Jellyby', 'Smike', 'Heep',
'Sowerberry', 'Pumblechook', 'Podsnap', 'Tox', 'Wackles',
'Scrooge', 'Snodgrass', 'Winkle', 'Pickwick']
counter = count_common_letters()
counter.send(None)
for name in names:
counter.send(name)
for letter, count in counter:
print(f'{letter} apppears {count} times.')
I only needed to watch for when the coroutine started receiving None
(after the initial priming, of course). Since I was storing the result of yield
in word
, I could break out of the loop for receiving information once word
was None
.
When we switch from using a coroutine as a coroutine, to using it as a generator, it needs to handle a single send(None)
before it starts outputting data with yield
. (This StackOverflow question demonstrates that phenomenon.) In calling our coroutine, we never explicitly send(None)
before switching our usage; Python does that in the background.
Also, remember that the coroutine/generator is still a function. It merely pauses every time it encounters a yield
. In my example, I could not suddenly go back to using counter
as a coroutine, because there's no execution flow that would take me back to word = yield
. It is perfectly possible to write it so you can switch back and forth, although perhaps not advisable if it comes at the cost of readability or becomes overly complicated.
Review
Generators and coroutines allow you to quickly write functions that "wait" for you. Later on, we'll meet the native coroutine, a type of coroutine used in concurrency.
Let's review the essentials from this section:
- Generators are iterables that wait for you to request output.
- Generators are written as normal functions, except they use the
yield
keyword to return values in the same way as a class would with its__next__()
function. - When a generator reaches the natural end of its execution order, or hits a
return
statement, it raisesStopIteration
and ends. -
Coroutines are similar to generators, except they wait for information to be sent to it via
foo.send()
function. - Both a generator and a coroutine can be advanced to the next yield statement with
next(foo)
orfoo.__next__()
. - Before a coroutine can have anything sent to it with
foo.send()
, it must be "primed" withfoo.send(None)
,next(foo)
, orfoo.__next__()
. - An exception can be raised at the current
yield
withfoo.throw()
. - A generator or coroutine can be manually stopped with `foo.close().
- A single function can behave first like a coroutine, and then like a generator.
As always, you can learn plenty more from the documentation:
- Python Tutorial: Classes - Generators
- PEP 255: Simple Generators
- PEP 479: Change StopIteration handling inside generators
- PEP 342: Coroutines via Enhanced Generators
- PEP 380: Syntax for Delegating to a Subgenerator
Thanks to deniska
(Freenode IRC #python
), @rhymes, and @florimondmanca (DEV.to) for suggested revisions.
Top comments (7)
Very in-depth article about generators! I enjoyed it a lot.
At first your use of the term "coroutine" when referring to generators that use
.send()
andyield from
was a bit jarring to me — as of Python 3.6 a coroutine is the return value of a coroutine function:But then I realized that you were probably using that term as the more general computer science concept of a routine that can be paused during execution (see Coroutine).
Still, the fact that
coroutine
is now "reserved terminology" in Python might be confusing to some people. Perhaps a disclaimer that coroutine refers more to the computer science general concept rather than thecoroutine
built-in type would be helpful. :-)Well, no, not precisely. In Python, the term "coroutine" does indeed officially refer to both. In fact, the two have their own qualified names.
What I described is called a simple coroutine, which was defined in PEP 342, and further expanded in PEP 380. Coroutines first appeared in Python 2.5, and continue to be a distinct and fully supported language feature.
You're referring to a native coroutine (also called an asynchronous coroutine), which was defined in PEP 492, and was based on simple coroutines, but designed to overcome some specific limitations of the former. Native coroutines first appeared in Python 3.5. Again, this didn't replace simple coroutines, but rather offered another form of them specifically for use in concurrency.
I'll put a little clause or two about this in the article.
Also, don't worry, I'll be coming back around to async and concurrency soon; once that's written, I'll come back to this article and link across.
Thanks for clarifying :) Actually, I wasn’t aware that native coroutine was the official name for generators used in this fashion.
Thanks! Just to be clear, I was simply raising the concern that as async programming is becoming more and more used/popular in Python and most people talk about coroutines as a shorthand for async coroutines, using the shorthand to refer to native ones could be confusing. Anyway I think you’ve got the point so thanks for taking that into account. :)
Uh oh! I just realized I'd had a dyslexic moment, and read something in PEP 492 backwards...
What I described are simple coroutines, and the newer type is the native coroutine (also called an "asyncronous coroutine").
Blinks
Anyhow, I've gone back and edited both my comment and article. Thanks again...if you hadn't asked about that, I would have never caught my error!
Great article Jason!
Just a couple of details:
An exception can be raised at the current yield with foo.raise().
->with foo.throw().
In
sorted(self.letters.items(), key=lambda kv: kv[1])
the lambda can be replaced with operator.itemgetter(1), it's one of my favorite small things that are in the standard library :DI was wondering if there was a way to simplify the coroutine code, using a context manager. The
__enter__
could callsend(None)
and the__exit__
could callclose()
.With a simple generator is easy to do something similar:
But the same doesn't work for a coroutine...
As a first I came up with this:
I came up with something like this then:
but I'm not sure it's improving much :D
Ooh! Thanks for catching that typo! That would have been confusing.
As to the lambda or
itemgetter()
, I'd actually gone back and forth between the two in writing that example. I think using the lambda there is my own personal preference more than anything.That is certainly a clever combination of a context and a coroutine, by the way. (Naturally, I didn't discuss contexts in this article, as I haven't discussed them yet in the series.)
Thanks for the feedback.
The way it presented generators, i knew it would be a 👌 read, and i was right!
Taking the time to cover only those two helps a lot, best article on coroutines i've read to date. Rhanks for writing this up 👍