When I first met Python generators, I found them quite obscure and not easy to understand. I didn't find a clear introduction to them, or maybe I didn't search too much. That's why I wrote this article, to go directly to the gist of those Python beasts.
Python generator definition
A Python generator is:
- a Python function or method
- which acts as an iterator
- which keeps track of when it's called (stateful)
- and returns data to its caller using the yield keyword
A simple example to start
Consider this function:
def generator1():
yield 1
yield 2
yield 3
Calling this function directly simply returns a generator object:
>>> generator1()
<generator object generator1 at 0x7fac361d8bf8>
The iter and next() methods are automatically implemented:
>>> gen1 = generator1()
>>> dir(gen1)
['__class__', '__del__', '__delattr__', '__dir__', '__doc__', '__eq__', '__format__', '__ge__', '__getattribute__', '__gt__', '__hash__', '__init__', '__init_subclass__', '__iter__', '__le__', '__lt__', '__name__', '__ne__', '__new__', '__next__', '__qualname__', '__reduce__', '__reduce_ex__', '__repr__', '__setattr__', '__sizeof__', '__str__', '__subclasshook__', 'close', 'gi_code', 'gi_frame', 'gi_running', 'gi_yieldfrom', 'send', 'throw']
and usable:
>>> next(gen1)
1
>>> next(gen1)
2
>>> next(gen1)
3
>>> next(gen1)
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
StopIteration
>
The StopIteration is returned because the generator function has been called 3 times and there's nothing more to yield.
Being iterable, it's directly callable using built-in functions like list():
>>> gen1 = generator1()
>>> list(gen1)
[1, 2, 3]
>>> list(gen1)
[]
You can see that after the second call, the generator object has been exhausted and an empty list is returned.
Same with tuple() or set() built-in functions:
>>> gen1 = generator1()
>>> tuple(gen1)
(1, 2, 3)
>>> gen1 = generator1()
>>> set(gen1)
{1, 2, 3}
Of course the for in construct is available here:
gen1 = generator1()
# this will print out 1,2,3
for i in gen1:
print(i)
Moving beyond
The above example was only meant to make you understand the yield mechanism.
We can go beyond, passing parameters to the generator function:
# returns a Fibonacci number < fib_max
def fibonacci1(fib_max: int) -> int:
# initial values
fib_n_2 = 0
fib_n_1 = 1
yield fib_n_2
yield fib_n_1
# now general case
fib_n = fib_n_2 + fib_n_1
while fib_n <= fib_max:
yield fib_n
fib_n_2 = fib_n_1
fib_n_1 = fib_n
fib_n = fib_n_2 + fib_n_1
# gives: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]
print(list(fibonacci1(1000)))
Note this is NOT a recursive function. It just stops at yield kind of breakpoints, which is both memory and stack efficient.
It's easy to modify the previous function to yield infinite values:
from itertools import islice
# returns an infinite sequence of Fibonacci numbers
def fibonacci2() -> int:
# initial values
fib_n_2 = 0
fib_n_1 = 1
yield fib_n_2
yield fib_n_1
# now general case
fib_n = fib_n_2 + fib_n_1
while True:
yield fib_n
fib_n_2 = fib_n_1
fib_n_1 = fib_n
fib_n = fib_n_2 + fib_n_1
# gives: [0, 1, 1, 2, 3, 5, 8, 13, 21, 34, 55, 89, 144, 233, 377, 610, 987]
print(list(islice(fibonacci2(), 17)))
Python generator expressions
Generator expressions are akin to list comprehensions, at least when comparing to the syntax.
They are used to create generator objects with a simple expression rather than a function, but they are less flexible and less powerful:
# first 100 squares
squares_gen = (x*x for x in range(100))
# only created here
squares = list(squares_gen)
They are lazily evaluated, meaning there are executed only when it's necessary.
Hope this helps !
Top comments (2)
Wish I had seen this when I was just starting with Python.
Trying to iterate twice over an exhausted generator got me many times...
Thanks, it was meant exactly for that purpose!