Let's continue our little research of itertools
module.
Today we'll have a look at 3 infinite iterator constructors:
from itertools import count, cycle, repeat
itertools.count
itertools.count
- is like a range
, but lazy and endless.
By the way, if you have never heard of laziness (well, I'm sure we all heard of it, and moreover, practice it everyday) - then you really should check it out, at least briefly. Someday we will walk the path of David Beazley and his legendary "Generator Tricks For Systems Programmers" in 147 pages, but not today. Today is for the basics.
Well, count
is super easy, it just counts until infinity. Or minus infinity, if step is negative.
def my_count(start=0, step=1):
x = start
while True:
yield x
x += step
That's it.
But there is a caveat. It never stops, so you can't "consume" it.
To consume - is to read all iterable at once, for example, to store it in a list.
Well, actually, you can try, but this code line will freeze to death any machine. And yeah, many-many Ctrl+C won't help. Only hard reset, I did warn you ;)
list(itertools.count())
Then, how am I supposed to work with it, if I can't call list/set/sum/etc. on it?
First of all, you can iterate over it (and break out - when time comes):
for i in count(start=10, step=-1):
print(i, end=", ")
if i<=0: break
# 10, 9, 8, 7, 6, 5, 4, 3, 2, 1, 0,
Second, some programs never break from endless loop, waiting for something to happen: workers waiting for incoming tasks, http servers waiting for incoming request, etc. But we shall skip this case. For now.
Finally, you can combine infinite iterator with another lazy iterators: map
, zip
, islice
, accumulate
, etc.
When iterators like zip
or map
iterate over multiple iterables at once, they finish when any of iterables finishes. It gives us exit from infinite iterator.
Here is an example from itertools.repeat
docs:
list(map(pow, range(10), repeat(2)))
# [0, 1, 4, 9, 16, 25, 36, 49, 64, 81]
Our machine is staying alive - although, technically we "consume infinite repeat with list". Well, range
is finite and map
finishes together with it.
Infinite iterator rejects its infinity - just to finish together with some finite collection...
Wow! Some serious Highlander & Queen vibe around here ...
itertools.repeat
itertools.repeat
is even easier, than itertools.count
. It doesn't even count, but simply repeats the same value infinitely. Also, there is a form with fixed amount of repeats.
According to itertools
docs, itertools.repeat
is roughly equivalent to:
def repeat(object, times=None):
# repeat(10, 3) --> 10 10 10
if times is None:
while True:
yield object
else:
for i in range(times):
yield object
For "fixed" form and since python generator statements are also lazy, itertools.repeat(42, 10)
can be simplified as:
( 42 for _ in range(10) )
For infinite form, we can't simplify it with range
, but one can notice, that itertools.repeat
equals to itertools.count
with step=0.
I guess, repeat
and count
add a little bit of readability to your code, and they might also be quite faster than python generator statements. However, it is not that easy to test performance of iterators (especially, infinite ones :) ) since they exhaust, and performance test is multiple repetition and comparison.
Still, let us try:
In [49]: i1 = lambda: ( 42 for _ in range(100000) )
In [50]: i2 = lambda: repeat(42, 100000)
In [51]: %timeit sum(i1())
3.49 ms ± 36.6 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
In [52]: %timeit sum(i2())
333 µs ± 1.27 µs per loop (mean ± std. dev. of 7 runs, 1,000 loops each)
itertools.repeat
seems to be 10 times faster!
By the way, do you think that performance test with "lambda-style factory" is valid and comparison is correct?
Wait, what do you mean by "exhaust"?
If you are confused with "exhaust" in the previous section - then I'll show you only this ...
In [3]: i = ( x for x in range(10) )
In [4]: sum(i)
Out[4]: 45
In [5]: sum(i)
Out[5]: 0
... and strongly encourage you to dive into Python Functional Programming HowTo
itertools.cycle
Endless cycle over iterable. As simple as that:
# cycle('ABCD') --> A B C D A B C D ...
def my_cycle(iterable):
while True:
yield from iterable
Despite its simplicity, it is very convenient.
I really love to rotate proxies/useragents/etc with itertools.cycle
for regular parsing/scraping of web pages.
For instance, you can define some "global" iterators:
PROXY_CYCLE = itertools.cycle(proxy_list)
UA_CYCLE = itertools.cycle(ua_list)
And each time you need to make a new request, you just ask "global" iterators for new proxy/ua values with next
:
proxy = next(PROXY_CYCLE)
ua = next(UA_CYCLE)
It turns out as a distributed iteration from different places of the program at the same time. But iterator is shared. Iterator as a service, huh.
It's like we defined a class ProxyManager
with method ProxyManager.get
, which handles proxy rotation and selection. But instead of class
we have itertools.cycle
, and instead of get
- we have next
, instead of 10 code lines - only 1. So do we really need to define a class? :)
That's all, folks!
Thank you for reading, hope you enjoyed! Consider subscribing - we shall go deeper :)
Anything else to read?
Always.
Top comments (0)