Overview
In Python, indexing and slicing are ways to get elements from iterables (like lists, tuples, and even string).
ingredients = ['beef', 'onions', 'tomato', 'potato', 'carrot']
print(ingredients[1]) # onions
print(ingredients[1:4]) # ['onions', 'tomato', 'potato']
This is something I have to look at recently. I am personally not a fan of deep dives in the C-level source code since things can always change (for example, ROT_TWO
is set to be replaced in Python 3.11
). Nevertheless, I think reading C code as a Python developer is a fun experience I want to share.
Below codes are performed on Python 3.10
.
AST
In CPython, actual Python syntax go through parsing and compiling.
Python code gets parsed and converted into an abstract syntax tree. The ast
module can help us illustrate this step.
import ast
print(ast.dump(ast.parse('[][0]', mode='eval'), indent=4))
# Result:
# Expression(
# body=Subscript(
# value=List(elts=[], ctx=Load()),
# slice=Constant(value=0),
# ctx=Load()))
print(ast.dump(ast.parse('[][::-1]', mode='eval'), indent=4))
# Result:
# Expression(
# body=Subscript(
# value=List(elts=[], ctx=Load()),
# slice=Slice(
# step=UnaryOp(
# op=USub(),
# operand=Constant(value=1))),
# ctx=Load()))
What we see in the two code snippets above is that both indexing and slicing are essentially the same operation, Subscript
.
Subscript
takes a value, a slice, and a ctx. We can learn from here that the difference between indexing and slicing is that:
- indexing is when a
Constant
tree is passed into theSubscript
's slice; whereas - slicing is when a
Slice
tree is passed otherwise.
Bytecode
The abstract syntax tree as shown above will eventually get compiled into bytecode. The dis
module help us see it.
from dis import dis
dis('[][0]')
# Result:
# 1 0 BUILD_LIST 0
# 2 LOAD_CONST 0 (0)
# 4 BINARY_SUBSCR
# 6 RETURN_VALUE
dis('[][::-1]')
# Result:
# 1 0 BUILD_LIST 0
# 2 LOAD_CONST 0 (None)
# 4 LOAD_CONST 0 (None)
# 6 LOAD_CONST 1 (-1)
# 8 BUILD_SLICE 3
# 10 BINARY_SUBSCR
# 12 RETURN_VALUE
What we're seeing in these snippets is that Subscript
is referred to as BINARY_SUBSCR
at bytecode level. We can also see that if any of start
, stop
, or step
is not supplied, None
is put in place.
From the docs: the Slice Object
Part of the Python's builtins is the slice
function which generates the
From the docs:
Slice objects are also generated when extended indexing syntax is used. For example:
a[start:stop:step]
ora[start:stop, i]
.
We can infer that when slicing ingredients[1:4]
, a slice(1,4)
is generated along the way.
Under the C
The bytecodes such as BUILD_SLICE
and BINARY_SUBSCR
can be found at Python's ceval.c
Below are snippets:
TARGET(BINARY_SUBSCR) {
PREDICTED(BINARY_SUBSCR);
PyObject *sub = POP();
PyObject *container = TOP();
PyObject *res = PyObject_GetItem(container, sub);
Py_DECREF(container);
Py_DECREF(sub);
SET_TOP(res);
if (res == NULL)
goto error;
JUMPBY(INLINE_CACHE_ENTRIES_BINARY_SUBSCR);
DISPATCH();
}
TARGET(BUILD_SLICE) {
PyObject *start, *stop, *step, *slice;
if (oparg == 3)
step = POP();
else
step = NULL;
stop = POP();
start = TOP();
slice = PySlice_New(start, stop, step);
Py_DECREF(start);
Py_DECREF(stop);
Py_XDECREF(step);
SET_TOP(slice);
if (slice == NULL)
goto error;
DISPATCH();
}
Some Stuff
As stated above, it's indexing if a Contstant
(i.e., int
) is passed; whereas it's slicing if a Slice
is passed.
Can we pass a slice()
in __getitem__
?
# Turns out, yes
#
print(ingredients[slice(1,4)])
# Result:
# ['onions', 'tomato', 'potato']
#
print(ingredients.__getitem__(slice(1,4)))
# Same result
From this line from the C code:
PyObject *res = PyObject_GetItem(container, sub);
we can infer that PyObject_GetItem
is the C equivalent of __getitem__
from the data model. We now know that indexing, slicing and getting a value from a dictionary are all just __getitem__
. This should not be a surprise considering the syntax, but I am.
(As a side note: if I had just read the Data Model docs, this readings done to make this post wouldn't have been necessary.)
Some implementations for the slice object is in Objects/sliceobject.c. There's a lot going in here.
In one of the snippets above, it is PySlice_New
which creates a new slice object (it's been around since it was first introduced).
Looking at this commit, it seems that PySlice_New
works like a def __init__
which simply returns a PyObject
struct with attributes start
, stop
, and step
. Calculations happen elsewhere. For example, the pointer *start
is calculated on five occasions.
if (r->start == Py_None) {
*start = *step < 0 ? length-1 : 0;
} else {
if (!PyLong_Check(r->start)) return -1;
*start = PyLong_AsSsize_t(r->start);
if (*start < 0) *start += length;
}
if (r->start == Py_None) {
*start = *step < 0 ? PY_SSIZE_T_MAX : 0;
}
if (*start < 0) {
*start += length;
if (*start < 0) {
*start = (step < 0) ? -1 : 0;
}
}
else if (*start >= length) {
*start = (step < 0) ? length - 1 : length;
}
I don't think I need to understand this yet. Moreso, there's a lot of pointers at play in order to really read it, you need to stop thinking in Python and start thinking in C.
Another cool thing to note is a factory function at the bottom called slice_new
, which, looking at the pull request, seem to be the slice
builtin.
Top comments (0)