DEV Community

Aryan
Aryan

Posted on

Understanding Python’s Inner Workings: Bytecode, PVM, and Compilation

Python is renowned for its simplicity and readability, but its execution model is quite sophisticated. This article provides a detailed yet concise overview of Python’s inner workings, focusing on bytecode, the Python Virtual Machine (PVM), the compilation process, and comparisons with other languages.

Bytecode
When you run Python code, it first gets translated into an intermediate form known as bytecode. This bytecode is a low-level, platform-independent representation of your source code, which the Python Virtual Machine (PVM) can execute.

Steps in Bytecode Generation:

Lexical Analysis: The source code is tokenized into keywords, operators, and identifiers.
Syntax Analysis: The tokens are parsed into a syntax tree.
Bytecode Compilation: The syntax tree is transformed into bytecode.
Bytecode files have a .pyc extension and are typically stored in the pycache directory.

Image description

Python Virtual Machine (PVM)
The PVM is the runtime engine of Python. It reads and executes the bytecode instructions, acting as an abstraction layer between the bytecode and the hardware.

PVM Execution Steps:

Loading Bytecode: The PVM loads the compiled bytecode into memory.
Execution: The PVM interprets and executes each bytecode instruction.

Image description

Compilation Process
Python uses a two-step process: compilation to bytecode and interpretation by the PVM. Here’s how it works:

Compilation to Bytecode: Python’s compiler translates the source code (.py files) into bytecode (.pyc files).
Bytecode Interpretation: The PVM reads and executes the bytecode.

Differences from Other Languages
Python’s execution model differs from both compiled and interpreted languages:

Interpreted Languages (e.g., JavaScript): These languages often execute code directly without an intermediate bytecode step, leading to slower performance.

Compiled Languages (e.g., C++): These languages translate source code directly into machine code for the target platform, offering faster execution but less portability.

Hybrid Languages (e.g., Java): Like Python, Java compiles to bytecode, but the Java Virtual Machine (JVM) has a different structure and performance characteristics.

Conclusion
Python’s use of bytecode and the PVM provides a balance between performance and portability, distinguishing it from other languages. Understanding these inner workings highlights the efficiency and flexibility that contribute to Python’s widespread popularity.

Top comments (1)

Collapse
 
joestrout profile image
JoeStrout

Very few scripting languages skip the bytecode compilation step. JavaScript certainly does not, nor does Lua, and both of these are faster than Python.

Python's PVM is basically not different (except in the implementation details) from Java's JVM, or the VM used by pretty much any other modern scripting language.

But this article is still a great introduction to how it works. If you want to look at a much (much!) smaller scripting language that uses the same sort of bytecode approach, check out MiniScript. It's about 13 thousand lines, compared to 661 thousand for Python, but nonetheless implements a clean, feature-complete modern language.