How Python Really Works | In-Depth Analysis | Python 3.# | Internal Mechanism |

ยท

14 min read

How Python Really Works | In-Depth Analysis | Python 3.# | Internal Mechanism |

In this article, I'll delve into how Python truly operates. I will try to cover every functional aspect, from writing a code to executing it. This article is basically for intermediate Python users with a knowledge of basic and intermediate programming concepts, algorithms, and more. Let's begin with the idea behind Python's creation!

Python is a popular programming language. It was created by Guido van Rossum, and released in 1991.....Hey F#ck Stop it!...

I know you are annoyed with this wikipedia shit, but trust me, things will make sense if you know why python was created.

Chapter 1 : Born of a warrior (Python)

The common programming languages between 1980 to 1990 were C, Pascal, Fortran, COBOL, BASIC, etc. During that time, the computer industry was growing rapidly. Many intelligent individuals understood the future potential of computers and recognised that softwares will play a crucial role in making them more intriguing. Consequently, some people changed their careers from different domains to software development. Most of them have a innovative ideas, but to implement their own idea, they first needed to learn a programming language, which was a complex task.

During that era, developers had to handle everything by themselves, from managing memory to dealing with common tasks like buffer overflow, memory leaks which were both commonplace and repetitive.

Guido van Rossum joined the chat...

During that era, Guido van Rossum was working at the Centrum Wiskunde & Informatica (CWI) in the Netherlands, a national research institute for mathematics and computer science.

Guido had been involved in various projects related to distributed systems and operating systems at CWI, and has gained valuable experience in software development.

During his time at CWI, Guido was dissatisfied with the existing programming languages available, finding them either too complex or lacking in certain areas. He had experience with languages like ABC and Modula-3, which influenced his thinking about language design.

Motivated by a desire to create a programming language that emphasized simplicity, readability, and productivity, Guido started working on a personal project to develop what would become Python. Drawing inspiration from his experiences with other languages and his desire to address their limitations, Guido set out to design a language that would be easy to learn and use, yet powerful enough to tackle a wide range of tasks.

Python was initially developed as a hobby project by Guido during his spare time at CWI. He worked on refining the language's syntax and semantics, carefully crafting features that would make it intuitive and expressive. Guido's goal was to create a language that would enable programmers to write clear, concise code that could be easily understood and maintained, without sacrificing flexibility or power.

As Python gained popularity within CWI and beyond, Guido decided to dedicate more time to its development, eventually leaving his job at CWI to focus on Python full-time.

Becoming a Bad Boy

After gaining popularity python started to roast their other friends like C, COBOL, FORTAN. it's not my words, he actually started bullying every programming language. Here are some of the examples.

C: Ah, C, the powerhouse of the era, loved for its speed and low-level control. But let's face it, memory management in you was like walking through a minefield. I (Python) comes in like a superhero with my automatic memory management, sparing developers from the headache of manual memory allocation and pesky segmentation faults.

Pascal: Bless Pascal's heart for its structured programming and strong typing, but its verbosity could put anyone to sleep. Python swoops in with its clean and concise syntax, making code elegant and readable without sacrificing functionality.

Fortran: Sure, Fortran was the go-to for numerical computing, but its syntax felt like a blast from the past even back then. Python's modern and flexible syntax feels like a breath of fresh air, attracting scientists and engineers with its ease of use and extensive libraries for numerical computing.

Since he become a bad boy, people started hating him for everything from being slow asf to being rude.

C(Queen) said : "what's with your performance? Sure, you're great for scripting and prototyping, but when it comes to heavy lifting, you're about as fast as a sloth on tranquilizers. Don't even get me started on your GIL (Global Interpreter Lock). Multithreading? More like multi-slowing."

Am just joking, please don't search for facts.. It is my made up controversy. But python being slow is really a thing ๐Ÿ˜‚.

Chapter 2: Python Being Python (Memory Genius)

Let's now get into the serious stuff. For understanding how python works, we need to understand first , how python memory is managed because it directly impacts the performance and behaviour of Python programs.

Memory management in Python differs from languages like C or C++, where developers have explicit control over memory allocation and deallocation.

In Python, memory management is handled by the Python runtime using a combination of techniques such as automatic memory allocation, garbage collection, and reference counting.

Now let's start understanding

  • Whenever a program or file is created and if you are using it! It takes a significant amount of memory in your RAM. Same goes for a python program in execution mode, that also takes some amount of memory in the RAM.

  • The memory which is allocated to the python program in ram is further divided into two regions.

  • Stack and Private Heap Space

  • Object always gets created in Heap Space and the variable name gets allocated in stack.

  • Let me explain you with an example. Suppose there is a file name called test.py which contains:

  •       a = 10
    
  • Now 10 is a int object and The variable a is created as a reference to this integer object. Now where the object 10 will be created? in the RAM, but where in RAM? the region allocated to the program in RAM? but which region? The Answer is PRIVATE HEAP SPACE.
    Each object also gonna have an address in the heap space.
    - Where does the name gets created?
    The name a gonna create in STACK and a has the address of its object. which means a is gonna point to the address of 10.

    And python automatically checks the type of your object, that's why it is a dynamically typed language.

  • I created a diagram for better understanding.

Take out the trash : Garbage Collector

In Python, the garbage collector is responsible for reclaiming memory occupied by objects that are no longer in use, thus freeing up resources and preventing memory leaks. Python's garbage collector uses a technique called reference counting along with cyclic garbage collection to manage memory.

You won't understand like this let me tell you a story

Imagine Python's garbage collector as a diligent janitor named Gary. His job is to keep Python's memory clean and tidy, ensuring there's no garbage lying around.

Now, Gary's first tool in his arsenal is "reference counting." It's like keeping track of how many times someone mentions a particular item. So, whenever someone (or some variable) mentions an object, Gary scribbles a tally mark on his clipboard. When nobody mentions the object anymore, Gary checks his clipboard. If the tally reaches zero, he knows it's safe to throw that object into the garbage bin.

But wait, sometimes things get a bit tricky. Picture two objects, let's call them Bert and Ernie, holding hands in a circle and giggling like schoolkids. They're referencing each other, creating a loop that Gary's reference counting can't handle. It's like Bert says, "I'm holding Ernie's hand," and Ernie says, "I'm holding Bert's hand," and they just go around in circles forever.

Now, Gary scratches his head. He can't just rely on his tally marks to clean up this mess. So, he brings out his special tool: the "cyclic garbage collector." It's like a magical broom that can sweep away those circular references. With a flick of his wrist, Gary breaks the loop, and Bert and Ernie can finally let go of each other's hands and go their separate ways.

And that, my friend, is how Python's garbage collector, with the help of Gary the janitor, keeps Python's memory squeaky clean, one object at a time. Just don't let him catch you leaving your variables lying around unattended!Imagine Python's garbage collector as a diligent janitor named Gary. His job is to keep Python's memory clean and tidy, ensuring there's no garbage lying around.

Now, Gary's first tool in his arsenal is "reference counting." It's like keeping track of how many times someone mentions a particular item. So, whenever someone (or some variable) mentions an object, Gary scribbles a tally mark on his clipboard. When nobody mentions the object anymore, Gary checks his clipboard. If the tally reaches zero, he knows it's safe to throw that object into the garbage bin.

But wait, sometimes things get a bit tricky. Picture two objects, let's call them Bert and Ernie, holding hands in a circle and giggling like schoolkids. They're referencing each other, creating a loop that Gary's reference counting can't handle. It's like Bert says, "I'm holding Ernie's hand," and Ernie says, "I'm holding Bert's hand," and they just go around in circles forever.

Now, Gary scratches his head. He can't just rely on his tally marks to clean up this mess. So, he brings out his special tool: the "cyclic garbage collector." It's like a magical broom that can sweep away those circular references. With a flick of his wrist, Gary breaks the loop, and Bert and Ernie can finally let go of each other's hands and go their separate ways.

And that, my friend, is how Python's garbage collector, with the help of Gary the janitor, keeps Python's memory squeaky clean, one object at a time. Just don't let him catch you leaving your variables lying around unattended!

Chapter 3: The Matrix | Python

In "The Matrix," the protagonist Neo discovers that the world he perceives as reality is actually a simulated environment created by sentient machines. Similarly, in the world of Python programming, developers often work with abstracted concepts and tools that may seem straightforward on the surface but actually operate in a more complex manner behind the scenes.

In "The Matrix," the simulated reality is controlled and manipulated by the sentient machines, who use it to subdue and control humanity. Similarly, the Python interpreter serves as the gatekeeper to the simulated reality of Python code, interpreting and executing instructions according to its programming.

But How Does Python Interpreter Really Works?

First Step: Lexical Analysis and Parsing

When you make a script or add lines of code and execute the program. Python Interpreter reads the code line by line, just like the execution flow. However, there are some exceptions to this rule. For example, if a line of code contains a loop, function call, or other control flow structure, the interpreter may need to execute multiple lines of code before moving on to the next line.

Lexical Analysis: When you execute the python code. The first step is lexical analysis, also known as tokenization. During this process, the Python interpreter breaks the code into individual tokens such as keywords, identifiers, operators, and literals. These tokens form the basic building blocks of the Python language.

Parsing : Once the code has been tokenized, the Python interpreter parses it to create an abstract syntax tree (AST). The AST represents the hierarchical structure of the code, with nodes corresponding to different elements such as expressions, statements, and functions. The parser ensures that the code follows the syntactic rules of the Python language.

Second Step: ByteCode Generation

After parsing, the Python interpreter generates bytecode instructions based on the AST. Bytecode is a low-level, platform-independent representation of the Python code that the PVM(Python Virtual Machine) can execute. Each bytecode instruction corresponds to a specific operation or action, such as loading a value onto the stack, calling a function, or performing arithmetic operations.

In Python, the bytecode is stored in a .pyc file. In Python 3, the bytecode files are stored in a folder named __pycache__. This folder is automatically created when you try to import another file that you created:

Third Step: ByteCode Execution

The bytecode generated by the Python interpreter is executed by the Python Virtual Machine (PVM). The PVM is responsible for interpreting and executing the bytecode instructions generated from the Python source code. It provides a runtime environment that manages memory, handles exceptions, and interacts with the underlying operating system to execute Python programs.

In summary, the Python interpreter compiles Python source code into bytecode, and the Python Virtual Machine executes this bytecode to run the Python program. This separation of concerns between compilation and execution allows Python code to be platform-independent and easily portable across different operating systems and hardware architectures.

I made a diagram for better understanding.

Example : A python code working

Let's walk through the full working of a Python code example:

# Define a function to calculate the factorial of a number
def factorial(n):
    if n == 0:
        return 1
    else:
        return n * factorial(n - 1)

# Main program
if __name__ == "__main__":
    # Prompt the user for input
    num = int(input("Enter a number: "))

    # Calculate and display the factorial of the input number
    print("Factorial of", num, "is", factorial(num))

Now, let's break down the execution of this code step by step:

  1. The Python interpreter reads the source code file line by line.

  2. The interpreter encounters the def keyword, indicating the definition of a function named factorial. The function definition is stored in memory for later use.

  3. The interpreter moves to the if __name__ == "__main__": block, which checks if the script is being run as the main program.

  4. Since the script is indeed being run as the main program, the interpreter proceeds to execute the code inside the if block.

  5. The input() function prompts the user to enter a number, which is then converted to an integer using int() and assigned to the variable num.

  6. The factorial() function is called with the input number as an argument. This triggers a recursive chain of function calls to calculate the factorial of the input number.

  7. Each recursive call to factorial() decrements the input number by 1 until it reaches 0, at which point the base case (if n == 0:) is triggered, and the function returns 1.

  8. As the recursive calls unwind, each intermediate result is multiplied by the current number until the final factorial value is computed.

  9. The calculated factorial value is then printed to the console using print().

  10. The program execution finishes, and the Python interpreter exits.

During this process, the Python interpreter compiles the source code into bytecode, which is then executed by the Python Virtual Machine (PVM). The PVM manages memory, handles function calls, and performs other runtime tasks to execute the Python program efficiently.

Chapter 3: A Slow End

While Python has earned widespread acclaim for its simplicity, readability, and versatility, one criticism that often arises is its perceived lack of speed compared to lower-level languages like C or C++. In this chapter, we'll explore why Python may be slower in certain contexts and how developers can mitigate performance concerns. Let's dive in!

Understanding Python's Execution Model

Python's dynamic typing, automatic memory management, and high-level abstractions contribute to its ease of use and rapid development cycle. However, these features can also introduce overhead that impacts performance.

One factor that contributes to Python's runtime overhead is its interpreted nature. Unlike compiled languages, where code is translated directly into machine code before execution, Python code is first compiled into bytecode and then interpreted by the Python Virtual Machine (PVM). While this approach offers flexibility and platform independence, it can result in slower execution speeds compared to compiled languages.

Additionally, Python's Global Interpreter Lock (GIL) poses a limitation on multi-threaded performance. The GIL ensures that only one thread executes Python bytecode at a time, effectively preventing multi-core parallelism in CPU-bound tasks. While this simplifies memory management and concurrency control, it can lead to suboptimal performance in multi-threaded applications.

Example: Comparing Python and C in Matrix Multiplication

To illustrate the performance difference between Python and a compiled language like C, let's consider a simple example of matrix multiplication implemented in both languages.

Python Implementation:

import numpy as np

# Generate random matrices
size = 1000
matrix_a = np.random.rand(size, size)
matrix_b = np.random.rand(size, size)

# Perform matrix multiplication
result = np.dot(matrix_a, matrix_b)

C Implementation:

#include <stdio.h>
#include <stdlib.h>

#define SIZE 1000

void matrix_multiply(double matrix_a[SIZE][SIZE], double matrix_b[SIZE][SIZE], double result[SIZE][SIZE]) {
    for (int i = 0; i < SIZE; i++) {
        for (int j = 0; j < SIZE; j++) {
            result[i][j] = 0;
            for (int k = 0; k < SIZE; k++) {
                result[i][j] += matrix_a[i][k] * matrix_b[k][j];
            }
        }
    }
}

int main() {
    double matrix_a[SIZE][SIZE];
    double matrix_b[SIZE][SIZE];
    double result[SIZE][SIZE];

    // Initialize matrices with random values
    for (int i = 0; i < SIZE; i++) {
        for (int j = 0; j < SIZE; j++) {
            matrix_a[i][j] = (double) rand() / RAND_MAX;
            matrix_b[i][j] = (double) rand() / RAND_MAX;
        }
    }

    // Perform matrix multiplication
    matrix_multiply(matrix_a, matrix_b, result);

    return 0;
}

In this example, we generate two random matrices of size 1000x1000 and multiply them using both Python's NumPy library (which is implemented in C) and a C program. We then compare the execution times of both implementations.

Conclusion and Mitigation Strategies

While Python may not always match the raw speed of compiled languages like C, there are several strategies developers can employ to improve performance:

  • Utilize libraries and extensions: Python offers extensive libraries and extensions, such as NumPy, Cython, and Numba, which provide optimized implementations for numerical and computationally intensive tasks.

  • Profile and optimize critical code paths: Identify performance bottlenecks using profiling tools like cProfile or line_profiler, and optimize critical code paths using techniques such as algorithmic improvements, caching, or parallelization.

  • Offload performance-critical tasks to compiled languages: Use tools like ctypes or Cython to interface with C/C++ code for performance-critical tasks while maintaining the high-level expressiveness of Python.

By understanding Python's execution model and employing optimization techniques, developers can strike a balance between productivity and performance, ensuring that Python remains a powerful tool for a wide range of applications.

With this chapter, we've explored the nuances of Python's performance characteristics and provided insights into mitigating performance concerns. Armed with this knowledge, developers can harness the full potential of Python while addressing performance requirements in their projects.

Did you find this article valuable?

Support Vishnu Tiwari by becoming a sponsor. Any amount is appreciated!

ย