16  Introducing threads and parallelism in Zig

Threads are available in Zig through the Thread struct from the Zig Standard Library. This struct represents a kernel thread, and it follows a POSIX Thread pattern, meaning that, it works similarly to a thread from the pthread C library, which is usually available on any distribution of the GNU C Compiler (gcc). If you are not familiar with threads, I will give you some threory behind it first, shall we?

16.1 What are threads?

A thread is basically a separate context of execution. We use threads to introduce parallelism into our program, which in most cases, makes the program runs faster, because we have multiple tasks being performed at the same time, parallel to each other.

Programs are normally single-threaded by default. Which means that each program usually runs on a single thread, or, a single context of execution. When we have only one thread running, we have no parallelism. And when we don’t have parallelism, the commands are executed sequentially, that is, only one command is executed at a time, one after another. By creating multiple threads inside our program, we start to execute multiple commands at the same time.

Programs that create multiple threads are very commom on the wild. Because many different types of applications are well suited for parallelism. Good examples are video and photo-editing applications (e.g. Adobe Photoshop or DaVinci Resolve) , games (e.g. The Witcher 3), and also web browsers (e.g. Google Chrome, Firefox, Microsoft Edge, etc). For example, in web browsers, threads are normally used to implement tabs. In other words, the tabs in a web browsers usually run as separate threads in the main process of the web browser. That is, each new tab that you open in your web browser, usually runs on a separate thread of execution.

By running each tab in a separate thread, we allow all open tabs in the browser to run at the same time, and independently from each other. For example, you might have YouTube, or Spotify, currently opened in a tab, and you are listening to some podcast in that tab, while, at the same time, you are working in another tab, writing an essay on Google Docs. Even if you are not looking into the YouTube tab, you can still hear the podcast only because this YouTube tab is running in parallel with the other tab where Google Docs is running.

Without threads, the other alternative would be to run each tab as a completely separate running process in your computer. But that would be a bad choice, because just a few tabs would already consume too much power and resources from your computer. In other words, is very expensive to create a completely new process, compared to creating a new thread of execution. Also, the chances of you experiencing lag and overhead while using the browser would be significant. Threads are faster to create, and they also consume much, much less resources from the computer, specially because they share some resources with the main process.

Therefore, is the use of threads in modern web browsers that allows you to hear the podcast at the same time while you are writing something on Google Docs. Without threads, a web browser would probably be limited to just one single tab.

Threads are also well-suited for anything that involves serving requests or orders. Because serving a request takes time, and usually involves a lot of “waiting time”. In other words, we spend a lot of time in idle, waiting for something to complete. For example, consider a restaurant. Serving orders in a restaurant usually involves the following steps:

  1. receive order from the client.
  2. pass the order to the kitchen, and wait for the food to be cooked.
  3. start cooking the food in the kitchen.
  4. when the food is fully cooked deliver this food to the client.

If you think about the bulletpoints above, you will notice that one big moment of waiting is present in this hole process, which is while the food is being prepared and cooked inside the kitchen. Because while the food is being prepped, both the waiter and the client itself are waiting for the food to be ready and delivered.

If we write a program to represent this restaurant, more specifically, a single-threaded program, then, this program would be very inefficient. Because the program would stay in idle, waiting for a considerable amount of time on the “check if food is ready” step. Consider the code snippet exposed below that could potentially represent such program.

The problem with this program is the while loop. This program will spend a lot of time waiting on the while loop, doing nothing more than just checking if the food is ready. This is a waste of time. Instead of waiting for something to happen, the waiter could just send the order to the kitchen, and just move on, and continue with receiving more orders from other clients, and sending more orders to the kitchen, insteading of doing nothing and waiting for the food to be ready.

const order = Order.init("Pizza Margherita", n = 1);
const waiter = Waiter.init();
waiter.receive_order(order);
waiter.ask_kitchen_to_cook();
var food_not_ready = false;
while (food_not_ready) {
    food_not_ready = waiter.is_food_ready();
}
const food = waiter.get_food_from_kitchen();
waiter.send_food_to_client(food);

This is why threads would be a great fit for this program. We could use threads to free the waiters from their “waiting duties”, so they can go on with their other tasks, and receive more orders. Take a look at the next example, where I have re-written the above program into a different program that uses threads to cook and deliver the orders.

You can see in this program that when a waiter receives a new order from a client, this waiter executes the send_order() function. The only thing that this function does is: it creates a new thread and detaches it. Since creating a thread is a very fast operation, this send_order() function returns almost immediatly, so the waiter spends almost no time worring about the order, and just move on and tries to get the next order from the clients.

Inside the new thread created, the order get’s cooked by a chef, and when the food is ready, it is delivered to the client’s table.

fn cook_and_deliver_order(order: *Order) void {
    const chef = Chef.init();
    const food = chef.cook(order.*);
    chef.deliver_food(food);
}
fn send_order(order: Order) void {
    const cook_thread = Thread.spawn(
        .{}, cook_and_deliver_order, .{&order}
    );
    cook_thread.detach();
}

const waiter = Waiter.init();
while (true) {
    const order = waiter.get_new_order();
    if (order) {
        send_order(order);
    }
}

16.2 Threads versus processes

When we run a program, this program is executed as a process in the operating system. This is a one to one relationship, each program or application that you execute is a separate process in the operating system. But each program, or each process, can create and contain multiple threads inside of it. Therefore, processes and threads have a one to many relationship.

This also means that every thread that we create is always associated with a particular process in our computer. In other words, a thread is always a subset (or a children) of an existing process. All threads share some of the resources associated with the process from which they were created. And because threads share resources with the process, they are very good for making communication between tasks easier.

For example, suppose that you were developing a big and complex application that would be much simpler if you could split it in two, and make these two separate pieces talk with each other. Some programmers opt to effectively write these two pieces of the codebase as two completely separate programs, and then, they use IPC (inter-process communication) to make these two separate programs/processes talk to each other, and make them work together.

However, some programmers find IPC hard to deal with, and, as consequence, they prefer to write one piece of the codebase as the “main part of the program”, or, as the part of the code that runs as the process in the operating system, while the other piece of the codebase is written as a task to be executed in a new thread. A process and a thread can easily comunicate with each other through both control flow, and also, through data, because they share and have access to the same standard file descriptors (stdout, stdin, stderr) and also to the same memory space on the heap and global data section.

In more details, each thread that you create have a separate stack frame reserved just for that thread, which essentially means that each local object that you create inside this thread, is local to that thread, i.e. the other threads cannot see this local object. Unless this object that you have created is an object that lives on the heap. In other words, if the memory associated with this object is on the heap, then, the other threads can potentially access this object.

Therefore, objects that are stored in the stack are local to the thread where they were created. But objects that are stored on the heap are potentially accessible to other threads. All of this means that, each thread have it’s own separate stack frame, but, at the same time, all threads share the same heap, the same standard file descriptors (which means that they share the same stdout, stdin, stderr), and the same global data section in the program.

16.3 Creating a thread

We create new threads in Zig, by first, importing the Thread struct into our current Zig module, and then, calling the spawn() method of this struct, which creates (or, “spawns”) a new thread of execution from our current process. This method have three arguments, which are, respectively:

  1. a SpawnConfig object, which contains configurations for the spawn process.
  2. the name of the function that is going to be executed (or, that is going to be “called”) inside this new thread.
  3. a list of arguments (or inputs) to be passed to the function provided in the second argument.

With these three arguments, you can control how the thread get’s created, and also, specify which work (or “tasks”) will be performed inside this new thread. A thread is just a separate context of execution, and we usually create new threads in our code, because we want to perform some work inside this new context of execution. And we specify which exact work, or, which exact steps that are going to be performed inside this context, by providing the name of a function on the second argument of the spawn() method.

Thus, when this new thread get’s created, this function that you provided as input to the spawn() method get’s called, or, get’s executed inside this new thread. You can control the arguments, or, the inputs that are passed to this function when it get’s called, by providing a list of arguments (or a list of inputs) on the third argument of the spawn() method. These arguments are passed to the function in the same order that they are provided to spawn().

Furthermore, the SpawnConfig is a struct object with only two possible fields, or, two possible members, that you can set to tailor the spawn behaviour. These fields are:

  • stack_size: you can provide an usize value to specify the size (in bytes) of the thread’s stack frame. By default, this value is: \(16 \times 1024 \times 1024\).
  • allocator: you can provide an allocator object to be used when allocating memory for the thread.

To use one of these two fields (or, “configs”) you just have to create a new object of type SpawnConfig, and provide this object as input to the spawn() method. But, if you are not interested in using one of these configs, and you are ok with using just the defaults, you can just provide an anonymous struct literal (.{}) in the place of this SpawnConfig argument.

As our first, and very simple example, consider the code exposed below. Inside the same program, you can create multiple threads of execution if you want to. But, in this first example, we are creating just a single thread of execution, because we call spawn() only once.

Also, notice in this example that we are executing the function do_some_work() inside the new thread. Since this function receives no inputs, because it has no arguments, in this instance, we have passed an empty list, or, more precisely, an empty and anonymous struct (.{}) in the third argument of spawn().

const std = @import("std");
const stdout = std.io.getStdOut().writer();
const Thread = std.Thread;
fn do_some_work() !void {
    _ = try stdout.write("Starting the work.\n");
    std.time.sleep(100 * std.time.ns_per_ms);
    _ = try stdout.write("Finishing the work.\n");
}

pub fn main() !void {
    const thread = try Thread.spawn(.{}, do_some_work, .{});
    thread.join();
}
Starting the work.Finishing the work.

Notice the use of try when calling the spawn() method. This means that this method can return an error in some circunstances. One circunstance in particular is when you attempt to create a new thread, when you have already created too much (i.e. you have excedeed the quota of concurrent threads in your system).

But, if the new thread is succesfully created, the spawn() method returns a handler object (which is just an object of type Thread) to this new thread. You can use this handler object to effectively control all aspects of the thread.

The instant that you create the new thread, the function that you provided as input to spawn() get’s invoked (i.e. get’s called) to start the execution on this new thread. In other words, everytime you call spawn(), not only a new thread get’s created, but also, the “start work button” of this thread get’s automatically pressed. So the work being performed in this thread starts at the moment that the thread is created. This is similar to how pthread_create() from the pthreads library in C works, which also starts the execution at the moment that the thread get’s created.

16.4 Returning from a thread

We have learned on the previous section that the execution of the thread starts at the moment that the thread get’s created. Now, we will learn how to “join” or “detach” a thread in Zig. “Join” and “detach” are operations that control how the thread returns to the main thread, or, to the main process in our program.

We perform these operations by using the methods join() and detach() from the thread handler object. Every thread that you create can be marked as either joinable or detached (Linux man-pages 2024). You can turn a thread into a detached thread by calling the detach() method from the thread handler object. But if you call the join() method instead, then, this thread becomes a joinable thread.

A thread cannot be both joinable and detached. Which in general means that you cannot call both join() and detach() on the same thread. But a thread must be one of the two, meaning that, you should always call either join() or detach() over a thread. If you don’t call one of these two methods over your thread, you introduce undefined behaviour into your program, which is described at Section 16.9.2.

Now, let’s describe what each of these two methods do to your thread.

16.4.1 Joining a thread

When you join a thread, you are essentially saying: “Hey! Could you please wait for the thread to finish, before you continue with your execution?”. For example, if we comeback to our first and simpliest example of a thread in Zig, in that example we have created a single thread inside the main() function of our program, and just called join() over this thread at the end. This section of the code example is reproduced below.

Because we are joining this new thread inside the main()’s scope, it means that the execution of the main() function is temporarily stopped, to wait for the execution of the thread to finish. That is, the execution of main() stops temporarily at the line where join() get’s called, and it will continue only after the thread has finished it’s tasks.

pub fn main() !void {
    const thread = try Thread.spawn(.{}, do_some_work, .{});
    thread.join();
}

Because we have joined this new thread inside main(), by calling join(), we have a garantee that this new thread will finish before the end of the execution of main(). Because it is garanteed that main() will wait for the thread to finish it’s tasks. You could also interpret this as: the execution of main will hang at the line where join() is called, and the next lines of code that come after this join() call, will be executed solely after the execution of main is “unlocked” after the thread finish it’s tasks.

In the example above, there is no more expressions after the join() call. We just have the end of the main()’s scope, and, therefore after the thread finish it’s tasks, the execution of our program just ends, since there is nothing more to do. But what if we had more stuff to do after the join call?

To demonstrate this other possibility, consider the next example exposed below. Here, we create a print_id() function, that just receives an id as input, and prints it to stdout. In this example, we are creating two new threads, one after another. Then, we join the first thread, then, we wait for two hole seconds, then, at last, we join the second thread.

The idea behind this example is that the last join() call is executed only after the first thread finish it’s task (i.e. the first join() call), and also, after the two seconds of delay. If you compile and run this example, you will notice that most messages are quickly printed to stdout, i.e. they appear almost instantly on your screen. However, the last message (“Joining thread 2”) takes aroung 2 seconds to appear in the screen.

fn print_id(id: *const u8) !void {
    try stdout.print("Thread ID: {d}\n", .{id.*});
}

pub fn main() !void {
    const id1: u8 = 1;
    const id2: u8 = 2;
    const thread1 = try Thread.spawn(.{}, print_id, .{&id1});
    const thread2 = try Thread.spawn(.{}, print_id, .{&id2});

    _ = try stdout.write("Joining thread 1\n");
    thread1.join();
    std.time.sleep(2 * std.time.ns_per_s);
    _ = try stdout.write("Joining thread 2\n");
    thread2.join();
}
Thread ID: Joining thread 1
1
Thread ID: 2
Joining thread 2

This demonstrates that both threads finish their work (i.e. printing the IDs) very fast, before the two seconds of delay end. Because of that, the last join() call returns pretty much instantly. Because when this last join() call happens, the second thread have already finished it’s task.

Now, if you compile and run this example, you will also notice that, in some cases, the messages get intertwined with each other. In other words, you might see the message “Joining thread 1” inserted in the middle of the message “Thread 1”, or vice-versa. This happens because:

  • the threads are executing basically at the same time as the main process of the program (i.e. the main() function).
  • the threads share the same stdout from the main process of the program, which means that the messages that the threads produce are sent to exact same place as the messages produced by the main process.

Both of these points were described previously at Section 16.1. So the messages might get intertwined because they are being produced and sent to the same stdout roughly at the same time. Anyway, when you call join() over a thread, the current process will wait for the thread to finish before it continues, and, when the thread does finishs it’s task, the resources associated with this thread are automatically freed, and, the current process continues with it’s execution.

16.4.2 Detaching a thread

When you detach a thread, by calling the detach() method, the thread is marked as detached. When a detached thread terminates, its resources are automatically released back to the system without the need for another thread to join with this terminated thread.

In other words, when you call detach() over a thread is like when your children becomes adults, i.e. they become independent from you. A detached thread frees itself, and it does need to report the results back to you, when the thread finishs it’s task. Thus, you normally mark a thread as detached when you don’t need to use the return value of the thread, or, when you don’t care about when exactly the thread finishs it’s job, i.e. the thread solves everything by itself.

Take the code example below. We create a new thread, detach it, and then, we just print a final message before we end our program. We use the same print_id() function that we have used over the previous examples.

fn print_id(id: *const u8) !void {
    try stdout.print("Thread ID: {d}\n", .{id.*});
}

pub fn main() !void {
    const id1: u8 = 1;
    const thread1 = try Thread.spawn(.{}, print_id, .{&id1});
    thread1.detach();
    _ = try stdout.write("Finish main\n");
}
Finish main

Now, if you look closely at the output of this code example, you will notice that only the final message in main was printed to the console. The message that was supposed to be printed by print_id() did not appear in the console. Why? Is because the main process of our program has finished first, before the thread was able to say anything.

And that is perfectly ok behaviour, because the thread was detached, so, it was able to free itself, without the need of the main process. If you ask main to sleep (or “wait”) for some extra nanoseconds, before it ends, you will likely see the message printed by print_id(), because you give enough time for the thread to finish before the main process ends.

16.5 Thread pools

Thread pools is a very popular programming pattern, which is used specially on servers and daemons processes. A thread pool is just a set of threads, or, a “pool” of threads. Many programmers like to use this pattern, because it makes easier to manage and use multiple threads, instead of manually creating the threads when you need them.

Also, using thread pools might increase performance as well in your program, especially if your program is constantly creating threads to perform short-lived tasks. In such instance, a thread pool might cause an increase in performance because you do not have be constantly creating and destroying threads all the time, so you don’t face a lot of the overhead involved in this constant process of creating and destroying threads.

The main idea behind a thread pool is to have a set of threads already created and ready to perform tasks at all times. You create a set of threads at the moment that your program starts, and keep these threads alive while your program runs. Each of these threads will be either performing a task, or, waiting for a task to be assigned. Every time a new task emerges in your program, this task is added to a “queue of tasks”. The moment that a thread becomes available and ready to perform a new task, this thread takes the next task in the “queue of tasks”, then, it simply performs the task.

The Zig Standard Library offers a thread pool implementation on the std.Thread.Pool struct. You create a new instance of a Pool object by providing a Pool.Options object as input to the init() method of this struct. A Pool.Options object, is a struct object that contains configurations for the pool of threads. The most important settings in this struct object are the members n_jobs and allocator. As the name suggests, the member allocator should receive an allocator object, while the member n_jobs specifies the number of threads to be created and maintained in this pool.

Consider the example exposed below, that demonstrates how can we create a new thread pool object. Here, we create a Pool.Options object that contains a general purpose allocator object, and also, the n_jobs member was set to 4, which means that the thread pool will create and use 4 threads.

Also notice that the pool object was initially set to undefined. This allow us to initially declare the thread pool object, but not properly instantiate the underlying memory of the object. You have to initially declare your thread pool object by using undefined like this, because the init() method of Pool needs to have an initial pointer to properly instantiate the object.

So, just remember to create your thread pool object by using undefined, and then, after that, you call the init() method over the object. You should also not forget to call the deinit() method over the thread pool object, once you are done with it, to release the resources allocated for the thread pool. Otherwise, you will have a memory leak in your program.

const std = @import("std");
const Pool = std.Thread.Pool;
pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    const opt = Pool.Options{
        .n_jobs = 4,
        .allocator = allocator,
    };
    var pool: Pool = undefined;
    _ = try pool.init(opt);
    defer pool.deinit();
}

Now that we know how to create Pool objects, we have to understand how to assign tasks to be executed by the threads in this pool object. To assign a task to be performed by a thread, we need to call the spawn() method from the thread pool object.

This spawn() method works identical to the spawn() method from the Thread object. The method have almost the same arguments as the previous one, more precisely, we don’t have to provide a SpawnConfig object in this case. But instead of creating a new thread, this spawn() method from the thread pool object just register a new task in the internal “queue of tasks” to be performed, and any available thread in the pool will get this task, and it will simply perform the task.

In the example below, we are using our previous print_id() function once again. But you may notice that the print_id() function is a little different this time, because now we are using catch instead of try in the print() call. Currently, the Pool struct only supports functions that don’t return errors as tasks. Thus, when assigining tasks to threads in a thread pool, is essential to use functions that don’t return errors. That is why we are using catch here, so that the print_id() function don’t return an error.

fn print_id(id: *const u8) void {
    _ = stdout.print("Thread ID: {d}\n", .{id.*})
        catch void;
}
const id1: u8 = 1;
const id2: u8 = 2;
try pool.spawn(print_id, .{&id1});
try pool.spawn(print_id, .{&id2});

This limitation should probably not exist, and, in fact, it is already on the radar of the Zig team to fix this issue, and it is being tracked on an open issue1. So, if you do need to provide a function that might return an error as the task to be performed by the threads in the thread pool, then, you are either limited to:

  • implementing your own thread pool that does not have this limitation.
  • wait for the Zig team to actually fix this issue.

16.6 Mutexes

Mutexes are a classic component of every thread library. In essence, a mutex is a Mutually Exclusive Flag, and this flag acts like a type of “lock”, or as a gate keeper to a particular section of your code. Mutexes are related to thread syncronization, more specifically, they prevent you from having some classic race conditions in your program, and, therefore, major bugs and undefined behaviour that are usually difficult to track and understand.

The main idea behind a mutex is to help us to control the execution of a particular section of the code, and to prevent two or more threads from executing this particular section of the code at the same time. Many programmers like to compare a mutex to a bathroom door (which usually have a lock). When a thread locks it’s own mutex object, it is like if the bathroom door was locked, and, therefore, the other people (in this case, the other threads) that wants to use the same bathroom at the same time have to be patient, and simply wait for the other person (or the other thread) to unlock the door and get out of the bathroom.

Some other programmers also like to explain mutexes by using the analogy of “each person will have their turn to speak”. This is the analogy used on the Multithreading Code video from the Computherfile project2. Imagine if you are in a conversation circle. There is a moderator in this circle, which is the person that decides who have the right to speak at that particular moment. The moderator gives a green card (or some sort of an authorization card) to the person that is going to speak, and, as a result, everyone else must be silent and hear this person that has the green card. When the person finishs talking, it gives the green card back to the moderator, and the moderator decides who is going to talk next, and delivers the green card to that person. And the cycle goes on like this.

A mutex acts like the moderator in this conversation circle. The mutex authorizes one single thread to execute a specific section of the code, and it also blocks the other threads from executing this same section of the code. If these other threads wants to execute this same piece of the code, they are forced to wait for the the authorized thread to finish first. When the authorized thread finishs executing this code, the mutex authorizes the next thread to execute this code, and the other threads are still blocked. Therefore, a mutex is like a moderator that does a “each thread will have their turn to execute this section of the code” type of control.

Mutexes are specially used to prevent data race problems from happening. A data race problem happens when two or more threads are trying to read from or write to the same shared object at the same time. So, when you have an object that is shared will all threads, and, you want to avoid two or more threads from accessing this same object at the same time, you can use a mutex to lock the part of the code that access this specific object. When a thread tries to run this code that is locked by a mutex, this thread stops it’s execution, and patiently waits for this section of the codebase to be unlocked to continue.

In other words, the execution of the thread is paused while the code section is locked by the mutex, and it is unpaused the moment that the code section is unlocked by the other thread that was executing this code section. Notice that mutexes are normally used to lock areas of the codebase that access/modify data that is shared with all threads, i.e. objects that are either stored in the global data section, or, in the heap space of your program. So mutexes are not normally used on areas of the codebase that access/modify objects that are local to the thread.

16.6.1 Critical section

Critical section is a concept commonly associated with mutexes and thread syncronization. In essence, a critical section is the section of the program that a thread access/modify a shared resource (i.e. an object, a file descriptor, something that all threads have access to). In other words, a critical section is the section of the program where race conditions might happen, and, therefore, where undefined behaviour can be introduced into the program.

When we use mutexes in our program, the critical section defines the area of the codebase that we want to lock. So we normally lock the mutex object at the beginning of the critical section, and then, we unlock it at the end of the critical section. The two bulletpoints exposed below comes from the “Critical Section” article from GeekFromGeeks, and they summarise well the role that a critical section plays in the thread syncronization problem (Geeks for Geeks 2024).

  1. The critical section must be executed as an atomic operation, which means that once one thread or process has entered the critical section, all other threads or processes must wait until the executing thread or process exits the critical section. The purpose of synchronization mechanisms is to ensure that only one thread or process can execute the critical section at a time.
  2. The concept of a critical section is central to synchronization in computer systems, as it is necessary to ensure that multiple threads or processes can execute concurrently without interfering with each other. Various synchronization mechanisms such as semaphores, mutexes, monitors, and condition variables are used to implement critical sections and ensure that shared resources are accessed in a mutually exclusive manner.

16.6.2 Atomic operations

You will also see the term “atomic operation” a lot when reading about threads, race conditions and mutexes. In summary, an operation is categorized as “atomic”, when there is no way to happen a context switch in the middle of this operation. In other words, this operation is always done from beginning to end, without interruptions of another process or operation in the middle of it’s execution phase.

Not many operations today are atomic. But why atomic operations matters here? Is because data races (which is a type of a race condition) cannot happen on operations that are atomic. So if a particular line in your code performs an atomic operation, then, this line will never suffer from a data race problem. Therefore, programmers sometimes use an atomic operation to protect themselves from data race problems in their code.

When you have an operation that is compiled into just one single assembly instruction, this operation might be atomic, because is just one assembly instruction. But this is not guaranteed. This is usually true for old CPU architectures (such as x86). But nowadays, most assembly instructions in modern CPU architectures turn into multiple micro-tasks, which inherently makes the operation not atomic anymore, even though it has just one single assembly instruction.

The Zig Standard Library offers some atomic functionality at the std.atomic module. In this module, you will find a public and generic function called Value(). With this function we create an “atomic object”, which is a value that contains some native atomic operations, most notably, a load() and a fetchAdd() operation. If you have experience with multithreading in C++, you probably have recognized this pattern. So yes, this generic “atomic object” in Zig is essentially identical to the template struct std::atomic from the C++ Standard Library. Is important to emphasize that only primitive data types (i.e. the types presented at Section 1.5) are supported by these atomic operations.

16.6.3 Data races and race conditions

To understand why mutexes are used, we need to understand better the problem that they seek to solve, which can be summarized into data races problems. A data race problem is a type of a race condition, which happens when one thread is accessing a particular memory location (i.e. a particular shared object) at the same time that another thread is trying to write/save new data into this same memory location (i.e. the same shared object).

We can simply define a race condition as any type of bug in your program that is based on a “who get’s there first” problem. A data race problem is a type of a race condition, because it occurs when two or more parties are trying to read and write into the same memory location at the same time, and, therefore, the end result of this operation depends completely on who get’s to this memory location first. As consequence, a program that have a data race problem will likely produce a different result each time that we execute it.

Thus, race conditions produce unefined behaviour and unpredictability because the program produces a different answer in each time that a different person get’s to the target location first than the others. And we have no easy way to either predict or control who is going to get to this target location first. In other words, in each execution of your program, you get a different answer, because a different person, or, a different function, or, a different part of the code is finishing its tasks (or it is reaching a location) first than the others.

As an example, consider the code snippet exposed below. In this example, we create a global counter variable, and we also create a increment() function, whose job is to just increment this global counter variable in a for loop.

Since the for loop iterates 1 hundred thousand times, and, we create two separate threads in this code example, what number do you expect to see in the final message printed to stdout? The answer should be 2 hundred thousand. Right? Well, in threory, this program was supposed to print 2 hundred thousand at the end, but in practice, every time that I execute this program I get a different answer.

In the example exposed below, you can see that this time we have executed the program, the end result was 117254, instead of the expected 200000. The second time I have executed this program, I got the number 108592 as result. So the end result of this program is varying, but it never gets to the expected 200000 that we want.

// Global counter variable
var counter: usize = 0;
// Function to increment the counter
fn increment() void {
    for (0..100000) |_| {
        counter += 1;
    }
}

pub fn main() !void {
    const thr1 = try Thread.spawn(.{}, increment, .{});
    const thr2 = try Thread.spawn(.{}, increment, .{});
    thr1.join();
    thr2.join();
    try stdout.print("Couter value: {d}\n", .{counter});
}
Couter value: 117254

Why this is happening? The answer is: because this program contains a data race problem. This program would print the correct number 200000, if, and only if the first thread finishs it’s tasks before the second thread starts to execute. But that is very unlikely to happen. Because the process of creating the thread is too fast, and therefore, both threads starts to execute roughly at the same time. If you change this code to add some nanoseconds of sleep between the first and the second calls to spawn(), you will increase the chances of the program producing the “correct result”.

So the data race problem happens, because both threads are reading and writing to the same memory location at roughly the same time. In this example, each thread is essentially performing three basic operations at each iteration of the for loop, which are:

  1. reading the current value of count.
  2. incrementing this value by 1.
  3. writing the result back into count.

Ideally, a thread B should read the value of count, only after the other thread A has finished writing the incremented value back into the count object. Therefore, in the ideal scenario, which is demonstrated at Table 16.1, the threads should work in sync with each other. But the reality is that these threads are out of sync, and because of that, they suffer from a data race problem, which is demonstrated at Table 16.2.

Notice that, in the data race scenario (Table 16.2), the read performed by a thread B happens before the write operation of thread A, and that ultimately leads to wrong results at the end of the program. Because when the thread B reads the value from the count variable, the thread A is still processing the initial value from count, and it did not write the new and incremented value into count yet. So what happens is that thread B ends up reading the same initial value (the “old” value) from count, instead of reading the new and incremented version of this value that would be calculated by thread A.

Table 16.1: An ideal scenario for two threads incrementing the same integer value
Thread 1 Thread 2 Integer value
read value 0
increment 1
write value 1
read value 1
increment 2
write value 2
Table 16.2: A data race scenario when two threads are incrementing the same integer value
Thread 1 Thread 2 Integer value
read value 0
read value 0
increment 1
increment 1
write value 1
write value 1

If you think about these diagrams exposed in form of tables, you will notice that they relate back to our discussion of atomic operations at Section 16.6.2. Remember, atomic operations are operations that the CPU executes from beginning to end, without interruptions from other threads or processes. So, the scenario exposed at Table 16.1 do not suffer from a data race, because the operations performed by thread A are not interrupted in the middle by the operations from thread B.

If we also think about the discussion of critical section from Section 16.6.1, we can identify the section that representes the critical section of the program, which is the section that is vulnerable to data race conditions. In this example, the critical section of the program is the line where we increment the counter variable (counter += 1). So, ideally, we want to use a mutex, and lock right before this line, and then, unlock right after this line.

16.6.4 Using mutexes in Zig

Now that we know the problem that mutexes seek to solve, we can learn how to use them in Zig. Mutexes in Zig are available through the std.Thread.Mutex struct from the Zig Standard Library. If we take the same code example from the previous example, and improve it with mutexes, to solve our data race problem, we get the code example exposed below.

Notice that we had this time to alter the increment() function to receive a pointer to the Mutex object as input. All that we need to do, to make this program safe against data race problems, is to call the lock() method at the beginning of the critical section, and then, call unlock() at the end of the critical section. Notice that the output of this program is now the correct number of 200000.

const std = @import("std");
const stdout = std.io.getStdOut().writer();
const Thread = std.Thread;
const Mutex = std.Thread.Mutex;
var counter: usize = 0;
fn increment(mutex: *Mutex) void {
    for (0..100000) |_| {
        mutex.lock();
        counter += 1;
        mutex.unlock();
    }
}

pub fn main() !void {
    var mutex: Mutex = .{};
    const thr1 = try Thread.spawn(.{}, increment, .{&mutex});
    const thr2 = try Thread.spawn(.{}, increment, .{&mutex});
    thr1.join();
    thr2.join();
    try stdout.print("Couter value: {d}\n", .{counter});
}
Couter value: 200000

16.7 Read/Write locks

Mutexes are normally used when is always not safe to have two or more threads running the same piece of code at the same time. In contrast, read/write locks are normally used in situations where you have a mixture of scenarios, i.e. there are some pieces of the codebase that are safe to run in parallel, and other pieces that are not safe.

For example, suppose that you have multiple threads that uses the same shared file in the filesystem to store some configurations, or, statistics. If two or more threads try to read the data from this same file at the same time, nothing bad happens. So this part of the codebase is perfectly safe to be executed in parallel, with multiple threads reading the same file at the same time.

However, if two or more threads try to write data into this same file at the same time, then, we cause some race conditions problems. So this other part of the codebase is not safe to be executed in parallel. More specifically, a thread might end up writing data in the middle of the data written by the other thread. This process of two or more threads writing to the same location, might lead to data corruption. This specific situation is usually called of a torn write.

Thus, what we can extract from this is that there is certain types of operations that causes a race condition, but there are also, other types of operations that do not cause a race condition problem. You could also say that, there are types of operations that are susceptible to race condition problems, and there are other types of operations that are not.

A read/write lock is a type of lock that acknowledges the existance of this specific scenario, and you can use this type of lock to control which parts of the codebase are safe to run in parallel, and which parts are not safe.

16.7.1 Exclusive lock vs shared lock

Therefore, a read/write lock is a little different from a mutex. Because a mutex is always an exclusive lock, meaning that, only one thread is allowed to execute at all times. With an exclusive lock, the other threads are always “excluded”, i.e. they are always blocked from executing. But in a read/write lock, the other threads might be authorized to run at the same time, depending on the type of lock that they acquire.

We have two types of locks in a read/write lock, which are: an exclusive lock and a shared lock. An exclusive lock works exactly the same as a mutex, while a shared lock is a lock that does not block the other threads from running. In the pthreads C library, read/write locks are available through the pthread_rwlock_t C struct. With this C struct, you can create a “write lock”, which corresponds to an exclusive lock, or, you can create a “read lock”, which corresponds to a shared lock. The terminology might be a little different, but the meaning is the same, so just remember this relationship, write locks are exclusive locks, while read locks are shared locks.

When a thread tries to acquire a read lock (i.e. a shared lock), this thread get’s the shared lock if, and only if another thread does not currently holds a write lock (i.e. an exclusive lock), and also, if there are no other threads that are already in the queue, waiting for their turn to acquire a write lock. In other words, the thread in the queue have attempted to get a write lock earlier, but this thread was blocked because there was another thread running that already had a write lock. As consequence, this thread is on the queue to get a write lock, and it’s currently waiting for the other thread with a write lock to finish it’s execution.

When a thread tries to acquire a read lock, but it fails in acquiring this read lock, either because there is a thread with a write lock already running, or, because there is a thread in the queue to get a write lock, the execution of this thread is instantly blocked, i.e. paused. This thread will indefinitely attempt to get the read lock, and it’s execution will be unblocked (or unpaused) only after this thread successfully acquires the read lock.

If you think deeply about this dynamic between read locks versus write locks, you might notice that a read lock is basically a safety mechanism. More specifically, it is a way for us to allow a particular thread to run together with the other threads, only when it’s safe to. In other words, if there is currently a thread with a write lock running, then, it is very likely not safe for the thread that is trying to acquire the read lock to run now. As consequence, the read lock protects this thread from running into dangerous waters, and patienly waits for the “write lock” thread to finishs it’s tasks before it continues.

On the other hand, if there are only “read lock” (i.e. “shared lock”) threads currently running (i.e. not a single “write lock” thread currently exists), then, is perfectly safe for this thread that is acquiring the read lock to run in parallel with the other threads. As a result, the read lock just allows for this thread to run together with the other threads.

Thus, by using read locks (shared locks) in conjunction with write locks (exclusive locks), we can control which regions or sections of our multithreaded code is safe for us to have parallelism, and which sections are not safe to have parallelism.

16.7.2 Using read/write locks in Zig

The Zig Standard Library supports read/write locks through the std.Thread.RwLock module. If you want to a particular thread to acquire a shared lock (i.e. a read lock), you should call the lockShared() method from the RwLock object. But, if you want for this thread to acquire an exclusive lock (i.e. a write lock) instead, then, you should call the lock() method from the RwLock object.

As with mutexes, we also have to unlock the shared or exclusive locks that we acquire through a read/write lock object, once we are at the end of our “critical section”. If you have acquired an exclusive lock, then, you unlock this exclusive lock by calling the unlock() method from the read/write lock object. In contrast, if you have acquired a shared lock instead, then, call unlockShared() to unlock this shared lock.

As a simple example, the code below creates three separate threads responsible for reading the current value in a counter object, and it also creates another thread, responsible for writing new data into the counter object (incrementing it, more specifically).

var counter: u32 = 0;
fn reader(lock: *RwLock) !void {
    while (true) {
        lock.lockShared();
        const v: u32 = counter;
        try stdout.print("{d}", .{v});
        lock.unlockShared();
        std.time.sleep(2 * std.time.ns_per_s);
    }
}
fn writer(lock: *RwLock) void {
    while (true) {
        lock.lock();
        counter += 1;
        lock.unlock();
        std.time.sleep(2 * std.time.ns_per_s);
    }
}

pub fn main() !void {
    var lock: RwLock = .{};
    const thr1 = try Thread.spawn(.{}, reader, .{&lock});
    const thr2 = try Thread.spawn(.{}, reader, .{&lock});
    const thr3 = try Thread.spawn(.{}, reader, .{&lock});
    const wthread = try Thread.spawn(.{}, writer, .{&lock});

    thr1.join();
    thr2.join();
    thr3.join();
    wthread.join();
}

16.8 Yielding a thread

The Thread struct supports yielding through the yield() method. Yielding a thread means that the execution of the thread is temporarily stopped, and the thread comes back to the end of the queue of priority of the scheduler from your operating system.

That is, when you yield a thread, you are essentially saying the following to your OS: “Hey! Could you please stop executing this thread for now, and comeback to continue it later?”. You could also interpret this yield operation as: “Could you please deprioritize this thread, to focus on doing other things instead?”. So this yield operation is also a way for you to stop a particular thread, so that you can work and prioritize other threads instead.

Is important to say that, yielding a thread is a “not so commom” thread operation these days. In other words, not many programmers use yielding in production, simply because is hard to use this operation and make it work properly, and also, there are better alternatives. Most programmers prefer to use join() instead. In fact, most of the times, when you see somebody using yield in some code example, they are mostly using it to help them debug race conditions in their applications. That is, yield is mostly used as a debug tool nowadays.

Anyway, if you want to yield a thread, just call the yield() method from it, like this:

thread.yield();

16.9 Common problems in threads

16.9.1 Deadlocks

A deadlock occurs when two or more threads are blocked forever, waiting for each other to release a resource. This usually happens when multiple locks are involved, and the order of acquiring them is not well managed.

The code example below demonstrates a deadlock situation. We have two different threads that execute two different functions (work1() and work2()) in this example. And we also have two separate mutexes. If you compile and run this code example, you will notice that the program just runs indefinitely, without ending.

When we look into the first thread, which executes the work1() function, we can notice that this function acquires the mut1 lock first. Because this is the first operation that is executed inside this thread, which is the first thread created in the program. After that, the function sleeps for 1 second, to simulate some type of work, and then, the function tries to acquire the mut2 lock.

On the other hand, when we look into the second thread, which executes the work2() function, we can see that this function acquires the mut2 lock first. Because when this thread get’s created and it tries to acquire this mut2 lock, the first thread is still sleeping on that “sleep 1 second” line. After acquiring mut2, the work2() function also sleeps for 1 second, to simulate some type of work, and then, the function tries to acquire the mut1 lock.

This creates a deadlock situation, because after the “sleep for 1 second” line in both threads, the thread 1 is trying to acquire the mut2 lock, but this lock is currently being used by thread 2. However, at this moment, the thread 2 is also trying to acquire the mut1 lock, which is currently being used by thread 1. Therefore, both threads end up waiting for ever. Waiting for their peer to free the lock that they want to acquire.

var mut1: Mutex = .{}; var mut2: Mutex = .{};
fn work1() !void {
    mut1.lock();
    std.time.sleep(1 * std.time.ns_per_s);
    mut2.lock();
    _ = try stdout.write("Doing some work 1\n");
    mut2.unlock(); mut1.unlock();
}

fn work2() !void {
    mut2.lock();
    std.time.sleep(1 * std.time.ns_per_s);
    mut1.lock();
    _ = try stdout.write("Doing some work 1\n");
    mut1.unlock(); mut2.unlock();
}

pub fn main() !void {
    const thr1 = try Thread.spawn(.{}, work1, .{});
    const thr2 = try Thread.spawn(.{}, work2, .{});
    thr1.join();
    thr2.join();
}

16.9.2 Not calling join() or detach()

When you do not call either join() or detach() over a thread, then, this thread becomes a “zombie thread”, because it does not have a clear “return point”. You could also interpret this as: “nobody is properly resposible for managing the thread”. When we don’t establish if a thread is either joinable or detached, nobody becomes responsible for dealing with the return value of this thread, and also, nobody becomes responsible for clearing (or freeing) the resources associated with this thread.

You don’t want to be in this situation, so remember to always use join() or detach() on the threads that you create. When you don’t use these methods, the execution of the thread becomes completely independent from the execution of the main process in your program. This means that the main process of your program might end before the thread finish it’s job, or vice-versa. The idea is that we have no idea of who is going to finish first. It becomes a race condition problem. In such case, we loose control over this thread, and it’s resources are never freed (i.e. you have leaked resources in the system).

16.9.3 Cancelling or killing a particular thread

When we think about the pthreads C library, there is a possible way to asynchronously kill or cancel a thread, which is by sending a SIGTERM signal to the thread through the pthread_kill() function. But canceling a thread like this is bad. Is dangerously bad. As consequence, the Zig implementation of threads does not have a similar function, or, a similar way to asynchronously cancel or kill a thread.

Therefore, if you want to cancel a thread in the middle of it’s execution in Zig, then, one good strategy that you can take is to use control flow in your favor in conjunction with join(). More specifically, you can design your thread around a while loop, that is constantly checking if the thread should continue running. If is time to cancel the thread, we could make the while loop break, and join the thread with the main thread by calling join().

The code example below demonstrates to some extent this strategy. Here, we are using control flow to break the while loop, and exit the thread earlier than what we have initially planned to. This example also demonstrates how can we use atomic objects in Zig with the Value() generic function that we have mentioned at Section 16.6.2.

var running = std.atomic.Value(bool).init(true);
var counter: u64 = 0;
fn do_more_work() void {
    std.time.sleep(2 * std.time.ns_per_s);
}
fn work() !void {
    while (running.load(.monotonic)) {
        for (0..10000) |_| { counter += 1; }
        if (counter < 15000) {
            _ = try stdout.write("Time to cancel the thread.\n");
            running.store(false, .monotonic);
        } else {
            _ = try stdout.write("Time to do more work.\n");
            do_more_work();
            running.store(false, .monotonic);
        }
    }
}

pub fn main() !void {
    const thread = try Thread.spawn(.{}, work, .{});
    thread.join();
}

  1. https://github.com/ziglang/zig/issues/18810↩︎

  2. https://www.youtube.com/watch?v=7ENFeb-J75k&ab_channel=Computerphile↩︎