16 Introducing threads and parallelism in Zig

Threads are available in Zig through the Thread struct from the Zig Standard Library. This struct represents a kernel thread, and it follows a POSIX Thread pattern, meaning that, it works similarly to a thread from the pthread C library, which is usually available on any distribution of the GNU C Compiler (gcc). If you are not familiar with threads, I will give you some theory behind it first, shall we?

16.1 What are threads?

A thread is basically a separate context of execution. We use threads to introduce parallelism into our program, which in most cases, makes the program run faster, because we have multiple tasks being performed at the same time, parallel to each other.

Programs are normally single-threaded by default. Which means that each program usually runs on a single thread, or, a single context of execution. When we have only one thread running, we have no parallelism. And when we don’t have parallelism, the commands are executed sequentially, that is, only one command is executed at a time, one after another. By creating multiple threads inside our program, we start to execute multiple commands at the same time.

Programs that create multiple threads are very common in the wild. Because many different types of applications are well suited for parallelism. Good examples are video and photo-editing applications (e.g. Adobe Photoshop or DaVinci Resolve), games (e.g. The Witcher 3), and also web browsers (e.g. Google Chrome, Firefox, Microsoft Edge, etc). For example, in web browsers, threads are normally used to implement tabs. The tabs in a web browsers usually run as separate threads in the main process of the web browser. That is, each new tab that you open in your web browser usually runs on a separate thread of execution.

By running each tab in a separate thread, we allow all open tabs in the browser to run at the same time, and independently from each other. For example, you might have YouTube or Spotify currently open in a tab, and you are listening to some podcast in that tab while at the same time working in another tab, writing an essay on Google Docs. Even if you are not looking into the YouTube tab, you can still hear the podcast only because this YouTube tab is running in parallel with the other tab where Google Docs is running.

Without threads, the other alternative would be to run each tab as a completely separate process in your computer. But that would be a bad choice because just a few tabs would already consume too much power and resources from your computer. In other words, it’s very expensive to create a completely new process, compared to creating a new thread of execution. Also, the chances of you experiencing lag and overhead while using the browser would be significant. Threads are faster to create, and they also consume much, much less resources from the computer, especially because they share some resources with the main process.

Therefore, it’s the use of threads in modern web browsers that allow you to hear the podcast at the same time while you are writing something on Google Docs. Without threads, a web browser would probably be limited to just one single tab.

Threads are also well-suited for anything that involves serving requests or orders. Because serving a request takes time, and usually involves a lot of “waiting time”. In other words, we spend a lot of time in idle, waiting for something to complete. For example, consider a restaurant. Serving orders in a restaurant usually involves the following steps:

receive order from the client.
pass the order to the kitchen, and wait for the food to be cooked.
start cooking the food in the kitchen.
when the food is fully cooked deliver this food to the client.

If you think about the bullet points above, you will notice that one big moment of waiting time is present in this whole process, which is while the food is being cooked inside the kitchen. While the food is being prepped, both the waiter and the client themselves are waiting for the food to be ready and delivered.

If we write a program to represent this restaurant, more specifically, a single-threaded program, then this program would be very inefficient. Because the program would stay in idle, waiting for a considerable amount of time on the “check if food is ready” step. Consider the code snippet exposed below that could potentially represent such program.

The problem with this program is the while loop. This program will spend a lot of time waiting on the while loop, doing nothing more than just checking if the food is ready. This is a waste of time. Instead of waiting for something to happen, the waiter could just send the order to the kitchen, and just move on, and continue with receiving more orders from other clients, and sending more orders to the kitchen, instead of doing nothing and waiting for the food to be ready.

const order = Order.init("Pizza Margherita", n = 1);
const waiter = Waiter.init();
waiter.receive_order(order);
waiter.ask_kitchen_to_cook();
var food_not_ready = true;
while (food_not_ready) {
    food_not_ready = waiter.is_food_ready();
}
const food = waiter.get_food_from_kitchen();
waiter.send_food_to_client(food);

This is why threads would be a great fit for this program. We could use threads to free the waiters from their “waiting duties”, so they can go on with their other tasks, and receive more orders. Take a look at the next example, where I have re-written the above program into a different program that uses threads to cook and deliver the orders.

You can see in this program that when a waiter receives a new order from a client, this waiter executes the send_order() function. The only thing that this function does is to create a new thread and detaches it. Since creating a thread is a very fast operation, this send_order() function returns almost immediately, so the waiter spends almost no time worrying about the order, and just move on and tries to get the next order from the clients.

Inside the new thread created, the order gets cooked by a chef, and when the food is ready, it’s delivered to the client’s table.

fn cook_and_deliver_order(order: *Order) void {
    const chef = Chef.init();
    const food = chef.cook(order.*);
    chef.deliver_food(food);
}
fn send_order(order: Order) void {
    const cook_thread = Thread.spawn(
        .{}, cook_and_deliver_order, .{&order}
    );
    cook_thread.detach();
}

const waiter = Waiter.init();
while (true) {
    const order = waiter.get_new_order();
    if (order) {
        send_order(order);
    }
}

16.2 Threads versus processes

When we run a program, this program is executed as a process in the operating system. This is a one to one relationship, each program or application that you execute is a separate process in the operating system. But each program, or each process, can create and contain multiple threads inside of it. Therefore, processes and threads have a one to many relationship.

This also means that every thread that we create is always associated with a particular process in our computer. In other words, a thread is always a subset (or a children) of an existing process. All threads share some of the resources associated with the process from which they were created. And because threads share resources with the process, they are very good for making communication between tasks easier.

For example, suppose that you were developing a big and complex application that would be much simpler if you could split it in two, and make these two separate pieces talk with each other. Some programmers opt to effectively write these two pieces of the codebase as two completely separate programs, and then, they use IPC (inter-process communication) to make these two separate programs/processes talk to each other, and make them work together.

However, some programmers find IPC hard to deal with, and, as consequence, they prefer to write one piece of the codebase as the “main part of the program”, or, as the part of the code that runs as the process in the operating system, while the other piece of the codebase is written as a task to be executed in a new thread. A process and a thread can easily comunicate with each other through both control flow, and also, through data, because they share and have access to the same standard file descriptors (stdout, stdin, stderr), and also to the same memory space on the heap and global data section.

In more details, each thread that you create have a separate stack frame reserved just for that thread, which essentially means that each local object that you create inside this thread, is local to that thread, i.e., the other threads cannot see this local object. Unless this object that you have created is an object that lives on the heap. In other words, if the memory associated with this object is on the heap, then, the other threads can potentially access this object.

Therefore, objects that are stored in the stack are local to the thread where they were created. But objects that are stored on the heap are potentially accessible to other threads. All of this means that, each thread has its own separate stack frame, but, at the same time, all threads share the same heap, the same standard file descriptors (which means that they share the same stdout, stdin, stderr), and the same global data section in the program.

16.3 Creating a thread

We create new threads in Zig by first importing the Thread struct into our current Zig module and then calling the spawn() method of this struct, which creates (or “spawns”) a new thread of execution from our current process. This method has three arguments, which are, respectively:

a SpawnConfig object, which contains configurations for the spawn process.
the name of the function that is going to be executed (or that is going to be “called”) inside this new thread.
a list of arguments (or inputs) to be passed to the function provided in the second argument.

With these three arguments, you can control how the thread gets created, and also, specify which work (or “tasks”) will be performed inside this new thread. A thread is just a separate context of execution, and we usually create new threads in our code because we want to perform some work inside this new context of execution. And we specify which exact work, or which exact steps that are going to be performed inside this context by providing the name of a function as the second argument of the spawn() method.

Thus, when this new thread gets created, this function that you provided as input to the spawn() method gets called, or gets executed inside this new thread. You can control the arguments, or the inputs that are passed to this function when it gets called by providing a list of arguments (or a list of inputs) in the third argument of the spawn() method. These arguments are passed to the function in the same order that they are provided to spawn().

Furthermore, the SpawnConfig is a struct object with only two possible fields, or, two possible members, that you can set to tailor the spawn behaviour. These fields are:

stack_size: you can provide a usize value to specify the size (in bytes) of the thread’s stack frame. By default, this value is: \(16 \times 1024 \times 1024\).
allocator: you can provide an allocator object to be used when allocating memory for the thread.

To use one of these two fields (or “configs”), you just have to create a new object of type SpawnConfig, and provide this object as input to the spawn() method. But, if you are not interested in using one of these configs, and you are ok with using just the defaults, you can just provide an anonymous struct literal (.{}) in place of this SpawnConfig argument.

As our first, and very simple example, consider the code exposed below. Inside the same program, you can create multiple threads of execution if you want to. But, in this first example, we are creating just a single thread of execution, because we call spawn() only once.

Also, notice in this example that we are executing the function do_some_work() inside the new thread. Since this function receives no inputs, because it has no arguments, we have passed an empty list in this instance, or more precisely, an empty, anonymous struct (.{}) in the third argument of spawn().

const std = @import("std");
const stdout = std.io.getStdOut().writer();
const Thread = std.Thread;
fn do_some_work() !void {
    _ = try stdout.write("Starting the work.\n");
    std.time.sleep(100 * std.time.ns_per_ms);
    _ = try stdout.write("Finishing the work.\n");
}

pub fn main() !void {
    const thread = try Thread.spawn(.{}, do_some_work, .{});
    thread.join();
}

Starting the work.Finishing the work.

Notice the use of try when calling the spawn() method. This means that this method can return an error in some circumstances. One circumstance in particular is when you attempt to create a new thread, when you have already created too much (i.e., you have exceeded the quota of concurrent threads in your system).

But, if the new thread is successfully created, the spawn() method returns a handler object (which is just an object of type Thread) to this new thread. You can use this handler object to effectively control all aspects of the thread.

When the thread gets created, the function that you provided as input to spawn() gets invoked (i.e., gets called) to start the execution on this new thread. In other words, every time you call spawn(), not only is a new thread created, but the “start work button” of this thread is also automatically pressed. So the work being performed in this thread starts as soon as the thread is created. This is similar to how pthread_create() from the pthreads library in C works, which also starts the execution as soon as the thread is created.

16.4 Returning from a thread

We have learned in the previous section that the execution of the thread starts as soon as the thread is created. Now, we will learn how to “join” or “detach” a thread in Zig. “Join” and “detach” are operations that control how the thread returns to the main thread, or to the main process in our program.

We perform these operations by using the methods join() and detach() from the thread handler object. Every thread that you create can be marked as either joinable or detached (Linux man-pages 2024). You can turn a thread into a detached thread by calling the detach() method from the thread handler object. But if you call the join() method instead, then this thread becomes a joinable thread.

A thread cannot be both joinable and detached. Which in general means that you cannot call both join() and detach() on the same thread. But a thread must be one of the two, meaning that, you should always call either join() or detach() over a thread. If you don’t call one of these two methods over your thread, you introduce undefined behaviour into your program, which is described in Section 16.9.2.

Now, let’s describe what each of these two methods do to your thread.

16.4.1 Joining a thread

When you join a thread, you are essentially saying: “Hey! Could you please wait for the thread to finish, before you continue with your execution?”. For example, if we come back to our first and simplest example of a thread in Zig, we created a single thread inside the main() function of our program and just called join() on this thread at the end. This section of the code example is reproduced below.

Because we are joining this new thread inside the main()’s scope, it means that the execution of the main() function is temporarily stopped, to wait for the execution of the thread to finish. That is, the execution of main() stops temporarily at the line where join() gets called, and it will continue only after the thread has finished its tasks.

pub fn main() !void {
    const thread = try Thread.spawn(.{}, do_some_work, .{});
    thread.join();
}

Because we have joined this new thread inside the main() scope, we have a guarantee that this new thread will finish before the end of the execution of main(). Because it’s guaranteed that main() will wait for the thread to finish its tasks.

In the example above, there are no more expressions after the join() call. We just have the end of the main()’s scope, and, therefore, the execution of our program just ends after the thread finishes its tasks, since there is nothing more to do. But what if we had more stuff to do after the join call?

To demonstrate this other possibility, consider the next example exposed below. Here, we create a print_id() function, that just receives an id as input, and prints it to stdout. In this example, we are creating two new threads, one after another. Then, we join the first thread, then, we wait for two whole seconds, then, at last, we join the second thread.

The idea behind this example is that the last join() call is executed only after the first thread finishes its task (i.e., the first join() call), and the two-second delay. If you compile and run this example, you will notice that most messages are quickly printed to stdout, i.e., they appear almost instantly on your screen. However, the last message (“Joining thread 2”) takes around 2 seconds to appear on the screen.

fn print_id(id: *const u8) !void {
    try stdout.print("Thread ID: {d}\n", .{id.*});
}

pub fn main() !void {
    const id1: u8 = 1;
    const id2: u8 = 2;
    const thread1 = try Thread.spawn(.{}, print_id, .{&id1});
    const thread2 = try Thread.spawn(.{}, print_id, .{&id2});

    _ = try stdout.write("Joining thread 1\n");
    thread1.join();
    std.time.sleep(2 * std.time.ns_per_s);
    _ = try stdout.write("Joining thread 2\n");
    thread2.join();
}

Thread ID: Joining thread 1
1
Thread ID: 2
Joining thread 2

This demonstrates that both threads finish their work (i.e., printing the IDs) very fast, before the two seconds of delay end. Because of that, the last join() call returns pretty much instantly. Because when this last join() call happens, the second thread has already finished its task.

Now, if you compile and run this example, you will also notice that, in some cases, the messages intertwine with each other. In other words, you might see the message “Joining thread 1” inserted in the middle of the message “Thread 1”, or vice-versa. This happens because:

the threads are executing basically at the same time as the main process of the program (i.e., the main() function).
the threads share the same stdout from the main process of the program, which means that the messages that the threads produce are sent to exact same place as the messages produced by the main process.

Both of these points were described previously in Section 16.1. So the messages might get intertwined because they are being produced and sent to the same stdout roughly at the same time. Anyway, when you call join() over a thread, the current process will wait for the thread to finish before it continues, and, when the thread finishes its task, the resources associated with this thread are automatically freed, and the current process continues with its execution.

16.4.2 Detaching a thread

When you detach a thread, the resources associated with this thread are automatically released back to the system, without the need for another thread to join with this terminated thread.

In other words, when you call detach() on a thread it’s like when your children become adults, i.e., they become independent from you. A detached thread frees itself, and when this thread finishes its tasks, it does not report the results back to you. Thus, you normally mark a thread as detached when you don’t need to use the return value of the thread, or when you don’t care about when exactly the thread finishes its job, i.e., the thread solves everything by itself.

Take the code example below. We create a new thread, detach it, and then, we just print a final message before we end our program. We use the same print_id() function that we have used over the previous examples.

fn print_id(id: *const u8) !void {
    try stdout.print("Thread ID: {d}\n", .{id.*});
}

pub fn main() !void {
    const id1: u8 = 1;
    const thread1 = try Thread.spawn(.{}, print_id, .{&id1});
    thread1.detach();
    _ = try stdout.write("Finish main\n");
}

Finish main

Now, if you look closely at the output of this code example, you will notice that only the final message in main was printed to the console. The message that was supposed to be printed by print_id() did not appear in the console. Why? It’s because the main process of our program has finished first, before the thread was able to say anything.

And that is perfectly ok behaviour, because the thread was detached, so it was able to free itself, without the need to wait for the main process. If you ask main to sleep (or “wait”) for some extra nanoseconds, before it ends, you will likely see the message printed by print_id(), because you give enough time for the thread to finish before the main process ends.

16.5 Thread pools

Thread pools is a very popular programming pattern, which is used especially on servers and daemons processes. A thread pool is just a set of threads, or a “pool” of threads. Many programmers like to use this pattern because it makes it easier to manage and use multiple threads in your program, instead of manually creating the threads when you need them.

Also, using thread pools might increase performance as well in your program, especially if your program is constantly creating threads to perform short-lived tasks. In such instance, a thread pool might cause an increase in performance because you do not have be constantly creating and destroying threads all the time, so you don’t face a lot of the overhead involved in this constant process of creating and destroying threads.

The main idea behind a thread pool is to have a set of threads already created and ready to perform tasks at all times. You create a set of threads at the moment that your program starts, and keep these threads alive while your program runs. Each of these threads will be either performing a task, or waiting for a task to be assigned. Every time a new task emerges in your program, this task is added to a “queue of tasks”, and the moment that a thread becomes available and ready to perform a new task, this thread takes the next task from the “queue of tasks”, and it simply performs the task.

The Zig Standard Library offers a thread pool implementation on the std.Thread.Pool struct. You create a new instance of a Pool object by providing a Pool.Options object as input to the init() method of this struct. A Pool.Options object, is a struct object that contains configurations for the pool of threads. The most important settings in this struct object are the members n_jobs and allocator. As the name suggests, the member allocator should receive an allocator object, while the member n_jobs specifies the number of threads to be created and maintained in this pool.

Consider the example exposed below, that demonstrates how can we create a new thread pool object. Here, we create a Pool.Options object that contains a general purpose allocator object, and also, the n_jobs member was set to 4, which means that the thread pool will create and use 4 threads.

Also notice that the pool object was initially set to undefined. This allow us to initially declare the thread pool object, but not properly instantiate the underlying memory of the object. You have to initially declare your thread pool object by using undefined like this, because the init() method of Pool needs to have an initial pointer to properly instantiate the object.

So, just remember to create your thread pool object by using undefined, and then, after that, you call the init() method over the object. You should also not forget to call the deinit() method over the thread pool object, once you are done with it, to release the resources allocated for the thread pool. Otherwise, you will have a memory leak in your program.

const std = @import("std");
const Pool = std.Thread.Pool;
pub fn main() !void {
    var gpa = std.heap.GeneralPurposeAllocator(.{}){};
    const allocator = gpa.allocator();
    const opt = Pool.Options{
        .n_jobs = 4,
        .allocator = allocator,
    };
    var pool: Pool = undefined;
    try pool.init(opt);
    defer pool.deinit();
}

Now that we know how to create Pool objects, we have to understand how to assign tasks to be executed by the threads in this pool object. To assign a task to be performed by a thread, we need to call the spawn() method from the thread pool object.

This spawn() method works identical to the spawn() method from the Thread object. The method has almost the same arguments as the previous one, more precisely, we don’t have to provide a SpawnConfig object in this case. But instead of creating a new thread, this spawn() method from the thread pool object just registers a new task in the internal “queue of tasks” to be performed, and any available thread in the pool will get this task, and it will simply perform the task.

In the example below, we are using our previous print_id() function once again. But you may notice that the print_id() function is a little different this time, because now we are using catch instead of try in the print() call. Currently, the Pool struct only supports functions that don’t return errors as tasks. Thus, when assigning tasks to threads in a thread pool, it is essential to use functions that don’t return errors. That is why we are using catch here, so that the print_id() function don’t return an error.

fn print_id(id: *const u8) void {
    _ = stdout.print("Thread ID: {d}\n", .{id.*})
        catch void;
}
const id1: u8 = 1;
const id2: u8 = 2;
try pool.spawn(print_id, .{&id1});
try pool.spawn(print_id, .{&id2});

This limitation should probably not exist, and, in fact, it’s already on the radar of the Zig team to fix this issue, and it’s being tracked in an open issue ¹. So, if you do need to provide a function that might return an error as the task to be performed by the threads in the thread pool, then, you are either limited to:

implementing your own thread pool that does not have this limitation.
wait for the Zig team to actually fix this issue.

16.6 Mutexes

Mutexes are a classic component of every thread library. In essence, a mutex is a Mutually Exclusive Flag, and this flag acts like a type of “lock”, or as a gate keeper to a particular section of your code. Mutexes are related to thread synchronization, more specifically, they prevent you from having some classic race conditions in your program, and, therefore, major bugs and undefined behaviour that are usually difficult to track and understand.

The main idea behind a mutex is to help us to control the execution of a particular section of the code, and to prevent two or more threads from executing this particular section of the code at the same time. Many programmers like to compare a mutex to a bathroom door (which typically has a lock). When a thread locks its own mutex object, it’s like if the bathroom door was locked. Therefore, other people (in this case, other threads) who want to use the same bathroom at the same time must be patient and simply wait for the current occupant (or thread) to unlock the door and get out of the bathroom.

Some other programmers also like to explain mutexes by using the analogy of “each person will have their turn to speak”. This is the analogy used in the Multithreading Code video from the Computerphile project ². Imagine if you are in a conversation circle. There is a moderator in this circle, which is the person that decides who has the right to speak at that particular moment. The moderator gives a green card (or some sort of an authorization card) to the person that is going to speak, and, as a result, everyone else must be silent and hear this person that has the green card. When the person finishes talking, they give the green card back to the moderator, and the moderator decides who is going to talk next, and delivers the green card to that person. And the cycle goes on like this.

A mutex acts like the moderator in this conversation circle. The mutex authorizes one single thread to execute a specific section of the code, and it also blocks the other threads from executing this same section of the code. If these other threads want to execute this same piece of the code, they are forced to wait for the the authorized thread to finish first. When the authorized thread finishes executing this code, the mutex authorizes the next thread to execute this code, while the remaining threads remain blocked. Therefore, a mutex is like a moderator that does a “each thread will have their turn to execute this section of the code” type of control.

Mutexes are especially used to prevent data race problems from happening. A data race problem happens when two or more threads are trying to read from or write to the same shared object at the same time. So, when you have an object that is shared will all threads, and, you want to avoid two or more threads from accessing this same object at the same time, you can use a mutex to lock the part of the code that access this specific object. When a thread tries to run this code that is locked by a mutex, this thread stops its execution, and patiently waits for this section of the codebase to be unlocked to continue.

Notice that mutexes are normally used to lock areas of the codebase that access/modify data that is shared with all threads, i.e., objects that are either stored in the global data section, or in the heap space of your program. So mutexes are not normally used on areas of the codebase that access/modify objects that are local to the thread.

16.6.1 Critical section

Critical section is a concept commonly associated with mutexes and thread synchronization. In essence, a critical section is the section of the program that a thread access/modify a shared resource (i.e., an object, a file descriptor, something that all threads have access to). In other words, a critical section is the section of the program where race conditions might happen, and, therefore, where undefined behaviour can be introduced into the program.

When we use mutexes in our program, the critical section defines the area of the codebase that we want to lock. So we normally lock the mutex object at the beginning of the critical section, and then, we unlock it at the end of the critical section. The two bullet points exposed below comes from the “Critical Section” article from GeekFromGeeks, and they summarise well the role that a critical section plays in the thread synchronization problem (Geeks for Geeks 2024).

The critical section must be executed as an atomic operation, which means that once one thread or process has entered the critical section, all other threads or processes must wait until the executing thread or process exits the critical section. The purpose of synchronization mechanisms is to ensure that only one thread or process can execute the critical section at a time.
The concept of a critical section is central to synchronization in computer systems, as it is necessary to ensure that multiple threads or processes can execute concurrently without interfering with each other. Various synchronization mechanisms such as semaphores, mutexes, monitors, and condition variables are used to implement critical sections and ensure that shared resources are accessed in a mutually exclusive manner.

16.6.2 Atomic operations

You will also see the term “atomic operation” a lot when reading about threads, race conditions and mutexes. In summary, an operation is categorized as “atomic” when a context switch cannot occur in the middle of the operation. In other words, this operation is always done from beginning to end, without interruptions of another process or operation in the middle of its execution phase.

Not many operations today are atomic. But why do atomic operations matter here? It’s because data races (which is a type of a race condition) cannot happen on operations that are atomic. So if a particular line in your code performs an atomic operation, then this line will never suffer from a data race problem. Therefore, programmers sometimes use an atomic operation to protect themselves from data race problems in their code.

When you have an operation that is compiled into just one single assembly instruction, this operation might be atomic, because it’s just one assembly instruction. But this is not guaranteed. This is usually true for old CPU architectures (such as x86). But nowadays, most assembly instructions in modern CPU architectures are broken down into multiple micro-tasks, which inherently makes the operation non-atomic, even if it consists of a single assembly instruction.

The Zig Standard Library offers some atomic functionality in the std.atomic module. In this module, you will find a public and generic function called Value(). With this function we create an “atomic object”, which is a value that contains some native atomic operations, most notably, a load() and a fetchAdd() operation. If you have experience with multithreading in C++, you probably have recognized this pattern. So yes, this generic “atomic object” in Zig is essentially identical to the template struct std::atomic from the C++ Standard Library. It’s important to emphasize that only primitive data types (i.e., the types presented in Section 1.5) are supported by these atomic operations in Zig.

16.6.3 Data races and race conditions

To understand why mutexes are used, we need to understand better the problem that they seek to solve, which can be summarized into data race problems. A data race problem is a type of a race condition, which happens when one thread is accessing a particular memory location (i.e., a particular shared object) at the same time that another thread is trying to write/save new data into this same memory location (i.e., the same shared object).

We can simply define a race condition as any type of bug in your program that is based on a “who gets there first” problem. A data race problem is a type of a race condition, because it occurs when two or more parties are trying to read and write into the same memory location at the same time, and, therefore, the end result of this operation depends completely on who gets to this memory location first. As a consequence, a program that has a data race problem will likely produce a different result each time that we execute it.

Thus, race conditions produce undefined behaviour and unpredictability because the program produces a different answer each time a different person gets to the target location before the others. And, we have no easy way to either predict or control who is getting to this target location first. In other words, each time your program runs, you may get a different answer because a different person, function, or part of the code finishes its tasks before the others.

As an example, consider the code snippet exposed below. In this example, we create a global counter variable, and we also create an increment() function, whose job is to just increment this global counter variable in a for loop.

Since the for loop iterates 1 hundred thousand times, and, we create two separate threads in this code example, what number do you expect to see in the final message printed to stdout? The answer should be 2 hundred thousand. Right? Well, in theory, this program was supposed to print 2 hundred thousand at the end, but in practice, every time that I execute this program I get a different answer.

In the example exposed below, you can see that this time the end result was 117254, instead of the expected 200000. The second time I executed this program, I got the number 108592 as the result. So the end result of this program is varying, but it never gets to the expected 200000 that we want.

// Global counter variable
var counter: usize = 0;
// Function to increment the counter
fn increment() void {
    for (0..100000) |_| {
        counter += 1;
    }
}

pub fn main() !void {
    const thr1 = try Thread.spawn(.{}, increment, .{});
    const thr2 = try Thread.spawn(.{}, increment, .{});
    thr1.join();
    thr2.join();
    try stdout.print("Couter value: {d}\n", .{counter});
}

Couter value: 117254

Why this is happening? The answer is: because this program contains a data race problem. This program would print the correct number 200000 if and only if the first thread finishes its tasks before the second thread starts to execute. But that is very unlikely to happen. Because the process of creating the thread is too fast, and therefore, both threads start to execute roughly at the same time. If you change this code to add some nanoseconds of sleep between the first and the second calls to spawn(), you will increase the chances of the program producing the “correct result”.

So the data race problem happens because both threads are reading and writing to the same memory location at roughly the same time. In this example, each thread is essentially performing three basic operations at each iteration of the for loop, which are:

reading the current value of count.
incrementing this value by 1.
writing the result back into count.

Ideally, a thread B should read the value of count, only after the other thread A has finished writing the incremented value back into the count object. Therefore, in the ideal scenario, which is demonstrated in Table 16.1, the threads should work in sync with each other. But the reality is that these threads are out of sync, and because of that, they suffer from a data race problem, which is demonstrated in Table 16.2.

Notice that, in the data race scenario (Table 16.2), the read performed by a thread B happens before the write operation of thread A, and that ultimately leads to wrong results at the end of the program. Because when thread B reads the value of the count variable, thread A is still processing the initial value from count, and has not yet written the new, incremented value back to count. As a result, thread B ends up reading the same initial (or “old”) value from count instead of the updated, incremented value that thread A would have written.

Table 16.1: An ideal scenario for two threads incrementing the same integer value

Thread 1	Thread 2	Integer value
read value		0
increment		1
write value		1
	read value	1
	increment	2
	write value	2

Table 16.2: A data race scenario when two threads are incrementing the same integer value

Thread 1	Thread 2	Integer value
read value		0
	read value	0
increment		1
	increment	1
write value		1
	write value	1

If you think about these diagrams exposed in form of tables, you will notice that they relate back to our discussion of atomic operations in Section 16.6.2. Remember, atomic operations are operations that the CPU executes from beginning to end, without interruptions from other threads or processes. So, the scenario exposed in Table 16.1 does not suffer from a data race, because the operations performed by thread A are not interrupted in the middle by the operations from thread B.

If we also think about the discussion of critical section from Section 16.6.1, we can identify the section that representes the critical section of the program, which is the section that is vulnerable to data race conditions. In this example, the critical section of the program is the line where we increment the counter variable (counter += 1). So, ideally, we want to use a mutex, and lock right before this line, and then unlock right after this line.

16.6.4 Using mutexes in Zig

Now that we know the problem that mutexes seek to solve, we can learn how to use them in Zig. Mutexes in Zig are available through the std.Thread.Mutex struct from the Zig Standard Library. If we take the same code from the previous example, and improve it with mutexes, to solve our data race problem, we get the code example below.

Notice that this time, we had to alter the increment() function to receive a pointer to the Mutex object as input. All that we need to do, to make this program safe against data race problems, is to call the lock() method at the beginning of the critical section, and then, call unlock() at the end of the critical section. Notice that the output of this program is now the correct number of 200000.

const std = @import("std");
const stdout = std.io.getStdOut().writer();
const Thread = std.Thread;
const Mutex = std.Thread.Mutex;
var counter: usize = 0;
fn increment(mutex: *Mutex) void {
    for (0..100000) |_| {
        mutex.lock();
        counter += 1;
        mutex.unlock();
    }
}

pub fn main() !void {
    var mutex: Mutex = .{};
    const thr1 = try Thread.spawn(.{}, increment, .{&mutex});
    const thr2 = try Thread.spawn(.{}, increment, .{&mutex});
    thr1.join();
    thr2.join();
    try stdout.print("Couter value: {d}\n", .{counter});
}

Couter value: 200000

16.7 Read/Write locks

Mutexes are normally used when it’s not always safe for two or more threads running the same piece of code at the same time. In contrast, read/write locks are normally used in situations where you have a mixture of scenarios, i.e., there are some pieces of the codebase that are safe to run in parallel, and other pieces that are not safe.

For example, suppose that you have multiple threads that uses the same shared file in the filesystem to store some configurations, or statistics. If two or more threads try to read the data from this same file at the same time, nothing bad happens. So this part of the codebase is perfectly safe to be executed in parallel, with multiple threads reading the same file at the same time.

However, if two or more threads try to write data into this same file at the same time, then we cause some race condition problems. So this other part of the codebase is not safe to be executed in parallel. More specifically, a thread might end up writing data in the middle of the data written by the other thread. This process of two or more threads writing to the same location might lead to data corruption. This specific situation is usually called a torn write.

Thus, what we can extract from this example is that there are certain types of operations that cause a race condition, but there are also other types of operations that do not cause a race condition problem. You could also say that there are types of operations that are susceptible to race condition problems, and there are other types of operations that are not.

A read/write lock is a type of lock that acknowledges the existence of this specific scenario, and you can use this type of lock to control which parts of the codebase are safe to run in parallel and which parts are not.

16.7.1 Exclusive lock vs shared lock

Therefore, a read/write lock is a little different from a mutex. Because a mutex is always an exclusive lock, meaning that, only one thread is allowed to execute at all times. With an exclusive lock, the other threads are always “excluded”, i.e., they are always blocked from executing. But in a read/write lock, the other threads might be authorized to run at the same time, depending on the type of lock that they acquire.

We have two types of locks in a read/write lock, which are: an exclusive lock and a shared lock. An exclusive lock works exactly the same as a mutex, while a shared lock is a lock that does not block the other threads from running at the same time. In the pthreads C library, read/write locks are available through the pthread_rwlock_t C struct. With this C struct, you can create:

a “write lock”, which corresponds to an exclusive lock.
a “read lock”, which corresponds to a shared lock.

The terminology might be a little different compared to Zig. But the meaning is still the same. Therefore, just remember this relationship, write locks are exclusive locks, while read locks are shared locks.

When a thread tries to acquire a read lock (i.e., a shared lock), this thread gets the shared lock if and only if another thread does not currently hold a write lock (i.e., an exclusive lock), and also if there are no other threads already in the queue, waiting for their turn to acquire a write lock. In other words, the thread in the queue has attempted to get a write lock earlier, but this thread was blocked because there was another thread running that already had a write lock. As a consequence, this thread is in the queue to get a write lock, and it’s currently waiting for the other thread with a write lock to finish its execution.

When a thread tries to acquire a read lock, but it fails in acquiring this read lock, either because there is a thread with a write lock already running, or because there is a thread in the queue to get a write lock, the execution of this thread is instantly blocked, i.e., paused. This thread will indefinitely attempt to get the read lock, and its execution will be unblocked (or unpaused) only after this thread successfully acquires the read lock.

If you think deeply about this dynamic between read locks versus write locks, you might notice that a read lock is basically a safety mechanism. More specifically, it’s a way for us to allow a particular thread to run together with the other threads only when it’s safe to. In other words, if there is currently a thread with a write lock running, then it’s very likely not safe for the thread that is trying to acquire the read lock to run now. As a consequence, the read lock protects this thread from running into dangerous waters, and patiently waits for the “write lock” thread to finishes its tasks before it continues.

On the other hand, if there are only “read lock” (i.e., “shared lock”) threads currently running (i.e., not a single “write lock” thread currently exists), then it is perfectly safe for this thread that is acquiring the read lock to run in parallel with the other threads. As a result, the read lock just allows for this thread to run together with the other threads.

Thus, by using read locks (shared locks) in conjunction with write locks (exclusive locks), we can control which regions or sections of our multithreaded code is safe to have parallelism, and which sections are not safe to have parallelism.

16.7.2 Using read/write locks in Zig

The Zig Standard Library supports read/write locks through the std.Thread.RwLock module. If you want a particular thread to acquire a shared lock (i.e., a read lock), you should call the lockShared() method from the RwLock object. But, if you want this thread to acquire an exclusive lock (i.e., a write lock) instead, then you should call the lock() method from the RwLock object.

As with mutexes, we also have to unlock the shared or exclusive locks that we acquire through a read/write lock object, once we are at the end of our “critical section”. If you have acquired an exclusive lock, then, you unlock this exclusive lock by calling the unlock() method from the read/write lock object. In contrast, if you have acquired a shared lock instead, then, call unlockShared() to unlock this shared lock.

As a simple example, the snippet below creates three separate threads responsible for reading the current value in a counter object, and it also creates another thread responsible for writing new data into the counter object (incrementing it, more specifically).

var counter: u32 = 0;
fn reader(lock: *RwLock) !void {
    while (true) {
        lock.lockShared();
        const v: u32 = counter;
        try stdout.print("{d}", .{v});
        lock.unlockShared();
        std.time.sleep(2 * std.time.ns_per_s);
    }
}
fn writer(lock: *RwLock) void {
    while (true) {
        lock.lock();
        counter += 1;
        lock.unlock();
        std.time.sleep(2 * std.time.ns_per_s);
    }
}

pub fn main() !void {
    var lock: RwLock = .{};
    const thr1 = try Thread.spawn(.{}, reader, .{&lock});
    const thr2 = try Thread.spawn(.{}, reader, .{&lock});
    const thr3 = try Thread.spawn(.{}, reader, .{&lock});
    const wthread = try Thread.spawn(.{}, writer, .{&lock});

    thr1.join();
    thr2.join();
    thr3.join();
    wthread.join();
}

16.8 Yielding a thread

The Thread struct supports yielding through the yield() method. Yielding a thread means that the execution of the thread is temporarily stopped, and it moves to the end of the priority queue managed by the scheduler of your operating system.

That is, when you yield a thread, you are essentially saying the following to your OS: “Hey! Could you please stop executing this thread for now, and comeback to continue it later?”. You could also interpret this yield operation as: “Could you please deprioritize this thread, to focus on doing other things instead?”. So this yield operation is also a way for you to stop a particular thread, so that you can work and prioritize other threads instead.

It’s important to say that, yielding a thread is a “not so common” thread operation these days. In other words, not many programmers use yielding in production, simply because it’s hard to use this operation and make it work properly, and also, there are better alternatives. Most programmers prefer to use join() instead. In fact, most of the time, when you see someone using this “yield” operation in some code example, they are usually doing so to help debug race conditions in their applications. That is, this “yield” operation is mostly used as a debug tool nowadays.

Anyway, if you want to yield a thread, just call the yield() method from it, like this:

thread.yield();

16.9 Common problems in threads

16.9.1 Deadlocks

A deadlock occurs when two or more threads are blocked forever, waiting for each other to release a resource. This usually happens when multiple locks are involved, and the order of acquiring them is not well managed.

The code example below demonstrates a deadlock situation. We have two different threads that execute two different functions (work1() and work2()) in this example. And we also have two separate mutexes. If you compile and run this code example, you will notice that the program just runs indefinitely, without ending.

When we look into the first thread, which executes the work1() function, we can notice that this function acquires the mut1 lock first. Because this is the first operation that is executed inside this thread, which is the first thread created in the program. After that, the function sleeps for 1 second, to simulate some type of work, and then, the function tries to acquire the mut2 lock.

On the other hand, when we look into the second thread, which executes the work2() function, we can see that this function acquires the mut2 lock first. Because when this thread gets created and it tries to acquire this mut2 lock, the first thread is still sleeping on that “sleep 1 second” line. After acquiring mut2, the work2() function also sleeps for 1 second, to simulate some type of work, and then the function tries to acquire the mut1 lock.

This creates a deadlock situation, because after the “sleep for 1 second” line in both threads, thread 1 is trying to acquire the mut2 lock, but this lock is currently being used by thread 2. However, at this moment, thread 2 is also trying to acquire the mut1 lock, which is currently being used by thread 1. Therefore, both threads end up waiting for ever. Waiting for their peer to free the lock that they want to acquire.

var mut1: Mutex = .{}; var mut2: Mutex = .{};
fn work1() !void {
    mut1.lock();
    std.time.sleep(1 * std.time.ns_per_s);
    mut2.lock();
    _ = try stdout.write("Doing some work 1\n");
    mut2.unlock(); mut1.unlock();
}

fn work2() !void {
    mut2.lock();
    std.time.sleep(1 * std.time.ns_per_s);
    mut1.lock();
    _ = try stdout.write("Doing some work 1\n");
    mut1.unlock(); mut2.unlock();
}

pub fn main() !void {
    const thr1 = try Thread.spawn(.{}, work1, .{});
    const thr2 = try Thread.spawn(.{}, work2, .{});
    thr1.join();
    thr2.join();
}

16.9.2 Not calling `join()` or `detach()`

When you do not call either join() or detach() over a thread, then this thread becomes a “zombie thread”, because it does not have a clear “return point”. You could also interpret this as: “nobody is properly responsible for managing the thread”. When we don’t establish if a thread is either joinable or detached, nobody becomes responsible for dealing with the return value of this thread, and also, nobody becomes responsible for clearing (or freeing) the resources associated with this thread.

You don’t want to be in this situation, so remember to always use join() or detach() on the threads that you create. When you don’t use one of these methods, we lose control over the thread, and its resources are never freed (i.e., you have leaked resources in the system).

16.9.3 Cancelling or killing a particular thread

When we think about the pthreads C library, there is a possible way to asynchronously kill or cancel a thread, which is by sending a SIGTERM signal to the thread through the pthread_kill() function. But canceling a thread like this is bad. It’s dangerously bad. As a consequence, the Zig implementation of threads does not have a similar function, or, a similar way to asynchronously cancel or kill a thread.

Therefore, if you want to cancel a thread in the middle of its execution in Zig, then one good strategy that you can take is to use control flow in conjunction with join(). More specifically, you can design your thread around a while loop that is constantly checking if the thread should continue running. If it’s time to cancel the thread, we could make the while loop break, and join the thread with the main thread by calling join().

The code example below demonstrates to some extent this strategy. Here, we are using control flow to break the while loop, and exit the thread earlier than what we have initially planned to. This example also demonstrates how can we use atomic objects in Zig with the Value() generic function that we have mentioned in Section 16.6.2.

const std = @import("std");
const Thread = std.Thread;
const stdout = std.io.getStdOut().writer();
var running = std.atomic.Value(bool).init(true);
var counter: u64 = 0;
fn do_more_work() void {
    std.time.sleep(2 * std.time.ns_per_s);
}
fn work() !void {
    while (running.load(.monotonic)) {
        for (0..10000) |_| { counter += 1; }
        if (counter < 15000) {
            _ = try stdout.write(
                "Time to cancel the thread.\n"
            );
            running.store(false, .monotonic);
        } else {
            _ = try stdout.write("Time to do more work.\n");
            do_more_work();
            running.store(false, .monotonic);
        }
    }
}

pub fn main() !void {
    const thread = try Thread.spawn(.{}, work, .{});
    thread.join();
}

Time to cancel the thread.