10 Error handling and unions

In this chapter, I want to discuss how error handling is done in Zig. We already briefly learned about one of the available strategies to handle errors in Zig, which is the try keyword presented in Section 1.2.3. But we still haven’t learned about the other methods, such as the catch keyword. I also want to discuss in this chapter how union types are created in Zig.

10.1 Learning more about errors in Zig

Before we get into how error handling is done, we need to learn more about what errors are in Zig. An error is actually a value in Zig (Zig Software Foundation 2024a). In other words, when an error occurs inside your Zig program, it means that somewhere in your Zig codebase, an error value is being generated. An error value is similar to any integer value that you create in your Zig code. You can take an error value and pass it as input to a function, and you can also cast (or coerce) it into a different type of an error value.

This have some similarities with exceptions in C++ and Python. Because in C++ and Python, when an exception happens inside a try block, you can use a catch block (in C++) or an except block (in Python) to capture the exception produced in the try block, and pass it to functions as an input.

However, error values in Zig are treated very differently than exceptions. First, you cannot ignore error values in your Zig code. Meaning that, if an error value appears somewhere in your source code, this error value must be explicitly handled in some way. This also means that you cannot discard error values by assigning them to an underscore, as you could do with normal values and objects.

Take the source code below as an example. Here we are trying to open a file that does not exist in my computer, and as a result, an obvious error value of FileNotFound is returned from the openFile() function. But because I’m assigning the result of this function to an underscore, I end up trying to discard an error value.

The zig compiler detects this mistake, and raises a compile error telling me that I’m trying to discard an error value. It also adds a note message that suggests the use of try, catch or an if statement to explicitly handle this error value This note is reinforcing that every possible error value must be explicitly handled in Zig.

const dir = std.fs.cwd();
_ = dir.openFile("doesnt_exist.txt", .{});

t.zig:8:17: error: error set is discarded
t.zig:8:17: note: consider using 'try', 'catch', or 'if'

10.1.1 Returning errors from functions

As we described in Section 1.2.3, when we have a function that might return an error value, this function normally includes an exclamation mark (!) in its return type annotation. The presence of this exclamation mark indicates that this function might return an error value as result, and, the zig compiler forces you to always handle explicitly the case of this function returning an error value.

Take a look at the print_name() function below. This function might return an error in the stdout.print() function call, and, as a consequence, its return type (!void) includes an exclamation mark in it.

fn print_name() !void {
    const stdout = std.getStdOut().writer();
    try stdout.print("My name is Pedro!", .{});
    try stdout.flush();
}

In the example above, we are using the exclamation mark to tell the zig compiler that this function might return some error. But which error exactly is returned from this function? For now, we are not specifying a specific error value. For now, we only know that some error value (whatever it is) might be returned.

But in fact, you can (if you want to) specify clearly which exact error values might be returned from this function. There are lot of examples of this in the Zig Standard Library. Take this fill() function from the http.Client module as an example. This function returns either a error value of type ReadError, or void.

pub fn fill(conn: *Connection) ReadError!void {
    // The body of this function ...
}

This idea of specifying the exact error values that you expect to be returned from the function is interesting. Because they automatically become some sort of documentation of your function, and also, this allows the zig compiler to perform some extra checks over your code. Because the compiler can check if there is any other type of error value that is being generated inside your function, and, that it’s not being accounted for in this return type annotation.

Anyway, you can list the types of errors that can be returned from the function by listing them on the left side of the exclamation mark. While the valid values stay on the right side of the exclamation mark. So the syntax format become:

<error-value>!<valid-value>

10.1.2 Error sets

But what about when we have a single function that might return different types of errors? When you have such a function, you can list all of these different types of errors that can be returned from this function, through a structure in Zig that we call of an error set.

An error set is a special case of a union type. It’s a union that contains error values in it. Not all programming languages have a notion of a “union object”. But in summary, a union is just a set of data types. Unions are used to allow an object to have multiple data types. For example, a union of x, y and z, means that an object can be either of type x, or type y or type z.

We are going to talk in more depth about unions in Section 10.3. But you can write an error set by writing the keyword error before a pair of curly braces, then you list the error values that can be returned from the function inside this pair of curly braces.

Take the resolvePath() function below as an example, which comes from the introspect.zig module of the Zig Standard Library. We can see in its return type annotation, that this function return either: 1) a valid slice of u8 values ([]u8); or, 2) one of the three different types of error values listed inside the error set (OutOfMemory, Unexpected, etc.). This is an usage example of an error set.

pub fn resolvePath(
    ally: mem.Allocator,
    p: []const u8,
) error{
    OutOfMemory,
    CurrentWorkingDirectoryUnlinked,
    Unexpected,
}![]u8 {
    // The body of the function ...
}

This is a valid way of annotating the return value of a Zig function. But, if you navigate through the modules that composes the Zig Standard Library, you will notice that, for the majority of cases, the programmers prefer to give a descriptive name to this error set, and then, use this name (or this “label”) of the error set in the return type annotation, instead of using the error set directly.

We can see that in the ReadError error set that we showed earlier in the fill() function, which is defined in the http.Client module. So yes, I presented the ReadError as if it was just a standard and single error value, but in fact, it’s an error set defined in the http.Client module, and therefore, it actually represents a set of different error values that might happen inside the fill() function.

Take a look at the ReadError definition reproduced below. Notice that we are grouping all of these different error values into a single object, and then, we use this object into the return type annotation of the function. Like the fill() function that we showed earlier, or, the readvDirect() function from the same module, which is reproduced below.

pub const ReadError = error{
    TlsFailure,
    TlsAlert,
    ConnectionTimedOut,
    ConnectionResetByPeer,
    UnexpectedReadFailure,
    EndOfStream,
};
// Some lines of code
pub fn readvDirect(
        conn: *Connection,
        buffers: []std.posix.iovec
    ) ReadError!usize {
    // The body of the function ...
}

So, an error set is just a convenient way of grouping a set of possible error values into a single object, or a single type of an error value.

10.1.3 Casting error values

Let’s suppose you have two different error sets, named A and B. If error set A is a superset of error set B, then, you can cast (or coerce) error values from B into error values of A.

Error sets are just a set of error values. So, if the error set A contains all error values from the error set B, then A becomes a superset of B. You could also say that the error set B is a subset of error set A.

The example below demonstrates this idea. Because A contains all values from B, A is a superset of B. In math notation, we would say that \(A \supset B\). As a consequence, we can give an error value from B as input to the cast() function, and, implicitly cast this input into the same error value, but from the A set.

const std = @import("std");
const A = error{
    ConnectionTimeoutError,
    DatabaseNotFound,
    OutOfMemory,
    InvalidToken,
};
const B = error {
    OutOfMemory,
};

fn cast(err: B) A {
    return err;
}

test "coerce error value" {
    const error_value = cast(B.OutOfMemory);
    try std.testing.expect(
        error_value == A.OutOfMemory
    );
}

1/1 file5b736396fa95.test.coerce error value...OKA
  All 1 tests passed.

10.2 How to handle errors

Now that we learned more about what errors are in Zig, let’s discuss the available strategies to handle these errors, which are:

try keyword;
catch keyword;
an if statement;
errdefer keyword;

10.2.1 What `try` means?

As I described over the previous sections, when we say that an expression might return an error, we are basically referring to an expression that have a return type in the format !T. The ! indicates that this expression returns either an error value, or a value of type T.

In Section 1.2.3, I presented the try keyword and where to use it. But I did not talked about what exactly this keyword does to your code, or, in other words, I have not explained yet what try means in your code.

In essence, when you use the try keyword in an expression, you are telling the zig compiler the following: “Hey! Execute this expression for me, and, if this expression return an error, please, return this error for me and stop the execution of my program. But if this expression return a valid value, then, return this value, and move on”.

In other words, the try keyword is essentially, a strategy to enter in panic mode, and stop the execution of your program in case an error occurs. With the try keyword, you are telling the zig compiler, that stopping the execution of your program is the most reasonable strategy to take if an error occurs in that particular expression.

10.2.2 The `catch` keyword

Ok, now that we understand properly what try means, let’s discuss catch now. One important detail here, is that you can use try or catch to handle your errors, but you cannot use try and catch together. In other words, try and catch are different and completely separate strategies in the Zig language.

This is uncommon, and different than what happens in other languages. Most programming languages that adopts the try catch pattern (such as C++, R, Python, Javascript, etc.), normally use these two keywords together to form the complete logic to properly handle the errors. Anyway, Zig tries a different approach in the try catch pattern.

So, we learned already about what try means, and we also known that both try and catch should be used alone, separate from each other. But what exactly catch do in Zig? With catch, we can construct a block of logic to handle the error value, in case it happens in the current expression.

Look at the code example below. Once again, we go back to the previous example where we were trying to open a file that doesn’t exist in my computer, but this time, I use catch to actually implement a logic to handle the error, instead of just stopping the execution right away.

More specifically, in this example, I’m using a logger object to record some logs into the system, before I return the error, and stop the execution of the program. For example, this could be some part of the codebase of a complex system that I do not have full control over, and I want to record these logs before the program crashes, so that I can debug it later (e.g. maybe I cannot compile the full program, and properly debug it with a debugger. So, these logs might be a valid strategy to surpass this barrier).

const dir = std.fs.cwd();
const file = dir.openFile(
    "doesnt_exist.txt", .{}
) catch |err| {
    logger.record_context();
    logger.log_error(err);
    return err;
};

Therefore, we use catch to create a block of expressions that will handle the error. I can return the error value from this block of expressions, like I did in the above example, which, will make the program enter in panic mode, and, stop the execution. But I could also, return a valid value from this block of code, which would be stored in the file object.

Notice that, instead of writing the keyword before the expression that might return the error, like we do with try, we write catch after the expression. We can open the pair of pipes (|), which captures the error value returned by the expression, and makes this error value available in the scope of the catch block as the object named err. In other words, because I wrote |err| in the code, I can access the error value returned by the expression, by using the err object.

Although this being the most common use of catch, you can also use this keyword to handle the error in a “default value” style. That is, if the expression returns an error, we use the default value instead. Otherwise, we use the valid value returned by the expression.

The Zig official language reference, provides a great example of this “default value” strategy with catch. This example is reproduced below. Notice that we are trying to parse some unsigned integer from a string object named str. In other words, this function is trying to transform an object of type []const u8 (i.e., an array of characters, a string, etc.) into an object of type u64.

But this parsing process done by the function parseU64() may fail, resulting in a runtime error. The catch keyword used in this example provides an alternative value (13) to be used in case this parseU64() function raises an error. So, the expression below essentially means: “Hey! Please, parse this string into a u64 for me, and store the results into the object number. But, if an error occurs, then, use the value 13 instead”.

const number = parseU64(str, 10) catch 13;

So, at the end of this process, the object number will contain either a u64 integer that was parsed successfully from the input string str, or, if an error occurs in the parsing process, it will contain the u64 value 13 that was provided by the catch keyword as the “default”, or, the “alternative” value.

10.2.3 Using if statements

Now, you can also use if statements to handle errors in your Zig code. In the example below, I’m reproducing the previous example, where we try to parse an integer value from an input string with a function named parseU64().

We execute the expression inside the “if”. If this expression returns an error value, the “if branch” (or, the “true branch”) of the if statement is not executed. But if this expression returns a valid value instead, then, this value is unwrapped into the number object.

This means that, if the parseU64() expression returns a valid value, this value becomes available inside the scope of this “if branch” (i.e., the “true branch”) through the object that we listed inside the pair of pipe character (|), which is the object number.

If an error occurs, we can use an “else branch” (or the “false branch”) of the if statement to handle the error. In the example below, we are using the else in the if statement to unwrap the error value (that was returned by parseU64()) into the err object, and handle the error.

if (parseU64(str, 10)) |number| {
    // do something with `number` here
} else |err| {
    // handle the error value.
}

Now, if the expression that you are executing returns different types of error values, and you want to take a different action in each of these types of error values, the try and catch keywords, and the if statement strategy, becomes limited.

For this type of situation, the official documentation of the language suggests the use of a switch statement together with an if statement (Zig Software Foundation 2024b). The basic idea is, to use the if statement to execute the expression, and use the “else branch” to pass the error value to a switch statement, where you define a different action for each type of error value that might be returned by the expression executed in the if statement.

The example below demonstrates this idea. We first try to add (or register) a set of tasks to a queue. If this “registration process” occurs well, we then try to distribute these tasks across the workers of our system. But if this “registration process” returns an error value, we then use a switch statement in the “else branch” to handle each possible error value.

if (add_tasks_to_queue(&queue, tasks)) |_| {
    distribute_tasks(&queue);
} else |err| switch (err) {
    error.InvalidTaskName => {
        // do something
    },
    error.TimeoutTooBig => {
        // do something
    },
    error.QueueNotFound => {
        // do something
    },
    // and all the other error options ...
}

10.2.4 The `errdefer` keyword

A common pattern in C programs in general, is to clean resources when an error occurs during the execution of the program. In other words, one common way to handle errors, is to perform “cleanup actions” before we exit our program. This guarantees that a runtime error does not make our program to leak resources of the system.

The errdefer keyword is a tool to perform such “cleanup actions” in hostile situations. This keyword is commonly used to clean (or to free) allocated resources, before the execution of our program gets stopped because of an error value being generated.

The basic idea is to provide an expression to the errdefer keyword. Then, errdefer executes this expression if, and only if, an error occurs during the execution of the current scope. In the example below, we are using an allocator object (that we have presented in Section 3.3) to create a new User object. If we are successful in creating and registering this new user, this create_user() function will return this new User object as its return value.

However, if for some reason, an error value is generated by some expression that is after the errdefer line, for example, in the db.add(user) expression, the expression registered by errdefer gets executed before the error value is returned from the function, and before the program enters in panic mode and stops the current execution.

fn create_user(db: Database, allocator: Allocator) !User {
    const user = try allocator.create(User);
    errdefer allocator.destroy(user);

    // Register new user in the Database.
    _ = try db.register_user(user);
    return user;
}

By using errdefer to destroy the user object that we have just created, we guarantee that the memory allocated for this user object gets freed, before the execution of the program stops. Because if the expression try db.add(user) returns an error value, the execution of our program stops, and we lose all references and control over the memory that we have allocated for the user object. As a result, if we do not free the memory associated with the user object before the program stops, we cannot free this memory anymore. We simply lose our chance to do the right thing. That is why errdefer is essential in this situation.

Just to state clearly the differences between defer and errdefer (which I described in Section 2.1.3 and Section 2.1.4), it might be worth to discuss the subject a bit further. You might still have the question “why use errdefer if we can use defer instead?” in your mind.

Although being similar, the key difference between errdefer and defer keyword is when the provided expression gets executed. The defer keyword always execute the provided expression at the end of the current scope, no matter how your code exits this scope. In contrast, errdefer executes the provided expression only when an error occurs in the current scope.

This becomes important if a resource that you allocate in the current scope gets freed later in your code, in a different scope. The create_user() functions is an example of this. If you think closely about this function, you will notice that this function returns the user object as the result.

In other words, the allocated memory for the user object does not get freed inside the create_user() function, if it returns successfully. So, if an error does not occur inside this function, the user object is returned from the function, and probably, the code that runs after this create_user() function will be responsible for freeing the memory of the user object.

But what if an error occurs inside the create_user() function? What happens then? This would mean that the execution of your code would stop in this create_user() function, and, as a consequence, the code that runs after this create_user() function would simply not run, and, as a result, the memory of the user object would not be freed before your program stops.

This is the perfect scenario for errdefer. We use this keyword to guarantee that our program will free the allocated memory for the user object, even if an error occurs inside the create_user() function.

If you allocate and free some memory for an object inside the same scope, then, just use defer and be happy, i.e., errdefer have no use for you in such situation. But if you allocate some memory in a scope A, but you only free this memory later, in a scope B for example, then, errdefer becomes useful to avoid leaking memory in sketchy situations.

10.3 Union type in Zig

A union type defines a set of types that an object can be. It’s like a list of options. Each option is a type that an object can assume. Therefore, unions in Zig have the same meaning, or, the same role as unions in C. They are used for the same purpose. You could also say that unions in Zig produces a similar effect to using typing.Union in Python ¹.

For example, you might be creating an API that sends data to a data lake, hosted in some private cloud infrastructure. Suppose you have created different structs in your codebase, to store the necessary information that you need, in order to connect to the services of each mainstream data lake service (Amazon S3, Azure Blob, etc.).

Now, suppose you also have a function named send_event() that receives an event as input, and, a target data lake, and it sends the input event to the data lake specified in the target data lake argument. But this target data lake could be any of the three mainstream data lakes services (Amazon S3, Azure Blob, etc.). Here is where an union can help you.

The union LakeTarget defined below allows the lake_target argument of send_event() to be either an object of type AzureBlob, or type AmazonS3, or type GoogleGCP. This union allows the send_event() function to receive an object of any of these three types as input in the lake_target argument.

Remember that each of these three types (AmazonS3, GoogleGCP and AzureBlob) are separate structs that we have defined in our source code. So, at first glance, they are separate data types in our source code. But is the union keyword that unifies them into a single data type called LakeTarget.

const LakeTarget = union {
    azure: AzureBlob,
    amazon: AmazonS3,
    google: GoogleGCP,
};

fn send_event(
    event: Event,
    lake_target: LakeTarget
) bool {
    // body of the function ...
}

An union definition is composed by a list of data members. Each data member is of a specific data type. In the example above, the LakeTarget union have three data members (azure, amazon, google). When you instantiate an object that uses an union type, you can only use one of its data members in this instantiation.

You could also interpret this as: only one data member of an union type can be activated at a time, the other data members remain deactivated and unaccessible. For example, if you create a LakeTarget object that uses the azure data member, you can no longer use or access the data members google or amazon. It’s like if these other data members didn’t exist at all in the LakeTarget type.

You can see this logic in the example below. Notice that, we first instantiate the union object using the azure data member. As a result, this target object contains only the azure data member inside of it. Only this data member is active in this object. That is why the last line in this code example is invalid. Because we are trying to instantiate the data member google, which is currently inactive for this target object, and as a result, the program enters in panic mode warning us about this mistake through a loud error message.

var target = LakeTarget {
    .azure = AzureBlob.init()
};
// Only the `azure` data member exist inside
// the `target` object, and, as a result, this
// line below is invalid:
target.google = GoogleGCP.init();

thread 2177312 panic: access of union field 'google' while
    field 'azure' is active:
    target.google = GoogleGCP.init();
          ^

So, when you instantiate an union object, you must choose one of the data types (or, one of the data members) listed in the union type. In the example above, I choose to use the azure data member, and, as a result, all other data members were automatically deactivated, and you can no longer use them after you instantiate the object.

You can activate another data member by completely redefining the entire enum object. In the example below, I initially use the azure data member. But then, I redefine the target object to use a new LakeTarget object, which uses the google data member.

var target = LakeTarget {
    .azure = AzureBlob.init()
};
target = LakeTarget {
    .google = GoogleGCP.init()
};

A curious fact about union types, is that, at first, you cannot use them in switch statements (which were presented in Section 2.1.2). In other words, if you have an object of type LakeTarget for example, you cannot give this object as input to a switch statement.

But what if you really need to do so? What if you actually need to provide an “union object” to a switch statement? The answer to this question relies on another special type in Zig, which are the tagged unions. To create a tagged union, all you have to do is to add an enum type into your union declaration.

As an example of a tagged union in Zig, take the Registry type exposed below. This type comes from the grammar.zig module ² from the Zig repository. This union type lists different types of registries. But notice this time, the use of (enum) after the union keyword. This is what makes this union type a tagged union. By being a tagged union, an object of this Registry type can be used as input in a switch statement. This is all you have to do. Just add (enum) to your union declaration, and you can use it in switch statements.

pub const Registry = union(enum) {
    core: CoreRegistry,
    extension: ExtensionRegistry,
};