Skip to main content

Command Palette

Search for a command to run...

Return Zero in Assembly

Published
4 min read

The most basic program in assembly or any programming language is a "return 0" program. Here is an example in C++:

int main() {
    return 0;
}

To write the equivalent program in GNU assembly, it would look like this:

.global _start

_start:
    mov $60, %rax
    xor %rdi, %rdi
    syscall

This is the most basic program. Instead of typing commands every time we need to run this file, we can create a batch file to execute it and provide the output directly. Below is a simple example of a batch file.

rm -rf yo.o yo
as yo.s -o yo.o
ld yo.o -o yo
./yo
echo $?

For simplicity, let's ignore how to run the assembly code and focus on the code itself.

The first line of code is .global _start.

.global _start

Here, we use .global, which is a directive. In simple terms, a directive is for the assembler, not the CPU. This is important because assembly language deals with CPU instructions. When we write mov, we are telling the CPU to execute the mov instruction directly. However, .global is used to inform the assembler to make the _start label visible to the entire program, meaning it will be accessible.

Now, _start is like the main() function in assembly, as it tells the assembler to begin execution from that point.To learn more about directives, you can go here.

Why do we use a . before writing global? We do this to indicate and declare that global is a directive.

Now, onto the next line.

_start:

This means that the main() function starts here, as _start is a label. We use a colon after the label to indicate that it is a label (more about labels here).

Now, let's address the rest of the code all at once.

mov $60, %rax
xor %rdi, %rdi
syscall

The code above is equivalent to this code in C++:

return 0;

The best way to understand assembly is to think of syscall as a function, with all the registers as its arguments. For the sake of understanding, the code would look like this:

syscall (rax, rdi, .....)

So, an assembly program is written like this:

syscall()
syscall()
syscall()

Before each syscall, we write the logic we want to use, and then we call syscall to perform the operation. After that, we can move on to the next operation.

Now, let's understand the code.

mov $60, %rax
xor %rdi, %rdi
syscall

In the first line, we do this:

mov $60, %rax

What this means is that we are moving the integer value 60 into the register rax.

What is mov?

In short, mov is an instruction that performs this action.

mov SOURCE, DESTINATION

Keep in mind that this is the AT&T syntax; the Intel syntax may be different, so refer to the prerequisites on the index page.

What we are doing here is moving 60 to the rax register. Why are we moving 60 into the rax register? This is because we need to tell the OS what we want it to do for us. The OS operates by following codes, and the code to exit a program is 60. So, when we transfer a value to rax, we are telling the OS what action to perform. For example, to exit, we send 60; to read, we would send 0, etc.

As we know, rdi is the first argument after rax. So, when we perform an exit, we send the exit status to the rdi register.

xor %rdi, %rdi

So, as we do this, why can't we just do this?

mov $0, %rdi

This is a commonly used instruction, so the xor operation has been optimized for performance. We don't need to send any data, which can sometimes contain unnecessary information, so this reduces overhead. By using xor on the value in the register with itself, we aren't moving data. In contrast, if we send 0 to rdi separately, it is slower compared to using xor. That's why we use the xor instruction.

syscall

This is used to perform the syscall, which means it takes all the values in the registers and executes the syscall with those argument values.

With this, we can create a simple exit program in assembly.

Note: When you write syscall, make sure to add a newline afterward. Otherwise, the assembler will compile it, but it will give a warning that a newline is needed.