Learning assembler on Linux
Summary
For entertainment, I’m learning assembler on Linux. Jotting down some things I learn here.
There are two syntaxes, AT&T and Intel (Go uses it’s own, because Plan 9). They look very different, but once you get over that the differences are minimal. Linux tradition is mostly AT&T syntax, MS Windows mostly Intel.
There’s no standardisation, so each assembler can do things it’s own way. as, the GNU Assembler is the most common one on Linux (and what gcc emits by default), but nasm, the Net wide Assembler is very popular too. Code written for as will not assemble in nasm.
Talking to the Linux kernel is different depending whether you have a 32-bit (x86) or 64-bit (x86-64) processor:
- The registers to use change
- The instruction to call changes (
int 80hvssyscall) - The syscall numbers change
So before you even get started, you need to pick a syntax, an assembler, and a target. I’m using as, with AT&T syntax, on Linux x86-64.
To learn I’m reading Assembly Language Step-by-Step. It’s definitely helpful, but it’s targeted at a CS 101 class which makes it slow going. It also uses Intel syntax, with nasm, on 32-bit, which takes a bit of mental translating.
Here is the first program from that book, translated, in case you want to play too:
.data
eatmsg:
.ascii "Eat at Joe's!\n"
eatlen = . - eatmsg
.text
.global _start
_start:
mov $1, %eax # 'write' syscall
mov $1, %edi # write to stdout (fd 1)
mov $eatmsg, %rsi # address of string to write
mov $eatlen, %edx # length of string to write
syscall
mov $60, %eax # 'exit' syscall
mov $0, %edi # return code 0
syscall
Save as eatsyscall.s and build with:
as -gstabs -o eatsyscall.o eatsyscall.s
ld -o eatsyscall eatsyscall.o
Other bookmarks I keep open:
- GNU Assembler manual. Extremely terse, but it’s there.
- Kernel calling convention. Because I forget which registers to use (RDI, RSI, RDX, R10, R8, and R9 – yes 10 8 9 at the end that’s not a typo).
- AMD manuals, particularly Part 3 – General Purpose Instructions.
/usr/include/x86_64-linux-gnu/asm/unistd_64.hfor the syscall numbers, andman 2 <syscall name>for what to pass them.- Programming from the ground up. This looks promising, and uses the same syntax and assembler as me. I haven’t gotten to reading it yet.
I’ve already learnt two interesting things, about starting and stopping programs.
Programs don’t start at main, they start at _start. When you build a C program, _start is put in for you, does some setup, then calls main. _start is the symbol the linker ld looks up to know what address to put in the ELF header as the entry point address. For a different example, the Go start symbol (on x86-64 linux) is _rt0_amd64_linux.
Programs have to explicitly exit. If you don’t call the exit (or exit_group) system call, your program keeps on running, tries to get it’s next instruction from whatever comes right after it in memory, and crashes.
You can call all of the C stdlib functions from assembler, by using gcc to link, or passing the right arguments to ld. Or you can not be so lazy, and do everything yourself!
That’s the part I’m most excited about. How am I going to allocate memory, without malloc? No, don’t answer that. The fun is in figuring it out.