August 23, 2014
For entertainment, I’m learning assembler on Linux. Jotting down some things I learn here.
There are two syntaxes, AT&T and Intel (Go uses it’s own, because Plan 9). They look very different, but once you get over that the differences are minimal. Linux tradition is mostly AT&T syntax, MS Windows mostly Intel.
There’s no standardisation, so each assembler can do things it’s own way.
as, the GNU Assembler is the most common one on Linux (and what
gcc emits by default), but
nasm, the Net
wide Assembler is very popular too. Code written for
as will not assemble in
- The registers to use change
- The instruction to call changes (
- The syscall numbers change
So before you even get started, you need to pick a syntax, an assembler, and a target. I’m using
as, with AT&T syntax, on Linux x86-64.
To learn I’m reading Assembly Language Step-by-Step. It’s definitely helpful, but it’s targeted at a CS 101 class which makes it slow going. It also uses Intel syntax, with nasm, on 32-bit, which takes a bit of mental translating.
Here is the first program from that book, translated, in case you want to play too:
.data eatmsg: .ascii "Eat at Joe's!\n" eatlen = . - eatmsg .text .global _start _start: mov $1, %eax # 'write' syscall mov $1, %edi # write to stdout (fd 1) mov $eatmsg, %rsi # address of string to write mov $eatlen, %edx # length of string to write syscall mov $60, %eax # 'exit' syscall mov $0, %edi # return code 0 syscall
eatsyscall.s and build with:
as -gstabs -o eatsyscall.o eatsyscall.s ld -o eatsyscall eatsyscall.o
Other bookmarks I keep open:
- GNU Assembler manual. Extremely terse, but it’s there.
- Kernel calling convention. Scroll down to Linux kernel – x64. Because I forget which registers to use.
- AMD manuals, particularly Part 3 – General Purpose Instructions.
/usr/include/x86_64-linux-gnu/asm/unistd_64.hfor the syscall numbers, and
man 2 <syscall name>for what to pass them.
- Programming from the ground up. This looks promising, and uses the same syntax and assembler as me. I haven’t gotten to reading it yet.
I’ve already learnt two interesting things, about starting and stopping programs.
Programs don’t start at
main, they start at
_start. When you build a C program,
_start is put in for you, does some setup, then calls main.
_start is the symbol the linker
ld looks up to know what address to put in the ELF header as the entry point address. For a different example, the Go start symbol (on x86-64 linux) is
Programs have to explicitly exit. If you don’t call the exit (or exit_group) system call, your program keeps on running, tries to get it’s next instruction from whatever comes right after it in memory, and crashes.
You can call all of the C stdlib functions from assembler, by using
gcc to link, or passing the right arguments to
ld. Or you can not be so lazy, and do everything yourself!
That’s the part I’m most excited about. How am I going to allocate memory, without malloc? No, don’t answer that. The fun is in figuring it out.