[Home]Binutils/As

ec2-3-140-198-43.us-east-2.compute.amazonaws.com | ToothyWiki | Binutils | RecentChanges | Login | Webcomic

GNU Assembler


The GNU assembler language uses a simplified syntax that attempts to unify assembler languages from different processor architectures into a single format.  Many commands are common between processors, but some processors offer additional capabilities and commands that are not available in the base syntax.  In these cases, GAS supports processor specific extensions.

The GAS command line options allow the specification of the exact architecture and instruction sets that should be targetted by the final object file.  The platforms available for cross-compilation are dependent on the capabilities that have been compiled into the Binutils libopcodes backend.

The GAS function only supports compiling plain assembly files (.s) to object files, so assembly files requiring pre-processing (.S) need to be parsed with cpp before compilation.  The object files must then be linked to produce the final executables.

Command format


The basic format of a GAS command is of form:

    Command <Arg1>, <Arg2>, <Arg3>

Where the number of arguments is dependent on the command (and processor architecture).  i386 and x64 usually have at most two arguments ([Source], [Destination]), ARM architectures typically have three arguments ([Source1], [Source2], [Destination] ).

Addressing modes


The standard addressing modes recognised by GAS are:

   Immediate 	 	    	$1234
  Absolute 1234
  Register %eax
  Indirect (%eax)
  Indirect-Offset 1234(%eax) (eax + 1234)
  Indirect-Indexed (%eax,%ebx, [1|2|4|8]) (eax + ebx * n)
  Indirect-Indexed-Offset 1234(%eax,%ebx, [1|2|4|8])

Basic commands


The basic two-argument commands have format...

   mov <from>,<to>	 to = from
  add <from>,<to> to = (to + from)
  sub <from>,<to> to = to - from
  cmp <from>,<to>            to - from
  lea <from>,<to>      to = address_of( from )

For ARM based systems, three argument parameters have format...

   add <from1>,<from2>,<to>  to = from1 + from2

The stack is treated as a standard register (e.g. %esp), but the standard push/pull commands are available...
   push <from>
  pop  <to>

Commands affecting the program counter include...
   jmp  <address>
  call <subroutine>
  ret

   int  <Interrupt#>	Trigger interrupt
  iret Return from interrupt

   beq  <address>	Branch  Zero
  bne  <address> Branch !Zero
  blt  <address>      Branch  Negative
  bpl  <address> Branch !Negative & !Zero

.data and .text sections


By default, all code will be compiled to the .text section, which is the standard location for executable code in ELF.  It is possible to place code and data in other sections using the .section declaration.  Note that code in these sections will need to be linked back into the .text section by the linker script to result in a valid ELF file

.section custom_section1
   <other section code>

.text
   <normal code>

Data storage


GAS allows the inclusion of data structures and buffers as part of the compilation process:
    .ascii "string"	    	 Embed a string (note: not NULL terminated)
    .org  <number of bytes> Buffer of specified length in bytes
    .long  1234       Long value
    .byte  0 Byte value

Often these will be defined in the .data sections as the .text section is read only by default.

Declaring and using global symbols


All labels that are declared or referenced within the .S file will have an entry in the object's symbols table. 

By default, labels declared within a .S file will be local to the object file that they reside in.  This means that they will not be linked to from other object files by the linker.  To be accessible outside the object file, the symbol must be marked as a global.  This is done by declaring the label as a .globl variables.

Any labels that are referenced, but not declared, within the object file will appear in the symbol table as having a NULL location.  These are handled in the usual way by the linker.  From the perspective of the .S file, there are therefore no additional requirements to accessing external functions and data beyond knowing the global symbol name.  All dependencies are managed by the linker.

Calling Assembly code from C


In general on being called from C code, %ebp (or equivalent) points at the start of the stack frame, and the stack pointer points at the return address.  Usually, arguments are accessed relative to the stack pointer. 

So:
   function( arg1, arg2, arg3 ) 

Appears as:
     (%esp)    - Return address
    4(%esp)    - arg1
    8(%esp)    - arg2
  12(%esp)    - arg3  [%ebp]

The return value should be passed in eax.  This may be a pointer to a position in the stack after %esp in the case of a structure.

Using syscalls


It is possible to call directly into the kernel by using the syscall function (in older processors the equivalent function is "int $0x80").  The syscall number should be placed in %eax (e.g. 1 for exit), further parameters can be placed in other registers.

There are a large number of syscall functions available, they correspond to the man 2 xxx pages for System Calls.  It is not necessary to link against libc to go use these APIs.  Syscall identifiers are defined in the asm/unistd.h header file.

Example:

To compile
  cpp -o HelloWorld?.s HelloWorld?.S
  as  -o HelloWorld?.o HelloWorld?.s
  ld  -o HelloWorld?  HelloWorld?.o

   /* HelloWorld?.S */
  #include "asm/unistd.h"

   .globl  _start

   /* Main entry point for program. */
  _start:
        /* Call write( stdout, message, length ) */
        mov $__NR_write, %eax
        mov $1, %ebx
lea HelloWorld?, %ecx
mov HelloWorldSize?, %edx
int $0x80

        /* Call exit( 0 ) */
        mov $__NR_exit, %eax
        mov $0, %ebx
        int $0x80

   .data

   /* Strings to display */
  HelloWorld?:
          .ascii "Hello World!\n"
  HelloWorldSize?:
          .long  . - HelloWorld?

ec2-3-140-198-43.us-east-2.compute.amazonaws.com | ToothyWiki | Binutils | RecentChanges | Login | Webcomic
This page is read-only | View other revisions | Recently used referrers
Last edited December 24, 2009 3:43 pm (viewing revision 2, which is the newest) (diff)
Search: