Binutils/Ld

ld is the "Dynamic Linker" or, for the older reader, the "Loader".

As the "Loader", its original purpose was to convert one or more files into an executable memory image that could be run directly by the local operating system. Nowadays most binary executables are delivered either in a.out format (which is basically directly executable), or in ELF format which hides the "Loader" stage by specifying "/lib/ld-linux.so" as the file interpreter.

The more modern use of ld is as the "Dynamic Linker", whose purpose is to "join" object files together. Basically, object files implement and require the use of resources that appear as named symbols in the objects' symbol tables. If an object implements a symbol, the symbol table will contain the location of the implementation. If an object requires, but does not itself implement a symbol, the symbol table will specify an "undefined" location. The linker takes multiple object files and symbol tables, "filling in" the undefined location variables.

In its simplest form, linking joins objects together into further objects (usually with larger symbol tables, but with fewer undefined symbols). Once all undefined symbols are satisfied, the object links to be an executable.

Object files

Object files consist of (at a minimum) a code section, and a symbol table that represent offsets within the code. The most common Linux file format is ELF, which contain separate tables for:

   .text              binary object code
   .bss               Static symbols
   .dynamic           Exported dynamic symbols
   .dynsym            Required dynamic symbols
   .rel.[dyn|plt]     Code relocation tables

The GNU linker includes functionality to rebase the object code using the code relocation tables. This will happen automatically at link or run time if the code cannot be loaded into memory at the object's base location, for example, if two code sections would overlap.

Weakly linked symbols

ELF objects allow the definition of weakly linked symbols, usually used for "overridable" functions. Typically this is used as a form of "polymorphic virtual functions" at a shared object level.

When the linker finds a weakly linked symbol it is stored as a "potential symbol definition" within in the symbol table. If the same symbol is found to be defined in a later object, then linking will occur as usual using the "stronly linked" symbol. If however the linking process completes without finding a strongly linked symbol, the linker will use the original "potential symbol definition". If the linker encounters multiple weak definitions of the same symbol, the "last one wins".

A symbol can be set as weakly linked using the following objcopy command:

   objcopy --weaken-symbol <symbolname> <objectfile.o>

Resolving namespace collisions

The name of each sybol is uniquely defined in an object's symbol table. Two objects defining the same symbol cannot be linked (unless one definition is a weak link). To resolve this problem, objcopy provides functionality to rename symbols, or to apply a namespace-prefix to an entire object.

Rename a symbol:

   objcopy --redefine-symbol <symbolname>=<newname> <objectfile.o>

Add namespace:

   objcopy --prefix-symbols <prefix> <objectfile.o>

Additionally, if an object exports symbols that are not needed for further linking, objcopy allows these to be stripped:

   objcopy --strip-symbol <symbolname> <objectfile.o>
   objcopy --strip-all --keep-symbol <symbolname> <objectfile.o>

Objcopy also offers --strip-unneeded-symbol, which will not remove symbols if they form part of a code relocation point. ELF has separate code relocation tables, so all symbols can be safely stripped.

The standard linker also has object file grouping capabilities, that indicate to the system which object files are expected to link together [--start-group <object1.o> <object2.o> --end-group]

Shared objects

The modern ld as part of ELF is far more complicated than a traditional static binary linker. ELF supports runtime linking and shared objects dependencies, so the linker must be aware of unsatisfied dynamically linked symbols that will be completed on program execution.

Typically a linker command will look like this:

   ld -o program                         \ 
      -dynamic-linker /lib/ld-linux.so.2 \ 
      -l :crt1.o                         \ 
      -l :libgcc_s.so.1                  \ 
         program.o                       \ 
      -l sharedobject1                   \ 
      -l sharedobject2                   \ 
      -l c

This results in an ELF file called program, referencing "libsharedobject1.so" and "libsharedobject2.so".

The situation is complicated further by the functionality provided by dlopen() to load arbitrary shared objects at runtime. Effectively, the linker is now a prerequisite part of libc. There are still flavours of Unix that do not provide dlopen() functionality [e.g. AIX and MacOSX?].

Dynamic link order

The linker symbol namespace is traditionally very limited. In early linker formats, symbols could contain at most 6 characters, and C has no built-in namespace functionality. As such, it is very common to have two shared object files defining two versions of the same symbol (e.g. an init() function). This is usually solved by altering the link order of the objects.

The general rule is that shared objects appearing later in the ld command cannot have their dependencies satisfied by symbols appearing earlier in the link command. So in the above example, crt1.o which includes the _start function, that invariably requires everything else, appears first. Libc has no dependencies on other modules so appears at the end of the list. Otherwise program.o has its dependencies satisfied by sharedobject1, sharedobject2 and libc.

Internally, the linker is examining each objects' symbols table in turn, maintaining a "linker list" of unsatisfied symbols in objects it has previously examined. When a symbol definition is found, the unsatisfied symbols are linked and the symbol entry is removed from the "link list". The linker therefore does not need to hold a list of all symbols contained in the program (which could be potentially large).

Resolving dependency problems

If sharedobject2 needs to use symbols defined by sharedobject1, the above will not compile. In this case, it is possible to repeat entries on a line:

   ld program.o -l sharedobject1 -l sharedobject2 -l sharedobject1

On encountering the second instance of sharedobject1, the linker knows that the appropriate code has already been loaded so there can be no additional unsatisfied symbols, but the defined symbols in sharedobject1 can be reused to link unsatisfied symbols in the "link list".