SYSDOC HPPA

SYSDOC HPPA Robert Duncan, April 1993

Porting Poplog to the HP PA-RISC 1.1

Select headings to return to index

Architectural Background
Assembling and Linking
Register Usage
Procedure Call and Return
The Callstack
External Calls
Signal Handling
Documentation, Utilities etc.

Architectural Background

The HP Precision Architecture has many similarities with other RISC processors which already support Poplog, such as SPARC and MIPS. These can be summarised as:

load/store architecture
byte addressable
32-bit word length
32, 32-bit general registers (GR[0..31])
32, 64-bit IEEE floating-point registers (FR[0..31])
word-length instructions
branch delay slots

Data must be ``naturally aligned'' for size -- words on a 4-byte boundary, doubles on an 8-byte boundary etc. Addresses are big-endian, pointing to the most significant byte of a datum. This principal extends to bit numbering, so that bit 0 is always the most significant bit, e.g. the sign bit in an integer. The Poplog tag bits are thus bit numbers 30 and 31.

The instruction set is a bit weird, and probably unreadable to the uninitiated, but practice shows it to be quite well thought out. Features which are to Poplog's advantage include:

post-increment and pre-decrement addressing modes which directly support stack operations
comprehensive range of condition codes for branches etc., including bit tests
the nullify bit on arithmetic and branch instructions causes the following instruction to be executed as a no-op if a condition is satisfied: this can reduce the number of no-ops occurring in branch delay slots, and can sometimes eliminate the need for a branch all together

One disadvantage is the lack of a division instruction (although there is a divide step instruction which speeds up an assembly-code division algorithm); and the single (unsigned!) multiply instruction is implemented by the floating-point unit which makes it awkward to use.

As on the MIPS, there are separate caches for instructions and data, and code written to the data space must be flushed from both (see cacheflush in "amain.s").

By far and away the biggest distinguishing feature of the Precision -- one which haunts the whole Poplog implementation -- is that the address space is segmented. Despite the 32-bit word length, virtual addresses are actually 64 bits in size, composed from two 32-bit parts: a space identifier and an offset. The general registers, when used for memory addressing, hold just the offset part; the space identifiers are held in 8 dedicated space registers (SR[0..7]). Memory spaces have their access rights policed by hardware, and this is presumably the primary point of it (it's certainly of no use to programmers). HP-UX allocates four distinct spaces to each process:

read-only, shared text
private data (including the call stack)
shared memory (including shared libraries)
privileged system code

Every memory-referencing instruction -- load, store or branch -- must specify a space register, either implicitly or explicitly. Implicit mode is most relevant to load/store instructions: in this mode, the top two bits of the offset part of the address are used to identify the space register, based on the mapping:

--> SR[4]
--> SR[5]

--> SR[6]
--> SR[7]

This does, of course, restrict the range of addresses to a 30-bit quadrant within the space. Fortunately, HP-UX memory mapping is based around this addressing mode, using these four space registers to hold the identifiers of the four process spaces:

SR[4] = shared text SR[5] = private data SR[6] = shared memory SR[7] = system code

and the offsets within the spaces are set by the linker to lie within their associated quadrants. This means that for most purposes, a process sees a standard 32-bit address space, as follows:

--------------------- | Text | 16:00000000

| 16:3FFFFFFC |-------------------|
Data | 16:40000000

| 16:7FFFFFFC |-------------------|
Shared | 16:80000000

Memory .
.

| 16:BFFFFFFC |-------------------|
System | 16:C0000000

Code .
.

| | 16:FFFFFFFC ---------------------

Branches and calls are different, because the branch instructions fall into two distinct groups: local (intra-space) branches, which compute their targets relative to the space of the instruction itself, and external (inter-space) branches which require an explicit space register to be specified for the target. You cannot make an external branch using an implicit space register, and it's this which causes the most difficulties for Poplog.

One other curious feature of the processor's execution model is that whenever control is transferred to an absolute address (i.e. by branching through a register) then the two least-significant bits of the target offset are interpreted as encoding the privilege level at which the code should be executed. These two bits would otherwise be unused of course, because code is always word-aligned. There are four privilege levels, from 0 the highest to 3 the lowest (standard) level. With normal branches the privilege level can only decrease, and any attempt to raise the level is ignored; only the special gate instruction can raise the privilege level. The matter is relevant to Poplog because the return address offset deposited by a branch-and-link instruction always has these two low bits set, to ensure that the current privilege level is restored on return. Since Poplog code always executes at level 3, return addresses look like pop integers!

Assembling and Linking

The assembler uses the symbols %r0-%r31, %sr0-%sr7 and %fr0-%fr31 to denote the general registers, space registers and floating point registers respectively. The floating point registers can have their upper and lower 32-bit halves addressed separately -- for single float or fixpoint values -- by suffixing the register name with L or R as appropriate. There are also more mnemonic names defined for most of the general registers which relate to their function in the standard procedure calling convention (see below) such as %sp for the stack pointer and %arg0 for the first subroutine argument register.

The file "asm_macros.h" defines some additional register names for Poplog's own use, such as %usp for the user stack pointer and %pb for the procedure base register. This file is included in all the hand-coded assembler files. It also defines several assembler macros for common operations, such as STV32 for storing a value to a 32-bit symbolic address; these are always written in upper case to distinguish them from real instructions (although instruction names are not case sensitive). Assembly code files generated by POPC don't use the "asm_macros" header file, but will define and use the Poplog register names if the flag M_DEBUG is set <true> in "sysdefs.p".

As with other RISC processors, the fixed instruction length makes it impossible to manipulate a 32-bit value with a single instruction. As on the SPARC, the assembler provides special operators -- L' and R' (or L% and R%) -- to extract the upper (21 bits) and lower (11 bits) parts of a 32-bit value. Curiously, the 11-bit R value is still too big to be the operand of an arithmetic immediate instruction, but is acceptable as the displacement part of a load/store, so to load a 32-bit value to a register we use:

ldil L'value, %reg ldo R'value(%reg), %reg

This is the same as the LDA32 macro defined in "asm_macros.h".

The assembler's .export directive marks a symbol as being visible from outside the current file (like .globl on other Unix systems). Unfortunately, there is also a matching .import directive which must be used to declare symbols which are referenced in the file but defined elsewhere. Failure to do this generates "undefined label" errors. In the hand-written assembly code files, these import directives can be inserted by hand; for POPC output, they have to be done automatically. This is accomplished by having two properties defined in "asmout.p" which record all symbol definitions and references within the current file; at the end of the file, all symbols used but not defined are imported.

Worse still, symbols are exported and imported with "types" which are meaningful to the linker. There are several legal type keywords, but the common ones are code and data; by default, a symbol gets the type of the space in which the export/import directive was placed. The documentation is very unclear as to what these types really mean; however, the type at which a symbol is imported into a file must match the type with which it was exported or it remains as an undefined symbol at link time. The linker reports such symbols as:

    /bin/ld: Unsatisfied symbols:
        foo (data)
        baz (code)

This is a problem for POPC, because it is impossible to deduce from a declaration such as

constant foo;

whether the structure foo is writeable (in data space) or non writeable (in code space). The adopted solution is to export and import all Poplog symbols as data regardless of the space in which they are actually defined. This instantly makes everything consistent, and it does appear to work. However, because the documentation on these types is so poor, it's not *guaranteed* to work. It's quite possible that there should be more information about executable symbols which the linker would normally attach to code symbols which is being lost to us.

This solution doesn't work for external symbols referenced with _extern. Such symbols already have types defined in the libraries from which they're extracted, and it's impossible to deduce those from the manner of the symbols' use within Poplog. This is insoluble in general without some extra syntax to declare external symbols. Our partial solution is to assume that all externs are code symbols (the usual case, for system calls etc.) and then make exception for a fixed number of data symbols which are listed individually in "asmout.p". This scheme will break as soon as somebody adds a reference to a global variable and forgets to update "asmout.p" accordingly. This is a manageable problem within Poplog development work, but makes _extern unusable as a general user feature (e.g. with POPC).

Poplog executables are linked to use shared libraries. This is the default for ld(1) anyway, but the a.out file format for the 9000/700 is so complicated that we've made no attempt to produce a version of external load which works in the traditional way, but only one based on the dynamic linking facilities described in shl_load(3X) (this is currently enabled with the SHARED_LIBRARIES flag in "sysdefs.p" -- trying to build a system without that won't work). Use of these facilities means that the executable has to be dynamically linked, because there's no static archive version of the required library (dld).

Register Usage

The following general registers are constrained by hardware:

    %r0         permanent zero: reads always return 0 and writes are
                ignored

%r1 implicit destination operand of the addil instruction

%r31 implicit return-address operand of the ble instruction

Registers %r1 and %r31 are available for use as temporaries when not required for their associated instructions.

Two further registers are reserved globally by software convention:

%r27 global data pointer

%r30 stack pointer

and the remainder have the following functions assigned by the procedure calling conventions:

%r2 local return link

%r3-%r18 callee-saves partition

%r19-%r22 caller-saves partition

%r23-%r26 subroutine arguments

%r28-%r29 subroutine results

The main Poplog registers are allocated from the callee-saves partition, to prevent them being modified by external code. These include the usual user stack pointer, procedure base register and false register; we also dedicate one register to the special var block (as on the SPARC) and one to popint 0 (= 3). The remainder are divided between pop and non-pop register lvars, with a fairly arbitrary division of 6 pop to 5 non-pop.

Of the other registers, Poplog's procedure calling convention uses %r31 rather than %r2 as the return link (see below) and %r1 is used as the chain register. This is summarised in the following table:

-----------------------------------------------------

Reg. | Name | Usage | |--------+------------+-----------------------------|
%r0 | 0 | Permanent 0 |
%r1 | %chain | Chain reg. |
%r2 | %rp | Local return link |
%r3 | %npop4 | Non-pop lvar |
%r4 | %npop3 | Non-pop lvar |
%r5 | %npop2 | Non-pop lvar |
%r6 | %npop1 | Non-pop lvar |
%r7 | %npop0 | Non-pop lvar |
%r8 | %pop5 | Pop lvar |
%r9 | %pop4 | Pop lvar |
%r10 | %pop3 | Pop lvar |
%r11 | %pop2 | Pop lvar |
%r12 | %pop1 | Pop lvar |
%r13 | %pop0 | Pop lvar |
%r14 | %pzero | Permanent pop 0 (3) |
%r15 | %false | Permanent false |
%r16 | %svb | Special var block pointer |
%r17 | %pb | Procedure base register |
%r18 | %usp | User stack pointer |
%r19 | %t4 | Temporary |
%r20 | %t3 | Temporary |
%r21 | %t2 | Temporary |
%r22 | %t1 | Temporary |
%r23 | %arg3 | Subroutine argument |
%r24 | %arg2 | Subroutine argument |
%r25 | %arg1 | Subroutine argument |
%r26 | %arg0 | Subroutine argument |
%r27 | %dp | Global data pointer |
%r28 | %ret0 | Subroutine result |
%r29 | %ret1 | Subroutine result |
%r30 | %sp | Stack pointer |
%r31 | %r31 | Poplog return link | -----------------------------------------------------

The more descriptive register names are either defined by the assembler or by Poplog in the "asm_macros.h" file.

Of the space registers: %sr0 is used by Poplog as a temporary when making inter-space calls; %sr4 and %sr5 are assumed always to hold the space identifiers of the process' code and data spaces, and are used in calls whenever the target space is known at compile time (see below).

Floating point registers are not generally used outside of "afloat.s", except for operands to the instruction xmpyu (unsigned multiply) which is implemented by the floating-point hardware.

Procedure Call and Return

A Poplog procedure may reside either in code space or in data space, so the most general procedure call form has to use an external branch instruction. For a procedure in a register (%arg0 say) this has the form:

ldw _PD_EXECUTE(%arg0), %t1 ; execute address ldsid (%arg0), %t2 ; space ID for procedure mtsp %t2, %sr0 ; copied to space register 0 ble (%sr0, %t1) nop

Knowing the procedure name (e.g from a constant procedure declaration) doesn't necessarily help very much, because it still doesn't tell us which space the procedure's in:

ldil L'xc$setpop, %t1 ldsid (%t1), %t2 mtsp %t2, %sr0 ble R'xc$setpop(%sr0, %t1) nop

There are three cases where we can know the target space at code-generation time:

    (1) in system code, when the target is an assembly-code subroutine
        always defined in code space;

    (2) in system code, when the target is a procedure previously
        defined within the current file: a property defined in
        "asmout.p" records the space in which each procedure is
        generated;

    (3) in user code, when the procedure address is absolute: it could
        be a system procedure or a user procedure in a locked portion of
        the heap, but in either case the correct space can be determined
        by examining bits 0 and 1 of the address, relying on the
        implicit addressing conventions discussed earlier.

In these cases we still use an external branch, but with the appropriate space register specified explicitly, relying on the HP-UX convention of registers %sr4 and %sr5 corresponding to code and data spaces:

ldil L'x$L23, %t1 ; call lconstant procedure L23 ble R'x$L23(%sr4, %t1) ; known to be in code space nop

In principal, for instances like this in system code, we could use a local branch and save one instruction, but we can't guarantee that the target will be in range and we can't afford to let the linker start adding code stubs which could corrupt Poplog's stack layout.

The ble instruction deposits its return address offset part in %r31, so we use this as the Poplog return address register. This is against the standard HP calling conventions which specify %rp (= %r2) as the return link. In general, a return must also be prepared to cross a space boundary as follows:

ldsid (%r31), %t1 ; space ID of return address mtsp %t1, %sr0 ; copied to space register 0 be (%sr0, %r31) nop

Note that a ble also deposits the return address space identifier in %sr0, but the overhead of saving and restoring this in the procedure's entry and exit code is greater than recomputing it dynamically at each return point.

The return address offset is the only information communicated between caller and callee. In particular, the caller does not promise to pre-compute the target procedure address: the calling sequence is complicated enough by the need to use an external branch without adding this extra complexity. So any procedure which needs to know its own address must compute it on entry. For a system procedure in code space this is easy because the address is known and fixed:

ldil L'xc$setpop, %pb ldo R'xc$setpop(%pb), %pb

For a relocatable procedure -- i.e. a user procedure or a copyable system procedure -- we use the technique of doing a very local branch-and-link to get a pointer to the executing code, and then adjust that backwards to point at the procedure start:

    bl          L$1, %pb                ; sets %pb to the value of L$1
    ldo         -_SIZE(%pb), %pb
L$1

The _SIZE offset is roughly the size of the procedure header, plus 8 for the first two instruction words. A minor wrinkle is caused by the fact (discussed above) that the return address deposited by bl is not a pure pointer, but includes the current privilege level encoded in the two low-order bits. To avoid the cost of an extra instruction to clear these two bits, we just assume that the privilege level is always 3 and adjust the size accordingly. This will break if Poplog code is ever run at anything other then the standard privilege level.

The Callstack

The system stack is located somewhere in the data space. The actual start address is obtainable from the macro USRSTACK defined in <sys/param.h> and copied into "sysdefs.p" as UNIX_USRSTACK. This changed between HP-UX 8 and 9, so beware. The size of the stack area can't be dynamically determined. An absolute upper limit is given by the macro USRSTACKMAX, but this is misleading: the kernel has a soft limit maxssiz which is typically less than this. The value of this limit is obtainable for a particular machine by running sam(1M) but may vary between systems (and can be changed by reconfiguring the kernel). The value in "sysdefs.p" for UNIX_USRSTACK_SIZE is the value of maxssiz for the machine we ported to.

A curiosity of the HP is that the stack grows up rather than down, and the stack pointer points to the next free word rather than the last word allocated. Assembly code which references the stack pointer should beware of this. A new stack frame can be allocated and the first word (typically the return address) stored with a single stwm instruction:

stwm %r31, _FRAME_SIZE(%sp)

Remaining words in the frame are then stored to fixed (negative) offsets. A matching ldwm will deallocate the frame and restore the first word.

The stack frame layout described by the procedure calling conventions is not suitable for Poplog. It requires an amount of fixed space in each frame for use by linker-generated stub code and for other system-specific purposes. This might be coped with by defining an HPPA-specific stack frame layout in "symdefs.p" with knock-on effects elsewhere, but this doesn't seem worth it: the only real cost of using non-standard frames is that the debugger can't provide a backtrace. Interfacing to external routines is not a problem, because this is restricted to well-defined points in the hand-coded assembler files, and these ensure that a dummy stack frame satisfying the conventions is constructed before any external call is made.

So HPPA Poplog stack frames have the standard form (shared by all other systems except SPARC) except, of course, that the offsets are negative rather than positive:

    SP ---> |                              |
            |------------------------------|
    SP-4 -> |  Owner address               |
            |------------------------------|

Non-pop stack lvars | |------------------------------|
Pop stack lvars | |------------------------------|
Saved non-pop dlocals | |------------------------------|
Saved pop dlocals | |------------------------------|
Saved pop registers | |------------------------------|
Saved non-pop registers | |------------------------------|
Return address into caller | |------------------------------|

Most of the differences resulting from this are handled by declaring STACK_GROWS_UP in the "sysdefs.p" file which "inverts" the csword type, so that stack offsets are automatically negated and have an extra 4 subtracted to account for the pointer position. The only case in the HP-specific code not covered by this is in Get_opnd in "ass.p" which has to interpret the encoding for on-stack lvars differently from normal.

External Calls

All calls to external routines are effected using the millicode routine $$dyncall which is recommended for indirect calls. This expects the procedure label (or plabel) of the target routine in register %t1. The function of a plabel is not well documented, but it appears to denote a structure containing the executable address of the routine plus an optional linkage table pointer for shared library use. External routines referenced by user code (through external load) are automatically obtained as plabels via the dynamic linking mechanism. Those referenced in system code (through _extern) need to be denoted specially in the assembly code output from POPC using the plabel field selectors P', LP' and RP'. These are generated by the appropriate routines from "asmout.p" whenever a symbol is detected of type external code.

Use of $$dyncall simplifies the general external call interface (_call_external defined in "aextern.s") with regard to the placing of arguments into registers. An external function called directly may expect its arguments in general registers or floating-point registers depending on their type, and the allocation strategy done dynamically is not easy. Using $$dyncall we assume that the executable part of the plabel argument is actually a linker-generated stub which will relocate arguments as necessary, so we simply ignore the floating-point registers completely. The strategy used is to place all arguments into the stack frame at their proper locations (taking care to align double-word arguments correctly) and then, immediately prior to the call, to copy the first four stack words -- whether or not they make sense -- into the argument registers %arg0-%arg3. Similarly, any result from the call is assumed to be returned in the general register pair (%ret0,%ret1) and these values are copied into the Poplog result structure.

A stack frame allocated for an external call must satisfy the procedure calling conventions which require a 48-byte fixed area, with the first argument starting at offset -36; space has to be allocated for a minimum of 4 argument words regardless of how many arguments the procedure actually expects. The manual also suggests aligning stack frames on 64-byte boundaries, so we do this even though it doesn't appear to be strictly necessary.

Signal Handling

The handling of non-trappable error signals -- segmentation violation and the like -- is subtly different on the Precision to that on other Poplog systems. The normal strategy is that the signal handler (_pop_errsig_handler defined in "c_core.c") updates the instruction-pointer field of the signal context to return to the assembly code routine __pop_errsig which in turn transfers control to the Poplog error handler. This was tried on the Precision: it was found necessary to assign also to the next-instruction pointer field of the context (as on the SPARC), and to clear the nullify bit in the saved processor status word to ensure that the next instruction was genuinely executed. The result worked most of the time, but not when the interrupt arrived during a system call -- e.g. a QUIT signal generated from the keyboard during a read -- when the call would appear to be restarted rather than aborted. This is presumably because the interrupted system call was returning to a previous state in which the changes made to the signal context were lost.

The solution is to have _pop_errsig_handler call __pop_errsig directly to abort the signal handling as well as everything else. In order to prevent Poplog's stack unwinding mechanism being confused by non-pop stack frames allocated by the signal handler, __pop_errsig takes an argument which is the value of the stack pointer extracted from the signal context: its first action is to reset the stack pointer to this value, discarding the signal handler's stack frames.