@node Installation @chapter Configuring and installing @lightning{} The first thing to do to use @lightning{} is to configure the program, picking the set of macros to be used on the host architecture; this configuration is automatically performed by the @file{configure} shell script; to run it, merely type: @example ./configure @end example @lightning{} supports cross-compiling in that you can choose a different set of macros from the one needed on the computer that you are compiling @lightning{} on. For example, @example ./configure --host=sparc-sun-linux @end example @noindent will select the SPARC set of runtime assemblers. You can use configure's ability to make reasonable assumptions about the vendor and operating system and simply type @example ./configure --host=i386 ./configure --host=ppc ./configure --host=sparc @end example Another option that @file{configure} accepts is @code{--enable-assertions}, which enables several consistency checks in the run-time assemblers. These are not usually needed, so you can decide to simply forget about it; also remember that these consistency checks tend to slow down your code generator. After you've configured @lightning{}, you don't have to compile it because it is nothing more than a set of include files. If you want to compile the examples, run @file{make} as usual. The next important step is: @example make install @end example This ends the process of installing @lightning{}. @node The instruction set @chapter @lightning{}'s instruction set @lightning{}'s instruction set was designed by deriving instructions that closely match those of most existing RISC architectures, or that can be easily syntesized if absent. Each instruction is composed of: @itemize @bullet @item an operation, like @code{sub} or @code{mul} @item sometimes, an register/immediate flag (@code{r} or @code{i}) @item a type identifier or, occasionally, two @end itemize The second and third field are separated by an underscore; thus, examples of legal mnemonics are @code{addr_i} (integer add, with three register operands) and @code{muli_l} (long integer multiply, with two register operands and an immediate operand). Each instruction takes two or three operands; in most cases, one of them can be an immediate value instead of a register. @lightning{} supports a full range of integer types: operands can be 1, 2 or 4 bytes long (64-bit architectures might support 8 bytes long operands), either signed or unsigned. The types are listed in the following table together with the C types they represent: @example c @r{signed char} uc @r{unsigned char} s @r{short} us @r{unsigned short} i @r{int} ui @r{unsigned int} l @r{long} ul @r{unsigned long} f @r{float} d @r{double} p @r{void *} @end example Some of these types may not be distinct: for example, (e.g., @code{l} is equivalent to @code{i} on 32-bit machines, and @code{p} is substantially equivalent to @code{ul}). There are at least seven integer registers, of which six are general-purpose, while the last is used to contain the stack pointer (@code{SP}). The stack pointer can be used to allocate and access local variables on the stack (which is supposed to grow downwards in memory on all architectures). Of the general-purpose registers, at least three are guaranteed to be preserved across function calls (@code{V0}, @code{V1} and @code{V2}) and at least three are not (@code{R0}, @code{R1} and @code{R2}). Six registers are not very much, but this restriction was forced by the need to target CISC architectures which, like the x86, are poor of registers; anyway, backends can specify the actual number of available caller- and callee-save registers. In addition, there is a special @code{RET} register which contains the return value of the current function (@emph{not} the return value of callees---use the @code{retval} instruction for this). You should always remember, however, that writing this register could overwrite either a general-purpose register or an incoming parameter, depending on the architecture. There are at least six floating-point registers, named @code{FPR0} to @code{FPR5}. These are separate from the integer registers on all the supported architectures; on Intel architectures, the register stack is mapped to a flat register file. The complete instruction set follows; as you can see, most non-memory operations only take integers, long integers (either signed or unsigned) and pointers as operands; this was done in order to reduce the instruction set, and because most architectures only provide word and long word operations on registers. There are instructions that allow operands to be extended to fit a larger data type, both in a signed and in an unsigned way. @table @b @item Binary ALU operations These accept three operands; the last one can be an immediate value for integer operands, or a register for all operand types. @code{addx} operations must directly follow @code{addc}, and @code{subx} must follow @code{subc}; otherwise, results are undefined. @example addr i ui l ul p f d O1 = O2 + O3 addi i ui l ul p O1 = O2 + O3 addxr i ui l ul O1 = O2 + (O3 + carry) addxi i ui l ul O1 = O2 + (O3 + carry) addcr i ui l ul O1 = O2 + O3, set carry addci i ui l ul O1 = O2 + O3, set carry subr i ui l ul p f d O1 = O2 - O3 subi i ui l ul p O1 = O2 - O3 subxr i ui l ul O1 = O2 - (O3 + carry) subxi i ui l ul O1 = O2 - (O3 + carry) subcr i ui l ul O1 = O2 - O3, set carry subci i ui l ul O1 = O2 - O3, set carry rsbr i ui l ul p f d O1 = O3 - O2 rsbi i ui l ul p O1 = O3 - O2 mulr i ui l ul f d O1 = O2 * O3 muli i ui l ul O1 = O2 * O3 hmulr i ui l ul O1 = @r{high bits of} O2 * O3 hmuli i ui l ul O1 = @r{high bits of} O2 * O3 divr i ui l ul f d O1 = O2 / O3 divi i ui l ul O1 = O2 / O3 modr i ui l ul O1 = O2 % O3 modi i ui l ul O1 = O2 % O3 andr i ui l ul O1 = O2 & O3 andi i ui l ul O1 = O2 & O3 orr i ui l ul O1 = O2 | O3 ori i ui l ul O1 = O2 | O3 xorr i ui l ul O1 = O2 ^ O3 xori i ui l ul O1 = O2 ^ O3 lshr i ui l ul O1 = O2 << O3 lshi i ui l ul O1 = O2 << O3 rshr i ui l ul O1 = O2 >> O3@footnote{The sign bit is propagated for signed types.} rshi i ui l ul O1 = O2 >> O3@footnote{The sign bit is propagated for signed types.} @end example @item Unary ALU operations These accept two operands, both of which must be registers. @example negr i l f d O1 = -O2 notr i ui l ul O1 = ~O2 @end example @item Compare instructions These accept three operands; again, the last can be an immediate value for integer data types. The last two operands are compared, and the first operand is set to either 0 or 1, according to whether the given condition was met or not. The conditions given below are for the standard behavior of C, where the ``unordered'' comparison result is mapped to false. @example ltr i ui l ul p f d O1 = (O2 < O3) lti i ui l ul p O1 = (O2 < O3) ler i ui l ul p f d O1 = (O2 <= O3) lei i ui l ul p O1 = (O2 <= O3) gtr i ui l ul p f d O1 = (O2 > O3) gti i ui l ul p O1 = (O2 > O3) ger i ui l ul p f d O1 = (O2 >= O3) gei i ui l ul p O1 = (O2 >= O3) eqr i ui l ul p f d O1 = (O2 == O3) eqi i ui l ul p O1 = (O2 == O3) ner i ui l ul p f d O1 = (O2 != O3) nei i ui l ul p O1 = (O2 != O3) unltr f d O1 = !(O2 >= O3) unler f d O1 = !(O2 > O3) ungtr f d O1 = !(O2 <= O3) unger f d O1 = !(O2 < O3) uneqr f d O1 = !(O2 < O3) && !(O2 > O3) ltgtr f d O1 = !(O2 >= O3) || !(O2 <= O3) ordr f d O1 = (O2 == O2) && (O3 == O3) unordr f d O1 = (O2 != O2) || (O3 != O3) @end example @item Transfer operations These accept two operands; for @code{ext} both of them must be registers, while @code{mov} accepts an immediate value as the second operand. Unlike @code{movr} and @code{movi}, the other instructions are applied between operands of different data types, and they need @strong{two} data type specifications. You can use @code{extr} to convert between integer data types, in which case the first must be smaller in size than the second; for example @code{extr_c_ui} is correct while @code{extr_ul_us} is not. You can also use @code{extr} to convert an integer to a floating point value: the only available possibilities are @code{extr_i_f} and @code{extr_i_d}. The other instructions convert a floating point value to an integer, so the possible suffixes are @code{_f_i} and @code{_d_i}. @example movr i ui l ul p f d O1 = O2 movi i ui l ul p f d O1 = O2 extr c uc s us i ui l ul f d O1 = O2 roundr i f d O1 = round(O2) truncr i f d O1 = trunc(O2) floorr i f d O1 = floor(O2) ceilr i f d O1 = ceil(O2) @end example Note that the order of the arguments is @emph{destination first, source second} as for all other @lightning{} instructions, but the order of the types is always reversed with respect to that of the arguments: @emph{shorter}---source---@emph{first, longer}---destination---@emph{second}. This happens for historical reasons. @item Network extensions These accept two operands, both of which must be registers; these two instructions actually perform the same task, yet they are assigned to two mnemonics for the sake of convenience and completeness. As usual, the first operand is the destination and the second is the source. @example hton us ui @r{Host-to-network (big endian) order} ntoh us ui @r{Network-to-host order } @end example @item Load operations @code{ld} accepts two operands while @code{ldx} accepts three; in both cases, the last can be either a register or an immediate value. Values are extended (with or without sign, according to the data type specification) to fit a whole register. @example ldr c uc s us i ui l ul p f d O1 = *O2 ldi c uc s us i ui l ul p f d O1 = *O2 ldxr c uc s us i ui l ul p f d O1 = *(O2+O3) ldxi c uc s us i ui l ul p f d O1 = *(O2+O3) @end example @item Store operations @code{st} accepts two operands while @code{stx} accepts three; in both cases, the first can be either a register or an immediate value. Values are sign-extended to fit a whole register. @example str c uc s us i ui l ul p f d *O1 = O2 sti c uc s us i ui l ul p f d *O1 = O2 stxr c uc s us i ui l ul p f d *(O1+O2) = O3 stxi c uc s us i ui l ul p f d *(O1+O2) = O3 @end example @item Stack management These accept a single register parameter. These operations are not guaranteed to be efficient on all architectures. @example pushr i ui l ul p @r{push }O1@r{ on the stack} popr i ui l ul p @r{pop }O1@r{ off the stack} @end example @item Argument management These are: @example prepare i f d pusharg c uc s us i ui l ul p f d getarg c uc s us i ui l ul p f d arg c uc s us i ui l ul p f d retval c uc s us i ui l ul p @end example Of these, the first two are used by the caller, while the last two are used by the callee. A code snippet that wants to call another procedure and has to pass registers must, in order: use the @code{prepare} instruction, giving the number of arguments to be passed to the procedure (once for each data type); use @code{pusharg} to push the arguments @strong{in reverse order}; and use @code{calli} or @code{finish} (explained below) to perform the actual call. @code{arg} and @code{getarg} are used by the callee. @code{arg} is different from other instruction in that it does not actually generate any code: instead, it is a function which returns a value to be passed to @code{getarg}.@footnote{``Return a value'' means that @lightning{} macros that compile these instructions return a value when expanded.} You should call @code{arg} as soon as possible, before any function call or, more easily, right after the @code{prolog} or @code{leaf} instructions (which are treated later). @code{getarg} accepts a register argument and a value returned by @code{arg}, and will move that argument to the register, extending it (with or without sign, according to the data type specification) to fit a whole register. These instructions are more intimately related to the usage of the @lightning{} instruction set in code that generates other code, so they will be treated more specifically in @ref{GNU lightning macros, , Generating code at run-time}. Finally, the @code{retval} instruction fetches the return value of a called function in a register. The @code{retval} instruction takes a register argument and copies the return value of the previously called function in that register. A function should put its own return value in the @code{RET} register before returning. @xref{Fibonacci, the Fibonacci numbers}, for an example. You should observe a few rules when using these macros. First of all, it is not allowed to call functions with more than six arguments; this was done to simplify and speed up the implementation on architectures that use registers for parameter passing. You should not nest calls to @code{prepare}, nor call zero-argument functions (which do not need a call to @code{prepare}) inside a @code{prepare/calli} or @code{prepare/finish} block. Doing this might corrupt already pushed arguments. You @strong{cannot} pass parameters between subroutines using the six general-purpose registers. This might work only when targeting particular architectures. On the other hand, it is possible to assume that callee-saved registers (@code{R0} through @code{R2}) are not clobbered by another dynamically generated function which does not use them as operands in its code and which does not return a value. @item Branch instructions Like @code{arg}, these also return a value which, in this case, is to be used to compile forward branches as explained in @ref{Fibonacci, , Fibonacci numbers}. They accept a pointer to the destination of the branch and two operands to be compared; of these, the last can be either a register or an immediate. They are: @example bltr i ui l ul p f d @r{if }(O2 < O3)@r{ goto }O1 blti i ui l ul p @r{if }(O2 < O3)@r{ goto }O1 bler i ui l ul p f d @r{if }(O2 <= O3)@r{ goto }O1 blei i ui l ul p @r{if }(O2 <= O3)@r{ goto }O1 bgtr i ui l ul p f d @r{if }(O2 > O3)@r{ goto }O1 bgti i ui l ul p @r{if }(O2 > O3)@r{ goto }O1 bger i ui l ul p f d @r{if }(O2 >= O3)@r{ goto }O1 bgei i ui l ul p @r{if }(O2 >= O3)@r{ goto }O1 beqr i ui l ul p f d @r{if }(O2 == O3)@r{ goto }O1 beqi i ui l ul p @r{if }(O2 == O3)@r{ goto }O1 bner i ui l ul p f d @r{if }(O2 != O3)@r{ goto }O1 bnei i ui l ul p @r{if }(O2 != O3)@r{ goto }O1 bunltr f d @r{if }!(O2 >= O3)@r{ goto }O1 bunler f d @r{if }!(O2 > O3)@r{ goto }O1 bungtr f d @r{if }!(O2 <= O3)@r{ goto }O1 bunger f d @r{if }!(O2 < O3)@r{ goto }O1 buneqr f d @r{if }!(O2 < O3) && !(O2 > O3)@r{ goto }O1 bltgtr f d @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1 bordr f d @r{if } (O2 == O2) && (O3 == O3)@r{ goto }O1 bunordr f d @r{if }!(O2 != O2) || (O3 != O3)@r{ goto }O1 bmsr i ui l ul @r{if }O2 & O3@r{ goto }O1 bmsi i ui l ul @r{if }O2 & O3@r{ goto }O1 bmcr i ui l ul @r{if }!(O2 & O3)@r{ goto }O1 bmci i ui l ul @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.} boaddr i ui l ul O2 += O3@r{, goto }O1@r{ on overflow} boaddi i ui l ul O2 += O3@r{, goto }O1@r{ on overflow} bosubr i ui l ul O2 -= O3@r{, goto }O1@r{ on overflow} bosubi i ui l ul O2 -= O3@r{, goto }O1@r{ on overflow} @end example @item Jump and return operations These accept one argument except @code{ret} which has none; the difference between @code{finish} and @code{calli} is that the latter does not clean the stack from pushed parameters (if any) and the former must @strong{always} follow a @code{prepare} instruction. Results are undefined when using function calls in a leaf function. @example calli (not specified) @r{function call to O1} callr (not specified) @r{function call to a register} finish (not specified) @r{function call to O1} finishr (not specified) @r{function call to a register} jmpi/jmpr (not specified) @r{unconditional jump to O1} prolog (not specified) @r{function prolog for O1 args} leaf (not specified) @r{the same for leaf functions} ret (not specified) @r{return from subroutine} retval c uc s us i ui l ul p f d @r{move return value} @r{to register} @end example Like branch instruction, @code{jmpi} also returns a value which is to be used to compile forward branches. @xref{Fibonacci, , Fibonacci numbers}. @end table As a small appetizer, here is a small function that adds 1 to the input parameter (an @code{int}). I'm using an assembly-like syntax here which is a bit different from the one used when writing real subroutines with @lightning{}; the real syntax will be introduced in @xref{GNU lightning macros, , Generating code at run-time}. @example incr: leaf 1 in = arg_i @rem{! We have an integer argument} getarg_i R0, in @rem{! Move it to R0} addi_i RET, R0, 1 @rem{! Add 1\, put result in return value} ret @rem{! And return the result} @end example And here is another function which uses the @code{printf} function from the standard C library to write a number in hexadecimal notation: @example printhex: prolog 1 in = arg_i @rem{! Same as above} getarg_i R0, in prepare 2 @rem{! Begin call sequence for printf} pusharg_i R0 @rem{! Push second argument} pusharg_p "%x" @rem{! Push format string} finish printf @rem{! Call printf} ret @rem{! Return to caller} @end example @node GNU lightning macros @chapter Generating code at run-time To use @lightning{}, you should include the @file{lightning.h} file that is put in your include directory by the @samp{make install} command. That include files defines about four hundred public macros (plus others that are private to @lightning{}), one for each opcode listed above. Each of the instructions above translates to a macro. All you have to do is prepend @code{jit_} (lowercase) to opcode names and @code{JIT_} (uppercase) to register names. Of course, parameters are to be put between parentheses, just like with every other @sc{cpp} macro. This small tutorial presents three examples: @iftex @itemize @bullet @item The @code{incr} function found in @ref{The instruction set, , @lightning{}'s instruction set}: @item A simple function call to @code{printf} @item An RPN calculator. @item Fibonacci numbers @end itemize @end iftex @ifnottex @menu * incr:: A function which increments a number by one * printf:: A simple function call to printf * RPN calculator:: A more complex example, an RPN calculator * Fibonacci:: Calculating Fibonacci numbers @end menu @end ifnottex @node incr @section A function which increments a number by one Let's see how to create and use the sample @code{incr} function created in @ref{The instruction set, , @lightning{}'s instruction set}: @example #include #include "lightning.h" static jit_insn codeBuffer[1024]; typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} int main() @{ pifi incr = (pifi) (jit_set_ip(codeBuffer).iptr); int in; jit_leaf(1); @rem{/* @t{ leaf 1 } */} in = jit_arg_i(); @rem{/* @t{in = arg_i } */} jit_getarg_i(JIT_R0, in); @rem{/* @t{ getarg_i R0 } */} jit_addi_i(JIT_RET, JIT_R0, 1); @rem{/* @t{ addi_i RET\, R0\, 1} */} jit_ret(); @rem{/* @t{ ret } */} jit_flush_code(codeBuffer, jit_get_ip().ptr); @rem{/* call the generated code\, passing 5 as an argument */} printf("%d + 1 = %d\n", 5, incr(5)); return 0; @} @end example Let's examine the code line by line (well, almost@dots{}): @table @t @item #include "lightning.h" You already know about this. It defines all of @lightning{}'s macros. @item static jit_insn codeBuffer[1024]; You might wonder about what is @code{jit_insn}. It is just a type that is defined by @lightning{}. Its exact definition depends on the architecture; in general, defining an array of 1024 @code{jit_insn}s allows one to write 100 to 400 @lightning{} instructions (depending on the architecture and exact instructions). @item typedef int (*pifi)(int); Just a handy typedef for a pointer to a function that takes an @code{int} and returns another. @item pifi incr = (pifi) (jit_set_ip(codeBuffer).iptr); This is the first @lightning{} macro we encounter that does not map to an instruction. It is @code{jit_set_ip}, which takes a pointer to an area of memory where compiled code will be put and returns the same value, cast to a @code{union} type whose members are pointers to functions returning different C types. This union is called @code{jit_code} and is defined as follows: @example typedef union jit_code @{ char *ptr; void (*vptr)(); char (*cptr)(); unsigned char (*ucptr)(); short (*sptr)(); unsigned short (*usptr)(); int (*iptr)(); unsigned int (*uiptr)(); long (*lptr)(); unsigned long (*ulptr)(); void * (*pptr)(); float (*fptr)(); double (*dptr)(); @} jit_code; @end example Any of the members could have been used, since the result is soon casted to type @code{pifi} but, for the sake of clarity, the program uses @code{iptr}, a pointer to a function with no prototype and returning an @code{int}. Analogous to @code{jit_set_ip} is @code{jit_get_ip}, which does not modify the instruction pointer---it is nothing more than a cast of the current @sc{ip} to @code{jit_code}. @item int in; A footnote in @ref{The instruction set, , @lightning{}'s instruction set}, under the description of @code{arg}, says that macros implementing @code{arg} return a value---we'll be using this variable to store the result of @code{arg}. @item jit_leaf(1); Ok, so we start generating code for our beloved function@dots{} it will accept one argument and won't call any other function. @item in = jit_arg_i(); @itemx jit_getarg_i(JIT_R0, in); We retrieve the first (and only) argument, an integer, and store it into the general-purpose register @code{R0}. @item jit_addi_i(JIT_RET, JIT_R0, 1); We add one to the content of the register and store the result in the return value. @item jit_ret(); This instruction generates a standard function epilog that returns the contents of the @code{RET} register. @item jit_flush_code(codeBuffer, jit_get_ip().ptr); This instruction is very important. It flushes the generated code area out of the processor's instruction cache, avoiding the processor executes bogus data that it happens to find there. The @code{jit_flush_code} function accepts the first and the last address to flush; we use @code{jit_get_ip} to find out the latter. @item printf("%d + 1 = %d", 5, incr(5)); Calling our function is this simple---it is not distinguishable from a normal C function call, the only difference being that @code{incr} is a variable. @end table @lightning{} abstracts two phases of dynamic code generation: selecting instructions that map the standard representation, and emitting binary code for these instructions. The client program has the responsibility of describing the code to be generated using the standard @lightning{} instruction set. Let's examine the code generated for @code{incr} on the SPARC and x86 architectures (on the right is the code that an assembly-language programmer would write): @table @b @item SPARC @example save %sp, -96, %sp mov %i0, %l0 retl add %l0, 1, %i0 add %o0, 1, %o0 ret restore @end example In this case, @lightning{} introduces overhead to create a register window (not knowing that the procedure is a leaf procedure) and to move the argument to the general purpose register @code{R0} (which maps to @code{%l0} on the SPARC). The former overhead could be avoided by teaching @lightning{} about leaf procedures (@pxref{Future}); the latter could instead be avoided by rewriting the getarg instruction as @code{jit_getarg_i(JIT_RET, in)}, which was not done in this example. @item x86 @example pushl %ebp movl %esp, %ebp pushl %ebx pushl %esi pushl %edi movl 8(%ebp), %eax movl 4(%esp), %eax addl $1, %eax incl %eax popl %edi popl %esi popl %ebx popl %ebp ret ret @end example In this case, the main overhead is due to the function's prolog and epilog, which is nine instructions long on the x86; a hand-written routine would not save unused callee-preserved registers on the stack. It is to be said, however, that this is not a problem in more complicated uses, because more complex procedure would probably use the @code{V0} through @code{V2} registers (@code{%ebx}, @code{%esi}, @code{%edi}); in this case, a hand-written routine would have included the prolog too. Also, a ten byte prolog would probably be a small overhead in a more complex function. @end table In such a simple case, the macros that make up the back-end compile reasonably efficient code, with the notable exception of prolog/epilog code. @node printf @section A simple function call to @code{printf} Again, here is the code for the example: @example #include #include "lightning.h" static jit_insn codeBuffer[1024]; typedef void (*pvfi)(int); @rem{/* Pointer to Void Function of Int */} int main() @{ pvfi myFunction; @rem{/* ptr to generated code */} char *start, *end; @rem{/* a couple of labels */} int in; @rem{/* to get the argument */} myFunction = (pvfi) (jit_set_ip(codeBuffer).vptr); start = jit_get_ip().ptr; jit_prolog(1); in = jit_arg_i(); jit_movi_p(JIT_R0, "generated %d bytes\n"); jit_getarg_i(JIT_R1, in); jit_prepare(2); jit_pusharg_i(JIT_R1); @rem{/* push in reverse order */} jit_pusharg_p(JIT_R0); jit_finish(printf); jit_ret(); end = jit_get_ip().ptr; @rem{/* call the generated code\, passing its size as argument */} jit_flush_code(start, end); myFunction(end - start); @} @end example The function shows how many bytes were generated. Most of the code is not very interesting, as it resembles very closely the program presented in @ref{incr, , A function which increments a number by one}. For this reason, we're going to concentrate on just a few statements. @table @t @item start = jit_get_ip().ptr; @itemx @r{@dots{}} @itemx end = jit_get_ip().ptr; These two instruction call the @code{jit_get_ip} macro which was mentioned in @ref{incr, , A function which increments a number by one} too. In this case we use the only field of @code{jit_code} that is not a function pointer: @code{ptr}, which is a simple @code{char *}. @item jit_movi_p(JIT_R0, "generated %d bytes\n"); Note the use of the @samp{p} type specifier, which automatically casts the second parameter to an @code{unsigned long} to make the code more clear and less cluttered by typecasts. @item jit_prepare(2); @itemx jit_pusharg_i(JIT_R1); @itemx jit_pusharg_p(JIT_R0); @itemx jit_finish(printf); Once the arguments to @code{printf} have been put in general-purpose registers, we can start a prepare/pusharg/finish sequence that moves the argument to either the stack or registers, then calls @code{printf}, then cleans up the stack. Note how @lightning{} abstracts the differences between different architectures and ABI's -- the client program does not know how parameter passing works on the host architecture. @end table @node RPN calculator @section A more complex example, an RPN calculator We create a small stack-based RPN calculator which applies a series of operators to a given parameter and to other numeric operands. Unlike previous examples, the code generator is fully parameterized and is able to compile different formulas to different functions. Here is the code for the expression compiler; a sample usage will follow. @example #include #include "lightning.h" typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} pifi compile_rpn(char *expr) @{ pifi fn; int in; fn = (pifi) (jit_get_ip().iptr); jit_leaf(1); in = jit_arg_i(); jit_getarg_i(JIT_R0, in); while (*expr) @{ char buf[32]; int n; if (sscanf(expr, "%[0-9]%n", buf, &n)) @{ expr += n - 1; jit_push_i(JIT_R0); jit_movi_i(JIT_R0, atoi(buf)); @} else if (*expr == '+') @{ jit_pop_i(JIT_R1); jit_addr_i(JIT_R0, JIT_R1, JIT_R0); @} else if (*expr == '-') @{ jit_pop_i(JIT_R1); jit_subr_i(JIT_R0, JIT_R1, JIT_R0); @} else if (*expr == '*') @{ jit_pop_i(JIT_R1); jit_mulr_i(JIT_R0, JIT_R1, JIT_R0); @} else if (*expr == '/') @{ jit_pop_i(JIT_R1); jit_divr_i(JIT_R0, JIT_R1, JIT_R0); @} else @{ fprintf(stderr, "cannot compile: %s\n", expr); abort(); @} ++expr; @} jit_movr_i(JIT_RET, JIT_R0); jit_ret(); return fn; @} @end example The principle on which the calculator is based is easy: the stack top is held in R0, while the remaining items of the stack are held on the hardware stack. Compiling an operand pushes the old stack top onto the stack and moves the operand into R0; compiling an operator pops the second operand off the stack into R1, and compiles the operation so that the result goes into R0, thus becoming the new stack top. Try to locate a call to @code{jit_set_ip} in the source code. You will not find one; this means that the client has to manually set the instruction pointer. This technique has one advantage and one drawback. The advantage is that the client can simply set the instruction pointer once and then generate code for multiple functions, one after another, without caring about passing a different instruction pointer each time; see @ref{Reentrancy, , Re-entrant usage of @lightning{}} for the disadvantage. Source code for the client (which lies in the same source file) follows: @example static jit_insn codeBuffer[1024]; int main() @{ pifi c2f, f2c; int i; jit_set_ip(codeBuffer); c2f = compile_rpn("9*5/32+"); f2c = compile_rpn("32-5*9/"); jit_flush_code(codeBuffer, jit_get_ip().ptr); printf("\nC:"); for (i = 0; i <= 100; i += 10) printf("%3d ", i); printf("\nF:"); for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i)); printf("\n"); printf("\nF:"); for (i = 32; i <= 212; i += 10) printf("%3d ", i); printf("\nC:"); for (i = 32; i <= 212; i += 10) printf("%3d ", f2c(i)); printf("\n"); return 0; @} @end example The client displays a conversion table between Celsius and Fahrenheit degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9}, respectively. Providing the formula as an argument to @code{compile_rpn} effectively parameterizes code generation, making it possible to use the same code to compile different functions; this is what makes dynamic code generation so powerful. The @file{rpn.c} file in the @lightning{} distribution includes a more complete (and more complex) implementation of @code{compile_rpn}, which does constant folding, allows the argument to the functions to be used more than once, and is able to assemble instructions with an immediate parameter. @node Fibonacci @section Fibonacci numbers The code in this section calculates a variant of the Fibonacci sequence. While the traditional Fibonacci sequence is modeled by the recurrence relation: @display f(0) = f(1) = 1 f(n) = f(n-1) + f(n-2) @end display @noindent the functions in this section calculates the following sequence, which is more interesting as a benchmark@footnote{That's because, as is easily seen, the sequence represents the number of activations of the @code{nfibs} procedure that are needed to compute its value through recursion.}: @display nfibs(0) = nfibs(1) = 1 nfibs(n) = nfibs(n-1) + nfibs(n-2) + 1 @end display The purpose of this example is to introduce branches. There are two kind of branches: backward branches and forward branches. We'll present the calculation in a recursive and iterative form; the former only uses forward branches, while the latter uses both. @example #include #include "lightning.h" static jit_insn codeBuffer[1024]; typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} int main() @{ pifi nfibs = (pifi) (jit_set_ip(codeBuffer).iptr); int in; @rem{/* offset of the argument */} jit_insn *ref; @rem{/* to patch the forward reference */} jit_prolog (1); in = jit_arg_ui (); jit_getarg_ui(JIT_V0, in); @rem{/* V0 = n */} ref = jit_blti_ui (jit_forward(), JIT_V0, 2); jit_subi_ui (JIT_V1, JIT_V0, 1); @rem{/* V1 = n-1 */} jit_subi_ui (JIT_V2, JIT_V0, 2); @rem{/* V2 = n-2 */} jit_prepare(1); jit_pusharg_ui(JIT_V1); jit_finish(nfibs); jit_retval(JIT_V1); @rem{/* V1 = nfibs(n-1) */} jit_prepare(1); jit_pusharg_ui(JIT_V2); jit_finish(nfibs); jit_retval(JIT_V2); @rem{/* V2 = nfibs(n-2) */} jit_addi_ui(JIT_V1, JIT_V1, 1); jit_addr_ui(JIT_RET, JIT_V1, JIT_V2); @rem{/* RET = V1 + V2 + 1 */} jit_ret(); jit_patch(ref); @rem{/* patch jump */} jit_movi_i(JIT_RET, 1); @rem{/* RET = 1 */} jit_ret(); @rem{/* call the generated code\, passing 32 as an argument */} jit_flush_code(codeBuffer, jit_get_ip().ptr); printf("nfibs(%d) = %d", 32, nfibs(32)); return 0; @} @end example As said above, this is the first example of dynamically compiling branches. Branch instructions have three operands: two contains the values to be compared, while the first is a @dfn{label}; @lightning{} label's are represented as @code{jit_insn *} values. Unlike other instructions (apart from @code{arg}, which is actually a directive rather than an instruction), branch instructions also return a value which, as we see in the example above, can be used to compile forward references. Compiling a forward reference is a two-step operation. First, a branch is compiled with a dummy label, since the actual destination of the jump is not yet known; the dummy label is returned by the @code{jit_forward()} macro. The value returned by the branch instruction is saved to be used later. Then, when the destination of the jump is reached, another macro is used, @code{jit_patch()}. This macro must be called once for @strong{every} point in which the code had a forward branch to the instruction following @code{jit_patch} (in this case a @code{movi_i} instruction). Now, here is the iterative version: @example #include #include "lightning.h" static jit_insn codeBuffer[1024]; typedef int (*pifi)(int); @rem{/* Pointer to Int Function of Int */} int main() @{ pifi nfibs = (pifi) (jit_set_ip(codeBuffer).iptr); int in; @rem{/* offset of the argument */} jit_insn *ref; @rem{/* to patch the forward reference */} jit_insn *loop; @rem{/* start of the loop */} jit_leaf (1); in = jit_arg_ui (); jit_getarg_ui(JIT_R2, in); @rem{/* R2 = n */} jit_movi_ui (JIT_R1, 1); ref = jit_blti_ui (jit_forward(), JIT_R2, 2); jit_subi_ui (JIT_R2, JIT_R2, 1); jit_movi_ui (JIT_R0, 1); loop= jit_get_label(); jit_subi_ui (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */} jit_addr_ui (JIT_V0, JIT_R0, JIT_R1); @rem{/* V0 = R0 + R1 */} jit_movr_ui (JIT_R0, JIT_R1); @rem{/* R0 = R1 */} jit_addi_ui (JIT_R1, JIT_V0, 1); @rem{/* R1 = V0 + 1 */} jit_bnei_ui (loop, JIT_R2, 0); @rem{/* if (R2) goto loop; */} jit_patch(ref); @rem{/* patch forward jump */} jit_movr_ui (JIT_RET, JIT_R1); @rem{/* RET = R1 */} jit_ret (); @rem{/* call the generated code\, passing 36 as an argument */} jit_flush_code(codeBuffer, jit_get_ip().ptr); printf("nfibs(%d) = %d", 36, nfibs(36)); return 0; @} @end example This code calculates the recurrence relation using iteration (a @code{for} loop in high-level languages). There is still a forward reference (indicated by the @code{jit_forward}/@code{jit_patch} pair); there are no function calls anymore: instead, there is a backward jump (the @code{bnei} at the end of the loop). In this case, the destination address should be known, because the jumps lands on an instruction that has already been compiled. However the program must make a provision and remember the address where the jump will land. This is achieved with @code{jit_get_label}, yet another macro that is much similar to @code{jit_get_ip} but, instead of a @code{jit_code} union, it answers an @code{jit_insn *} that the branch macros accept. Now, let's make one more change: let's rewrite the loop like this: @example @r{@dots{}} jit_delay( jit_movi_ui (JIT_R1, 1), ref = jit_blti_ui (jit_forward(), JIT_R2, 2)); jit_subi_ui (JIT_R2, JIT_R2, 1); loop= jit_get_label(); jit_subi_ui (JIT_R2, JIT_R2, 1); @rem{/* decr. counter */} jit_addr_ui (JIT_V0, JIT_R0, JIT_R1); @rem{/* V0 = R0 + R1 */} jit_movr_ui (JIT_R0, JIT_R1); @rem{/* R0 = R1 */} jit_delay( jit_addi_ui (JIT_R1, JIT_V0, 1), @rem{/* R1 = V0 + 1 */} jit_bnei_ui (loop, JIT_R2, 0)); @rem{/* if (R2) goto loop; */} @r{@dots{}} @end example The @code{jit_delay} macro is used to schedule delay slots in jumps and branches. This is optional, but might lead to performance improvements in tight inner loops (of course not in a loop that is executed 35 times, but this is just an example). @code{jit_delay} takes two @lightning{} instructions, a @dfn{delay instruction} and a @dfn{branch instruction}. Note that the two instructions must be written in execution order (first the delay instruction, then the branch instruction), @strong{not} with the branch first. If the current machine has a delay slot, the delay instruction (or part of it) is placed in the delay slot after the branch instruction; otherwise, it emits the delay instruction before the branch instruction. The delay instruction must not depend on being executed before or after the branch. Instead of @code{jit_patch}, you can use @code{jit_patch_at}, which takes two arguments: the first is the same as for @code{jit_patch}, and the second is the valued to be patched in. In other words, these two invocations have the same effect: @example jit_patch (jump_pc); jit_patch_at (jump_pc, jit_get_ip ()); @end example Dual to branches and @code{jit_patch_at} are @code{jit_movi_p} and @code{jit_patch_movi}, which can also be used to implement forward references. @code{jit_movi_p} is carefully implemented to use an encoding that is as long as possible, so that it can always be patched; in addition, like branches, it will return an address which is then passed to @code{jit_patch_movi}. The usage of @code{jit_patch_movi} is similar to @code{jit_patch_at}. @node Reentrancy @chapter Re-entrant usage of @lightning{} By default, @lightning{} is able to compile different functions at the same time as long as it happens in different object files, and on the other hand constrains code generation tasks to reside in a single object file. The reason for this is not apparent, but is easily explained: the @file{lightning.h} header file defines its state as a @code{static} variable, so calls to @code{jit_set_ip} and @code{jit_get_ip} residing in different files access different instruction pointers. This was not done without reason: it makes the usage of @lightning{} much simpler, as it limits the initialization tasks to the bare minimum and removes the need to link the program with a separate library. On the other hand, multi-threaded or otherwise concurrent programs require reentrancy in the code generator, so this approach cannot be the only one. In fact, it is possible to define your own copy of @lightning{}'s instruction state by defining a variable of type @code{jit_state} and @code{#define}-ing @code{_jit} to it: @example struct jit_state lightning; #define _jit lightning @end example You are free to define the @code{jit_state} variable as you like: @code{extern}, @code{static} to a function, @code{auto}, or global. This feature takes advantage of an aspect of macros (@dfn{cascaded macros}), which is documented thus in @acronym{CPP}'s reference manual: @quotation A cascade of macros is when one macro's body contains a reference to another macro. This is very common practice. For example, @example #define BUFSIZE 1020 #define TABLESIZE BUFSIZE @end example This is not at all the same as defining @code{TABLESIZE} to be @samp{1020}. The @code{#define} for @code{TABLESIZE} uses exactly the body you specify---in this case, @code{BUFSIZE}---and does not check to see whether it too is the name of a macro; it's only when you use @code{TABLESIZE} that the result of its expansion is checked for more macro names. This makes a difference if you change the definition of @code{BUFSIZE} at some point in the source file. @code{TABLESIZE}, defined as shown, will always expand using the definition of @code{BUFSIZE} that is currently in effect: #define BUFSIZE 1020 #define TABLESIZE BUFSIZE #undef BUFSIZE #define BUFSIZE 37 Now @code{TABLESIZE} expands (in two stages) to `37'. (The @code{#undef} is to prevent any warning about the nontrivial redefinition of @code{BUFSIZE}.) @end quotation @noindent In the same way, @code{jit_get_label} will adopt whatever definition of @code{_jit} is in effect: @example #define jit_get_label() (_jit.pc) @end example Special care must be taken when functions residing in separate files must access the same state. This could be the case, for example, if a special library contained function for strength reduction of multiplications to adds & shifts, or maybe of divisions to multiplications and shifts. The function would be compiled using a single definition of @code{_jit} and that definition would be used whenever the function would be called. Since @lightning{} uses a feature of the preprocessor to obtain re-entrancy, it makes sense to rely on the preprocessor in this case too. The idea is to pass the current @code{struct jit_state} to the function: @example static void _opt_muli_i(jit, dest, source, n) register struct jit_state *jit; register int dest, source, n; @{ #define _jit jit @dots{} #undef _jit @} @end example @noindent doing this unbeknownst to the client, using a macro in the header file: @example extern void _opt_muli_i(struct jit_state *, int, int, int); #define opt_muli_i(rd, rs, n) _opt_muli_i(&_jit, (rd), (rs), (n)) @end example @section Registers @chapter Accessing the whole register file As mentioned earlier in this chapter, all @lightning{} back-ends are guaranteed to have at least six integer registers and six floating-point registers, but many back-ends will have more. To access the entire register files, you can use the @code{JIT_R}, @code{JIT_V} and @code{JIT_FPR} macros. They accept a parameter that identifies the register number, which must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM} and @code{JIT_FPR_NUM} respectively; the number need not be constant. Of course, expressions like @code{JIT_R0} and @code{JIT_R(0)} denote the same register, and likewise for integer callee-saved, or floating-point, registers. @node Bundling GNU lightning @chapter Using @lightning{} in your programs It is very easy to include @lightning{}'s source code (without the documentation and examples) into your program's distribution so that people don't need to have it installed in order to use it. Here is a step by step explanation of what to do: @enumerate @item Run @command{lightningize} from your package's main distribution directory. @example lightningize @end example @noindent This will copy the source code for the @lightning{} back ends into the @file{lightning} directory of your package. @item If you're using Automake, you might be pleased to know that @file{Makefile.am} files will be already there. If you're not using Automake and @code{aclocal}, instead, you should delete the @file{Makefile.am} files (they are of no use to you) and copy the contents of the @file{lightning.m4} file, found in @command{aclocal}'s macro repository (usually @file{/usr/share/aclocal}, to your @file{configure.in} or @file{acinclude.m4} or @file{aclocal.m4} file. @item Include a call to the @code{LIGHTNING_CONFIGURE_IF_NOT_FOUND} macro in your @file{configure.in} file. @end enumerate @code{LIGHTNING_CONFIGURE_IF_NOT_FOUND} will first look for a pre-installed copy of @lightning{} and, if it can be found, it will use it; otherwise, it will test if there is a back-end for the host system. If @lightning{} is already installed, or if the system is supported by lightning, it will define the @code{HAVE_LIGHTNING} symbol. In addition, an Automake conditional named @code{HAVE_INSTALLED_LIGHTNING} will be set if @lightning{} is already installed, which can be used to set up include paths appropriately. Finally, @code{LIGHTNING_CONFIGURE_IF_NOT_FOUND} accepts two optional parameters: respectively, an action to be taken if @lightning{} is available, and an action to be taken if it is not.