guile/doc/using.texi

@node Installation
@chapter Configuring and installing @lightning{}

The first thing to do to use @lightning{} is to configure the
program, picking the set of macros to be used on the host
architecture; this configuration is automatically performed by
the @file{configure} shell script; to run it, merely type:
@example
     ./configure
@end example

@lightning{} supports cross-compiling in that you can choose a
different set of macros from the one needed on the computer that
you are compiling @lightning{} on.  For example,
@example
     ./configure --host=sparc-sun-linux
@end example

@noindent will select the SPARC set of runtime assemblers.  You can use
configure's ability to make reasonable assumptions about the vendor
and operating system and simply type
@example
     ./configure --host=i386
     ./configure --host=ppc
     ./configure --host=sparc
@end example

Another option that @file{configure} accepts is
@code{--enable-assertions}, which enables several consistency checks in
the run-time assemblers.  These are not usually needed, so you can
decide to simply forget about it; also remember that these consistency
checks tend to slow down your code generator.

After you've configured @lightning{}, you don't have to compile it
because it is nothing more than a set of include files.  If you want to
compile the examples, run @file{make} as usual.  The next important
step is:
@example
    make install
@end example

This ends the process of installing @lightning{}.

@node The instruction set
@chapter @lightning{}'s instruction set

@lightning{}'s instruction set was designed by deriving instructions
that closely match those of most existing RISC architectures, or
that can be easily syntesized if absent.  Each instruction is composed
of:
@itemize @bullet
@item
an operation, like @code{sub} or @code{mul}

@item
sometimes, an register/immediate flag (@code{r} or @code{i})

@item
a type identifier or, occasionally, two
@end itemize

The second and third field are separated by an underscore; thus,
examples of legal mnemonics are @code{addr_i} (integer add, with three
register operands) and @code{muli_l} (long integer multiply, with two
register operands and an immediate operand).  Each instruction takes
two or three operands; in most cases, one of them can be an immediate
value instead of a register.

@lightning{} supports a full range of integer types: operands can be 1,
2 or 4 bytes long (64-bit architectures might support 8 bytes long
operands), either signed or unsigned.  The types are listed in the
following table together with the C types they represent:

@example
     c          @r{signed char}
     uc         @r{unsigned char}
     s          @r{short}
     us         @r{unsigned short}
     i          @r{int}
     ui         @r{unsigned int}
     l          @r{long}
     ul         @r{unsigned long}
     f          @r{float}
     d          @r{double}
     p          @r{void *}
@end example

Some of these types may not be distinct: for example, (e.g., @code{l}
is equivalent to @code{i} on 32-bit machines, and @code{p} is
substantially equivalent to @code{ul}).

There are at least seven integer registers, of which six are
general-purpose, while the last is used to contain the frame pointer
(@code{FP}).  The frame pointer can be used to allocate and access local
variables on the stack, using the @code{allocai} instruction.

Of the general-purpose registers, at least three are guaranteed to be
preserved across function calls (@code{V0}, @code{V1} and
@code{V2}) and at least three are not (@code{R0}, @code{R1} and
@code{R2}).  Six registers are not very much, but this
restriction was forced by the need to target CISC architectures
which, like the x86, are poor of registers; anyway, backends can
specify the actual number of available caller- and callee-save
registers.

In addition, there is a special @code{RET} register which contains the
return value of the current function (@emph{not} the return value of
callees---use the @code{retval} instruction for this).  You should
always remember, however, that writing this register could overwrite
either a general-purpose register or an incoming parameter, depending
on the architecture.

There are at least six floating-point registers, named @code{FPR0} to
@code{FPR5}.  These are separate from the integer registers on
all the supported architectures; on Intel architectures, the
register stack is mapped to a flat register file.

The complete instruction set follows; as you can see, most non-memory
operations only take integers, long integers (either signed or
unsigned) and pointers as operands; this was done in order to reduce
the instruction set, and because most architectures only provide word
and long word operations on registers.  There are instructions that
allow operands to be extended to fit a larger data type, both in a
signed and in an unsigned way.

@table @b
@item Binary ALU operations
These accept three operands; the last one can be an immediate
value for integer operands, or a register for all operand types.
@code{addx} operations must directly follow @code{addc}, and
@code{subx} must follow @code{subc}; otherwise, results are undefined.
@example
addr     i  ui  l  ul  p  f  d  O1 = O2 + O3
addi     i  ui  l  ul  p        O1 = O2 + O3
addxr    i  ui  l  ul           O1 = O2 + (O3 + carry)
addxi    i  ui  l  ul           O1 = O2 + (O3 + carry)
addcr    i  ui  l  ul           O1 = O2 + O3, set carry
addci    i  ui  l  ul           O1 = O2 + O3, set carry
subr     i  ui  l  ul  p  f  d  O1 = O2 - O3
subi     i  ui  l  ul  p        O1 = O2 - O3
subxr    i  ui  l  ul           O1 = O2 - (O3 + carry)
subxi    i  ui  l  ul           O1 = O2 - (O3 + carry)
subcr    i  ui  l  ul           O1 = O2 - O3, set carry
subci    i  ui  l  ul           O1 = O2 - O3, set carry
rsbr     i  ui  l  ul  p  f  d  O1 = O3 - O2
rsbi     i  ui  l  ul  p        O1 = O3 - O2
mulr     i  ui  l  ul     f  d  O1 = O2 * O3
muli     i  ui  l  ul           O1 = O2 * O3
hmulr    i  ui  l  ul           O1 = @r{high bits of} O2 * O3
hmuli    i  ui  l  ul           O1 = @r{high bits of} O2 * O3
divr     i  ui  l  ul     f  d  O1 = O2 / O3
divi     i  ui  l  ul           O1 = O2 / O3
modr     i  ui  l  ul           O1 = O2 % O3
modi     i  ui  l  ul           O1 = O2 % O3
andr     i  ui  l  ul           O1 = O2 & O3
andi     i  ui  l  ul           O1 = O2 & O3
orr      i  ui  l  ul           O1 = O2 | O3
ori      i  ui  l  ul           O1 = O2 | O3
xorr     i  ui  l  ul           O1 = O2 ^ O3
xori     i  ui  l  ul           O1 = O2 ^ O3
lshr     i  ui  l  ul           O1 = O2 << O3
lshi     i  ui  l  ul           O1 = O2 << O3
rshr     i  ui  l  ul           O1 = O2 >> O3@footnote{The sign bit is propagated for signed types.}
rshi     i  ui  l  ul           O1 = O2 >> O3@footnote{The sign bit is propagated for signed types.}
@end example

@item Unary ALU operations
These accept two operands, both of which must be registers.
@example
negr     i     l         f  d  O1 = -O2
notr     i  ui l  ul           O1 = ~O2
@end example

@item Compare instructions
These accept three operands; again, the last can be an immediate
value for integer data types.  The last two operands are compared,
and the first operand is set to either 0 or 1, according to
whether the given condition was met or not.

The conditions given below are for the standard behavior of C,
where the ``unordered'' comparison result is mapped to false.

@example
ltr      i  ui  l  ul  p  f  d  O1 = (O2 <  O3)
lti      i  ui  l  ul  p        O1 = (O2 <  O3)
ler      i  ui  l  ul  p  f  d  O1 = (O2 <= O3)
lei      i  ui  l  ul  p        O1 = (O2 <= O3)
gtr      i  ui  l  ul  p  f  d  O1 = (O2 >  O3)
gti      i  ui  l  ul  p        O1 = (O2 >  O3)
ger      i  ui  l  ul  p  f  d  O1 = (O2 >= O3)
gei      i  ui  l  ul  p        O1 = (O2 >= O3)
eqr      i  ui  l  ul  p  f  d  O1 = (O2 == O3)
eqi      i  ui  l  ul  p        O1 = (O2 == O3)
ner      i  ui  l  ul  p  f  d  O1 = (O2 != O3)
nei      i  ui  l  ul  p        O1 = (O2 != O3)
unltr                     f  d  O1 = !(O2 >= O3)
unler                     f  d  O1 = !(O2 >  O3)
ungtr                     f  d  O1 = !(O2 <= O3)
unger                     f  d  O1 = !(O2 <  O3)
uneqr                     f  d  O1 = !(O2 <  O3) && !(O2 >  O3)
ltgtr                     f  d  O1 = !(O2 >= O3) || !(O2 <= O3)
ordr                      f  d  O1 =  (O2 == O2) &&  (O3 == O3)
unordr                    f  d  O1 =  (O2 != O2) ||  (O3 != O3)
@end example

@item Transfer operations
These accept two operands; for @code{ext} both of them must be
registers, while @code{mov} accepts an immediate value as the second
operand.

Unlike @code{movr} and @code{movi}, the other instructions are applied
between operands of different data types, and they need @strong{two}
data type specifications.  You can use @code{extr} to convert between
integer data types, in which case the first must be smaller in size
than the second; for example @code{extr_c_ui} is correct while
@code{extr_ul_us} is not.  You can also use @code{extr} to convert
an integer to a floating point value: the only available possibilities
are @code{extr_i_f} and @code{extr_i_d}.  The other instructions
convert a floating point value to an integer, so the possible
suffixes are @code{_f_i} and @code{_d_i}.

@example
movr                      i  ui  l  ul  p  f  d  O1 = O2
movi                      i  ui  l  ul  p  f  d  O1 = O2
extr        c  uc  s  us  i  ui  l  ul     f  d  O1 = O2
roundr                    i                f  d  O1 = round(O2)
truncr                    i                f  d  O1 = trunc(O2)
floorr                    i                f  d  O1 = floor(O2)
ceilr                     i                f  d  O1 = ceil(O2)
@end example

Note that the order of the arguments is @emph{destination first,
source second} as for all other @lightning{} instructions, but
the order of the types is always reversed with respect to that
of the arguments: @emph{shorter}---source---@emph{first,
longer}---destination---@emph{second}.  This happens for historical
reasons.

@item Network extensions
These accept two operands, both of which must be registers; these
two instructions actually perform the same task, yet they are
assigned to two mnemonics for the sake of convenience and
completeness.  As usual, the first operand is the destination and
the second is the source.
@example
hton       us ui          @r{Host-to-network (big endian) order}
ntoh       us ui          @r{Network-to-host order }
@end example

@item Load operations
@code{ld} accepts two operands while @code{ldx} accepts three;
in both cases, the last can be either a register or an immediate
value. Values are extended (with or without sign, according to
the data type specification) to fit a whole register.
@example
ldr     c  uc  s  us  i  ui  l  ul  p  f  d  O1 = *O2
ldi     c  uc  s  us  i  ui  l  ul  p  f  d  O1 = *O2
ldxr    c  uc  s  us  i  ui  l  ul  p  f  d  O1 = *(O2+O3)
ldxi    c  uc  s  us  i  ui  l  ul  p  f  d  O1 = *(O2+O3)
@end example

@item Store operations
@code{st} accepts two operands while @code{stx} accepts three; in
both cases, the first can be either a register or an immediate
value. Values are sign-extended to fit a whole register.
@example
str     c  uc  s  us  i  ui  l  ul  p  f  d  *O1 = O2
sti     c  uc  s  us  i  ui  l  ul  p  f  d  *O1 = O2
stxr    c  uc  s  us  i  ui  l  ul  p  f  d  *(O1+O2) = O3
stxi    c  uc  s  us  i  ui  l  ul  p  f  d  *(O1+O2) = O3
@end example

@item Argument management
These are:
@example
prepare                   i                f  d
pusharg     c  uc  s  us  i  ui  l  ul  p  f  d
getarg      c  uc  s  us  i  ui  l  ul  p  f  d
arg         c  uc  s  us  i  ui  l  ul  p  f  d
retval      c  uc  s  us  i  ui  l  ul  p
@end example

Of these, the first two are used by the caller, while the last two
are used by the callee.  A code snippet that wants to call another
procedure and has to pass registers must, in order: use the
@code{prepare} instruction, giving the number of arguments to
be passed to the procedure (once for each data type); use
@code{pusharg} to push the arguments @strong{in reverse order};
and use @code{calli} or @code{finish} (explained below) to
perform the actual call.

@code{arg} and @code{getarg} are used by the callee.
@code{arg} is different from other instruction in that it does not
actually generate any code: instead, it is a function which returns
a value to be passed to @code{getarg}.@footnote{``Return a
value'' means that @lightning{} macros that compile these
instructions return a value when expanded.} You should call
@code{arg} as soon as possible, before any function call or, more
easily, right after the @code{prolog} or @code{leaf} instructions
(which are treated later).

@code{getarg} accepts a register argument and a value returned by
@code{arg}, and will move that argument to the register, extending
it (with or without sign, according to the data type specification)
to fit a whole register.  These instructions are more intimately
related to the usage of the @lightning{} instruction set in code
that generates other code, so they will be treated more
specifically in @ref{GNU lightning macros, , Generating code at
run-time}.

Finally, the @code{retval} instruction fetches the return value of a
called function in a register.  The @code{retval} instruction takes a
register argument and copies the return value of the previously called
function in that register.  A function should put its own return value
in the @code{RET} register before returning.  @xref{Fibonacci, the
Fibonacci numbers}, for an example.

You should observe a few rules when using these macros.  First of
all, it is not allowed to call functions with more than six arguments;
this was done to simplify and speed up the implementation on
architectures that use registers for parameter passing.

You should not nest calls to @code{prepare}, nor call zero-argument
functions (which do not need a call to @code{prepare}) inside a
@code{prepare/calli} or @code{prepare/finish} block.  Doing this
might corrupt already pushed arguments.

You @strong{cannot} pass parameters between subroutines using
the six general-purpose registers.  This might work only when
targeting particular architectures.

On the other hand, it is possible to assume that callee-saved registers
(@code{R0} through @code{R2}) are not clobbered by another dynamically
generated function which does not use them as operands in its code and
which does not return a value.

@item Branch instructions
Like @code{arg}, these also return a value which, in this case,
is to be used to compile forward branches as explained in
@ref{Fibonacci, , Fibonacci numbers}.  They accept a pointer to the
destination of the branch and two operands to be compared; of these,
the last can be either a register or an immediate.  They are:
@example
bltr      i  ui  l  ul  p  f  d  @r{if }(O2 <  O3)@r{ goto }O1
blti      i  ui  l  ul  p        @r{if }(O2 <  O3)@r{ goto }O1
bler      i  ui  l  ul  p  f  d  @r{if }(O2 <= O3)@r{ goto }O1
blei      i  ui  l  ul  p        @r{if }(O2 <= O3)@r{ goto }O1
bgtr      i  ui  l  ul  p  f  d  @r{if }(O2 >  O3)@r{ goto }O1
bgti      i  ui  l  ul  p        @r{if }(O2 >  O3)@r{ goto }O1
bger      i  ui  l  ul  p  f  d  @r{if }(O2 >= O3)@r{ goto }O1
bgei      i  ui  l  ul  p        @r{if }(O2 >= O3)@r{ goto }O1
beqr      i  ui  l  ul  p  f  d  @r{if }(O2 == O3)@r{ goto }O1
beqi      i  ui  l  ul  p        @r{if }(O2 == O3)@r{ goto }O1
bner      i  ui  l  ul  p  f  d  @r{if }(O2 != O3)@r{ goto }O1
bnei      i  ui  l  ul  p        @r{if }(O2 != O3)@r{ goto }O1

bunltr                     f  d  @r{if }!(O2 >= O3)@r{ goto }O1
bunler                     f  d  @r{if }!(O2 >  O3)@r{ goto }O1
bungtr                     f  d  @r{if }!(O2 <= O3)@r{ goto }O1
bunger                     f  d  @r{if }!(O2 <  O3)@r{ goto }O1
buneqr                     f  d  @r{if }!(O2 <  O3) && !(O2 >  O3)@r{ goto }O1
bltgtr                     f  d  @r{if }!(O2 >= O3) || !(O2 <= O3)@r{ goto }O1
bordr                      f  d  @r{if } (O2 == O2) &&  (O3 == O3)@r{ goto }O1
bunordr                    f  d  @r{if }!(O2 != O2) ||  (O3 != O3)@r{ goto }O1

bmsr      i ui l  ul             @r{if }O2 &  O3@r{ goto }O1
bmsi      i ui l  ul             @r{if }O2 &  O3@r{ goto }O1
bmcr      i ui l  ul             @r{if }!(O2 & O3)@r{ goto }O1
bmci      i ui l  ul             @r{if }!(O2 & O3)@r{ goto }O1@footnote{These mnemonics mean, respectively, @dfn{branch if mask set} and @dfn{branch if mask cleared}.}
boaddr    i ui l  ul             O2 += O3@r{, goto }O1@r{ on overflow}
boaddi    i ui l  ul             O2 += O3@r{, goto }O1@r{ on overflow}
bosubr    i ui l  ul             O2 -= O3@r{, goto }O1@r{ on overflow}
bosubi    i ui l  ul             O2 -= O3@r{, goto }O1@r{ on overflow}
@end example

@item Jump and return operations
These accept one argument except @code{ret} which has none; the
difference between @code{finish} and @code{calli} is that the
latter does not clean the stack from pushed parameters (if any)
and the former must @strong{always} follow a @code{prepare}
instruction.
@example
calli     (not specified)                  @r{function call to O1}
callr     (not specified)                  @r{function call to a register}
finish    (not specified)                  @r{function call to O1}
finishr   (not specified)                  @r{function call to a register}
jmpi/jmpr (not specified)                  @r{unconditional jump to O1}
ret       (not specified)                  @r{return from subroutine}
retval    c  uc s  us i  ui l  ul p  f  d  @r{move return value}
                                           @r{to register}
@end example

Like branch instruction, @code{jmpi} also returns a value which is to
be used to compile forward branches. @xref{Fibonacci, , Fibonacci
numbers}.

@item Function prolog

These macros are used to set up the function prolog, in particular to
declare the number of arguments accepted by a function, and to reserve
space on the stack to be used for variables.  They accept a single
numeric argument.

@example
prolog    (not specified)                  @r{function prolog for O1 args}
leaf      (not specified)                  @r{the same for leaf functions}
allocai   (not specified)                  @r{reserve space on the stack}
@end example

Results are undefined when using function calls in a leaf function.

@code{allocai} receives the number of bytes to allocate and returns
the offset from the frame pointer register @code{FP} to the base of
the area.  The area is aligned to an @code{int}; future versions of
@lightning{} may provide more fine-grained control on the alignment of
stack-allocated variables.
@end table

As a small appetizer, here is a small function that adds 1 to the input
parameter (an @code{int}).  I'm using an assembly-like syntax here which
is a bit different from the one used when writing real subroutines with
@lightning{}; the real syntax will be introduced in @xref{GNU lightning
macros, , Generating code at run-time}.

@example
incr:
     leaf      1
in = arg_i                   @rem{! We have an integer argument}
     getarg_i  R0, in        @rem{! Move it to R0}
     addi_i    RET, R0, 1    @rem{! Add 1\, put result in return value}
     ret                     @rem{! And return the result}
@end example

And here is another function which uses the @code{printf} function from
the standard C library to write a number in hexadecimal notation:

@example
printhex:
     prolog    1
in = arg_i                    @rem{! Same as above}
     getarg_i  R0, in
     prepare   2              @rem{! Begin call sequence for printf}
     pusharg_i R0             @rem{! Push second argument}
     pusharg_p "%x"           @rem{! Push format string}
     finish    printf         @rem{! Call printf}
     ret                      @rem{! Return to caller}
@end example

@node GNU lightning macros
@chapter Generating code at run-time

To use @lightning{}, you should include the @file{lightning.h} file that
is put in your include directory by the @samp{make install} command.
That include files defines about four hundred public macros (plus
others that are private to @lightning{}), one for each opcode listed
above.

Each of the instructions above translates to a macro.  All you have to
do is prepend @code{jit_} (lowercase) to opcode names and @code{JIT_}
(uppercase) to register names.  Of course, parameters are to be put
between parentheses, just like with every other @sc{cpp} macro.

This small tutorial presents three examples:

@iftex
@itemize @bullet
@item
The @code{incr} function found in @ref{The instruction set, ,
@lightning{}'s instruction set}:

@item
A simple function call to @code{printf}

@item
An RPN calculator.

@item
Fibonacci numbers
@end itemize
@end iftex
@ifnottex
@menu
* incr::             A function which increments a number by one
* printf::           A simple function call to printf
* RPN calculator::   A more complex example, an RPN calculator
* Fibonacci::        Calculating Fibonacci numbers
@end menu
@end ifnottex

@node incr
@section A function which increments a number by one

Let's see how to create and use the sample @code{incr} function created
in @ref{The instruction set, , @lightning{}'s instruction set}:

@example
#include <stdio.h>
#include "lightning.h"

static jit_insn codeBuffer[1024];

typedef int (*pifi)(int);    @rem{/* Pointer to Int Function of Int */}

int main()
@{
  pifi  incr = (pifi) (jit_set_ip(codeBuffer).iptr);
  int   in;

  jit_leaf(1);                     @rem{/* @t{     leaf  1            } */}
  in = jit_arg_i();                @rem{/* @t{in = arg_i              } */}
  jit_getarg_i(JIT_R0, in);        @rem{/* @t{     getarg_i R0        } */}
  jit_addi_i(JIT_RET, JIT_R0, 1);  @rem{/* @t{     addi_i   RET\, R0\, 1} */}
  jit_ret();                       @rem{/* @t{     ret                } */}

  jit_flush_code(codeBuffer, jit_get_ip().ptr);

  @rem{/* call the generated code\, passing 5 as an argument */}
  printf("%d + 1 = %d\n", 5, incr(5));
  return 0;
@}
@end example

Let's examine the code line by line (well, almost@dots{}):

@table @t
@item #include "lightning.h"
You already know about this.  It defines all of @lightning{}'s macros.

@item static jit_insn codeBuffer[1024];
You might wonder about what is @code{jit_insn}.  It is just a type that
is defined by @lightning{}.  Its exact definition depends on the
architecture; in general, defining an array of 1024 @code{jit_insn}s
allows one to write 100 to 400 @lightning{} instructions (depending on
the architecture and exact instructions).

@item typedef int (*pifi)(int);
Just a handy typedef for a pointer to a function that takes an
@code{int} and returns another.

@item pifi incr = (pifi) (jit_set_ip(codeBuffer).iptr);
This is the first @lightning{} macro we encounter that does not map to
an instruction.  It is @code{jit_set_ip}, which takes a pointer to an
area of memory where compiled code will be put and returns the same
value, cast to a @code{union} type whose members are pointers to
functions returning different C types.  This union is called
@code{jit_code} and is defined as follows:

@example
    typedef union jit_code @{
      char               *ptr;
      void               (*vptr)();
      char               (*cptr)();
      unsigned char      (*ucptr)();
      short              (*sptr)();
      unsigned short     (*usptr)();
      int                (*iptr)();
      unsigned int       (*uiptr)();
      long               (*lptr)();
      unsigned long      (*ulptr)();
      void *             (*pptr)();
      float              (*fptr)();
      double             (*dptr)();
    @} jit_code;
@end example

Any of the members could have been used, since the result is soon casted
to type @code{pifi} but, for the sake of clarity, the program uses
@code{iptr}, a pointer to a function with no prototype and returning an
@code{int}.

Analogous to @code{jit_set_ip} is @code{jit_get_ip}, which does not
modify the instruction pointer---it is nothing more than a cast of the
current @sc{ip} to @code{jit_code}.

@item int       in;
A footnote in @ref{The instruction set, , @lightning{}'s instruction
set}, under the description of @code{arg}, says that macros implementing
@code{arg} return a value---we'll be using this variable to store the
result of @code{arg}.

@item jit_leaf(1);
Ok, so we start generating code for our beloved function@dots{} it will
accept one argument and won't call any other function.

@item in = jit_arg_i();
@itemx jit_getarg_i(JIT_R0, in);
We retrieve the first (and only) argument, an integer, and store it
into the general-purpose register @code{R0}.

@item jit_addi_i(JIT_RET, JIT_R0, 1);
We add one to the content of the register and store the result in the
return value.

@item jit_ret();
This instruction generates a standard function epilog that returns
the contents of the @code{RET} register.

@item jit_flush_code(codeBuffer, jit_get_ip().ptr);
This instruction is very important.  It flushes the generated code
area out of the processor's instruction cache, avoiding the processor
executes bogus data that it happens to find there.  The
@code{jit_flush_code} function accepts the first and the last address
to flush; we use @code{jit_get_ip} to find out the latter.

@item printf("%d + 1 = %d", 5, incr(5));
Calling our function is this simple---it is not distinguishable from
a normal C function call, the only difference being that @code{incr}
is a variable.
@end table

@lightning{} abstracts two phases of dynamic code generation: selecting
instructions that map the standard representation, and emitting binary
code for these instructions.  The client program has the responsibility
of describing the code to be generated using the standard @lightning{}
instruction set.

Let's examine the code generated for @code{incr} on the SPARC and x86
architectures (on the right is the code that an assembly-language
programmer would write):

@table @b
@item SPARC
@example
    save %sp, -96, %sp
    mov  %i0, %l0                   retl
    add  %l0, 1,  %i0               add %o0, 1, %o0
    ret
    restore
@end example
In this case, @lightning{} introduces overhead to create a register
window (not knowing that the procedure is a leaf procedure) and to
move the argument to the general purpose register @code{R0} (which
maps to @code{%l0} on the SPARC).  The former overhead could be
avoided by teaching @lightning{} about leaf procedures (@pxref{Future});
the latter could instead be avoided by rewriting the getarg instruction
as @code{jit_getarg_i(JIT_RET, in)}, which was not done in this
example.

@item x86
@example
    pushl %ebp
    movl  %esp, %ebp
    pushl %ebx
    pushl %esi
    pushl %edi
    movl  8(%ebp), %eax        movl 4(%esp), %eax
    addl  $1, %eax             incl %eax
    popl  %edi
    popl  %esi
    popl  %ebx
    popl  %ebp
    ret                        ret
@end example
In this case, the main overhead is due to the function's prolog and
epilog, which is nine instructions long on the x86; a hand-written
routine would not save unused callee-preserved registers on the stack.
It is to be said, however, that this is not a problem in more
complicated uses, because more complex procedure would probably use
the @code{V0} through @code{V2} registers (@code{%ebx}, @code{%esi},
@code{%edi}); in this case, a hand-written routine would have included
the prolog too.  Also, a ten byte prolog would probably be a small
overhead in a more complex function.
@end table

In such a simple case, the macros that make up the back-end compile
reasonably efficient code, with the notable exception of prolog/epilog
code.

@node printf
@section A simple function call to @code{printf}

Again, here is the code for the example:

@example
#include <stdio.h>
#include "lightning.h"

static jit_insn codeBuffer[1024];

typedef void (*pvfi)(int);      @rem{/* Pointer to Void Function of Int */}

int main()
@{
  pvfi          myFunction;             @rem{/* ptr to generated code */}
  char          *start, *end;           @rem{/* a couple of labels */}
  int           in;                     @rem{/* to get the argument */}

  myFunction = (pvfi) (jit_set_ip(codeBuffer).vptr);
  start = jit_get_ip().ptr;
  jit_prolog(1);
  in = jit_arg_i();
  jit_movi_p(JIT_R0, "generated %d bytes\n");
  jit_getarg_i(JIT_R1, in);
  jit_prepare(2);
    jit_pusharg_i(JIT_R1);              @rem{/* push in reverse order */}
    jit_pusharg_p(JIT_R0);
  jit_finish(printf);
  jit_ret();
  end = jit_get_ip().ptr;

  @rem{/* call the generated code\, passing its size as argument */}
  jit_flush_code(start, end);
  myFunction(end - start);
@}
@end example

The function shows how many bytes were generated.  Most of the code
is not very interesting, as it resembles very closely the program
presented in @ref{incr, , A function which increments a number by one}.

For this reason, we're going to concentrate on just a few statements.

@table @t
@item start = jit_get_ip().ptr;
@itemx @r{@dots{}}
@itemx end = jit_get_ip().ptr;
These two instruction call the @code{jit_get_ip} macro which was
mentioned in @ref{incr, , A function which increments a number by one}
too.  In this case we use the only field of @code{jit_code} that is
not a function pointer: @code{ptr}, which is a simple @code{char *}.

@item jit_movi_p(JIT_R0, "generated %d bytes\n");
Note the use of the @samp{p} type specifier, which automatically
casts the second parameter to an @code{unsigned long} to make the
code more clear and less cluttered by typecasts.

@item jit_prepare(2);
@itemx jit_pusharg_i(JIT_R1);
@itemx jit_pusharg_p(JIT_R0);
@itemx jit_finish(printf);
Once the arguments to @code{printf} have been put in general-purpose
registers, we can start a prepare/pusharg/finish sequence that
moves the argument to either the stack or registers, then calls
@code{printf}, then cleans up the stack.  Note how @lightning{}
abstracts the differences between different architectures and
ABI's -- the client program does not know how parameter passing
works on the host architecture.
@end table

@node RPN calculator
@section A more complex example, an RPN calculator

We create a small stack-based RPN calculator which applies a series
of operators to a given parameter and to other numeric operands.
Unlike previous examples, the code generator is fully parameterized
and is able to compile different formulas to different functions.
Here is the code for the expression compiler; a sample usage will
follow.

Since @lightning{} does not provide push/pop instruction, this
example uses a stack-allocated area to store the data.  Such an
area can be allocated using the macro @code{jit_allocai}, which
receives the number of bytes to allocate and returns the offset
from the frame pointer register @code{JIT_FP} to the base of the
area.  The area is aligned to an @code{int}; future versions
of @lightning{} may provide more fine-grained control on the
alignment of stack-allocated variables.

Usually, you will use the @code{ldxi} and @code{stxi} instruction
to access stack-allocated variables.  However, it is possible to
use operations such as @code{add} to compute the address of the
variables, and pass the address around.

@example
#include <stdio.h>
#include "lightning.h"

typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}

void stack_push(int reg, int *sp)
@{
  jit_stxi_i (*sp, JIT_FP, reg);
  *sp += sizeof (int);
@}

void stack_pop(int reg, int *sp)
@{
  *sp -= sizeof (int);
  jit_ldxi_i (reg, JIT_FP, *sp);
@}

pifi compile_rpn(char *expr)
@{
  pifi fn;
  int stack_base, stack_ptr;
  int in;

  fn = (pifi) (jit_get_ip().iptr);
  jit_leaf(1);
  in = jit_arg_i();
  stack_ptr = stack_base = jit_allocai (32 * sizeof (int));

  jit_getarg_i(JIT_R2, in);

  while (*expr) @{
    char buf[32];
    int n;
    if (sscanf(expr, "%[0-9]%n", buf, &n)) @{
      expr += n - 1;
      stack_push(JIT_R0, &stack_ptr);
      jit_movi_i(JIT_R0, atoi(buf));
    @} else if (*expr == 'x') @{
      stack_push(JIT_R0, &stack_ptr);
      jit_movi_i(JIT_R0, JIT_R2);
    @} else if (*expr == '+') @{
      stack_pop(JIT_R1, &stack_ptr);
      jit_addr_i(JIT_R0, JIT_R1, JIT_R0);
    @} else if (*expr == '-') @{
      stack_pop(JIT_R1, &stack_ptr);
      jit_subr_i(JIT_R0, JIT_R1, JIT_R0);
    @} else if (*expr == '*') @{
      stack_pop(JIT_R1, &stack_ptr);
      jit_mulr_i(JIT_R0, JIT_R1, JIT_R0);
    @} else if (*expr == '/') @{
      stack_pop(JIT_R1, &stack_ptr);
      jit_divr_i(JIT_R0, JIT_R1, JIT_R0);
    @} else @{
      fprintf(stderr, "cannot compile: %s\n", expr);
      abort();
    @}
    ++expr;
  @}
  jit_movr_i(JIT_RET, JIT_R0);
  jit_ret();
  return fn;
@}
@end example

The principle on which the calculator is based is easy: the stack top
is held in R0, while the remaining items of the stack are held in the
memory area that we allocate with @code{allocai}.  Compiling a numeric
operand or the argument @code{x} pushes the old stack top onto the
stack and moves the operand into R0; compiling an operator pops the
second operand off the stack into R1, and compiles the operation so
that the result goes into R0, thus becoming the new stack top.

This example allocates a fixed area for 32 @code{int}s.  This is not
a problem when the function is a leaf like in this case; in a full-blown
compiler you will want to analyze the input and determine the number
of needed stack slots---a very simple example of register allocation.
The area is then managed like a stack using @code{stack_push} and
@code{stack_pop}.

Try to locate a call to @code{jit_set_ip} in the source code.  You
will not find one; this means that the client has to manually set
the instruction pointer.  This technique has one advantage and one
drawback.  The advantage is that the client can simply set the
instruction pointer once and then generate code for multiple functions,
one after another, without caring about passing a different instruction
pointer each time; see @ref{Reentrancy, , Re-entrant usage of
@lightning{}} for the disadvantage.

Source code for the client (which lies in the same source file) follows:

@example
static jit_insn codeBuffer[1024];

int main()
@{
  pifi c2f, f2c;
  int i;

  jit_set_ip(codeBuffer);
  c2f = compile_rpn("32x9*5/+");
  f2c = compile_rpn("x32-5*9/");
  jit_flush_code(codeBuffer, jit_get_ip().ptr);

  printf("\nC:");
  for (i = 0; i <= 100; i += 10) printf("%3d ", i);
  printf("\nF:");
  for (i = 0; i <= 100; i += 10) printf("%3d ", c2f(i));
  printf("\n");

  printf("\nF:");
  for (i = 32; i <= 212; i += 10) printf("%3d ", i);
  printf("\nC:");
  for (i = 32; i <= 212; i += 10) printf("%3d ", f2c(i));
  printf("\n");
  return 0;
@}
@end example

The client displays a conversion table between Celsius and Fahrenheit
degrees (both Celsius-to-Fahrenheit and Fahrenheit-to-Celsius). The
formulas are, @math{F(c) = c*9/5+32} and @math{C(f) = (f-32)*5/9},
respectively.

Providing the formula as an argument to @code{compile_rpn} effectively
parameterizes code generation, making it possible to use the same code
to compile different functions; this is what makes dynamic code
generation so powerful.

The @file{rpn.c} file in the @lightning{} distribution includes a more
complete (and more complex) implementation of @code{compile_rpn},
which does constant folding and is able to assemble instructions with
an immediate parameter.  Still, it is based on the same principle and
also uses @code{allocai} to allocate space for the stack.

@node Fibonacci
@section Fibonacci numbers

The code in this section calculates a variant of the Fibonacci sequence.
While the traditional Fibonacci sequence is modeled by the recurrence
relation:
@display
     f(0) = f(1) = 1
     f(n) = f(n-1) + f(n-2)
@end display

@noindent
the functions in this section calculates the following sequence, which
is more interesting as a benchmark@footnote{That's because, as is
easily seen, the sequence represents the number of activations of the
@code{nfibs} procedure that are needed to compute its value through
recursion.}:
@display
     nfibs(0) = nfibs(1) = 1
     nfibs(n) = nfibs(n-1) + nfibs(n-2) + 1
@end display

The purpose of this example is to introduce branches.  There are two
kind of branches: backward branches and forward branches.  We'll
present the calculation in a recursive and iterative form; the
former only uses forward branches, while the latter uses both.

@example
#include <stdio.h>
#include "lightning.h"

static jit_insn codeBuffer[1024];

typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}

int main()
@{
  pifi      nfibs = (pifi) (jit_set_ip(codeBuffer).iptr);
  int       in;                 @rem{/* offset of the argument */}
  jit_insn  *ref;               @rem{/* to patch the forward reference */}

        jit_prolog   (1);
  in =  jit_arg_ui   ();
        jit_getarg_ui(JIT_V0, in);              @rem{/* V0 = n */}
  ref = jit_blti_ui  (jit_forward(), JIT_V0, 2);
        jit_subi_ui  (JIT_V1, JIT_V0, 1);       @rem{/* V1 = n-1 */}
        jit_subi_ui  (JIT_V2, JIT_V0, 2);       @rem{/* V2 = n-2 */}
        jit_prepare(1);
          jit_pusharg_ui(JIT_V1);
        jit_finish(nfibs);
        jit_retval(JIT_V1);                     @rem{/* V1 = nfibs(n-1) */}
        jit_prepare(1);
          jit_pusharg_ui(JIT_V2);
        jit_finish(nfibs);
        jit_retval(JIT_V2);                     @rem{/* V2 = nfibs(n-2) */}
        jit_addi_ui(JIT_V1,  JIT_V1,  1);
        jit_addr_ui(JIT_RET, JIT_V1, JIT_V2);   @rem{/* RET = V1 + V2 + 1 */}
        jit_ret();

  jit_patch(ref);                               @rem{/* patch jump */}
        jit_movi_i(JIT_RET, 1);                 @rem{/* RET = 1 */}
        jit_ret();

  @rem{/* call the generated code\, passing 32 as an argument */}
  jit_flush_code(codeBuffer, jit_get_ip().ptr);
  printf("nfibs(%d) = %d", 32, nfibs(32));
  return 0;
@}
@end example

As said above, this is the first example of dynamically compiling
branches.  Branch instructions have three operands: two contains the
values to be compared, while the first is a @dfn{label}; @lightning{}
label's are represented as @code{jit_insn *} values.  Unlike other
instructions (apart from @code{arg}, which is actually a directive
rather than an instruction), branch instructions also return a value
which, as we see in the example above, can be used to compile
forward references.

Compiling a forward reference is a two-step operation.  First, a
branch is compiled with a dummy label, since the actual destination
of the jump is not yet known; the dummy label is returned by the
@code{jit_forward()} macro.  The value returned by the branch
instruction is saved to be used later.

Then, when the destination of the jump is reached, another macro
is used, @code{jit_patch()}. This macro must be called once for
@strong{every} point in which the code had a forward branch to the
instruction following @code{jit_patch} (in this case a @code{movi_i}
instruction).

Now, here is the iterative version:

@example
#include <stdio.h>
#include "lightning.h"

static jit_insn codeBuffer[1024];

typedef int (*pifi)(int);       @rem{/* Pointer to Int Function of Int */}

int main()
@{
  pifi     nfibs = (pifi) (jit_set_ip(codeBuffer).iptr);
  int      in;                  @rem{/* offset of the argument */}
  jit_insn *ref;                @rem{/* to patch the forward reference */}
  jit_insn *loop;               @rem{/* start of the loop */}

        jit_leaf     (1);
  in =  jit_arg_ui   ();
        jit_getarg_ui(JIT_R2, in);              @rem{/* R2 = n */}
        jit_movi_ui  (JIT_R1, 1);
  ref = jit_blti_ui  (jit_forward(), JIT_R2, 2);
        jit_subi_ui  (JIT_R2, JIT_R2, 1);
        jit_movi_ui  (JIT_R0, 1);

  loop= jit_get_label();
        jit_subi_ui  (JIT_R2, JIT_R2, 1);       @rem{/* decr. counter */}
        jit_addr_ui  (JIT_V0, JIT_R0, JIT_R1);  @rem{/* V0 = R0 + R1 */}
        jit_movr_ui  (JIT_R0, JIT_R1);          @rem{/* R0 = R1 */}
        jit_addi_ui  (JIT_R1, JIT_V0, 1);       @rem{/* R1 = V0 + 1 */}
        jit_bnei_ui  (loop, JIT_R2, 0);         @rem{/* if (R2) goto loop; */}

  jit_patch(ref);                               @rem{/* patch forward jump */}
        jit_movr_ui  (JIT_RET, JIT_R1);         @rem{/* RET = R1 */}
        jit_ret      ();

  @rem{/* call the generated code\, passing 36 as an argument */}
  jit_flush_code(codeBuffer, jit_get_ip().ptr);
  printf("nfibs(%d) = %d", 36, nfibs(36));
  return 0;
@}
@end example

This code calculates the recurrence relation using iteration (a
@code{for} loop in high-level languages).  There is still a forward
reference (indicated by the @code{jit_forward}/@code{jit_patch} pair);
there are no function calls anymore: instead, there is a backward
jump (the @code{bnei} at the end of the loop).

In this case, the destination address should be known, because the
jumps lands on an instruction that has already been compiled.
However the program must make a provision and remember the address
where the jump will land.  This is achieved with @code{jit_get_label},
yet another macro that is much similar to @code{jit_get_ip} but,
instead of a @code{jit_code} union, it answers an @code{jit_insn *}
that the branch macros accept.

Now, let's make one more change: let's rewrite the loop like this:

@example
  @r{@dots{}}

  jit_delay(
        jit_movi_ui  (JIT_R1, 1),
  ref = jit_blti_ui  (jit_forward(), JIT_R2, 2));
        jit_subi_ui  (JIT_R2, JIT_R2, 1);

  loop= jit_get_label();
        jit_subi_ui  (JIT_R2, JIT_R2, 1);       @rem{/* decr. counter */}
        jit_addr_ui  (JIT_V0, JIT_R0, JIT_R1);  @rem{/* V0 = R0 + R1 */}
        jit_movr_ui  (JIT_R0, JIT_R1);          @rem{/* R0 = R1 */}
  jit_delay(
        jit_addi_ui  (JIT_R1, JIT_V0, 1),       @rem{/* R1 = V0 + 1 */}
        jit_bnei_ui  (loop, JIT_R2, 0));        @rem{/* if (R2) goto loop; */}

  @r{@dots{}}
@end example

The @code{jit_delay} macro is used to schedule delay slots in jumps and
branches.  This is optional, but might lead to performance improvements
in tight inner loops (of course not in a loop that is executed 35
times, but this is just an example).

@code{jit_delay} takes two @lightning{} instructions, a @dfn{delay
instruction} and a @dfn{branch instruction}.  Note that the two
instructions must be written in execution order (first the delay
instruction, then the branch instruction), @strong{not} with the branch
first.  If the current machine has a delay slot, the delay instruction
(or part of it) is placed in the delay slot after the branch
instruction; otherwise, it emits the delay instruction before the branch
instruction.  The delay instruction must not depend on being executed
before or after the branch.

Instead of @code{jit_patch}, you can use @code{jit_patch_at}, which
takes two arguments: the first is the same as for @code{jit_patch}, and
the second is the valued to be patched in.  In other words, these two
invocations have the same effect:

@example
  jit_patch (jump_pc);
  jit_patch_at (jump_pc, jit_get_ip ());
@end example

Dual to branches and @code{jit_patch_at} are @code{jit_movi_p}
and @code{jit_patch_movi}, which can also be used to implement
forward references.  @code{jit_movi_p} is carefully implemented
to use an encoding that is as long as possible, so that it can
always be patched; in addition, like branches, it will return
an address which is then passed to @code{jit_patch_movi}.  The
usage of @code{jit_patch_movi} is similar to @code{jit_patch_at}.

@node Reentrancy
@chapter Re-entrant usage of @lightning{}

By default, @lightning{} is able to compile different functions at the
same time as long as it happens in different object files, and on the
other hand constrains code generation tasks to reside in a single
object file.

The reason for this is not apparent, but is easily explained:
the @file{lightning.h} header file defines its state as a
@code{static} variable, so calls to @code{jit_set_ip} and
@code{jit_get_ip} residing in different files access different
instruction pointers.  This was not done without reason: it makes
the usage of @lightning{} much simpler, as it limits the initialization
tasks to the bare minimum and removes the need to link the program
with a separate library.

On the other hand, multi-threaded or otherwise concurrent programs
require reentrancy in the code generator, so this approach cannot be
the only one.  In fact, it is possible to define your own copy of
@lightning{}'s instruction state by defining a variable of type
@code{jit_state} and @code{#define}-ing @code{_jit} to it:

@example
    struct jit_state lightning;
    #define _jit lightning
@end example

You are free to define the @code{jit_state} variable as you like:
@code{extern}, @code{static} to a function, @code{auto}, or global.

This feature takes advantage of an aspect of macros (@dfn{cascaded
macros}), which is documented thus in @acronym{CPP}'s reference manual:

@quotation
A cascade of macros is when one macro's body contains a reference to
another macro.  This is very common practice.  For example,
@example
#define BUFSIZE 1020
#define TABLESIZE BUFSIZE
@end example
This is not at all the same as defining @code{TABLESIZE} to be
@samp{1020}.  The @code{#define} for @code{TABLESIZE} uses exactly the
body you specify---in this case, @code{BUFSIZE}---and does not check to
see whether it too is the name of a macro; it's only when you use
@code{TABLESIZE} that the result of its expansion is checked for more
macro names.

This makes a difference if you change the definition of @code{BUFSIZE}
at some point in the source file. @code{TABLESIZE}, defined as shown,
will always expand using the definition of @code{BUFSIZE} that is
currently in effect:
#define BUFSIZE 1020
#define TABLESIZE BUFSIZE
#undef BUFSIZE
#define BUFSIZE 37

Now @code{TABLESIZE} expands (in two stages) to `37'. (The @code{#undef}
is to prevent any warning about the nontrivial redefinition of
@code{BUFSIZE}.)
@end quotation

@noindent
In the same way, @code{jit_get_label} will adopt whatever definition of
@code{_jit} is in effect:
@example
#define	jit_get_label()			(_jit.pc)
@end example

Special care must be taken when functions residing in separate files
must access the same state.  This could be the case, for example, if a
special library contained function for strength reduction of
multiplications to adds & shifts, or maybe of divisions to
multiplications and shifts.  The function would be compiled using a
single definition of @code{_jit} and that definition would be used
whenever the function would be called.

Since @lightning{} uses a feature of the preprocessor to obtain
re-entrancy, it makes sense to rely on the preprocessor in this case
too.

The idea is to pass the current @code{struct jit_state} to the
function:

@example
static void
_opt_muli_i(jit, dest, source, n)
     register struct jit_state *jit;
     register int		dest, source, n;
@{
#define _jit          jit
@dots{}
#undef _jit
@}
@end example

@noindent
doing this unbeknownst to the client, using a macro in the header file:

@example
extern void _opt_muli_i(struct jit_state *, int, int, int);

#define opt_muli_i(rd, rs, n)	_opt_muli_i(&_jit, (rd), (rs), (n))
@end example


@section Registers
@chapter Accessing the whole register file

As mentioned earlier in this chapter, all @lightning{} back-ends are
guaranteed to have at least six general-purpose integer registers and
six floating-point registers, but many back-ends will have more.

To access the entire register files, you can use the
@code{JIT_R}, @code{JIT_V} and @code{JIT_FPR} macros.  They
accept a parameter that identifies the register number, which
must be strictly less than @code{JIT_R_NUM}, @code{JIT_V_NUM}
and @code{JIT_FPR_NUM} respectively; the number need not be
constant.  Of course, expressions like @code{JIT_R0} and
@code{JIT_R(0)} denote the same register, and likewise for
integer callee-saved, or floating-point, registers.

@node Bundling GNU lightning
@chapter Using @lightning{} in your programs

It is very easy to include @lightning{}'s source code (without the
documentation and examples) into your program's distribution
so that people don't need to have it installed in order to use it.

Here is a step by step explanation of what to do:

@enumerate
@item Run @command{lightningize} from your package's main
distribution directory.
@example
     lightningize
@end example

@noindent
This will copy the source code for the @lightning{} back ends
into the @file{lightning} directory of your package.

@item If you're using Automake, you might be pleased to know that
@file{Makefile.am} files will be already there.

If you're not using Automake and @code{aclocal}, instead,
you should delete the @file{Makefile.am} files (they are of no use
to you) and copy the contents of the @file{lightning.m4} file, found in
@command{aclocal}'s macro repository (usually @file{/usr/share/aclocal},
to your @file{configure.in} or @file{acinclude.m4} or @file{aclocal.m4} file.

@item Include a call to the @code{LIGHTNING_CONFIGURE_IF_NOT_FOUND}
macro in your @file{configure.in} file.
@end enumerate

@code{LIGHTNING_CONFIGURE_IF_NOT_FOUND} will first look for a
pre-installed copy of @lightning{} and, if it can be found, it will
use it; otherwise, it will test if there is a back-end for the host
system.  If @lightning{} is already installed, or if the system is
supported by lightning, it will define the @code{HAVE_LIGHTNING}
symbol.

In addition, an Automake conditional named @code{HAVE_INSTALLED_LIGHTNING}
will be set if @lightning{} is already installed, which can be used to
set up include paths appropriately.

Finally, @code{LIGHTNING_CONFIGURE_IF_NOT_FOUND} accepts two
optional parameters: respectively, an action to be taken if @lightning{}
is available, and an action to be taken if it is not.