* check/alu_rsb.ok, check/alu_rsb.tst: New files implementing
tests for jit_rsb*.
* check/Makefile.am, check/lightning.c, include/lightning.h,
lib/jit_aarch64-cpu.c, lib/jit_aarch64-fpu.c, lib/jit_aarch64-sz.c,
lib/jit_aarch64.c, lib/jit_alpha-cpu.c, lib/jit_alpha-fpu.c,
lib/jit_alpha-sz.c, lib/jit_alpha.c, lib/jit_arm-cpu.c,
lib/jit_arm-swf.c, lib/jit_arm-sz.c, lib/jit_arm-vfp.c,
lib/jit_arm.c, lib/jit_hppa-cpu.c, lib/jit_hppa-fpu.c,
lib/jit_hppa-sz.c, lib/jit_hppa.c, lib/jit_ia64-cpu.c,
lib/jit_ia64-fpu.c, lib/jit_ia64-sz.c, lib/jit_ia64.c,
lib/jit_mips-cpu.c, lib/jit_mips-fpu.c, lib/jit_mips-sz.c,
lib/jit_mips.c, lib/jit_names.c, lib/jit_ppc-cpu.c,
lib/jit_ppc-fpu.c, lib/jit_ppc-sz.c, lib/jit_ppc.c,
lib/jit_s390x-cpu.c, lib/jit_s390x-fpu.c, lib/jit_s390x-sz.c,
lib/jit_s390x.c, lib/jit_sparc-cpu.c, lib/jit_sparc-fpu.c,
lib/jit_sparc-sz.c, lib/jit_sparc.c, lib/jit_x86-cpu.c,
lib/jit_x86-sse.c, lib/jit_x86-sz.c, lib/jit_x86-x87.c,
lib/jit_x86.c, lib/lightning.c: Implement jit_rsb*. This
was a missing lightning 1.x interface, that on most
backends is synthesized, but on a few backends (hppa and ia64),
it can generate better code as on those there is, or the
only instruction with an immediate is in "rsb" format
(left operand).
* include/lightning.h, include/lightning/jit_private.h,
lib/jit_aarch64-cpu.c, lib/jit_alpha-cpu.c, lib/jit_arm-cpu.c,
lib/jit_hppa-cpu.c, lib/jit_ia64-cpu.c, lib/jit_mips-cpu.c,
lib/jit_ppc-cpu.c, lib/jit_s390x-cpu.c, lib/jit_sparc-cpu.c,
lib/jit_x86-cpu.c, lib/lightning.c: Implement the new
jit_frame and jit_tramp interfaces, that allow writing
trampoline like calls, where a single dispatcher jit buffer
is written, and later other jit buffers are created, with
the same stack frame layout as the dispatcher. This is the
logic that GNU Smalltalk used in lightning 1.x, and is required
to make a sane port for lighting 2.x.
* jit_ia64-cpu.c: Implement support for jit_frame and jit_tramp,
and also correct wrong encoding for B4 instructions, that
implement jmpr, as well as correct reverse logic in _jmpr,
that was moving the branch register to the jump register,
and not vice-versa.
Also, if a stack frame is to be assumed, always assume it may
call a function with up to 8 arguments, regardless of the
hint frame argument.
* lib/jit_arm.c: Add a new must_align_p() interface to ensure
function prologs are always aligned. This condition was
previously always true, somewhat by accident, but with
jit_tramp it is not guaranteed.
* jit_ia64-cpu.c: lib/jit_ppc.c: Add minor special handling
required to implement jit_tramp, where a function descriptor
should not be added before a prolog, as jit_tramp means omit
prolog.
* check/lightning.c: Update test driver for the new interfaces.
* check/Makefile.am, check/tramp.tst, check/tramp.ok: Add
a simple test and example of the jit_frame and jit_tramp
usage implementing a simple Fibonacci function using a
simulation of an interpreter stack and how it would handle
state in language specific variables.
* doc/body.texi: Add documentation for jit_frame and
jit_tramp.
* lib/jit_aarch64-cpu.c, lib/jit_aarch64-fpu.c,
lib/jit_arm-cpu.c, lib/jit_arm-vfp.c,
lib/jit_hppa-cpu.c, lib/jit_hppa-fpu.c,
lib/jit_ia64-cpu.c, lib/jit_ia64-fpu.c,
lib/jit_mips-cpu.c, lib/jit_mips-fpu.c,
lib/jit_ppc-cpu.c, lib/jit_ppc-fpu.c,
lib/jit_s390x-cpu.c, lib/jit_s390x-fpu.c,
lib/jit_s390x.c, lib/jit_sparc-cpu.c,
lib/jit_x86-cpu.c, lib/jit_x86-sse.c,
lib/jit_x86-x87.c: Review generation of all branch
instructions and always adds the jit_class_nospill
bitfield for temporary registers that cannot be spilled
because the reload would be after a conditional jump; the
patch only adds an extra assertion. These conditions do
not happen on documented lightning usage, but can happen
if one uses the not exported jit_get_reg and jit_unget_reg
calls and cause enough register starvation.
* lib/jit_ia64-cpu.c, lib/jit_ia64-fpu.c: Correct some
off by one range checks (that were only accepting values
one less than the maximum allowed) and an invalid test
condition check that was forcing it to always use
indirect jumps even when reachable with an immediate
displacement.
* configure.ac: Force -mlp64 to CFLAGS on HP-UX ia64 port.
It is the only supported mode, and expects gcc as C compiler.
* include/lightning.h, lib/jit_ia64-cpu.c, lib/jit_ia64.c:
Correct ia64 port to work on HP-UX that runs it in big endian
mode.
* check/varargs.tst: Correct misplaced .align directive
that was causing the double buffer to not be aligned at
8 bytes.
* lib/jit_ia64-cpu.c:
Properly implement abi for excess arguments passed on
stack.
Simplify load/store with immediate displacement argument
with zero value.
Simplify some calls to "subi" changing to "addi" with
a negative argument.
Remove some #if 0'ed code, that could be useful in
special conditions, but the most useful one would be
to "optimize" "static" jit functions, but for the sake
of simplicity, jit functions are implemented in a way
that can be passed back to C code as C function pointers.
Add an attribute to prototypes of several unused functions.
These functions are defined for the sake of implementing all
Itanium documented instructions, but a significant amount of
them is not used by lightning.
* lib/jit_ia64-fpu.c: Simplify load/store with zero immediate
displacement and add unused attribute for functions not used
by lightning, but required to provide macros implementing all
Itanium documented instructions.
* lib/jit_ia64.c: Update for the properly implemented abi
for stack arguments.
* lib/lightning.c: Mark an unused function as such.
lib/jit_ia64-cpu.c:
Correct immediate range check of integer comparisons when
inverting arguments.
Correct gei_u that was not decrementing immediate when
inverting arguments.
Correct b?add* and b?sub* that were not properly updating
the result register.
* lib/jit_ia64-cpu.c: Correct wrong mapping of 2 instructions
in "M-, stop, M-, stop" translation, that was ignoring the
last stop (implemented as a nop I- stop).
* lib/jit_ia64-fpu.c: Properly implement fnorm.s and fnorm.d,
as well as the proper integer to float or double conversion.
* lib/jit_ia64-cpu.c: Correct bogus implementation of ldr_T
for signed integers, that was using ld1.s, ld2.s and ld4.s.
The ".s" stands for speculative load, not sign extend.
* lib/jit_ia64-fpu.c: Correct bogus implementation of ldxr_T
for float and double. The third (actually, second) argument
is indeed added to the base register, but the base register
is modified. The actual M7 implementation was already correct,
just the ldxr_f and ldxr_d implementation that was kept in
a prototype state, misinterpreting what M7 does.
* lib/jit_ia64-cpu.c: Correct X2 pattern matching by preventing
it to attempt to require a stop between the L and the X
instruction; that is, check the registers and predicates
before emitting the L instruction, not after.
* lib/jit_ia64-fpu.c: Slightly simplify and correct
divr_f and divrd_d implementation.
* check/lightning.c: Add __ia64__ preprocessor define
on Itanium.
* check/alu.inc, check/clobber.tst, check/float.tst: Define
several macros conditionally to __ia64__. This is required
because __ia64__ jit generation can use way too many memory,
due to not implementing instruction reordering to avoid
as much as possible "stops", what causes way too many nops
to be generated, as well as the fact that division and
remainder requires function calls, and float division
requires significant code to implement.
* include/lightning.h: Add new backend specific movr_w_d,
movr_d_w and movi_d_w codes as helpers to ia64 varargs
functions arguments.
* lib/jit_ia64-cpu.c:
Correct wrong encoding of A5 small integers.
Correct define of "mux" instruction modifiers.
Correct ordering of arguments and predicates of cmp_xy
implementation with immediate arguments; like most other
codes with an immediate, the immediate is the second, not
the third argument.
* lib/jit_ia64-fpu.c: Actual implementation of the code
to move to/from gpr to/from fpr, to implement varargs abi.
* lib/jit_ia64.c: Make fpr argument registers not allocatable
as temporaries, no need for the extra checks when there are
plenty registers.
* lib/jit_print.c, lib/lightning.c: Minor updates for the
new movr_w_d, movr_d_w and movi_d_w codes.
* lib/jit_ia64-cpu.c, lib/jit_ia64-fpu.c: Correct code to
also insert a stop to break an instruction group if a
register is written more than once in the same group.
This may happen if a register is argument and result of
some lightning call (not a real instruction). The most
common case should be code in the pattern:
movl rn=largenum
...
mov rn=smallnum
where "rn" would end up holding "largenum".
But the problem possibly could happen in other circumstances.
* include/lightning/jit_ia64.h, lib/jit_ia64-cpu.c,
lib/jit_ia64-fpu.c, lib/jit_ia64.c:
Relocate JIT_Rn registers to the local registers, as, like
float registers, div/rem and sqrt are implemented as function
calls, and may overwrite non saved scratch registers.
Change patch_at to receive a jit_code_t instead of a
jit_node_t, so that it is easier to "inline" patches when
some instruction requires complex code to implement, e.g.
uneq and ltgt.
Correct arguments to FMA and FMA like instructions that,
due to a cut&paste error were passing the wrong argument
to the related F- implementation function.
Rewrite ltgt to return the proper result if one (or both)
of the arguments is unordered.
* include/lightning/jit_ia64.h, include/lightning/jit_private.h,
lib/jit_ia64-cpu.c, lib/jit_ia64-fpu.c, lib/jit_ia64.c,
lib/lightning.c: Rework code to detect need of a "stop" to
also handle predicates, as if a predicate is written, it
cannot be read in the same instruction group.
Use a single jit_regset_t variable for all registers when
checking need for a stop (increment value by 128 for
float registers).
Correct wrong "subi" implementation, as the code executed
is r0=im-r1, not r0=r1-im.
Use standard lightning 6 fpr registers, and rework to
use callee save float registers, that may be spill/reloaded
in prolog/epilog. This is required because some jit
instructions implementations need to call functions; currently
integer div/mod and float sqrt, what may change the value of
scratch float registers.
Rework point of "sync" of branches that need to return a
patch'able address, because the need for a "stop" before a
predicate read causes all branches to be the instruction
in slot 0, as there is no template to "stop" and branch
in the same instruction "bundle".
* include/lightning/jit_ia64.h, lib/jit_ia64-cpu.c,
lib/jit_ia64-fpu.c, lib/jit_ia64.c: New files implementing
the basic infrastructure of an Itanium port. The code
compiles and can generate jit for basic hello world like
functions.
* check/lightning.c, configure.ac, include/lightning.h,
include/lightning/Makefile.am, include/lightning/jit_private.h,
lib/Makefile.am, lib/lightning.c: Update for the Itanium
port.
* lib/jit_mips-cpu.c, lib/jit_mips.c: Correct typo and
make the jit_carry register local to the jit_state_t.
This matches code reviewed in the Itanium port, that
should use the same base logic to handle carry/borrow.