* lib/jit_arm-cpu.c, lib/jit_arm.c: Correct wrong test and update
of the thumb offset information, when checking if needing to
patch a jump from arm to thumb mode. The problem would happen when
remapping the code buffer, and the new address being lower than
the previous one.
* configure.ac: Extend FreeBSD test to also handle NetBSD.
* lib/jit_x86-cpu.c: Correct wrongly defined offset type of
ldxi_ui. Problem detected when building on NetBSD.
* lib/lightning.c: Adjust code to handle NetBSD mremap,
where arguments do not match Linux mremap.
lib/jit_ppc.c: Correct C sequence point problem miscalculating
the actual function address in a function descriptor. Problem
happens with gcc 4.8.1 at least.
* include/lightning/jit_s390x.h, lib/jit_s390x-cpu.c,
lib/jit_s390x-fpu.c, lib/jit_s390x.c: New files
implementing the new s390x port.
* configure.ac, include/lightning.h,
include/lightning/Makefile.am,
include/lightning/jit_private.h,
lib/Makefile.am, lib/jit_disasm.c, lib/lightning.c:
Minor adaptation for the new s390x backend.
* check/float.tst: Update for the s390x result of
truncating +Inf to integer.
* check/qalu_mul.tst: Add extra test cases to better test
high word of signed multiplication as the result is
adjust from unsigned multiplication on s390x.
* check/lightning.c: Do not assume casting a double NaN or
Inf to float will produce the expected float NaN or Inf.
This is not true at least under s390x.
* check/check.arm.sh, check/check.sh, check/check.swf.sh,
check/check.x87.sh: Properly check test programs output,
not just rely on the test program self testing the results
and not crashing.
* include/lightning/jit_aarch64.h, lib/jit_aarch64-cpu.c,
lib/jit_aarch64-fpu.c, lib/jit_aarch64.c: New files
implementing the new aarch64 port, as a new architecture,
not as an expansion of the existing armv[4-7] port.
* check/lightning.c: Add aarch64 support and a small
change to recognize character constants as immediate
values.
* check/float.tst: Add aarch64 preprocessor conditionals
to select proper expected value when converting [+-]Inf
and NaN to integer.
* include/lightning/jit_arm.h, lib/jit_arm.c: Minor changes
to better match the new aarch64 files.
* configure.ac, include/lightning.h,
include/lightning/Makefile.am, include/lightning/jit_private.h,
lib/Makefile.am, lib/lightning.c: Minor adjustments
for the aarch64 port.
* lib/jit_mips.c: Correct cut&paste error that caused wrong
stack offset calculation for double arguments in stack in
the o32 abi.
Correct typo in the __LITTLE_ENDIAN macro name, that came
from cut&paste error in the original typo in lib/jit_ppc.c.
* lib/jit_ia64.c, lib/jit_ppc.c: Correct typo in the
__LITTLE_ENDIAN macro name.
* check/lightning.c, configure.ac, include/lightning.h,
lib/lightning.c: Add tests and quirks to build/detect
and/or work on Irix.
* include/lightning/jit_mips.h, lib/jit_mips-cpu.c,
lib/jit_mips-fpu.c, lib/jit_mips.c: Adapt code to run
in big endian mips, using the n32 abi.
* lib/jit_sparc-cpu.c: Correct compiler warning of value
used before assignment. The usage is bogus as the api
requires always patching jumps, but the random value used
could cause an assertion due to invalid displacement.
* lib/jit_sparc.c: Always load and store double arguments
in stack as 2 float loads or stores, for safety, as unaligned
access is not allowed in Sparc Solaris.
* configure.ac: Force -mlp64 to CFLAGS on HP-UX ia64 port.
It is the only supported mode, and expects gcc as C compiler.
* include/lightning.h, lib/jit_ia64-cpu.c, lib/jit_ia64.c:
Correct ia64 port to work on HP-UX that runs it in big endian
mode.
* lib/jit_hppa.c: Sanitize the cache synchronization inline
assembly code that was doing twice the work and redundantly
flushing the end address every loop iteration.
* configure.ac, check/Makefile.am, doc/Makefile.am: Do not
explicitly link to -ldl, but instead autodetect the library
with dlopen, dlsym, etc.
* check/lightning.c: Add workaround to apparently buggy
getopt in HP-UX that sets optind to the wrong index, and
use RTLD_NEXT on HP-UX instead of RTLD_DEFAULT to dlsym
global symbols.
* include/lightning.h: Rework definitions of wordsize and
byte order to detect proper values on HP-UX.
* lib/lightning.c: Minor correction to use MAP_ANONYMOUS
instead of MAP_ANON on HP-UX.
* lib/jit_hppa.c: Float arguments must be passed on integer
registers on HP-UX, not only for varargs functions.
Add code to properly clear instruction cache. This was
not required on Debian hppa port, but may have been working
by accident.
* lib/jit_hppa-cpu.c: Follow pattern of HP-UX binaries and
use bve,n instead of bv,n to return from functions.
* lib/jit_hppa-fpu.c: For some reason "fst? frX,rX,(rY)" did
not work on the tested computer (HP-UX B.11.23 U 9000/785 HP-UX)
so the code was changed, at first for __hpux only to add the
base and offset register and use the instruction with an
immediate (zero) offset.
Many thanks to Trent Nelson from snakebite.org for giving access to a
build farm with several different architectures and operating systems.
* check/lightning.c, lib/jit_disasm.c, lib/jit_ppc-cpu.c,
lib/jit_ppc-fpu.c, lib/jit_ppc.c, include/lightning.h,
include/lightning/jit_ppc.h, include/lightning/jit_private.h:
Adapt code to work on 32 bit AIX ppc using gcc. Most changes
are basically to adapt the elf64 logic to 32 bit, as it does
not use the same convention of 32 bit Darwin ppc.
* check/stack.tst: Add a fake memcpy function to the test
case if running under AIX, as it is not available to dlsym.
* configure.ac: Check for getopt.h header, not available in
AIX.
* include/lightning/jit_hppa.h, lib/jit_hppa-cpu.c,
lib/jit_hppa-fpu.c, lib/jit_hppa.c: New files implementing
the hppa port. Built on Debian Linux PA-RISC 2.0, 32 bit.
* check/float.tst: Add preprocessor for hppa expected
values when converting NaN and +-Inf to an integer.
* check/ldst.inc: Ensure double load/store tests use an
8 byte aligned address by default.
* lib/lightning.c: Correct a bug found during tests in
the new port, where qmul* and qdiv* were not properly
setting one of the result registers as modified in the
function, what would be a problem if the only "write"
usage were the qmul* or qdiv*.
* check/varargs.tst, check/varargs.ok: Add one extra
interleaved integer/double test to validate proper code
generation in the extra case.
* check/lightning.c, configure.ac, include/lightning.h,
include/lightning/Makefile.am,
include/lightning/jit_private.h, lib/Makefile.am,
lib/jit_disasm.c: Update for the hppa port.
* check/varargs.tst: Correct misplaced .align directive
that was causing the double buffer to not be aligned at
8 bytes.
* lib/jit_ia64-cpu.c:
Properly implement abi for excess arguments passed on
stack.
Simplify load/store with immediate displacement argument
with zero value.
Simplify some calls to "subi" changing to "addi" with
a negative argument.
Remove some #if 0'ed code, that could be useful in
special conditions, but the most useful one would be
to "optimize" "static" jit functions, but for the sake
of simplicity, jit functions are implemented in a way
that can be passed back to C code as C function pointers.
Add an attribute to prototypes of several unused functions.
These functions are defined for the sake of implementing all
Itanium documented instructions, but a significant amount of
them is not used by lightning.
* lib/jit_ia64-fpu.c: Simplify load/store with zero immediate
displacement and add unused attribute for functions not used
by lightning, but required to provide macros implementing all
Itanium documented instructions.
* lib/jit_ia64.c: Update for the properly implemented abi
for stack arguments.
* lib/lightning.c: Mark an unused function as such.
lib/jit_ia64-cpu.c:
Correct immediate range check of integer comparisons when
inverting arguments.
Correct gei_u that was not decrementing immediate when
inverting arguments.
Correct b?add* and b?sub* that were not properly updating
the result register.
* lib/jit_ia64-cpu.c: Correct wrong mapping of 2 instructions
in "M-, stop, M-, stop" translation, that was ignoring the
last stop (implemented as a nop I- stop).
* lib/jit_ia64-fpu.c: Properly implement fnorm.s and fnorm.d,
as well as the proper integer to float or double conversion.
* lib/jit_ia64-cpu.c: Correct bogus implementation of ldr_T
for signed integers, that was using ld1.s, ld2.s and ld4.s.
The ".s" stands for speculative load, not sign extend.
* lib/jit_ia64-fpu.c: Correct bogus implementation of ldxr_T
for float and double. The third (actually, second) argument
is indeed added to the base register, but the base register
is modified. The actual M7 implementation was already correct,
just the ldxr_f and ldxr_d implementation that was kept in
a prototype state, misinterpreting what M7 does.
* lib/jit_ia64-cpu.c: Correct X2 pattern matching by preventing
it to attempt to require a stop between the L and the X
instruction; that is, check the registers and predicates
before emitting the L instruction, not after.
* lib/jit_ia64-fpu.c: Slightly simplify and correct
divr_f and divrd_d implementation.
* check/lightning.c: Add __ia64__ preprocessor define
on Itanium.
* check/alu.inc, check/clobber.tst, check/float.tst: Define
several macros conditionally to __ia64__. This is required
because __ia64__ jit generation can use way too many memory,
due to not implementing instruction reordering to avoid
as much as possible "stops", what causes way too many nops
to be generated, as well as the fact that division and
remainder requires function calls, and float division
requires significant code to implement.
* include/lightning.h: Add new backend specific movr_w_d,
movr_d_w and movi_d_w codes as helpers to ia64 varargs
functions arguments.
* lib/jit_ia64-cpu.c:
Correct wrong encoding of A5 small integers.
Correct define of "mux" instruction modifiers.
Correct ordering of arguments and predicates of cmp_xy
implementation with immediate arguments; like most other
codes with an immediate, the immediate is the second, not
the third argument.
* lib/jit_ia64-fpu.c: Actual implementation of the code
to move to/from gpr to/from fpr, to implement varargs abi.
* lib/jit_ia64.c: Make fpr argument registers not allocatable
as temporaries, no need for the extra checks when there are
plenty registers.
* lib/jit_print.c, lib/lightning.c: Minor updates for the
new movr_w_d, movr_d_w and movi_d_w codes.
* lib/jit_ia64-cpu.c, lib/jit_ia64-fpu.c: Correct code to
also insert a stop to break an instruction group if a
register is written more than once in the same group.
This may happen if a register is argument and result of
some lightning call (not a real instruction). The most
common case should be code in the pattern:
movl rn=largenum
...
mov rn=smallnum
where "rn" would end up holding "largenum".
But the problem possibly could happen in other circumstances.
* include/lightning/jit_ia64.h, lib/jit_ia64-cpu.c,
lib/jit_ia64-fpu.c, lib/jit_ia64.c:
Relocate JIT_Rn registers to the local registers, as, like
float registers, div/rem and sqrt are implemented as function
calls, and may overwrite non saved scratch registers.
Change patch_at to receive a jit_code_t instead of a
jit_node_t, so that it is easier to "inline" patches when
some instruction requires complex code to implement, e.g.
uneq and ltgt.
Correct arguments to FMA and FMA like instructions that,
due to a cut&paste error were passing the wrong argument
to the related F- implementation function.
Rewrite ltgt to return the proper result if one (or both)
of the arguments is unordered.
* include/lightning/jit_ia64.h, include/lightning/jit_private.h,
lib/jit_ia64-cpu.c, lib/jit_ia64-fpu.c, lib/jit_ia64.c,
lib/lightning.c: Rework code to detect need of a "stop" to
also handle predicates, as if a predicate is written, it
cannot be read in the same instruction group.
Use a single jit_regset_t variable for all registers when
checking need for a stop (increment value by 128 for
float registers).
Correct wrong "subi" implementation, as the code executed
is r0=im-r1, not r0=r1-im.
Use standard lightning 6 fpr registers, and rework to
use callee save float registers, that may be spill/reloaded
in prolog/epilog. This is required because some jit
instructions implementations need to call functions; currently
integer div/mod and float sqrt, what may change the value of
scratch float registers.
Rework point of "sync" of branches that need to return a
patch'able address, because the need for a "stop" before a
predicate read causes all branches to be the instruction
in slot 0, as there is no template to "stop" and branch
in the same instruction "bundle".
* include/lightning/jit_ia64.h, lib/jit_ia64-cpu.c,
lib/jit_ia64-fpu.c, lib/jit_ia64.c: New files implementing
the basic infrastructure of an Itanium port. The code
compiles and can generate jit for basic hello world like
functions.
* check/lightning.c, configure.ac, include/lightning.h,
include/lightning/Makefile.am, include/lightning/jit_private.h,
lib/Makefile.am, lib/lightning.c: Update for the Itanium
port.
* lib/jit_mips-cpu.c, lib/jit_mips.c: Correct typo and
make the jit_carry register local to the jit_state_t.
This matches code reviewed in the Itanium port, that
should use the same base logic to handle carry/borrow.
* include/lightning/jit_private.h, lib/jit_arm.c,
lib/jit_mips-cpu.c, lib/jit_mips.c, lib/jit_ppc-cpu.c,
lib/jit_ppc.c, lib/jit_print.c, lib/jit_sparc-cpu.c,
lib/jit_sparc.c, lib/jit_x86-cpu.c, lib/jit_x86.c,
lib/lightning.c: Change all jit_regset macros to take
a pointer argument, to avoid structure copies when
adding a port to an architecture with more than 64
registers.
* include/lightning/jit_private.h, lib/jit_arm.c, lib/jit_memory.c,
lib/jit_mips.c, lib/jit_ppc.c, lib/jit_sparc.c, lib/jit_x86.c,
lib/lightning.c: Do not start over jit generation if can grow
the code buffer with mremap without moving the base pointer.
* lib/jit_memory.c: Implement a simple memory allocation wrapper
to allow overriding calls to malloc/calloc/realloc/free, as well
as ensuring all memory containing pointers is zero or points to
allocated memory.
* include/lightning.h, include/lightning/jit_private.h: Definitions
for the memory allocation wrapper.
* lib/Makefile.am: Update for new jit_memory.c file.
* lib/jit_arm.c, lib/jit_disasm.c, lib/jit_mips.c, lib/jit_note.c,
lib/jit_ppc.c, lib/jit_sparc.c, lib/jit_x86.c, lib/lightning.c:
Use the new memory allocation wrapper code.
* configure.ac, include/lightning/jit_private.h, lib/lightning.c:
Remove dependency on gmp. Only a simple bitmap was required, and
that was not enough reason to force linking to gmp and possible
complications caused by it.
* include/lightning.h: Add check for __powerpc__ defined
in Linux, while Darwin defines __ppc__.
* include/lightning/jit_ppc.h: Adjust register definitions
for Darwin 32 bit and Linux 64 bit ppc usage and/or ABI.
* include/lightning/jit_private.h: Add proper check for
Linux __powerpc__ and an data definition for an workaround
to properly handle code that starts with a jump to a "main"
label.
* lib/jit_disasm.c: Add extra disassembler initialization
for __powerpc64__.
* lib/jit_ppc-cpu.c: Add extra macros and functions, and
correct/adapt previous ones to handle powerpc64.
* lib/jit_ppc-fpu.c: Adapt for 64 bit wordsize. Basically
add conversion from/to int32/int64 and proper handling of
load/store offsets too large for 32 bit.
* lib/jit_ppc.c: Add calls to 64 bit codes and adaptation
for the PowerPC 64 bit Linux ABI.
* lib/jit_arm.c, lib/jit_mips.c, lib/jit_sparc, lib/jit_x86.c,
lib/lightning.c: Correct off by one error when restarting jit
of a function due to finding too late that needs to spill/reload
some register. Problem was found by accident on a very special
condition during PowerPC 64 code adaptation.
* check/float.tst: Comment out the int to negative infinity
test in mips for the moment because not all Loongson agrees
on the result.
* lib/jit_disasm.c: Add a test instead of an assertion
when loading symbols for disassembly due to a failure with
a simple binutils build in Debian mipsel64.
* include/lightning/jit_private.h, lib/jit_arm-cpu.c,
lib/jit_arm.c, lib/jit_disasm.c, lib/jit_mips-cpu.c,
lib/jit_mips.c, lib/jit_note.c, lib/jit_ppc-cpu.c,
lib/jit_ppc.c, lib/jit_print.c, lib/jit_sparc-cpu.c,
lib/jit_sparc.c, lib/jit_x86-cpu.c, lib/jit_x86.c,
lib/lightning.c: Add an extra structure for data storage
during jit generation, and release it after generating
jit, to reduce a bit memory usage, and also to make it
easier to understand what data is available during
jit runtime.