* lib/jit_hppa.c: Sanitize the cache synchronization inline
assembly code that was doing twice the work and redundantly
flushing the end address every loop iteration.
* configure.ac, check/Makefile.am, doc/Makefile.am: Do not
explicitly link to -ldl, but instead autodetect the library
with dlopen, dlsym, etc.
* check/lightning.c: Add workaround to apparently buggy
getopt in HP-UX that sets optind to the wrong index, and
use RTLD_NEXT on HP-UX instead of RTLD_DEFAULT to dlsym
global symbols.
* include/lightning.h: Rework definitions of wordsize and
byte order to detect proper values on HP-UX.
* lib/lightning.c: Minor correction to use MAP_ANONYMOUS
instead of MAP_ANON on HP-UX.
* lib/jit_hppa.c: Float arguments must be passed on integer
registers on HP-UX, not only for varargs functions.
Add code to properly clear instruction cache. This was
not required on Debian hppa port, but may have been working
by accident.
* lib/jit_hppa-cpu.c: Follow pattern of HP-UX binaries and
use bve,n instead of bv,n to return from functions.
* lib/jit_hppa-fpu.c: For some reason "fst? frX,rX,(rY)" did
not work on the tested computer (HP-UX B.11.23 U 9000/785 HP-UX)
so the code was changed, at first for __hpux only to add the
base and offset register and use the instruction with an
immediate (zero) offset.
Many thanks to Trent Nelson from snakebite.org for giving access to a
build farm with several different architectures and operating systems.
* check/lightning.c, lib/jit_disasm.c, lib/jit_ppc-cpu.c,
lib/jit_ppc-fpu.c, lib/jit_ppc.c, include/lightning.h,
include/lightning/jit_ppc.h, include/lightning/jit_private.h:
Adapt code to work on 32 bit AIX ppc using gcc. Most changes
are basically to adapt the elf64 logic to 32 bit, as it does
not use the same convention of 32 bit Darwin ppc.
* check/stack.tst: Add a fake memcpy function to the test
case if running under AIX, as it is not available to dlsym.
* configure.ac: Check for getopt.h header, not available in
AIX.
* include/lightning/jit_hppa.h, lib/jit_hppa-cpu.c,
lib/jit_hppa-fpu.c, lib/jit_hppa.c: New files implementing
the hppa port. Built on Debian Linux PA-RISC 2.0, 32 bit.
* check/float.tst: Add preprocessor for hppa expected
values when converting NaN and +-Inf to an integer.
* check/ldst.inc: Ensure double load/store tests use an
8 byte aligned address by default.
* lib/lightning.c: Correct a bug found during tests in
the new port, where qmul* and qdiv* were not properly
setting one of the result registers as modified in the
function, what would be a problem if the only "write"
usage were the qmul* or qdiv*.
* check/varargs.tst, check/varargs.ok: Add one extra
interleaved integer/double test to validate proper code
generation in the extra case.
* check/lightning.c, configure.ac, include/lightning.h,
include/lightning/Makefile.am,
include/lightning/jit_private.h, lib/Makefile.am,
lib/jit_disasm.c: Update for the hppa port.
* check/varargs.tst: Correct misplaced .align directive
that was causing the double buffer to not be aligned at
8 bytes.
* lib/jit_ia64-cpu.c:
Properly implement abi for excess arguments passed on
stack.
Simplify load/store with immediate displacement argument
with zero value.
Simplify some calls to "subi" changing to "addi" with
a negative argument.
Remove some #if 0'ed code, that could be useful in
special conditions, but the most useful one would be
to "optimize" "static" jit functions, but for the sake
of simplicity, jit functions are implemented in a way
that can be passed back to C code as C function pointers.
Add an attribute to prototypes of several unused functions.
These functions are defined for the sake of implementing all
Itanium documented instructions, but a significant amount of
them is not used by lightning.
* lib/jit_ia64-fpu.c: Simplify load/store with zero immediate
displacement and add unused attribute for functions not used
by lightning, but required to provide macros implementing all
Itanium documented instructions.
* lib/jit_ia64.c: Update for the properly implemented abi
for stack arguments.
* lib/lightning.c: Mark an unused function as such.
lib/jit_ia64-cpu.c:
Correct immediate range check of integer comparisons when
inverting arguments.
Correct gei_u that was not decrementing immediate when
inverting arguments.
Correct b?add* and b?sub* that were not properly updating
the result register.
* lib/jit_ia64-cpu.c: Correct wrong mapping of 2 instructions
in "M-, stop, M-, stop" translation, that was ignoring the
last stop (implemented as a nop I- stop).
* lib/jit_ia64-fpu.c: Properly implement fnorm.s and fnorm.d,
as well as the proper integer to float or double conversion.
* lib/jit_ia64-cpu.c: Correct bogus implementation of ldr_T
for signed integers, that was using ld1.s, ld2.s and ld4.s.
The ".s" stands for speculative load, not sign extend.
* lib/jit_ia64-fpu.c: Correct bogus implementation of ldxr_T
for float and double. The third (actually, second) argument
is indeed added to the base register, but the base register
is modified. The actual M7 implementation was already correct,
just the ldxr_f and ldxr_d implementation that was kept in
a prototype state, misinterpreting what M7 does.
* lib/jit_ia64-cpu.c: Correct X2 pattern matching by preventing
it to attempt to require a stop between the L and the X
instruction; that is, check the registers and predicates
before emitting the L instruction, not after.
* lib/jit_ia64-fpu.c: Slightly simplify and correct
divr_f and divrd_d implementation.
* check/lightning.c: Add __ia64__ preprocessor define
on Itanium.
* check/alu.inc, check/clobber.tst, check/float.tst: Define
several macros conditionally to __ia64__. This is required
because __ia64__ jit generation can use way too many memory,
due to not implementing instruction reordering to avoid
as much as possible "stops", what causes way too many nops
to be generated, as well as the fact that division and
remainder requires function calls, and float division
requires significant code to implement.
* include/lightning.h: Add new backend specific movr_w_d,
movr_d_w and movi_d_w codes as helpers to ia64 varargs
functions arguments.
* lib/jit_ia64-cpu.c:
Correct wrong encoding of A5 small integers.
Correct define of "mux" instruction modifiers.
Correct ordering of arguments and predicates of cmp_xy
implementation with immediate arguments; like most other
codes with an immediate, the immediate is the second, not
the third argument.
* lib/jit_ia64-fpu.c: Actual implementation of the code
to move to/from gpr to/from fpr, to implement varargs abi.
* lib/jit_ia64.c: Make fpr argument registers not allocatable
as temporaries, no need for the extra checks when there are
plenty registers.
* lib/jit_print.c, lib/lightning.c: Minor updates for the
new movr_w_d, movr_d_w and movi_d_w codes.
* lib/jit_ia64-cpu.c, lib/jit_ia64-fpu.c: Correct code to
also insert a stop to break an instruction group if a
register is written more than once in the same group.
This may happen if a register is argument and result of
some lightning call (not a real instruction). The most
common case should be code in the pattern:
movl rn=largenum
...
mov rn=smallnum
where "rn" would end up holding "largenum".
But the problem possibly could happen in other circumstances.
* include/lightning/jit_ia64.h, lib/jit_ia64-cpu.c,
lib/jit_ia64-fpu.c, lib/jit_ia64.c:
Relocate JIT_Rn registers to the local registers, as, like
float registers, div/rem and sqrt are implemented as function
calls, and may overwrite non saved scratch registers.
Change patch_at to receive a jit_code_t instead of a
jit_node_t, so that it is easier to "inline" patches when
some instruction requires complex code to implement, e.g.
uneq and ltgt.
Correct arguments to FMA and FMA like instructions that,
due to a cut&paste error were passing the wrong argument
to the related F- implementation function.
Rewrite ltgt to return the proper result if one (or both)
of the arguments is unordered.
* include/lightning/jit_ia64.h, include/lightning/jit_private.h,
lib/jit_ia64-cpu.c, lib/jit_ia64-fpu.c, lib/jit_ia64.c,
lib/lightning.c: Rework code to detect need of a "stop" to
also handle predicates, as if a predicate is written, it
cannot be read in the same instruction group.
Use a single jit_regset_t variable for all registers when
checking need for a stop (increment value by 128 for
float registers).
Correct wrong "subi" implementation, as the code executed
is r0=im-r1, not r0=r1-im.
Use standard lightning 6 fpr registers, and rework to
use callee save float registers, that may be spill/reloaded
in prolog/epilog. This is required because some jit
instructions implementations need to call functions; currently
integer div/mod and float sqrt, what may change the value of
scratch float registers.
Rework point of "sync" of branches that need to return a
patch'able address, because the need for a "stop" before a
predicate read causes all branches to be the instruction
in slot 0, as there is no template to "stop" and branch
in the same instruction "bundle".
* include/lightning/jit_ia64.h, lib/jit_ia64-cpu.c,
lib/jit_ia64-fpu.c, lib/jit_ia64.c: New files implementing
the basic infrastructure of an Itanium port. The code
compiles and can generate jit for basic hello world like
functions.
* check/lightning.c, configure.ac, include/lightning.h,
include/lightning/Makefile.am, include/lightning/jit_private.h,
lib/Makefile.am, lib/lightning.c: Update for the Itanium
port.
* lib/jit_mips-cpu.c, lib/jit_mips.c: Correct typo and
make the jit_carry register local to the jit_state_t.
This matches code reviewed in the Itanium port, that
should use the same base logic to handle carry/borrow.
* include/lightning/jit_private.h, lib/jit_arm.c,
lib/jit_mips-cpu.c, lib/jit_mips.c, lib/jit_ppc-cpu.c,
lib/jit_ppc.c, lib/jit_print.c, lib/jit_sparc-cpu.c,
lib/jit_sparc.c, lib/jit_x86-cpu.c, lib/jit_x86.c,
lib/lightning.c: Change all jit_regset macros to take
a pointer argument, to avoid structure copies when
adding a port to an architecture with more than 64
registers.
* include/lightning/jit_private.h, lib/jit_arm.c, lib/jit_memory.c,
lib/jit_mips.c, lib/jit_ppc.c, lib/jit_sparc.c, lib/jit_x86.c,
lib/lightning.c: Do not start over jit generation if can grow
the code buffer with mremap without moving the base pointer.
* lib/jit_memory.c: Implement a simple memory allocation wrapper
to allow overriding calls to malloc/calloc/realloc/free, as well
as ensuring all memory containing pointers is zero or points to
allocated memory.
* include/lightning.h, include/lightning/jit_private.h: Definitions
for the memory allocation wrapper.
* lib/Makefile.am: Update for new jit_memory.c file.
* lib/jit_arm.c, lib/jit_disasm.c, lib/jit_mips.c, lib/jit_note.c,
lib/jit_ppc.c, lib/jit_sparc.c, lib/jit_x86.c, lib/lightning.c:
Use the new memory allocation wrapper code.
* configure.ac, include/lightning/jit_private.h, lib/lightning.c:
Remove dependency on gmp. Only a simple bitmap was required, and
that was not enough reason to force linking to gmp and possible
complications caused by it.
* include/lightning.h: Add check for __powerpc__ defined
in Linux, while Darwin defines __ppc__.
* include/lightning/jit_ppc.h: Adjust register definitions
for Darwin 32 bit and Linux 64 bit ppc usage and/or ABI.
* include/lightning/jit_private.h: Add proper check for
Linux __powerpc__ and an data definition for an workaround
to properly handle code that starts with a jump to a "main"
label.
* lib/jit_disasm.c: Add extra disassembler initialization
for __powerpc64__.
* lib/jit_ppc-cpu.c: Add extra macros and functions, and
correct/adapt previous ones to handle powerpc64.
* lib/jit_ppc-fpu.c: Adapt for 64 bit wordsize. Basically
add conversion from/to int32/int64 and proper handling of
load/store offsets too large for 32 bit.
* lib/jit_ppc.c: Add calls to 64 bit codes and adaptation
for the PowerPC 64 bit Linux ABI.
* lib/jit_arm.c, lib/jit_mips.c, lib/jit_sparc, lib/jit_x86.c,
lib/lightning.c: Correct off by one error when restarting jit
of a function due to finding too late that needs to spill/reload
some register. Problem was found by accident on a very special
condition during PowerPC 64 code adaptation.
* check/float.tst: Comment out the int to negative infinity
test in mips for the moment because not all Loongson agrees
on the result.
* lib/jit_disasm.c: Add a test instead of an assertion
when loading symbols for disassembly due to a failure with
a simple binutils build in Debian mipsel64.
* include/lightning/jit_private.h, lib/jit_arm-cpu.c,
lib/jit_arm.c, lib/jit_disasm.c, lib/jit_mips-cpu.c,
lib/jit_mips.c, lib/jit_note.c, lib/jit_ppc-cpu.c,
lib/jit_ppc.c, lib/jit_print.c, lib/jit_sparc-cpu.c,
lib/jit_sparc.c, lib/jit_x86-cpu.c, lib/jit_x86.c,
lib/lightning.c: Add an extra structure for data storage
during jit generation, and release it after generating
jit, to reduce a bit memory usage, and also to make it
easier to understand what data is available during
jit runtime.
* check/lightning.c: Remove state flag to work with partial
sparc port, by just disassembling if there was incomplete
code generation.
* jit_sparc-cpu.c: Correct wrong range check for immediate
integer constants (off by one bit shift).
Correct macro implementing equivalent "rd %y, rd" assembly.
Implement qmul* and qdiv*.
* jit_sparc.c: Update for qmul* and qdiv* and remove logic
to handle incomplete code generation during sparc port.
* check/float.tst: Add sparc to list of known NaN and +-Inf
to integer conversion.
* check/lightning.c: Define __sparc__ to preprocessor in
the sparc backend.
* include/lightning/jit_private.h: Correct wrong definition
of emit_stxi_d, that has lived for a long time, but would
cause problems whenever needing to spill/reload a float
register.
* include/lightning/jit_sparc.h: Can only use %g2,%g3,%g4
for scratch variables, as other "global" registers are
reserved for the system, e.g. libc.
Reorder float register naming to make it easier to
access odd float registers, so that generating code for
pusharg and getarg is easier for the IR.
* lib/jit_mips-cpu.c, lib/jit_ppc-cpu.c: Update to match
new code in jit_sparc-cpu.c. It must call jit_get_reg
with jit_class_nospill if using the register to move
an unconditional branch address to it, as the reload
will not happen (actually could happen in the delay
slot...)
* lib/jit_sparc-cpu.c: Correct wrong macro definition for
ldxr_s.
Properly implement div* and implement rem. Div* needs
to use the y register, and rem* needs to be synthesized.
Correct b?sub* macro definitions.
* lib/jit_sparc-fpu.c: Correct reversed float to/from double
conversion.
Correct wrong jit_get_reg call asking for a gpr and then
using the fpr with that number.
Correct wrong branch displacement computation for
conditional branches.
* lib/jit_sparc.c: Correct getarg_d and pushargi_d implementation.
Add rem* entries to the switch converting IR to machine code.
* lib/lightning.c: Correct a problem detected when adding
the jit_class_nospill flag to jit_get_reg, that was caused
when having a branch to an "epilog" node, what would cause
the code to think all registers in unknown state were live,
while in truth, all registers in unknown state in the
"just after return" point are actually dead.
* include/lightning/jit_sparc.h, lib/jit_sparc-cpu.c,
lib/jit_sparc-fpu.c, lib/jit_sparc.c: New files implementing
the basic framework of the sparc port.
* configure.ac, include/lightning.h, include/lightning/Makefile.am,
include/lightning/jit_private.h, lib/jit_disasm.c: Update
for the sparc port framework.
* lib/jit_mips.c: Correct reversed retr/reti logic.
* lib/jit_ppc.c: Correct misspelled __LITTLE_ENDIAN.
* lib/lightning.c: Always do byte hashing in hash_data, because
the logic to "compress" strings causes large pointers to not
be guaranteed aligned at 4 byte boundaries.
Update for the sparc port framework.
* lib/jit_arm.c: Correct jit_pushargi_f in the arm hardfp abi.
Most of the logic uses even numbered register numbers, so that
a float and a double can be used in the same register, but
the abi requires packing the float arguments, so jit_pushargi_f
needs to allocate a temporary register to modify only the
proper register argument (or be very smart to push two
immediate arguments if applicable).
* include/lightning.h, lib/lightning.c: Implement the new
jit_clear_state and jit_destroy_state calls. jit_clear_state
releases all memory not required during jit_execution; that
is, leaves only the mmap'ed data and code buffers allocated.
jit_destroy_state releases the mmap'ed buffers as well as
the jit_state_t object itself, that holds pointers to the
code and data buffers, as well as annotation pointers (for
disassembly or backtrace) in the data buffer.
* lib/jit_note.c: Correct invalid vector offset access.
* check/ccall.c, check/lightning.c, doc/ifib.c, doc/incr.c,
doc/printf.c, doc/rfib.c, doc/rpn.c: Use the new jit_clear_state
and jit_destroy_state calls, to demonstrate the new code to
release all jit memory.
* doc/body.texi: Add basic documentation and usage description
of jit_clear_state and jit_destroy_state.
* include/lightning/jit_private.h, lib/jit_note.c, lib/lightning.c:
Store all annotation information in the mmap'ed area reserved for
read only data. This adds code to not allocate memory for jit_note_t
objects, and to relocate jit_line_t objects and its contents after
calculating annotation information. The jit_line_t objects are
relocated because it is not possible to always calculate before
hand data layout because note information may be extended or
redundant entries removed, as well as allowed to be added in
non sequential order.
A bug was also corrected in _jit_set_note, that was causing it
to allocate new jit_line_t objects when not needed. It was still
working correctly, but allocating way more memory than required.
*include/lightning.h, lib/lightning.c: Add the new jit_live code
to explicitly mark a register as live. It is required to avoid
assuming functions always return a value in the gpr and fpr return
register, and to avoid the need of some very specialized codes
that vary too much from backend to backend, to instruct the
optimization code the return register is live.
* lib/jit_arm.c, lib/jit_mips.c, lib/jit_ppc.c, lib/jit_print.c,
lib/jit_x86.c: Update for the new jit_live code.
* check/ret.ok, check/ret.tst: New files implementing a simple
test case that would previously fail at least in ix86/x86_64.
* check/Makefile.am: Update for new "ret" test case.
2013-02-04 Paulo Andrade <pcpa@gnu.org>
* include/lightning.h, include/lightning/jit_private.h,
lib/jit_arm-cpu.c, lib/jit_arm.c, lib/jit_mips-cpu.c,
lib/jit_mips.c, lib/jit_ppc-cpu.c, lib/jit_ppc.c,
lib/jit_x86-cpu.c, lib/jit_x86.c, lib/lightning.c:
Implement the new qmul and qdiv instructions that return signed
and unsigned lo/hi multiplication result and div/rem division result.
These should be useful for jit translation of code that needs to
know if a multiplication overflows (no branch opcode added) or if
a division is exact (easy check if remainder is zero).
* check/lightning.c, lib/jit_print.c, check/Makefile.am,
check/all.tst: Update for the new qmul and qdiv instructions.
* check/qalu.inc, check/qalu_div.ok, check/qalu_div.tst,
check/qalu_mul.ok, check/qalu_mul.tst: New files implementing
simple test cases for qmul and qdiv.
* doc/body.texi: Correct "jmpi" description that incorrectly
told it was possible to pass any address as jump target. The
only way to do that is "movi+jmpr".
* check/Makefile.am: "make debug" target should pass only
the main test tool program as argument for running gdb
* configure.ac: Add the --enable-assertions options.
* doc/Makefile.am, doc/body.texi, doc/lightning.texi:
Major rewrite of the documentation to match the current
implementation.
* doc/version.texi: Automatic date update.
* doc/ifib.c, doc/incr.c, doc/printf.c, doc/rfib.c, doc/rpn.c:
Implementation of the documentation examples, that are also
compiled during a normal build.
* doc/p-lightning.texi, doc/porting.texi, doc/toc.texi,
doc/u-lightning.texi, doc/using.texi: These files were
renamed in the documentation rewrite, as the documentation
was significantly trimmed due to full removal of the porting
chapters. Better porting documentation should be added but
for the moment it was just removed the documentation not
matching the implementation.
* check/3to2.tst, check/add.tst, check/allocai.tst, check/bp.tst,
check/call.tst, check/ccall.c, check/clobber.tst, check/divi.tst,
check/fib.tst, check/ldsti.tst, check/ldstr-c.tst, check/ldstr.tst,
check/ldstxi-c.tst, check/ldstxi.tst, check/ldstxr-c.tst,
check/ldstxr.tst, check/lightning.c, check/rpn.tst, check/stack.tst,
check/varargs.tst, include/lightning.h,
include/lightning/jit_private.h, lib/jit_arm.c, lib/jit_disasm.c,
lib/jit_mips.c, lib/jit_note.c, lib/jit_ppc.c, lib/jit_print.c,
lib/jit_x86.c, lib/lightning.c: Extend the "jit_note" abstraction
with the new "jit_name" call, that receives a string argument, and
should usually be called to mark boundaries of functions of code
generating jit (that is, it is not expected that the language
generating jit map its functions to jit functions).
* check/add.tst, check/allocai.tst, check/bp.tst, check/divi.tst,
check/fib.tst, check/lightning.c, include/lightning/jit_arm.h,
include/lightning/jit_mips.h, include/lightning/jit_ppc.h,
include/lightning/jit_private.h, include/lightning/jit_x86.h:
Make JIT_RET, JIT_FRET and JIT_SP private. These should not be
used in any operations due to frequently having special
constraints (usually JIT_FRET). JIT_FP must be made available
because it must be used as the base register to access stack
space allocated with jit_allocai.
2013-01-14 Paulo Andrade <pcpa@gnu.org>
* include/lightning.h, lib/lightning.c: Add an extra align
argument to the jit_data call (that should be made private),
so that it should not align strings at 8 bytes.
Correct the jit_note call to include the null ending byte
when adding label/note names to the "jit data section".
* lib/jit_note.c: New file implementing a simple string+integer
annotation, that should be used to map filename and line number
to offsets in the generated jit.
* include/lightning.h, lib/lightning.c: Update for the new
note code.
Add an extra mandatory argument to init_jit, that is used
as argument to bfd_openr.
Change from generic void* to char* the argument to jit_note
and add an extra integer argument, to map to filename and
line number.
* check/ccall.c, check/lightning.c, include/lightning/jit_private.h,
lib/jit_arm.c, lib/jit_disasm.c, lib/jit_mips.c, lib/jit_ppc.c,
lib/jit_print.c, lib/jit_x86.c: lib/Makefile.am: Update for the
new annotation code.
* configure.ac, check/Makefile.am: Update to work with latest
automake.
* check/cccall.c, check/ccall.ok: New test case to validate
interleaved calls from/to C code and jit.
* check/Makefile.am: Update for the new ccall test case.
* include/lightning.h, lib/lightning.c: Add the new jit_address
call that returns the real/final address of a "note" in the
generated jit. It requires a jit_node_t as returned by the
jit_note call, and is only valid after calling jit_emit.
Add an intermediate solution to properly handle arm
soft and softfp modes that move a double to an integer register
pair. Currently it just adds extra tests for the condition,
but the proper solution should be to have extra lightning
codes for these conditions, codes which should be only used
by the backends that need it, and merged with the existing
jit_pusharg*_{f,d}.
* include/lightning/jit_private.h: Add new jit_state_t flag
to know it finished jit_emit, so that calls to jit_address
are valid.
* lib/jit_mips.c: Correct abi implementation so that the
new ccall test case pass. Major problem was using
_jit->function.self.arg{i,f} as boolean values, but that
would cause lightning.c:patch_registers() to incorrectly
assume only one register was used as argument when calling
jit_regarg_p(); _jit->function.self.arg{i,f} must be the
number of registers used as arguments (in all backends).
* lib/jit_x86.c: Add workaround, by marking %rax as used,
to a special condition, when running out of registers and the
allocator trying to spill and reload %rax, but %rax was used
as a pointer to a function, what would cause the reload to
destroy the return value. This condition can be better
generalized, but the current solution is good enough.
* include/lightning/jit_ppc.h, lib/jit_ppc-cpu.c, lib/jit_ppc.c:
Rewrite logic to handle arguments, as the original code was
written based on a SysV pdf about the generic powerpc ABI,
what did "invent" a new abi for the previous test cases, but
failed in the new ccall test in Darwin PPC. Now it properly
handles 13 float registers for arguments, as well as proper
computation of stack offsets when running out of registers
for arguments.