diff --git a/doc/ref/compiler.texi b/doc/ref/compiler.texi index 3ec7cfd5d..7e1f29fcb 100644 --- a/doc/ref/compiler.texi +++ b/doc/ref/compiler.texi @@ -1372,20 +1372,14 @@ company, and in a good position. Guile's compiler needs your help. There are many possible avenues for improving Guile's compiler. Probably the most important improvement, speed-wise, will be some form -of native compilation, both just-in-time and ahead-of-time. This could -be done in many ways. Probably the easiest strategy would be to extend -the compiled procedure structure to include a pointer to a native code -vector, and compile from bytecode to native code at run-time after a -procedure is called a certain number of times. - -The name of the game is a profiling-based harvest of the low-hanging -fruit, running programs of interest under a system-level profiler and -determining which improvements would give the most bang for the buck. -It's really getting to the point though that native compilation is the -next step. +of optimized ahead-of-time native compilation with global register +allocation. A first pass could simply extend the compiler to also emit +machine code in addition to bytecode, pre-filling the corresponding JIT +data structures referenced by the @code{instrument-entry} bytecodes. +@xref{Instrumentation Instructions}. The compiler also needs help at the top end, enhancing the Scheme that -it knows to also understand R6RS, and adding new high-level compilers. +it knows to also understand R7RS, and adding new high-level compilers. We have JavaScript and Emacs Lisp mostly complete, but they could use some love; Lua would be nice as well, but whatever language it is that strikes your fancy would be welcome too. diff --git a/doc/ref/vm.texi b/doc/ref/vm.texi index 3808ed2a5..66fda17bf 100644 --- a/doc/ref/vm.texi +++ b/doc/ref/vm.texi @@ -42,6 +42,7 @@ programs to Guile's VM. * VM Programs:: * Object File Format:: * Instruction Set:: +* Just-In-Time Native Code:: @end menu @node Why a VM? @@ -1862,3 +1863,99 @@ The @var{idx} value should be an unboxed unsigned 64-bit integer. The @var{val} values are all unboxed, either as signed 64-bit integers, unsigned 64-bit integers, or IEEE double floating point numbers. @end deftypefn + +@node Just-In-Time Native Code +@subsection Just-In-Time Native Code + +@cindex just-in-time compiler +@cindex jit compiler +@cindex template jit +@cindex compiler, just-in-time +The final piece of Guile's virtual machine is a just-in-time (JIT) +compiler from bytecode instructions to native code. It is faster to run +a function when its bytecode instructions are compiled to native code, +compared to having the VM interpret the instructions. + +The JIT compiler runs automatically, triggered by counters associated +with each function. The counter increments when functions are called +and during each loop iteration. Once a function's counter passes a +certain value, the function gets JIT-compiled. @xref{Instrumentation +Instructions}, for full details. + +Guile's JIT compiler is what is known as a @dfn{template JIT}. This +kind of JIT is very simple: for each instruction in a function, the JIT +compiler will emit a generic sequence of machine code corresponding to +the instruction kind, specializing that generic template to reference +the specific operands of the instruction being compiled. + +The strength of a template JIT is principally that is that it is very +fast at emitting code. It doesn't need to do any time-consuming +analysis on the bytecode that it is compiling to do its job. + +A template JIT is also very predictable: the native code emitted by a +template JIT has the same performance characteristics of the +corresponding bytecode, only that it runs faster. In theory you could +even generate the template-JIT machine code ahead of time, as it doesn't +depend on any value seen at run-time. + +This predictability makes it possible to reason about the performance of +a system in terms of bytecode, knowing that the conclusions apply to +native code emitted by a template JIT. + +Because the machine code corresponding to an instruction always performs +the same tasks that the interpreter would do for that instruction, +bytecode and a template JIT also allows Guile programmers to debug their +programs in terms of the bytecode model. When a Guile programmer sets a +breakpoint, Guile will disable the JIT for the thread being debugged, +falling back to the interpreter (which has the corresponding code to run +the hooks). @xref{VM Hooks}. + +Guile uses the GNU Lightning library to emit native code. This allows +Guile's template JIT supports practically all architectures, from +Itanium to MIPS. You can optimize a program on your x86-64 desktop and +you can know that the corresponding program on an AArch64 phone will +also get faster. + +The weaknesses of a template JIT are two-fold. Firstly, as a simple +back-end that has to run fast, a template JIT doesn't have time to do +analysis that could help it generate better code, notably global +register allocation and instruction selection. + +However this is a minor weakness compared to the inability to perform +significant, speculative program transformations. For example, Guile +could see that in an expression @code{(f x)}, that in practice @var{f} +always refers to the same function. An advanced JIT compiler would +speculatively inline @var{f} into the call-site, along with a dynamic +check to make sure that the assertion still held. But as a template JIT +doesn't pay attention to values only known at run-time, it can't make +this transformation. + +This limitation is mitigated in part by Guile's robust ahead-of-time +compiler which can already perform significant optimizations when it can +prove they will always be valid, and its low-level bytecode which is +able to represent the effect of those optimizations (e.g. elided +type-checks). @xref{Compiling to the Virtual Machine}, for more on +Guile's compiler. + +An ahead-of-time Scheme-to-bytecode strategy, complemented by a template +JIT, also particularly suits the somewhat static nature of Scheme. +Scheme programmers often write code in a way that makes the identity of +free variable references lexically apparent. For example, the @code{(f +x)} expression could appear within a @code{(let ((f (lambda (x) (1+ +x)))) ...)} expression, or we could see that @code{f} was imported from +a particular module where we know its binding. Ahead-of-time +compilation techniques can work well for a language like Scheme where +there is little polymorphism and much first-order programming. They do +not work so well for a language like JavaScript, which is highly mutable +at run-time and difficult to analyze due to method calls (which are +effectively higher-order calls). + +All that said, a template JIT works well for Guile at this point. It's +only a few thousand lines of maintainable code, it speeds up Scheme +programs, and it keeps the bulk of the Guile Scheme implementation +written in Scheme itself. The next step is probably to add +ahead-of-time native code emission to the back-end of the compiler +written in Scheme, to take advantage of the opportunity to do global +register allocation and instruction selection. Once this is working, it +can allow Guile to experiment with speculative optimizations in Scheme +as well. @xref{Extending the Compiler}, for more on future directions.