* libguile/fluids.c (grow_dynamic_state, new_fluid): Arrange for the
default value in the dynamic-state vector to be SCM_UNDEFINED instead
of SCM_BOOL_F. If the value in the dynamic-state is #f, default to a
value attached to the fluid instead. This allows useful default
values.
(scm_make_fluid_with_default): New function, allows the user to
specify a default value for the fluid. Defaults to #f. Bound to
`make-fluid' on the Scheme side.
(scm_make_unbound_fluid): Use SCM_UNDEFINED as the default in all
threads.
(scm_fluid_unset_x): Also unset the default value. Not sure if this
is the right thing.
(fluid_ref): Update to the new default-value strategy.
* libguile/threads.c (scm_i_reset_fluid): Reset to SCM_UNDEFINED.
* libguile/threads.h: Remove extra arg to scm_i_reset_fluid.
* libguile/vm-i-system.c (fluid-ref): Update to new default-value
strategy.
* module/ice-9/vlist.scm (block-growth-factor): Default to 2 in all
threads. Fixes http://debbugs.gnu.org/10093.
* libguile/threads.h:
* libguile/threads.c (scm_i_reset_fluid): New internal function, resets
the binding of a fluid for all threads. Needed for fluid GC.
* libguile/fluids.c (new_fluid): Call scm_i_reset_fluid here.
* libguile/init.h:
* libguile/init.c (scm_i_init_guile): Change arg to this internal
function from SCM_STACKITEM* to void*. Actually it's a
struct GC_stack_base*.
* libguile/bdw-gc.h: Don't do pthread redirects, because we don't want
to affect applications' pthread_* bindings.
* libguile/pthread-threads.h (scm_i_pthread_create)
(scm_i_pthread_detach, scm_i_pthread_exit, scm_i_pthread_cancel)
(scm_i_pthread_sigmask): Do pthread redirects here, in this internal
header.
* libguile/threads.h: Remove declaration of internal
scm_i_with_guile_and_parent. Remove declaration of undefined
scm_threads_init_first_thread. Make declaration of internal
scm_threads_prehistory actually internal, and take a void* (actually a
struct GC_stack_base*).
* libguile/threads.c (GC_get_stack_base): Implement a shim if this
function is unavailable, and fold in the implementations of
get_thread_stack_base.
(GC_call_with_stack_base): Actually implement.
(guilify_self_1): Take a GC_stack_base* as an arg.
(scm_i_init_thread_for_guile): Likewise, and set up libgc for
registration of other threads.
(scm_init_guile): Use GC_get_stack_base instead of our own guesswork.
(with_guile_and_parent, scm_i_with_guile_and_parent): Rework to
trampoline through a GC_call_with_stack_base.
(scm_threads_prehistory): Pass the "base" arg on to guilify_self_1.
* libguile/threads.h: Always declare a scm_i_thread_key, for cleanup
purposes, in the BUILDING_LIBGUILE case.
* libguile/threads.c (scm_i_thread_key): Init with a cleanup handler, so
any guile-specific info for a thread can be cleaned up reliably.
(guilify_self_1): Always set the thread key.
(do_thread_exit_trampoline, on_thread_exit): Enter guile-mode for the
guile-mode cleanup handler, and trampoline through a
gc_call_with_stack_base for reasons explained in the code.
(init_thread_key, scm_i_init_thread_for_guile): Always init the key.
(scm_i_with_guile_and_parent): No need for pthread_cancel cleanup
handlers, as the pthread key destructor will take care of that for
us.
(really_launch): Remove needless pthread_exit call with incorrect
comment.
* configure.ac: Check for GC_call_with_gc_active.
* libguile/threads.h (scm_i_thread): Remove "top", as it's not used.
* libguile/threads.c (with_gc_inactive, with_gc_active): Define shims to
GC_do_blocking and GC_call_with_gc_active.
(scm_i_init_thread_for_guile): Don't do thread base adjustment here,
do it in scm_i_with_guile_and_parent. The previous logic would never
be run.
(scm_i_with_guile_and_parent): If we enter Guile mode, leave it too.
Take care of adjusting the thread stack base here too. Also, call
with_gc_active.
(scm_without_guile): Refactor.
* libguile/debug.h (scm_t_debug_frame): Remove this type, as it was
internal to the old evaluator.
(SCM_EVALFRAME, SCM_APPLYFRAME, SCM_VOIDFRAME, SCM_MACROEXPF)
(SCM_TAILREC, SCM_TRACED_FRAME, SCM_ARGS_READY, SCM_DOVERFLOW)
(SCM_MAX_FRAME_SIZE, SCM_FRAMETYPE)
(SCM_EVALFRAMEP, SCM_APPLYFRAMEP, SCM_VOIDFRAMEP, SCM_MACROEXPFP)
(SCM_TAILRECP, SCM_TRACED_FRAME_P, SCM_ARGS_READY_P, SCM_OVERFLOWP)
(SCM_SET_MACROEXP, SCM_SET_TAILREC, SCM_SET_TRACED_FRAME)
(SCM_SET_ARGSREADY, SCM_SET_OVERFLOW)
(SCM_CLEAR_MACROEXP, SCM_CLEAR_TRACED_FRAME, SCM_CLEAR_ARGSREADY):
Remove macro accessors to scm_t_debug_frame.
(SCM_DEBUGOBJP, SCM_DEBUGOBJ_FRAME, SCM_SET_DEBUGOBJ_FRAME):
(scm_debug_object_p, scm_make_debugobj): Remove debugobj accessors.
(scm_i_unmemoize_expr): Remove unused declaration.
* libguile/debug.c (scm_debug_options): No more max limit on frame
sizes.
(scm_start_stack): Just call out to scm_vm_call_with_new_stack.
(scm_debug_object_p, scm_make_debugobj, scm_init_debug): No more
debugobj smob type.
* libguile/deprecated.h:
* libguile/deprecated.c (scm_i_deprecated_last_debug_frame)
(scm_last_debug_frame): Remove deprecated debug-frame bits.
* libguile/stacks.c (scm_make_stack): Rework this function and its
dependents to only walk VM frames.
(scm_stack_id): Call out to the holder of the VM frame in question,
which should be a VM or a VM continuation, for the stack ID. Currently
this bit is stubbed out.
(scm_last_stack_frame): Removed. It seems this is mainly useful for a
debugger, and we need to rewrite the debugger to work on the Scheme
level.
* test-suite/tests/continuations.test ("continuations"): Remove test for
last-stack-frame.
* libguile/continuations.h (struct scm_t_contregs):
* libguile/continuations.c (scm_make_continuation):
(copy_stack_and_call, scm_i_with_continuation_barrier): No need to
save and restore debug frames.
* libguile/threads.h (scm_i_thread): Don't track debug frames.
(scm_i_last_debug_frame, scm_i_set_last_debug_frame): Remove macro
accessors.
* libguile/threads.c (guilify_self_1): Don't track debug frames.
* libguile/throw.c: No need to track debug frames in a jmpbuf.
* libguile/vm-engine.c (vm_engine, VM_PUSH_DEBUG_FRAMES): Don't push
debug frames.
* libguile/vm.h:
* libguile/vm.c (scm_vm_call_with_new_stack): New function. Currently
stubbed out though.
* libguile/__scm.h (scm_async_tick): New declaration.
(SCM_ASYNC_TICK)[!BUILDING_LIBGUILE]: Use `scm_async_tick ()'.
* libguile/async.c (scm_critical_section_start,
scm_critical_section_end, scm_async_tick): New functions.
* libguile/async.h (scm_i_critical_section_mutex): Made internal.
(scm_critical_section_start, scm_critical_section_end): New
declarations.
(SCM_CRITICAL_SECTION_START,
SCM_CRITICAL_SECTION_END)[!BUILDING_LIBGUILE]: Use the same-named
function (lower-case).
* libguile/stackchk.h (SCM_STACK_OVERFLOW_P): Conditionalize on
`BUILDING_LIBGUILE'.
* libguile/threads.h (SCM_I_CURRENT_THREAD, scm_i_dynwinds,
scm_i_set_dynwinds, scm_i_last_debug_frame,
scm_i_set_last_debug_frame): Conditionalize on `BUILDING_LIBGUILE'.
The crux of this problem was that the thread doing a throw, and so
checking scm_i_critical_section_level, was different from the thread
that was in a critical section.
* libguile/async.h (scm_i_critical_section_level): Removed, replaced
by per-thread critical_section_level.
(SCM_CRITICAL_SECTION_START, SCM_CRITICAL_SECTION_END): Use
per-thread critical_section_level.
* libguile/continuations.c (scm_dynthrow): Check per-thread
critical_section_level.
* libguile/threads.c (guilify_self_1): Init per-thread
critical_section_level.
(scm_i_critical_section_level): Removed.
* libguile/threads.h (scm_i_thread): New critical_section_level field.
* libguile/throw.c (scm_ithrow): Check per-thread critical_section_level.
Problem was that if an application includes both libguile.h and the
system's setjmp.h, and is compiled on IA64, it gets compile errors
because of jmp_buf, setjmp and longjmp being multiply defined.
* libguile/__scm.h (__ia64__): Define scm_i_jmp_buf, SCM_I_SETJMP and
SCM_I_LONGJMP instead of jmp_buf, setjmp and longjmp.
(all other platforms): Map scm_i_jmp_buf, SCM_I_SETJMP and
SCM_I_LONGJMP to jmp_buf, setjmp and longjmp.
* libguile/continuations.c (scm_make_continuation): Use `SCM_I_SETJMP'
instead of `setjmp'.
(copy_stack_and_call): Use `SCM_I_LONJMP' instead of `longjmp'.
(scm_ia64_longjmp): Use type `scm_i_jmp_buf' instead of `jmp_buf'.
* libguile/continuations.h (scm_t_contregs): Use type `scm_i_jmp_buf'
instead of `jmp_buf'.
* libguile/threads.c (suspend): Use `SCM_I_SETJMP' instead of
`setjmp'.
* libguile/threads.h (scm_i_thread): Use type `scm_i_jmp_buf' instead
of `jmp_buf'.
* libguile/throw.c (JBJMPBUF, make_jmpbuf, jmp_buf_and_retval): Use
type `scm_i_jmp_buf' instead of `jmp_buf'.
(scm_c_catch): Use `SCM_I_SETJMP' instead of `setjmp'.
(scm_ithrow): Use `SCM_I_LONGJMP' instead of `longjmp'.
Reported by Roland Haeder. The declaration and definition of
scm_pthread_cond_timedwait were using possibly different types for the
third arg.
* THANKS: Added Roland Haeder.
* libguile/threads.h (scm_pthread_cond_timedwait): Use scm_t_timespec
for third arg rather than struct timespec, for consistency with the
function implementation.
For each thread that goes into Guile mode, Guile pushes a cleanup
function, scm_leave_guile_cleanup, whose purpose is to execute
`scm_leave_guile ()' if the thread is terminated while in Guile mode.
The problem is that there are various places - like
scm_pthread_cond_wait, scm_without_guile and scm_std_select - where
the thread temporarily leaves Guile mode (which means unlocking the
heap mutex), and the cleanup function is still in place. Therefore if
the thread is terminated at these places, the cleanup function ends up
trying to unlock a mutex (the heap mutex) which isn't actually locked.
* libguile/threads.h (scm_i_thread): New heap_mutex_locked_by_self field.
* libguile/threads.c (scm_enter_guile): Set heap_mutex_locked_by_self.
(scm_leave_guile): Only unlock if heap_mutex_locked_by_self is 1.
(guilify_self_1): Initialize heap_mutex_locked_by_self.
(scm_i_thread_sleep_for_gc): Remove incorrect use of t->held_mutex
here.
* libguile/threads.h (held_mutex): New field.
* libguile/threads.c (enqueue, remqueue, dequeue): Use critical
section to protect access to the queue.
(guilify_self_1): Initialize held_mutex field.
(on_thread_exit): If held_mutex non-null, unlock it.
(fat_mutex_unlock, fat_cond_free, scm_make_condition_variable,
fat_cond_signal, fat_cond_broadcast): Delete now unnecessary uses
of c->lock.
(fat_mutex_unlock): Pass m->lock to block_self() instead of
c->lock; move scm_i_pthread_mutex_unlock(m->lock) call from before
block_self() to after.
(scm_pthread_cond_wait, scm_pthread_cond_timedwait,
scm_i_thread_sleep_for_gc): Set held_mutex before pthread call;
reset it afterwards.
I was seeing a hang in srfi-18.test, when running make check in master,
in the "exception handler installation is thread-safe" test. It wasn't
100% reproducible, so looked like a race.
The problem is that wait-condition-variable is not actually
atomic in the way that it is supposed to be. It unlocks the mutex,
then starts waiting on the cond var. So it is possible for another
thread to lock the same mutex, and signal the cond var, before the
wait-condition-variable thread starts waiting.
In order for wait-condition-variable to be atomic - e.g. in a race
where thread A holds (Scheme-level) mutex M, and calls
(wait-condition-variable C M), and thread B calls (begin (lock-mutex
M) (signal-condition-variable C)) - it needs to call pthread_cond_wait
with the same underlying mutex as is involved in the `lock-mutex'
call. In terms of the threads.c code, this means that it has to use
M->lock, not C->lock.
block_self() used its mutex arg for two purposes: for protecting
access and changes to the wait queue, and for the pthread_cond_wait
call. But it wouldn't work reliably to use M->lock to protect C's
wait queue, because in theory two threads can call
(wait-condition-variable C M1) and (wait-condition-variable C M2)
concurrently, with M1 and M2 different. So we either have to pass
both C->lock and M->lock into block_self(), or use some other mutex to
protect the wait queue. For this patch, I switched to using the
critical section mutex, because that is a global and so easily
available. (If that turns out to be a problem for performance, we
could make each queue structure have its own mutex, but there's no
reason to believe yet that it is a problem, because the critical
section mutex isn't used much overall.)
So then we call block_self() with M->lock, and move where M->lock is
unlocked to after the block_self() call, instead of before.
That solves the first hang, but introduces a new one, when a SRFI-18
thread is terminated (`thread-terminate!') between being launched
(`make-thread') and started (`thread-start!'). The problem now is
that pthread_cond_wait is a cancellation point (see man
pthread_cancel), so the pthread_cond_wait call is one of the few
places where a thread-terminate! call can take effect. If the thread
is cancelled at that point, M->lock ends up still being locked, and
then when do_thread_exit() tries to lock M->lock again, it hangs.
The fix for that is a new `held_mutex' field in scm_i_thread, which is
set to point to the mutex just before a pthread_cond_(timed)wait call,
and set to NULL again afterwards. If on_thread_exit() finds that
held_mutex is non-NULL, it unlocks that mutex.
A detail is that checking and unlocking held_mutex must be done before
on_thread_exit() calls scm_i_ensure_signal_delivery_thread(), because
the innards of scm_i_ensure_signal_delivery_thread() can do another
pthread_cond_wait() call and so overwrite held_mutex. But that's OK,
because it's fine for the mutex check and unlock to happen outside
Guile mode.
Lastly, C->lock is then not needed, so I've removed it.
* libguile/threads.h (held_mutex): New field.
* libguile/threads.c (enqueue, remqueue, dequeue): Use critical
section to protect access to the queue.
(guilify_self_1): Initialize held_mutex field.
(on_thread_exit): If held_mutex non-null, unlock it.
(fat_mutex_unlock, fat_cond_free, scm_make_condition_variable,
fat_cond_signal, fat_cond_broadcast): Delete now unnecessary uses
of c->lock.
(fat_mutex_unlock): Pass m->lock to block_self() instead of
c->lock; move scm_i_pthread_mutex_unlock(m->lock) call from before
block_self() to after.
(scm_pthread_cond_wait, scm_pthread_cond_timedwait,
scm_i_thread_sleep_for_gc): Set held_mutex before pthread call;
reset it afterwards.
I was seeing a hang in srfi-18.test, when running make check in master,
in the "exception handler installation is thread-safe" test. It wasn't
100% reproducible, so looked like a race.
The problem is that wait-condition-variable is not actually
atomic in the way that it is supposed to be. It unlocks the mutex,
then starts waiting on the cond var. So it is possible for another
thread to lock the same mutex, and signal the cond var, before the
wait-condition-variable thread starts waiting.
In order for wait-condition-variable to be atomic - e.g. in a race
where thread A holds (Scheme-level) mutex M, and calls
(wait-condition-variable C M), and thread B calls (begin (lock-mutex
M) (signal-condition-variable C)) - it needs to call pthread_cond_wait
with the same underlying mutex as is involved in the `lock-mutex'
call. In terms of the threads.c code, this means that it has to use
M->lock, not C->lock.
block_self() used its mutex arg for two purposes: for protecting
access and changes to the wait queue, and for the pthread_cond_wait
call. But it wouldn't work reliably to use M->lock to protect C's
wait queue, because in theory two threads can call
(wait-condition-variable C M1) and (wait-condition-variable C M2)
concurrently, with M1 and M2 different. So we either have to pass
both C->lock and M->lock into block_self(), or use some other mutex to
protect the wait queue. For this patch, I switched to using the
critical section mutex, because that is a global and so easily
available. (If that turns out to be a problem for performance, we
could make each queue structure have its own mutex, but there's no
reason to believe yet that it is a problem, because the critical
section mutex isn't used much overall.)
So then we call block_self() with M->lock, and move where M->lock is
unlocked to after the block_self() call, instead of before.
That solves the first hang, but introduces a new one, when a SRFI-18
thread is terminated (`thread-terminate!') between being launched
(`make-thread') and started (`thread-start!'). The problem now is
that pthread_cond_wait is a cancellation point (see man
pthread_cancel), so the pthread_cond_wait call is one of the few
places where a thread-terminate! call can take effect. If the thread
is cancelled at that point, M->lock ends up still being locked, and
then when do_thread_exit() tries to lock M->lock again, it hangs.
The fix for that is a new `held_mutex' field in scm_i_thread, which is
set to point to the mutex just before a pthread_cond_(timed)wait call,
and set to NULL again afterwards. If on_thread_exit() finds that
held_mutex is non-NULL, it unlocks that mutex.
A detail is that checking and unlocking held_mutex must be done before
on_thread_exit() calls scm_i_ensure_signal_delivery_thread(), because
the innards of scm_i_ensure_signal_delivery_thread() can do another
pthread_cond_wait() call and so overwrite held_mutex. But that's OK,
because it's fine for the mutex check and unlock to happen outside
Guile mode.
Lastly, C->lock is then not needed, so I've removed it.
* ice-9/Makefile.am: Don't compile popen.scm, its behaviour at runtime
is not consistent -- seems to miss some GC references? I suspect a bug
in the compiler. In any case without popen.scm being compiled,
continuations.test, r4rs.tes, and r5rs_pitfall.test do pass.
* libguile/threads.h (scm_i_thread):
* libguile/threads.c (thread_mark, guilify_self_2): Add a field for the
thread's vm. Previously I had this as a fluid, but it seems that newly
created threads share their fluid values from the creator thread; as
expected, I guess. In any case one VM should not be active in two
threads.
* libguile/vm.c (scm_the_vm): Change to access the thread-local vm,
instead of accessing a fluid.
(scm_the_vm_fluid): Removed.
* module/system/vm/vm.scm: Removed *the-vm*.
* libguile/gc.c (scm_gc): Don't use `scm_gc_running_p' as
an lvalue.
* libguile/gc.h (scm_gc_running_p): Define to 0.
* libguile/threads.h (scm_i_thread)[gc_running_p]: Remove.
Actually, threads would "go to sleep" either by blocking on a heap
allocation, or by noticing `scm_i_thread_go_to_sleep' is set when
running `SCM_TICK', which limits the applicability of this technique
(e.g., it was not appropriate for the shared string code).
* libguile/threads.c (scm_i_thread_go_to_sleep, scm_i_thread_put_to_sleep,
scm_i_thread_invalidate_freelists, scm_i_thread_wake_up,
scm_i_thread_sleep_for_gc): Remove.
* libguile/threads.h (scm_i_thread_go_to_sleep, scm_i_thread_put_to_sleep,
scm_i_thread_invalidate_freelists, scm_i_thread_wake_up,
scm_i_thread_sleep_for_gc): Remove declarations.
(SCM_THREAD_SWITCHING_CODE): Do nothing.
* Specific problems in IA64 make check
** test-unwind
Representation of the relevant dynamic context:
non-rewindable
catch frame make cont.
o----o-----a----------b-------------c
\
\ call cont.
o-----o-----------d
A continuation is captured at (c), with a non-rewindable frame in the
dynamic context at (b). If a rewind through that frame was attempted,
Guile would throw to the catch at (a). Then the context unwinds back
past (a), then winds forwards again, and the captured continuation is
called at (d).
We should end up at the catch at (a). On ia64, we get an "illegal
instruction".
The problem is that Guile does not restore the ia64 register backing
store (RBS) stack (which is saved off when the continuation is
captured) until all the unwinding and rewinding is done. Therefore,
when the rewind code (scm_i_dowinds) hits the non-rewindable frame at
(b), the RBS stack hasn't yet been restored. The throw finds the
jmp_buf (for the catch at (a)) correctly from the dynamic context, and
jumps back to (a), but the RBS stack is invalid, hence the illegal
instruction.
This could be fixed by restoring the RBS stack earlier, at the same
point (copy_stack) where the normal stack is restored. But that
causes a problem in the next test...
** continuations.test
The dynamic context diagram for this case is similar:
non-rewindable
catch frame make cont.
a----x-----o----------b-------------c
\
\ call cont.
o-------d
The only significant difference is that the catch point (a) is
upstream of where the dynamic context forks. This means that the RBS
stack at (d) already contains the correct RBS contents for throwing
back to (a), so it doesn't matter whether the RBS stack that was saved
off with the continuation gets restored.
This test passes with the Guile 1.8.4 code, but fails (with an
"illegal instruction") when the code is changed to restore the RBS
stack earlier as described above.
The problem now is that the RBS stack is being restored _too_ early;
specifically when there is still stuff to do that relies on the old
RBS contents. When a continuation is called, the sequence of relevant
events is:
(1) Grow the (normal) stack until it is bigger than the (normal)
stack saved off in the continuation. (scm_dynthrow, grow_stack)
(2) scm_i_dowinds calls itself recursively, such that
(2.1) for each rewind (from (x) to (c)) that will be needed,
another frame is added to the stack (both normal and RBS),
with local variables specifying the required rewind; the
rewinds don't actually happen yet, they will happen when
the stack unwinds again through these frames
(2.2) required unwinds - back from where the continuation was
called (d) to the fork point (x) - are done immediately.
(3) The normal (i.e. non-RBS) stack that was stored in the
continuation is restored (i.e. copied on top of the actual
stack).
Note that this doesn't overwrite the frames that were added in
(2.1), because the growth in (1) ensures that the added frames
are beyond the end of the restored stack.
(4) ? Restore the RBS stack here too ?
(5) Return (from copy_stack) through the (2.1) frames, which means
that the rewinds now happen.
(6) setcontext (or longjmp) to the context (c) where the
continuation was captured.
The trouble is that step (1) does not create space in the RBS stack in
the same kind of way that it does for the normal stack. Therefore, if
the saved (in the continuation) RBS stack is big enough, it can
overwrite the RBS of the (2.1) frames that still need to complete.
This causes an illegal instruction when we return through those frames
and try to perform the rewinds.
* Fix
The key to the fix is that the saved RBS stack only needs to be
restored at some point before the next setcontext call, and that doing
it as close to the setcontext call as possible will avoid bad
interactions with the pre-setcontext stack. Therefore we do the
restoration at the last possible point, immediately before the next
setcontext call.
The situation is complicated by there being two ways that the next
setcontext call can happen.
- If the unwinding and rewinding is all successful, the next
setcontext will be the one from step (6) above. This is the
"normal" continuation invocation case.
- If one of the rewinds throws an error, the next setcontext will
come from the throw implementation code. (And the one in step (6)
will never happen.) This is the rewind error case.
In the rewind error case, the code calling setcontext knows nothing
about the continuation. So to cover both cases, we:
- copy (in step (4) above) the address and length of the
continuation's saved RBS stack to the current thread state
(SCM_I_CURRENT_THREAD)
- modify all setcontext callers so that they check the current
thread state for a saved RBS stack, and restore it if so before
calling setcontext.
* Notes
** I think rewinders cannot rely on using any stack data
Unless it can be guaranteed that the data won't go into a register.
I'm not 100% sure about this, but I think it follows from the fact
that the RBS stack is not restored until after the rewinds have
happened.
Note that this isn't a regression caused by the current fix. In Guile
1.8.4, the RBS stack was restored _after_ the rewinds, and this is
still the case now.
** Most setcontext calls for `throw' don't need to change the RBS stack
In the absence of continuation invocation, the setcontext call in the
throw implementation code always sets context to a place higher up the
same stack (both normal and RBS), hence no stack restoration is
needed.
* Other changes
** Using setcontext for all non-local jumps (for __ia64__)
Along the way, I read a claim somewhere that setcontext was more
reliable than longjmp, in cases where the stack has been manipulated.
I don't now have any reason to believe this, but it seems reasonable
anyway to leave the __ia64__ code using getcontext/setcontext, instead
of setjmp/longjmp.
(I think the only possible argument against this would be performance -
if getcontext was significantly slower than setjmp. It that proves to
be the case, we should revisit this.)
** Capping RBS base for non-main threads
Somewhere else along the way, I hit a problem in GC, involving the RBS
stack of a non-main thread. The problem was, in
SCM_MARK_BACKING_STORE, that scm_ia64_register_backing_store_base was
returning a value that was massively greater than the value of
scm_ia64_ar_bsp, leading to a seg fault. This is because the
implementation of scm_ia64_register_backing_store_base is only valid
for the main thread. I couldn't find a neat way of getting the true
RBS base of a non-main thread, but one idea is simply to call
scm_ia64_ar_bsp when guilifying a thread, and use the value returned
as an upper bound for that thread's RBS base. (Note that the RBS
stack grows upwards.)
(Were it not for scm_init_guile, we could be much more definitive
about this. We could take the value of scm_ia64_ar_bsp as a
definitive base address for the part of the RBS stack that Guile cares
about. We could also then discard
scm_ia64_register_backing_store_base.)
scm_set_thread_cleanup_x, scm_thread_cleanup): Lock on thread-specific
admin mutex instead of `thread_admin_mutex'.
* threads.h (scm_i_thread)[admin_mutex]: New field.
* throw.c (make_jmpbuf): Don't enter critical section during thread
spawn -- there is a possibility of deadlock if other threads are
exiting.
(scm_i_thread): Removed unused signal_asyncs field.
(threads_mark): Do not mark it.
(guilify_self_1): Do not initialize it. Do initialize
continuation_root field.
(do_thread_exit): Do not remove thread from all_threads list.
(on_thread_exit): Do it here, after leaving guile mode.
(sleep_level): Removed.
(scm_i_thread_put_to_sleep): Leave thread_admin_mutex locked when
returning. Do not support recursive sleeps.
(scm_i_thread_wake_up): Expect thread_admin_mutex to be locked on
entry. Do not support recursive sleeps.
SCM_CRITICAL_SECTION_END): Moved here from threads.h since now
they also block/unblock execution of asyncs and call
scm_async_click which is declared in async.h but threads.h can not
include async.h since async.h already includes threads.h.
(scm_i_critical_section_level): New, for checking mistakes in the
use of the SCM_CRITICAL_SECTION_* macros.
(scm_i_critical_section_mutex): Make it a recursive mutex so that
critical sections can be nested.
* threads.h, threads.c (scm_frame_lock_mutex): New.
(scm_frame_critical_section): Take mutex as argument.
(framed_critical_section_mutex): New, used as default for above.
(scm_init_threads): Initialize it.
(scm_threads_prehistory): Do not initialize thread_admin_mutex and
scm_i_critical_section_mutex; both are initialized statically.