Fixes <http://bugs.gnu.org/11468>.
* libguile/ports.c (scm_conversion_strategy): Remove.
(default_conversion_strategy_var, sym_error, sym_substitute,
sym_escape): New variables.
(scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x):
Remove.
(scm_i_default_port_conversion_handler,
scm_i_set_default_port_conversion_handler): New functions.
(scm_port_conversion_strategy): Use
`scm_i_default_port_conversion_handler' when PORT is #f.
(scm_set_port_conversion_strategy_x): Use SYM_ERROR, SYM_SUBSTITUTE,
and SYM_ESCAPE. Use `scm_i_set_default_port_conversion_handler' when
PORT is #f.
(scm_init_ports): Initialize DEFAULT_CONVERSION_STRATEGY_VAR.
* libguile/ports.h: Update declarations accordingly.
* libguile/foreign.c: Change
`scm_i_get_conversion_strategy (SCM_BOOL_F)' to
`scm_i_default_port_conversion_handler ()'.
* libguile/strings.c: Likewise.
* test-suite/tests/ports.test ("%default-port-conversion-strategy"): New
test prefix.
* test-suite/tests/foreign.test ("pointer<->string")["%default-port-conversion-strategy
is error", "%default-port-conversion-strategy is soft"]: New tests.
* test-suite/test-suite/lib.scm (exception:encoding-error): Allow the
regexp to match `scm_to_stringn' error messages.
* doc/ref/api-io.texi (Ports): Document `%default-port-conversion-strategy'.
* libguile/strings.c (scm_to_utf8_stringn): Fix another new bug in this
recent comedy of errors: pass the size of the preallocated buffer to
u32_to_u8. Arrange to call 'scm_i_string_wide_chars' and
'scm_i_string_length' only once each. Rename local variables for
improved code clarity.
* test-suite/standalone/test-conversion.c (test_to_utf8_stringn): New
function to test scm_to_utf8_stringn.
* libguile/strings.c (u32_u8_length_in_bytes): Internal static function
renamed from u32_u8_strlen, whose name was potentially confusing. For
added safety, handle everything that can be encoded in the more
general UTF-8 encoding: up to six bytes for each code point, with code
points up to 2^31-1.
(scm_to_utf8_stringn): NUL-terminate only if (lenp == NULL).
If (lenp != NULL) return the length in bytes in *lenp.
* libguile/bytevectors.c (STRING_TO_UTF, scm_string_to_utf8)
(UTF_TO_STRING):
* libguile/ports.c (open_iconv_descriptors, close_iconv_descriptors):
* libguile/strings.c (scm_from_stringn, scm_to_stringn): Wrap operations
that acquire and destroy iconv contexts with a mutex. While iconv is
threadsafe, internally it uses a lock, and we need to make sure when
we fork() that no one has that lock -- so we surround it with another
one. Gross.
* libguile/async.c:
* libguile/deprecation.c:
* libguile/fluids.c:
* libguile/gc.c:
* libguile/instructions.c:
* libguile/ports.c:
* libguile/posix.c:
* libguile/strings.c:
* libguile/threads.c: Use the SCM_PTHREAD_ATFORK_LOCK_STATIC_MUTEX
mechanism to lock a number of static mutexen.
* libguile/strings.c (scm_from_utf8_stringn): Embarassingly, my
scm_from_utf8_stringn implementation was buggy for non-ascii
characters, since October (41d1d984). Fixed. Will be tested with the
next patch.
* libguile/strings.c (scm_i_substring_copy): When asked to create an
empty substring, use 'scm_i_make_string' to make use of its
optimization for empty strings that reuses the global null_stringbuf.
* libguile/strings.c (scm_i_substring, scm_i_substring_read_only,
scm_i_substring_shared): When asked to create an empty substring,
return a freshly allocated null string. Previously, an empty
substring needlessly held a reference to the original stringbuf.
* libguile/strings.c (scm_from_stringn): Avoid calling
`u32_conv_from_encoding' on the null string, by using the same
fast-path code used if (encoding == NULL). This is an optimization,
and also avoids any possible encoding errors.
* libguile/strings.c (scm_from_stringn): Always return a freshly
allocated string from scm_from_stringn, even when asked to construct
the null string, in accordance with the R5RS. Previously, we
optimized the null string case by returning a reference to a global
null string object (scm_nullstr).
* libguile/strings.c (scm_i_is_narrow_string, scm_i_try_narrow_string,
scm_i_string_set_x): Check to see if the provided string is a
mutation-sharing substring, and do the right thing in that case.
Previously, if such a string was passed to these functions, they would
behave very badly: while trying to fetch and/or mutate the cell
containing the stringbuf, they were actually fetching or mutating the
cell containing the original shared string. That's because
mutation-sharing substrings store the original string in CELL_1,
whereas all other strings store the stringbuf there.
* libguile/strings.c (scm_init_strings): Make scm_nullstr mutable. It
is still usable as a common object, because of course it contains no
characters to mutate anyway. It is returned by several procedures
that are specified to return mutable strings, and string mutators
raise errors when passed an immutable string, even if it is the null
string.
* libguile/strings.c (decoding_error): Factor out of scm_from_stringn,
properly handling errno.
(scm_from_stringn): Adapt.
(scm_from_utf8_stringn): Inline the conversion here, to avoid going
through iconv.
* libguile/tags.h (SCM_UNPACK_POINTER, SCM_PACK_POINTER): New macros.
The old SCM2PTR and PTR2SCM were defined in such a way that
round-tripping through a pointer could lose precision, even in the
case in which you weren't interested in actually dereferencing the
pointer, it was simply that you needed to plumb a SCM through APIs
that take pointers. These new macros are more like SCM_PACK and
SCM_UNPACK, but for pointer types. The bit representation of the
pointer should be the same as the scm_t_bits representation.
* libguile/gc.h (PTR2SCM, SCM2PTR): Remove support for (old) UNICOS
pointers. We are going to try tagging the SCM object itself in the
future, and I don't think that keeping this support is worth its
cost. It probably doesn't work anyway.
* libguile/backtrace.c:
* libguile/bytevectors.c:
* libguile/continuations.c:
* libguile/fluids.c:
* libguile/foreign.c:
* libguile/gc.h:
* libguile/guardians.c:
* libguile/hashtab.c:
* libguile/load.c:
* libguile/numbers.c:
* libguile/ports.c:
* libguile/smob.c:
* libguile/strings.c:
* libguile/symbols.c:
* libguile/vm.c:
* libguile/weak-set.c:
* libguile/weak-table.c:
* libguile/weak-vector.c: Update many sites to use the new macros.
This was a pretty big merge involving a fair amount of porting,
especially to peval and its tests. I did not update psyntax-pp.scm,
that comes in the next commit.
Conflicts:
module/ice-9/boot-9.scm
module/ice-9/psyntax-pp.scm
module/language/ecmascript/compile-tree-il.scm
module/language/tree-il.scm
module/language/tree-il/analyze.scm
module/language/tree-il/inline.scm
test-suite/tests/tree-il.test
* libguile/strings.c (scm_to_latin1_stringn): Fix for substrings.
* test-suite/standalone/Makefile.am:
* test-suite/standalone/test-scm-to-latin1-string.c: Add test case.
Thanks to David Hansen for the bug report and test case, and Stefan
Israelsson Tampe for the fix.
* libguile/bytevectors.h:
* libguile/bytevectors.c (scm_c_take_gc_bytevector): Rename this
internal function, from scm_c_take_bytevector. This indicates that
unlike the other scm_take_* functions, this one takes GC-managed
memory.
* libguile/objcodes.c (scm_objcode_to_bytecode):
* libguile/vm.c (really_make_boot_program): Use
scm_gc_malloc_pointerless, not scm_malloc. Thanks to Stefan
Israelsson Tampe!
* libguile/r6rs-ports.c:
* libguile/strings.c: Adapt to renames.
* libguile/strings.c (scm_i_allocate_string_pointers): Encode strings
using the current locale. Previously, Latin-1 was used. Indirectly,
this affects the encoding of strings in `system*', `execl', `execlp',
`execle', `environ', and `dynamic-args-call'.
(scm_makfromstrs): In header comment, clarify that the C strings are
interpreted according to the current locale encoding.
* NEWS: Add NEWS entry.
* libguile/inline.h:
* libguile/deprecated.h:
* libguile/deprecated.c (scm_immutable_cell, scm_immutable_double_cell):
Deprecate these, as the GC_STUBBORN API doesn't do anything any more.
* libguile/strings.c (scm_i_c_make_symbol): Change the one use of
scm_immutable_double_cell to scm_double_cell.
* libguile/async.c:
* libguile/async.h:
* libguile/debug.h:
* libguile/deprecated.c:
* libguile/deprecated.h:
* libguile/evalext.h:
* libguile/gc-malloc.c:
* libguile/gc.h:
* libguile/gen-scmconfig.c:
* libguile/numbers.c:
* libguile/ports.c:
* libguile/ports.h:
* libguile/procprop.c:
* libguile/procprop.h:
* libguile/read.c:
* libguile/socket.c:
* libguile/srfi-4.h:
* libguile/strings.c:
* libguile/strings.h:
* libguile/tags.h:
* module/ice-9/boot-9.scm:
* module/ice-9/deprecated.scm: Remove all deprecated code. CPP defines
that were not previously issuing warnings were changed so that their
expansions would indicate the replacement forms to use,
e.g. scm_sizet__GONE__REPLACE_WITH__size_t.
The two exceptions were SCM_LISTN, which did not produce warnings
before, and the string-filter argument order stuff.
Drops the initial dirty memory usage of Guile down to 2.8 MB on my
machine, from 4.4 MB.
* libguile/bytevectors.h (SCM_BYTEVECTOR_HEADER_SIZE): Bump, giving
bytevectors another word: a parent pointer. Will allow for
sub-bytevectors and efficient mmap bindings.
* libguile/bytevectors.c (make_bytevector):
(make_bytevector_from_buffer): Init parent to #f.
(scm_c_take_bytevector, scm_c_take_typed_bytevector): Another
argument, the parent, which gets set in the bytevector.
* libguile/foreign.c (scm_pointer_to_bytevector): Use the parent field
instead of registering a weak reference from bytevector to foreign
pointer.
* libguile/objcodes.c (scm_objcode_to_bytecode): Use the parent field to
avoid copying the objcode.
* libguile/srfi-4.c (DEFINE_SRFI_4_C_FUNCS):
* libguile/strings.c (scm_from_stringn):
* libguile/vm.c (really_make_boot_program):
* libguile/r6rs-ports.c (scm_get_bytevector_some)
(scm_get_bytevector_all, bytevector_output_port_procedure): Set the
parent to #f.
* libguile/strings.c (scm_encoding_error, scm_decoding_error): Use
scm_from_latin1_string for the subr and message args, as these are
internal functions, and we know their callers.
* libguile/strings.c (scm_to_locale_stringn, scm_from_locale_stringn):
Use the encoding of the current locale, not of the current i/o ports.
Also use the current conversion strategy.
* doc/ref/api-data.texi (Conversion to/from C): Update docs.
* libguile/ports.c (scm_read_char): Mention `decoding-error' in the
docstring.
(get_codepoint): Change to return an error code; add `codepoint'
output parameter. Don't raise an error from here.
(scm_getc): Raise an error with `scm_decoding_error' if
`get_codepoint' returns an error.
(scm_peek_char): Likewise. Update docstring.
* libguile/strings.c (scm_decoding_error_key): New variable.
(scm_decoding_error): New function.
(scm_from_stringn): Use `scm_decoding_error' instead of
`scm_encoding_error'.
* libguile/strings.h (scm_decoding_error): New declaration.
* test-suite/tests/ports.test ("string ports")["read-char, wrong
encoding, error"]: Change to expect `decoding-error'. Make sure PORT
points past the error.
["read-char, wrong encoding, escape"]: Likewise.
["peek-char, wrong encoding, error"]: New test.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with wrong-encoding string port"]: Change to
expect `decoding-error'.
("8.2.6 Input and output ports")["transcoded-port [error handling
mode = raise]"]: Likewise.
* test-suite/tests/rdelim.test ("read-line")["decoding error", "decoding
error, substitute"]: New tests.
* doc/ref/api-io.texi (Reading): Update documentation of `read-char' and
`peek-char'.
(Line/Delimited): Update documentation of `read-line'.
* libguile/strings.c (scm_from_latin1_stringn): Directly return a narrow
string instead of going through `scm_from_stringn'.
(scm_to_latin1_stringn): Directly return a copy of STR's raw bytes when
it's narrow.