* libguile/ports.h (scm_t_port): Represent the conversion strategy as a
symbol, to make things easier for Scheme. Rename to
"conversion_strategy".
(scm_c_make_port_with_encoding): Change to take encoding and
conversion_strategy arguments as symbols.
(scm_i_string_failed_conversion_handler): New internal helper, to turn
a symbol to a scm_t_string_failed_conversion_handler.
(scm_i_default_port_encoding): Return the default port encoding as a
symbol.
(scm_i_default_port_conversion_strategy)
(scm_i_set_default_port_conversion_strategy): Rename from
scm_i_default_port_conversion_handler et al. Take and return Scheme
symbols.
* libguile/foreign.c (scm_string_to_pointer, scm_pointer_to_string): Use
scm_i_default_string_failed_conversion_handler instead of
scm_i_default_port_conversion_handler.
* libguile/print.c (PORT_CONVERSION_HANDLER): Update definition.
(print_normal_symbol): Use PORT_CONVERSION_HANDLER.
* libguile/r6rs-ports.c (make_bytevector_input_port):
(make_custom_binary_input_port, make_bytevector_output_port): Adapt to
changes in scm_c_make_port_with_encoding.
* libguile/strings.h:
* libguile/strings.c (scm_i_default_string_failed_conversion_handler):
New helper.
(scm_from_locale_stringn, scm_from_port_stringn):
(scm_to_locale_stringn, scm_to_port_stringn): Adapt to interface
changes.
* libguile/strports.c (scm_mkstrport): Adapt to
scm_c_make_port_with_encoding change.
* libguile/ports.c (scm_c_make_port): Adapt to
scm_c_make_port_with_encoding change.
(ascii_toupper, encoding_matches, canonicalize_encoding): Move down in
the file.
(peek_codepoint, get_codepoint, scm_ungetc): Adapt to port conversion
strategy change. Remove duplicate case in get_codepoint.
(scm_init_ports): Move symbol initializations to the same place.
* libguile/ports-internal.h (scm_t_port_internal): Remove encoding_mode
member.
* libguile/ports.h (scm_t_port): "encoding" member is now a SCM symbol.
* libguile/ports.c (scm_init_ports): Define symbols for the encodings
that we handle explicitly.
(encoding_matches): Adapt to check against an encoding as a symbol.
(canonicalize_encoding): Return an encoding as a symbol.
(scm_c_make_port_with_encoding, scm_i_set_default_port_encoding)
(decide_utf16_encoding, decide_utf32_encoding)
(scm_i_port_iconv_descriptors, scm_i_set_port_encoding_x)
(scm_port_encoding, peek_codepoint, scm_ungetc): Adapt to encoding
change.
* libguile/print.c (display_string_using_iconv, display_string):
* libguile/read.c (scm_read_character):
* libguile/strings.c (scm_from_port_stringn, scm_to_port_stringn): Adapt
to port encoding change.
* libguile/strings.c (scm_from_utf8_stringn): Use 'u8_mbtoucr' and check
for a decoding error by its 'nbytes' return value. Previously we used
'u8_mbtouc' and improperly assumed that a U+FFFD character indicated a
decoding error.
* libguile/symbols.c (utf8_string_equals_wide_string): Likewise.
* test-suite/tests/bytevectors.test (exception:decoding-error): New
variable.
("2.9 Operations on Strings"): Add tests.
* libguile/array-handle.h (scm_t_vector_ref, scm_t_vector_set): Rename
from scm_t_array_ref, scm_t_array_set. These were named
scm_i_t_array_ref and scm_i_t_array_set in 1.8 and 2.0. Change to
take the vector directly, instead of the array handle. In this way,
generic array handles are layered on top of specific implementations
of backing stores.
Remove scm_t_array_implementation, introduced in 2.0 but never
documented. It was a failed attempt to layer the array implementation
that actually introduced too many layers, as it prevented the "vref"
and "vset" members of scm_t_array_handle (called "ref" and "set" in
1.8, not present in 2.0) from specializing on array backing stores.
(scm_i_register_array_implementation) (scm_i_array_implementation_for_obj):
Remove these internal interfaces.
(scm_t_array_handle): Adapt to scm_t_vector_ref / scm_t_vector_set
change.
(scm_array_handle_ref, scm_array_handle_set): Adapt to change in
vref/vset prototype.
* libguile/array-handle.c (scm_array_get_handle): Inline all the
necessary initializations here for all specific array types.
* libguile/array-map.c (rafill, racp, ramap, rafe, array_index_map_1):
* libguile/arrays.c: Remove array implementation code.
* libguile/bitvectors.h:
* libguile/bitvectors.c: Remove array implementation code.
(scm_i_bitvector_bits): New internal interface.
* libguile/bytevectors.c: Remove array implementation code.
* libguile/srfi-4.h: Remove declarations for internal procedures that
don't exist (!).
* libguile/strings.c: Remove array implementation code.
* libguile/vectors.c: Remove array implementation code.
* libguile/strings.h:
* libguile/strings.c (scm_i_print_stringbuf):
* libguile/print.c (iprin1): Add a printer for stringbufs. The
disassembler can print a stringbuf.
* libguile/strings.c (scm_string_append): Check for numerical overflow
while computing the length of the result. Double-check that we don't
overflow the result string, and that it is the correct length in the
end (in case another thread changed the list). When copying a narrow
string to a wide result, avoid calling 'scm_i_string_length' and
'scm_i_string_chars' on each character.
* libguile/strings.c (scm_from_utf8_stringn):
* libguile/symbols.c (utf8_string_equals_wide_string): The "bad UTF8"
return from u8_mbtouc is a 0xfffd character, not a negative byte
length. Fixes a bug in which invalid UTF-8 would not be caught.
* libguile/bytevectors.c (scm_utf8_to_string): Use scm_from_utf8_stringn
directly. Just a little cleanup.
* test-suite/tests/iconv.test ("narrow non-ascii string"): Add test for
parsing bad utf-8 with substitution.
Moved scm_i_struct_hash from struct.c to hash.c and made it static.
The port's alist is now a field of 'scm_t_port'.
Conflicts:
libguile/arrays.c
libguile/hash.c
libguile/ports.c
libguile/print.h
libguile/read.c
Fixes <http://bugs.gnu.org/11468>.
* libguile/ports.c (scm_conversion_strategy): Remove.
(default_conversion_strategy_var, sym_error, sym_substitute,
sym_escape): New variables.
(scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x):
Remove.
(scm_i_default_port_conversion_handler,
scm_i_set_default_port_conversion_handler): New functions.
(scm_port_conversion_strategy): Use
`scm_i_default_port_conversion_handler' when PORT is #f.
(scm_set_port_conversion_strategy_x): Use SYM_ERROR, SYM_SUBSTITUTE,
and SYM_ESCAPE. Use `scm_i_set_default_port_conversion_handler' when
PORT is #f.
(scm_init_ports): Initialize DEFAULT_CONVERSION_STRATEGY_VAR.
* libguile/ports.h: Update declarations accordingly.
* libguile/foreign.c: Change
`scm_i_get_conversion_strategy (SCM_BOOL_F)' to
`scm_i_default_port_conversion_handler ()'.
* libguile/strings.c: Likewise.
* test-suite/tests/ports.test ("%default-port-conversion-strategy"): New
test prefix.
* test-suite/tests/foreign.test ("pointer<->string")["%default-port-conversion-strategy
is error", "%default-port-conversion-strategy is soft"]: New tests.
* test-suite/test-suite/lib.scm (exception:encoding-error): Allow the
regexp to match `scm_to_stringn' error messages.
* doc/ref/api-io.texi (Ports): Document `%default-port-conversion-strategy'.
* libguile/strings.c (scm_to_utf8_stringn): Fix another new bug in this
recent comedy of errors: pass the size of the preallocated buffer to
u32_to_u8. Arrange to call 'scm_i_string_wide_chars' and
'scm_i_string_length' only once each. Rename local variables for
improved code clarity.
* test-suite/standalone/test-conversion.c (test_to_utf8_stringn): New
function to test scm_to_utf8_stringn.
* libguile/strings.c (u32_u8_length_in_bytes): Internal static function
renamed from u32_u8_strlen, whose name was potentially confusing. For
added safety, handle everything that can be encoded in the more
general UTF-8 encoding: up to six bytes for each code point, with code
points up to 2^31-1.
(scm_to_utf8_stringn): NUL-terminate only if (lenp == NULL).
If (lenp != NULL) return the length in bytes in *lenp.
* libguile/bytevectors.c (STRING_TO_UTF, scm_string_to_utf8)
(UTF_TO_STRING):
* libguile/ports.c (open_iconv_descriptors, close_iconv_descriptors):
* libguile/strings.c (scm_from_stringn, scm_to_stringn): Wrap operations
that acquire and destroy iconv contexts with a mutex. While iconv is
threadsafe, internally it uses a lock, and we need to make sure when
we fork() that no one has that lock -- so we surround it with another
one. Gross.
* libguile/async.c:
* libguile/deprecation.c:
* libguile/fluids.c:
* libguile/gc.c:
* libguile/instructions.c:
* libguile/ports.c:
* libguile/posix.c:
* libguile/strings.c:
* libguile/threads.c: Use the SCM_PTHREAD_ATFORK_LOCK_STATIC_MUTEX
mechanism to lock a number of static mutexen.
* libguile/strings.c (scm_from_utf8_stringn): Embarassingly, my
scm_from_utf8_stringn implementation was buggy for non-ascii
characters, since October (41d1d984). Fixed. Will be tested with the
next patch.
* libguile/strings.c (scm_i_substring_copy): When asked to create an
empty substring, use 'scm_i_make_string' to make use of its
optimization for empty strings that reuses the global null_stringbuf.
* libguile/strings.c (scm_i_substring, scm_i_substring_read_only,
scm_i_substring_shared): When asked to create an empty substring,
return a freshly allocated null string. Previously, an empty
substring needlessly held a reference to the original stringbuf.
* libguile/strings.c (scm_from_stringn): Avoid calling
`u32_conv_from_encoding' on the null string, by using the same
fast-path code used if (encoding == NULL). This is an optimization,
and also avoids any possible encoding errors.
* libguile/strings.c (scm_from_stringn): Always return a freshly
allocated string from scm_from_stringn, even when asked to construct
the null string, in accordance with the R5RS. Previously, we
optimized the null string case by returning a reference to a global
null string object (scm_nullstr).
* libguile/strings.c (scm_i_is_narrow_string, scm_i_try_narrow_string,
scm_i_string_set_x): Check to see if the provided string is a
mutation-sharing substring, and do the right thing in that case.
Previously, if such a string was passed to these functions, they would
behave very badly: while trying to fetch and/or mutate the cell
containing the stringbuf, they were actually fetching or mutating the
cell containing the original shared string. That's because
mutation-sharing substrings store the original string in CELL_1,
whereas all other strings store the stringbuf there.
* libguile/strings.c (scm_init_strings): Make scm_nullstr mutable. It
is still usable as a common object, because of course it contains no
characters to mutate anyway. It is returned by several procedures
that are specified to return mutable strings, and string mutators
raise errors when passed an immutable string, even if it is the null
string.
* libguile/strings.c (decoding_error): Factor out of scm_from_stringn,
properly handling errno.
(scm_from_stringn): Adapt.
(scm_from_utf8_stringn): Inline the conversion here, to avoid going
through iconv.
* libguile/tags.h (SCM_UNPACK_POINTER, SCM_PACK_POINTER): New macros.
The old SCM2PTR and PTR2SCM were defined in such a way that
round-tripping through a pointer could lose precision, even in the
case in which you weren't interested in actually dereferencing the
pointer, it was simply that you needed to plumb a SCM through APIs
that take pointers. These new macros are more like SCM_PACK and
SCM_UNPACK, but for pointer types. The bit representation of the
pointer should be the same as the scm_t_bits representation.
* libguile/gc.h (PTR2SCM, SCM2PTR): Remove support for (old) UNICOS
pointers. We are going to try tagging the SCM object itself in the
future, and I don't think that keeping this support is worth its
cost. It probably doesn't work anyway.
* libguile/backtrace.c:
* libguile/bytevectors.c:
* libguile/continuations.c:
* libguile/fluids.c:
* libguile/foreign.c:
* libguile/gc.h:
* libguile/guardians.c:
* libguile/hashtab.c:
* libguile/load.c:
* libguile/numbers.c:
* libguile/ports.c:
* libguile/smob.c:
* libguile/strings.c:
* libguile/symbols.c:
* libguile/vm.c:
* libguile/weak-set.c:
* libguile/weak-table.c:
* libguile/weak-vector.c: Update many sites to use the new macros.