* libguile/strings.c (scm_string_append): Check for numerical overflow
while computing the length of the result. Double-check that we don't
overflow the result string, and that it is the correct length in the
end (in case another thread changed the list). When copying a narrow
string to a wide result, avoid calling 'scm_i_string_length' and
'scm_i_string_chars' on each character.
Fixes <http://bugs.gnu.org/11468>.
* libguile/ports.c (scm_conversion_strategy): Remove.
(default_conversion_strategy_var, sym_error, sym_substitute,
sym_escape): New variables.
(scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x):
Remove.
(scm_i_default_port_conversion_handler,
scm_i_set_default_port_conversion_handler): New functions.
(scm_port_conversion_strategy): Use
`scm_i_default_port_conversion_handler' when PORT is #f.
(scm_set_port_conversion_strategy_x): Use SYM_ERROR, SYM_SUBSTITUTE,
and SYM_ESCAPE. Use `scm_i_set_default_port_conversion_handler' when
PORT is #f.
(scm_init_ports): Initialize DEFAULT_CONVERSION_STRATEGY_VAR.
* libguile/ports.h: Update declarations accordingly.
* libguile/foreign.c: Change
`scm_i_get_conversion_strategy (SCM_BOOL_F)' to
`scm_i_default_port_conversion_handler ()'.
* libguile/strings.c: Likewise.
* test-suite/tests/ports.test ("%default-port-conversion-strategy"): New
test prefix.
* test-suite/tests/foreign.test ("pointer<->string")["%default-port-conversion-strategy
is error", "%default-port-conversion-strategy is soft"]: New tests.
* test-suite/test-suite/lib.scm (exception:encoding-error): Allow the
regexp to match `scm_to_stringn' error messages.
* doc/ref/api-io.texi (Ports): Document `%default-port-conversion-strategy'.
* libguile/strings.c (scm_to_utf8_stringn): Fix another new bug in this
recent comedy of errors: pass the size of the preallocated buffer to
u32_to_u8. Arrange to call 'scm_i_string_wide_chars' and
'scm_i_string_length' only once each. Rename local variables for
improved code clarity.
* test-suite/standalone/test-conversion.c (test_to_utf8_stringn): New
function to test scm_to_utf8_stringn.
* libguile/strings.c (u32_u8_length_in_bytes): Internal static function
renamed from u32_u8_strlen, whose name was potentially confusing. For
added safety, handle everything that can be encoded in the more
general UTF-8 encoding: up to six bytes for each code point, with code
points up to 2^31-1.
(scm_to_utf8_stringn): NUL-terminate only if (lenp == NULL).
If (lenp != NULL) return the length in bytes in *lenp.
* libguile/strings.c (scm_i_substring_copy): When asked to create an
empty substring, use 'scm_i_make_string' to make use of its
optimization for empty strings that reuses the global null_stringbuf.
* libguile/strings.c (scm_i_substring, scm_i_substring_read_only,
scm_i_substring_shared): When asked to create an empty substring,
return a freshly allocated null string. Previously, an empty
substring needlessly held a reference to the original stringbuf.
* libguile/strings.c (scm_from_stringn): Avoid calling
`u32_conv_from_encoding' on the null string, by using the same
fast-path code used if (encoding == NULL). This is an optimization,
and also avoids any possible encoding errors.
* libguile/strings.c (scm_from_stringn): Always return a freshly
allocated string from scm_from_stringn, even when asked to construct
the null string, in accordance with the R5RS. Previously, we
optimized the null string case by returning a reference to a global
null string object (scm_nullstr).
* libguile/strings.c (scm_i_is_narrow_string, scm_i_try_narrow_string,
scm_i_string_set_x): Check to see if the provided string is a
mutation-sharing substring, and do the right thing in that case.
Previously, if such a string was passed to these functions, they would
behave very badly: while trying to fetch and/or mutate the cell
containing the stringbuf, they were actually fetching or mutating the
cell containing the original shared string. That's because
mutation-sharing substrings store the original string in CELL_1,
whereas all other strings store the stringbuf there.
* libguile/strings.c (scm_init_strings): Make scm_nullstr mutable. It
is still usable as a common object, because of course it contains no
characters to mutate anyway. It is returned by several procedures
that are specified to return mutable strings, and string mutators
raise errors when passed an immutable string, even if it is the null
string.
* libguile/strings.c (scm_to_latin1_stringn): Fix for substrings.
* test-suite/standalone/Makefile.am:
* test-suite/standalone/test-scm-to-latin1-string.c: Add test case.
Thanks to David Hansen for the bug report and test case, and Stefan
Israelsson Tampe for the fix.
* libguile/bytevectors.h:
* libguile/bytevectors.c (scm_c_take_gc_bytevector): Rename this
internal function, from scm_c_take_bytevector. This indicates that
unlike the other scm_take_* functions, this one takes GC-managed
memory.
* libguile/objcodes.c (scm_objcode_to_bytecode):
* libguile/vm.c (really_make_boot_program): Use
scm_gc_malloc_pointerless, not scm_malloc. Thanks to Stefan
Israelsson Tampe!
* libguile/r6rs-ports.c:
* libguile/strings.c: Adapt to renames.
* libguile/strings.c (scm_i_allocate_string_pointers): Encode strings
using the current locale. Previously, Latin-1 was used. Indirectly,
this affects the encoding of strings in `system*', `execl', `execlp',
`execle', `environ', and `dynamic-args-call'.
(scm_makfromstrs): In header comment, clarify that the C strings are
interpreted according to the current locale encoding.
* NEWS: Add NEWS entry.
* libguile/inline.h:
* libguile/deprecated.h:
* libguile/deprecated.c (scm_immutable_cell, scm_immutable_double_cell):
Deprecate these, as the GC_STUBBORN API doesn't do anything any more.
* libguile/strings.c (scm_i_c_make_symbol): Change the one use of
scm_immutable_double_cell to scm_double_cell.
* libguile/strings.c (scm_encoding_error, scm_decoding_error): Use
scm_from_latin1_string for the subr and message args, as these are
internal functions, and we know their callers.
* libguile/strings.c (scm_to_locale_stringn, scm_from_locale_stringn):
Use the encoding of the current locale, not of the current i/o ports.
Also use the current conversion strategy.
* doc/ref/api-data.texi (Conversion to/from C): Update docs.
* libguile/ports.c (scm_read_char): Mention `decoding-error' in the
docstring.
(get_codepoint): Change to return an error code; add `codepoint'
output parameter. Don't raise an error from here.
(scm_getc): Raise an error with `scm_decoding_error' if
`get_codepoint' returns an error.
(scm_peek_char): Likewise. Update docstring.
* libguile/strings.c (scm_decoding_error_key): New variable.
(scm_decoding_error): New function.
(scm_from_stringn): Use `scm_decoding_error' instead of
`scm_encoding_error'.
* libguile/strings.h (scm_decoding_error): New declaration.
* test-suite/tests/ports.test ("string ports")["read-char, wrong
encoding, error"]: Change to expect `decoding-error'. Make sure PORT
points past the error.
["read-char, wrong encoding, escape"]: Likewise.
["peek-char, wrong encoding, error"]: New test.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with wrong-encoding string port"]: Change to
expect `decoding-error'.
("8.2.6 Input and output ports")["transcoded-port [error handling
mode = raise]"]: Likewise.
* test-suite/tests/rdelim.test ("read-line")["decoding error", "decoding
error, substitute"]: New tests.
* doc/ref/api-io.texi (Reading): Update documentation of `read-char' and
`peek-char'.
(Line/Delimited): Update documentation of `read-line'.
* libguile/strings.c (scm_from_latin1_stringn): Directly return a narrow
string instead of going through `scm_from_stringn'.
(scm_to_latin1_stringn): Directly return a copy of STR's raw bytes when
it's narrow.
* libguile/bytevectors.c:
* libguile/eval.c:
* libguile/goops.c:
* libguile/i18n.c:
* libguile/load.c:
* libguile/memoize.c:
* libguile/modules.c:
* libguile/ports.c:
* libguile/print.c:
* libguile/procs.c:
* libguile/programs.c:
* libguile/read.c:
* libguile/script.c:
* libguile/srfi-14.c:
* libguile/stacks.c:
* libguile/strings.c:
* libguile/throw.c:
* libguile/vm.c: Use scm_from_latin1_symboln to make symbols from string
literals, because they aren't in the user's locale -- they are in
ASCII, and we can optimize this case.
* libguile/vm-i-loader.c: Also use scm_from_latin1_symboln when loading
narrow symbols.
* libguile/strings.h:
* libguile/strings.c (scm_from_latin1_string, scm_to_latin1_string): New
functions, in terms of the latin1_stringn variants.
(scm_from_utf8_string, scm_from_utf8_stringn)
(scm_to_utf8_string, scm_to_utf8_stringn): New functions.
(scm_i_from_utf8_string, scm_i_to_utf8_string): Removed these internal
functions.
(scm_from_stringn): Handle -1 as a length. Unlike the previous
behavior of scm_from_locale_string (NULL), which returned the empty
string, we now raise an error. The null pointer is not the same as
the empty string.
* libguile/stime.c (scm_strftime, scm_strptime): Adapt to publishing of
utf8 functions.
* libguile/gc-malloc.c: Add a note that the gc-malloc does not clear the
memory block, so users need to make sure it is initialized.
* libguile/bitvectors.c (scm_c_make_bitvector):
* libguile/bytevectors.c (scm_make_bytevector):
* libguile/strings.c (scm_c_make_string): If no initializer is given,
initialize the bytes to 0. Prevents information leakage if an app uses
make-string et al without initializers.
* libguile/foreign.c (make_cif): Initialize this too, to prevent leakage
in the struct holes. Paranoia...
Reported by Mike Gran <spk121@yahoo.com>.
* libguile/strings.c (scm_i_unistring_escapes_to_guile_escapes,
scm_i_unistring_escapes_to_r6rs_escapes): Augment comments.
(scm_to_stringn): When `handler ==
SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE && SCM_R6RS_ESCAPES_P', realloc
BUF so that it's large enough for the worst case.
* libguile/print.c (display_character): When `result != NULL && strategy
== SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE && SCM_R6RS_ESCAPES_P', make
LOCALE_ENCODED large enough to hold an R6RS escape.
* doc/ref/api-data.texi: document scm_to_stringn, scm_from_stringn,
scm_to_latin1_stringn, and scm_from_latin1_stringn
* libguile/strings.h (scm_to_stringn): make public
(scm_to_latin1_stringn): new declaration
(scm_from_latin1_stringn): new declaration
* libguile/strings.c (scm_to_latin1_stringn): new function
(scm_from_latin1_stringn): new function
* libguile/strings.c (STRINGBUF_CONTENTS): New macro.
(STRINGBUF_CHARS, STRINGBUF_WIDE_CHARS): Use it.
(scm_i_string_data): New function.
* libguile/strings.h (scm_i_string_data): New declaration.
* libguile/strings.c (scm_encoding_error): Change arguments to convey
more information. Raise the error with `scm_throw ()', passing all
the information to the handler.
(scm_from_stringn, scm_to_stringn): Update accordingly.
* test-suite/tests/ports.test ("string ports")["wrong encoding"]: Check
the arguments passed to the `throw' handler.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with wrong-encoding string port"]: Likewise.
scm_to_stringn failed to do the necessary escape conversion for
R6RS hex escapes
* libguile/strings.c (unistring_escapes_to_r6rs_escapes): new function
(scm_to_stringn): use new function when r6rs hex escapes are enabled
* test-suite/tests/reader.test: new test for string display
* libguile/strings.c (SCM_ARRAY_IMPLEMENTATION): The mask for the string
array implementation should be 0x7f, without masking out 0x2.
Otherwise numbers were being thought to be vectors!
* test-suite/tests/unif.test: Add test.
* libguile/vectors.c (SCM_ARRAY_IMPLEMENTATION): Only register one
implementation, because weak vectors can be checked with the mask &
~2, and the functions are the same.