Previously they were unaligned, unlike their parent strings, and so
could end up with the wrong pointer tag. Observed on i686-linux-gnu,
where they ended up tagged as immediates (SCM_IMP()), causing failures
in TYP7 related checks.
* libguile/strings.h (SCM_IMMUTABLE_STRINGBUF): align resulting buffer
via SCM_ALIGNED(8).
As the FSF advises, 'There is no legal significance to using the
three-character sequence “(C)”, but it does no harm.' It does take up
space though! For that reason, we remove it here from our C files.
* libguile/array-handle.c (initialize_vector_handle): Add mutable_p
argument. Unless the vector handle is mutable, null out its
writable_elements member.
(scm_array_get_handle): Adapt to determine mutability of the various
arrays.
(scm_array_handle_elements, scm_array_handle_writable_elements):
Reverse the sense: instead of implementing read-only in terms of
read-write, go the other way around, adding an assertion in the
read-write case that the array handle is mutable.
* libguile/array-map.c (racp): Assert that the destination is mutable.
* libguile/bitvectors.c (SCM_F_BITVECTOR_IMMUTABLE, IS_BITVECTOR):
(IS_MUTABLE_BITVECTOR): Add a flag to indicate immutability.
(scm_i_bitvector_bits): Fix indentation.
(scm_i_is_mutable_bitvector): New helper.
(scm_array_handle_bit_elements)
((scm_array_handle_bit_writable_elements): Build writable_elements in
terms of elements.
(scm_bitvector_elements, scm_bitvector_writable_elements): Likewise.
(scm_c_bitvector_set_x): Require a mutable bitvector for the
fast-path.
(scm_bitvector_to_list, scm_bit_count): Use read-only elements()
function.
* libguile/bitvectors.h (scm_i_is_mutable_bitvector): New decl.
* libguile/bytevectors.c (INTEGER_ACCESSOR_PROLOGUE):
(INTEGER_GETTER_PROLOGUE, INTEGER_SETTER_PROLOGUE):
(INTEGER_REF, INTEGER_NATIVE_REF, INTEGER_SET, INTEGER_NATIVE_SET):
(GENERIC_INTEGER_ACCESSOR_PROLOGUE):
(GENERIC_INTEGER_GETTER_PROLOGUE, GENERIC_INTEGER_SETTER_PROLOGUE):
(LARGE_INTEGER_NATIVE_REF, LARGE_INTEGER_NATIVE_SET):
(IEEE754_GETTER_PROLOGUE, IEEE754_SETTER_PROLOGUE):
(IEEE754_REF, IEEE754_NATIVE_REF, IEEE754_SET, IEEE754_NATIVE_SET):
Setters require a mutable bytevector.
(SCM_BYTEVECTOR_SET_FLAG): New helper.
(SCM_BYTEVECTOR_SET_CONTIGUOUS_P, SCM_BYTEVECTOR_SET_ELEMENT_TYPE):
Remove helpers.
(SCM_VALIDATE_MUTABLE_BYTEVECTOR): New helper.
(make_bytevector, make_bytevector_from_buffer): Use
SCM_SET_BYTEVECTOR_FLAGS.
(scm_c_bytevector_set_x, scm_bytevector_fill_x)
(scm_bytevector_copy_x): Require a mutable bytevector.
* libguile/bytevectors.h (SCM_F_BYTEVECTOR_CONTIGUOUS)
(SCM_F_BYTEVECTOR_IMMUTABLE, SCM_MUTABLE_BYTEVECTOR_P): New
definitions.
* libguile/bytevectors.h (SCM_BYTEVECTOR_CONTIGUOUS_P): Just access one
bit.
* libguile/srfi-4.c (DEFINE_SRFI_4_C_FUNCS): Implement
writable_elements() in terms of elements().
* libguile/strings.c (scm_i_string_is_mutable): New helper.
* libguile/uniform.c (scm_array_handle_uniform_elements):
(scm_array_handle_uniform_writable_elements): Implement
writable_elements in terms of elements.
* libguile/vectors.c (SCM_VALIDATE_MUTABLE_VECTOR): New helper.
(scm_vector_elements, scm_vector_writable_elements): Implement
writable_elements in terms of elements.
(scm_c_vector_set_x): Require a mutable vector.
* libguile/vectors.h (SCM_F_VECTOR_IMMUTABLE, SCM_I_IS_MUTABLE_VECTOR):
New definitions.
* libguile/vm-engine.c (VM_VALIDATE_MUTABLE_BYTEVECTOR):
(VM_VALIDATE_MUTABLE_VECTOR, vector-set!, vector-set!/immediate)
(BV_BOUNDED_SET, BV_SET): Require mutable bytevector/vector.
* libguile/vm.c (vm_error_not_a_mutable_bytevector):
(vm_error_not_a_mutable_vector): New definitions.
* module/system/vm/assembler.scm (link-data): Mark residualized vectors,
bytevectors, and bitvectors as being read-only.
* libguile/snarf.h (SCM_IMMUTABLE_STRINGBUF): Remove shared flag.
Stringbufs are immutable by default.
* libguile/strings.c: Rewrite blurb. Change to have stringbufs be
immutable by default and mutable only when marked as such. Going
mutable means making a private copy.
(STRINGBUF_MUTABLE, STRINGBUF_F_MUTABLE): New definitions.
(SET_STRINGBUF_SHARED): Remove.
(scm_i_print_stringbuf): Simplify to just alias the stringbuf as-is.
(substring_with_immutable_stringbuf): New helper.
(scm_i_substring, scm_i_substring_read_only, scm_i_substring_copy):
use new helper.
(scm_i_string_ensure_mutable_x): New helper.
(scm_i_substring_shared): Use scm_i_string_ensure_mutable_x.
(stringbuf_write_mutex): Remove; yaaaaaaaay.
(scm_i_string_start_writing): Use scm_i_string_ensure_mutable_x. No
more mutex.
(scm_i_string_stop_writing): Now a no-op.
(scm_i_make_symbol): Use substring/copy.
(scm_sys_string_dump, scm_sys_symbol_dump): Update.
* libguile/strings.h (SCM_I_STRINGBUF_F_SHARED): Remove.
(SCM_I_STRINGBUF_F_MUTABLE): Add.
* module/system/vm/assembler.scm (link-data): Don't add shared flag any
more. Existing compiled flags are harmless tho.
* test-suite/tests/strings.test ("string internals"): Update.
* libguile/ports.h (scm_t_port): Represent the conversion strategy as a
symbol, to make things easier for Scheme. Rename to
"conversion_strategy".
(scm_c_make_port_with_encoding): Change to take encoding and
conversion_strategy arguments as symbols.
(scm_i_string_failed_conversion_handler): New internal helper, to turn
a symbol to a scm_t_string_failed_conversion_handler.
(scm_i_default_port_encoding): Return the default port encoding as a
symbol.
(scm_i_default_port_conversion_strategy)
(scm_i_set_default_port_conversion_strategy): Rename from
scm_i_default_port_conversion_handler et al. Take and return Scheme
symbols.
* libguile/foreign.c (scm_string_to_pointer, scm_pointer_to_string): Use
scm_i_default_string_failed_conversion_handler instead of
scm_i_default_port_conversion_handler.
* libguile/print.c (PORT_CONVERSION_HANDLER): Update definition.
(print_normal_symbol): Use PORT_CONVERSION_HANDLER.
* libguile/r6rs-ports.c (make_bytevector_input_port):
(make_custom_binary_input_port, make_bytevector_output_port): Adapt to
changes in scm_c_make_port_with_encoding.
* libguile/strings.h:
* libguile/strings.c (scm_i_default_string_failed_conversion_handler):
New helper.
(scm_from_locale_stringn, scm_from_port_stringn):
(scm_to_locale_stringn, scm_to_port_stringn): Adapt to interface
changes.
* libguile/strports.c (scm_mkstrport): Adapt to
scm_c_make_port_with_encoding change.
* libguile/ports.c (scm_c_make_port): Adapt to
scm_c_make_port_with_encoding change.
(ascii_toupper, encoding_matches, canonicalize_encoding): Move down in
the file.
(peek_codepoint, get_codepoint, scm_ungetc): Adapt to port conversion
strategy change. Remove duplicate case in get_codepoint.
(scm_init_ports): Move symbol initializations to the same place.
* libguile/strings.h:
* libguile/strings.c (scm_i_print_stringbuf):
* libguile/print.c (iprin1): Add a printer for stringbufs. The
disassembler can print a stringbuf.
* libguile/bytevectors.c (STRING_TO_UTF, scm_string_to_utf8)
(UTF_TO_STRING):
* libguile/ports.c (open_iconv_descriptors, close_iconv_descriptors):
* libguile/strings.c (scm_from_stringn, scm_to_stringn): Wrap operations
that acquire and destroy iconv contexts with a mutex. While iconv is
threadsafe, internally it uses a lock, and we need to make sure when
we fork() that no one has that lock -- so we surround it with another
one. Gross.
* libguile/async.c:
* libguile/async.h:
* libguile/debug.h:
* libguile/deprecated.c:
* libguile/deprecated.h:
* libguile/evalext.h:
* libguile/gc-malloc.c:
* libguile/gc.h:
* libguile/gen-scmconfig.c:
* libguile/numbers.c:
* libguile/ports.c:
* libguile/ports.h:
* libguile/procprop.c:
* libguile/procprop.h:
* libguile/read.c:
* libguile/socket.c:
* libguile/srfi-4.h:
* libguile/strings.c:
* libguile/strings.h:
* libguile/tags.h:
* module/ice-9/boot-9.scm:
* module/ice-9/deprecated.scm: Remove all deprecated code. CPP defines
that were not previously issuing warnings were changed so that their
expansions would indicate the replacement forms to use,
e.g. scm_sizet__GONE__REPLACE_WITH__size_t.
The two exceptions were SCM_LISTN, which did not produce warnings
before, and the string-filter argument order stuff.
Drops the initial dirty memory usage of Guile down to 2.8 MB on my
machine, from 4.4 MB.
* libguile/ports.c (scm_read_char): Mention `decoding-error' in the
docstring.
(get_codepoint): Change to return an error code; add `codepoint'
output parameter. Don't raise an error from here.
(scm_getc): Raise an error with `scm_decoding_error' if
`get_codepoint' returns an error.
(scm_peek_char): Likewise. Update docstring.
* libguile/strings.c (scm_decoding_error_key): New variable.
(scm_decoding_error): New function.
(scm_from_stringn): Use `scm_decoding_error' instead of
`scm_encoding_error'.
* libguile/strings.h (scm_decoding_error): New declaration.
* test-suite/tests/ports.test ("string ports")["read-char, wrong
encoding, error"]: Change to expect `decoding-error'. Make sure PORT
points past the error.
["read-char, wrong encoding, escape"]: Likewise.
["peek-char, wrong encoding, error"]: New test.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with wrong-encoding string port"]: Change to
expect `decoding-error'.
("8.2.6 Input and output ports")["transcoded-port [error handling
mode = raise]"]: Likewise.
* test-suite/tests/rdelim.test ("read-line")["decoding error", "decoding
error, substitute"]: New tests.
* doc/ref/api-io.texi (Reading): Update documentation of `read-char' and
`peek-char'.
(Line/Delimited): Update documentation of `read-line'.
* libguile/strings.h:
* libguile/strings.c (scm_from_latin1_string, scm_to_latin1_string): New
functions, in terms of the latin1_stringn variants.
(scm_from_utf8_string, scm_from_utf8_stringn)
(scm_to_utf8_string, scm_to_utf8_stringn): New functions.
(scm_i_from_utf8_string, scm_i_to_utf8_string): Removed these internal
functions.
(scm_from_stringn): Handle -1 as a length. Unlike the previous
behavior of scm_from_locale_string (NULL), which returned the empty
string, we now raise an error. The null pointer is not the same as
the empty string.
* libguile/stime.c (scm_strftime, scm_strptime): Adapt to publishing of
utf8 functions.
* doc/ref/api-data.texi: document scm_to_stringn, scm_from_stringn,
scm_to_latin1_stringn, and scm_from_latin1_stringn
* libguile/strings.h (scm_to_stringn): make public
(scm_to_latin1_stringn): new declaration
(scm_from_latin1_stringn): new declaration
* libguile/strings.c (scm_to_latin1_stringn): new function
(scm_from_latin1_stringn): new function
* libguile/strings.c (STRINGBUF_CONTENTS): New macro.
(STRINGBUF_CHARS, STRINGBUF_WIDE_CHARS): Use it.
(scm_i_string_data): New function.
* libguile/strings.h (scm_i_string_data): New declaration.
* libguile/root.h
* libguile/root.c (scm_sys_protects): It used to be that for some reason
we'd define a special array of "protected" values. This was a little
silly, always, but with the BDW GC it's completely unnecessary. Also
many of these variables were unused, and none of them were good API.
So remove this array, and either eliminate, make static, or make
internal the various values.
* libguile/snarf.h: No need to generate calls to scm_permanent_object.
* guile-readline/readline.c (scm_init_readline): No need to call
scm_permanent_object.
* libguile/array-map.c (ramap, rafe): Remove the dubious nullvect
optimizations.
* libguile/async.c (scm_init_async): No need to init scm_asyncs, it is
no more.
* libguile/eval.c (scm_init_eval): No need to init scm_listofnull, it is
no more.
* libguile/gc.c: Make scm_protects a static var.
(scm_storage_prehistory): Change the sanity check to use the address
of protects.
(scm_init_gc_protect_object): No need to clear the scm_sys_protects,
as it is no more.
* libguile/keywords.c: Make the keyword obarray a static var.
* libguile/numbers.c: Make flo0 a static var.
* libguile/objprop.c: Make object_whash a static var.
* libguile/properties.c: Make properties_whash a static var.
* libguile/srcprop.h:
* libguile/srcprop.c: Make scm_source_whash a global with internal
linkage.
* libguile/strings.h:
* libguile/strings.c: Make scm_nullstr a global with internal linkage.
* libguile/vectors.c (scm_init_vectors): No need to init scm_nullvect,
it's unused.
The intent is to allow compilation with `-Wundef', which in turn should
make it easier to catch erroneous uses of nonexistent macros.
* libguile/__scm.h: Don't assume `BUILDING_LIBGUILE' is defined.
* libguile/conv-uinteger.i.c (SCM_TO_TYPE_PROTO): Remove unneeded CPP
conditional on `TYPE_MIN == 0'.
* libguile/fports.c: Check for the definition of `HAVE_CHSIZE' and
`HAVE_FTRUNCATE', not for their value.
* libguile/ports.c: Likewise.
* libguile/numbers.c (guile_ieee_init): Likewise with `HAVE_DINFINITY'
and `HAVE_DQNAN'.
* test-suite/standalone/test-conversion.c (ieee_init): Likewise.
* libguile/strings.c: Likewise with `SCM_STRING_LENGTH_HISTOGRAM'.
* libguile/strings.h: Likewise.
* libguile/tags.h: Likewise with `HAVE_INTTYPES_H' and `HAVE_STDINT_H'.
* libguile/threads.c: Likewise with `HAVE_PTHREAD_GET_STACKADDR_NP'.
* libguile/vm-engine.c (VM_NAME): Likewise with `VM_CHECK_IP'.
* libguile/gen-scmconfig.c (main): Use "#ifdef HAVE_", not "#if HAVE_".
* libguile/socket.c (scm_setsockopt): Likewise.
This requires separate small fixes.
Readline has internal logic to deal with multi-byte characters, so
it wants bytes, not characters.
scm_c_read gets called by the vm when readline is activated, and it was
truncating multi-byte characters because soft ports didn't have the
UCS-4 capability.
Soft ports need the capability to read UCS-4 characters. Since soft ports
may have a single byte buffer, full characters need to be stored into the
pushback buffer.
This broke the optimizations in scm_c_read for using an alternate buffer
for single-byte-buffered ports, because the opimization wasn't expecting
anything in the pushback buffer.
* libguile/vports.c (sf_fill_input): store complete chars, not single bytes
* libguile/ports.c (scm_c_read): don't use optimized path for non Latin-1.
Add debug prints.
* libguile/string.h: make scm_i_from_stringn and scm_i_string_ref public
so that readline can use them
* guile-readline/readline.c: read bytes, not complete chars, from the
input port. Convert output to the output port's locale
String ports, being 8-bit, store strings using the character encoding
of the port. This fixes a bug where the default character encoding, and
not the port's encoding, was being used to convert the string port data
back to a string.
* libguile/strports.c: extra comments
(scm_strport_to_string): use port's encoding when converting port data
to a string
* libguile/strings.c (scm_i_from_stringn): renamed from scm_from_stringn
and made internal. All callers changed.
(scm_from_stringn): renamed to scm_i_from_stringn.
* libguile/strings.h: declaration for scm_i_from_stringn
Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
* libguile/socket.c (scm_recv): receive the message without holding the
stringbuf writing lock
(scm_send): try to narrow a string before using it
* libguile/stime.c (strftime): convert string to UTF-8 so that it can
be safely passed to strftime
(strptime): convert input string to UTF-8 so that it can be safely
passed through strptime
* libguile/strings.c (narrow_stringbuf): new function
(scm_i_try_narrow_string): new function
* libguile/strings.h: new declaration for scm_i_try_narrow_string