* libguile/strports.c (scm_mkstrport): Use UTF-8; ignore
%default-port-encoding. Rename 'str_len' and 'c_pos' to
'num_bytes' and 'c_byte_pos'. Interpret 'pos' argument
as a character index instead of a byte index.
* module/ice-9/boot-9.scm (%cond-expand-features): Add srfi-6 to the
list of core features.
* module/srfi/srfi-6.scm (open-input-string, open-output-string): Simply
re-export these, since the core versions are now compliant.
* doc/ref/api-io.texi (String Ports): Remove text that describes
non-compliant behavior of string ports with regard to encoding.
* doc/ref/srfi-modules.texi (SRFI-0): Add srfi-6 to the list of
core features.
(SRFI-6): Remove text that mentions non-compliant behavior of
core string ports.
* module/ice-9/format.scm (format):
* module/ice-9/pretty-print.scm (truncated-print):
* module/rnrs/io/ports.scm (open-string-input-port,
open-string-output-port):
* test-suite/test-suite/lib.scm (format-test-name):
* test-suite/tests/chars.test ("combining accent is pretty-printed",
"combining X is pretty-printed"):
* test-suite/tests/ecmascript.test (eread, eread/1):
* test-suite/tests/rdelim.test:
* test-suite/tests/reader.test (read-string):
* test-suite/tests/regexp.test:
* test-suite/tests/srfi-105.test (read-string): Don't set
%default-port-encoding before creating string ports.
* benchmark-suite/benchmarks/ports.bm (%latin1-port): Use
'set-port-encoding!' to set the string port encoding.
(%utf8/ascii-port, %utf8/wide-port, "rdelim"): Don't set
%default-port-encoding before creating string ports.
* test-suite/tests/r6rs-ports.test ("lookahead-u8 non-ASCII"): Don't set
%default-port-encoding before creating string ports.
("put-bytevector with UTF-16 string port", "put-bytevector with
wrong-encoding string port"): Use 'set-port-encoding!' to set the
string port encoding.
* test-suite/tests/print.test (tprint): Use 'set-port-encoding!' to set
the string port encoding.
("truncated-print"): Use 'pass-if-equal'.
* test-suite/tests/ports.test ("encoding failure leads to exception",
"%default-port-encoding is honored", "peek-char [latin-1]", "peek-char
[utf-8]", "peek-char [utf-16]"): Remove tests.
("%default-port-encoding is ignored", "peek-char"): Add tests.
("suitable encoding [latin-1]", "suitable encoding [latin-3]",
"wrong encoding, error", "wrong encoding, substitute",
"wrong encoding, escape"): Use 'set-port-encoding!' to set the
string port encoding.
("%default-port-encoding, wrong encoding"): Rewrite to use
a file port instead of a string port.
* libguile/debug.c (scm_local_eval):
libguile/ports.c (scm_current_warning_port):
libguile/strports.c (scm_eval_string_in_module): Perform
lazy-initialization while holding a mutex. Use SCM_UNDEFINED as the
uninitialized value. Use 'scm_c_*_variable'.
* doc/ref/api-modules.texi (Accessing Modules from C): Fix
'my_eval_string' example to be thread-safe.
Partly addresses <http://bugs.gnu.org/11197>.
* libguile/strports.c (scm_mkstrport): Call `scm_port_non_buffer', set
Z's cell type and stream, and release `scm_i_port_table_mutex' early.
Reacquire `scm_i_port_table_mutex' once BUF, C_BUF, and STR_LEN are
initialized.
* test-suite/tests/ports.test ("string ports")["encoding failure leads
to exception"]: New test.
* libguile/strports.c (scm_mkstrport): Remove initialization of
`pt->ilseq_handler'.
* module/ice-9/pretty-print.scm (truncated-print)[ellipsis]: Set
%DEFAULT-PORT-CONVERSION-STRATEGY to 'error.
* test-suite/tests/ports.test ("string
ports")["%default-port-conversion-strategy is honored"]: New test.
["wrong encoding"]: Rename to...
["wrong encoding, error"]: ... this. Explicitly set
%DEFAULT-PORT-CONVERSION-STRATEGY to 'error. Return #f when no
exception is raised.
* libguile/print.c (PORT_CONVERSION_HANDLER): New macro.
(print_extended_symbol, iprin1, write_character, scm_write_char): Use
it instead of `scm_i_get_conversion_strategy'.
* libguile/strports.c (scm_mkstrport): Assign `pt->ilseq_handler'
directly instead of via `scm_i_set_conversion_strategy_x'.
* libguile/strports.c (st_fill_input): Rename from stfill_buffer, and
remove an unneeded scm_return_first_int.
(st_resize_port): Minor variable renaming.
(st_write): Keep read_pos updated to be the same as write_pos. If we
need to resize, do so only once.
(st_seek): No more need to flush.
(st_truncate): Update read_pos here too.
(scm_mkstrport): No need to flush here.
(scm_strport_to_string): Just call scm_from_stringn; rely on it to
detect the latin1 case.
(scm_make_stptob): No more flush function.
* libguile/ports.c (scm_c_make_port_with_encoding, scm_c_make_port): New
functions, to replace scm_new_port_table_entry. Use a weak set
instead of a weak table.
(scm_i_remove_port):
(scm_c_port_for_each, scm_port_for_each): Adapt to use weak set.
(scm_i_void_port): Use scm_c_make_port.
(scm_init_ports): Make a weak set.
* libguile/fports.c:
* libguile/ioext.c:
* libguile/r6rs-ports.c:
* libguile/strports.c:
* libguile/vports.c: Adapt to use the new scm_c_make_port API.
* libguile/strports.c (INITIAL_BUFFER_SIZE): New macro.
(scm_mkstrport): If STR is false, allocate a bytevector on the
caller's behalf.
(scm_object_to_string, scm_call_with_output_string,
scm_open_output_string): Pass SCM_BOOL_F as the STR argument of
`scm_mkstrport'.
* libguile/backtrace.c (scm_display_application,
display_backtrace_body): Likewise.
* libguile/gdbint.c (scm_init_gdbint): Likewise.
* libguile/print.c (scm_simple_format): Likewise.
* libguile/strports.c (st_resize_port): Adjust to deal with OLD_STREAM
and NEW_STREAM as bytevectors.
(scm_mkstrport): Store a bytevector in the port's stream rather than a
string.
* libguile/strports.c (scm_i_mkstrport): Remove.
(scm_mkstrport): Don't change the port's encoding to UTF-8; convert
STR to the default port encoding.
(scm_strport_to_string): Fix documentation & indentation.
* libguile/strports.h (scm_i_mkstrport): Remove.
* test-suite/lib.scm (exception:encoding-error): New variable.
(format-test-name): Set `%default-port-encoding' to "UTF-8".
* test-suite/tests/ports.test ("string ports")["%default-port-encoding
is honored", "suitable encoding [latin-1]", "suitable encoding
[latin-3]", "wrong encoding"]: New tests.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with UTF-16 string port", "put-bytevector
with wrong-encoding string port"]: New tests.
* test-suite/tests/reader.test (read-string): Set
`%default-port-encoding' to `#f'.
("reading")["unprintable symbol"]: Use a string that doesn't contain
zeros.
* doc/ref/api-io.texi (String Ports): Document encoding issues with
`call-with-output-string' and `with-output-to-string'.
String ports should be able to accept any string characters, regardless
of the current locale. Setting it to UTF-8 achieves that.
* libguile/strports.c (scm_i_mkstrport): set port's locale to UTF-8
(scm_mkstrport): convert input string to UTF-8
String ports, being 8-bit, store strings using the character encoding
of the port. This fixes a bug where the default character encoding, and
not the port's encoding, was being used to convert the string port data
back to a string.
* libguile/strports.c: extra comments
(scm_strport_to_string): use port's encoding when converting port data
to a string
* libguile/strings.c (scm_i_from_stringn): renamed from scm_from_stringn
and made internal. All callers changed.
(scm_from_stringn): renamed to scm_i_from_stringn.
* libguile/strings.h: declaration for scm_i_from_stringn
Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
* libguile/gen-scmconfig.c (main): Produce a definition for
`scm_t_off'.
* libguile/ports.h (scm_t_port)[read_buf_size, saved_read_buf_size,
write_buf_size, seek, truncate]: Use `scm_t_off' instead of `off_t' so
that the layout and size of the structure does not depend on the
application's `_FILE_OFFSET_BITS' value. Reported by Bill
Schottstaedt, see
http://lists.gnu.org/archive/html/bug-guile/2009-06/msg00018.html.
(scm_set_port_seek, scm_set_port_truncate): Update.
* libguile/ports.c (scm_set_port_seek, scm_set_port_truncate): Use
`scm_t_off' and `off_t_or_off64_t'.
* libguile/fports.c (fport_seek, fport_truncate): Use `scm_t_off'
instead of `off_t'.
* libguile/r6rs-ports.c (bip_seek, cbp_seek, bop_seek): Use `scm_t_off'
instead of `off_t'.
* libguile/rw.c (scm_write_string_partial): Likewise.
* libguile/strports.c (st_resize_port, st_seek, st_truncate): Likewise.
* doc/ref/api-io.texi (Port Implementation): Update prototype of
`scm_set_port_seek ()' and `scm_set_port_truncate ()'.
* NEWS: Update.
(SCM_VALIDATE_STRING_COPY): Deprecated. Replaced all uses with
SCM_VALIDATE_STRING plus SCM_I_STRING_CHARS or
scm_to_locale_string, etc.
(SCM_VALIDATE_SUBSTRING_SPEC_COPY): Deprecated. Replaced as
above, plus scm_i_get_substring_spec.
* regex-posix.c, read.c, random.c, ramap.c, print.c, numbers.c,
hash.c, gc.c, gc-card.c, convert.i.c, backtrace.c, strop.c,
strorder.c, strports.c, struct.c, symbols.c, unif.c, ports.c: Use
SCM_I_STRING_CHARS, SCM_I_STRING_UCHARS, and SCM_I_STRING_LENGTH
instead of SCM_STRING_CHARS, SCM_STRING_UCHARS, and
SCM_STRING_LENGTH, respectively. Also, replaced scm_return_first
with more explicit scm_remember_upto_here_1, etc, or introduced
them in the first place.
SCM_INUM): Deprecated by reenaming them to SCM_I_INUMP, SCM_I_NINUMP
and SCM_I_INUM, respectively and adding deprecated versions to
deprecated.h and deprecated.c. Changed all uses to either use the
SCM_I_ variants or scm_is_*, scm_to_*, or scm_from_*, as appropriate.