* libguile/ports-internal.h (struct scm_port_internal): Add new members
'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
(SCM_UNICODE_BOM): New macro.
(scm_i_port_iconv_descriptors): Add 'mode' parameter to prototype.
* libguile/ports.c (scm_new_port_table_entry): Initialize
'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
(get_iconv_codepoint): Pass new 'mode' parameter to
'scm_i_port_iconv_descriptors'.
(get_codepoint): After reading a codepoint at stream start, record
that we're no longer at stream start, and consume a BOM where
appropriate.
(scm_seek): Set the stream start flags according to the new position.
(looking_at_bytes): New static function.
(scm_utf8_bom, scm_utf16be_bom, scm_utf16le_bom, scm_utf32be_bom,
scm_utf32le_bom): New static const arrays.
(decide_utf16_encoding, decide_utf32_encoding): New static functions.
(scm_i_port_iconv_descriptors): Add new 'mode' parameter. If the
specified encoding is UTF-16 or UTF-32, make that precise by deciding
what byte order to use, and construct iconv descriptors based on the
precise encoding.
(scm_i_set_port_encoding_x): Record that we are now at stream start.
Do not open the new iconv descriptors immediately; let them be
initialized lazily.
* libguile/print.c (display_string_using_iconv): Record that we're no
longer at stream start. Write a BOM if appropriate.
* doc/ref/api-io.texi (BOM Handling): New node.
* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
Adapt test to cope with the fact that 'set-port-encoding!' does not
immediately open the iconv descriptors.
(bv-read-test): New procedure.
("unicode byte-order marks (BOMs)"): New test prefix.
* libguile/ports.c (scm_i_set_port_encoding_x): Always copy the
user-provided port encoding string, so that its case will be preserved
and returned exactly by subsequent calls to 'port-encoding'.
* libguile/ports.c (scm_i_get_byte_or_eof, scm_i_peek_byte_or_eof):
Rename to 'scm_slow_get_byte_or_eof' and 'scm_slow_peek_byte_or_eof',
respectively.
* libguile/ports.h (scm_i_get_byte_or_eof, scm_i_peek_byte_or_eof):
Rename to 'scm_slow_get_byte_or_eof' and 'scm_slow_peek_byte_or_eof',
respectively, and mark them as SCM_API.
* libguile/inline.h (scm_get_byte_or_eof, scm_peek_byte_or_eof): Adjust
to use the new names.
* libguile/ports.c (get_iconv_codepoint): Rewrite to fix a bug and
improve efficiency and clarity. Previously, it incorrectly assumed
that iconv would never consume input without producing output, which
led to a buffer overrun and subsequent assertion failure. This
happens when a byte-order mark is consumed by iconv at the beginning
of the stream when using the UTF-16 or UTF-32 encodings.
* test-suite/tests/ports.test (unicode byte-order marks (BOMs)):
Add tests.
Suggested by Andy Wingo.
* libguile/inline.h (scm_get_byte_or_eof, scm_peek_byte_or_eof): Keep
only the fast path here, with fallback to 'scm_i_get_byte_or_eof' and
'scm_i_peek_byte_or_eof'.
* libguile/ports.c (scm_i_get_byte_or_eof, scm_i_peek_byte_or_eof):
New internal functions.
* libguile/ports.h (scm_i_get_byte_or_eof, scm_i_peek_byte_or_eof): Add
prototypes.
* libguile/ports.c (scm_i_fill_input): New static function, containing
the code that was previously in 'scm_fill_input'.
(scm_fill_input): Simply call 'scm_i_fill_input'.
(scm_c_read): Use 'scm_i_fill_input'.
* libguile/ports-internal.h (struct scm_port_internal): Add 'alist'
member.
* libguile/ports.c (scm_i_port_alist, scm_i_set_port_alist_x): New
internal functions.
(scm_i_port_weak_hash): Update comment: the hash table is no longer
used to store the port's alist.
(scm_new_port_table_entry): Initialize 'alist'. Store SCM_BOOL_F in
the port weak hash, not SCM_EOL.
* libguile/ports.h (scm_i_port_alist, scm_i_set_port_alist_x): Add
protoypes.
* libguile/read.c (set_port_read_option, init_read_options): Access the
port's alist via 'scm_i_port_alist' and 'scm_i_set_port_alist_x'.
Based on 6c98257f2e by Andy Wingo.
* libguile/ports-internal.h (struct scm_port_internal): Add a flag
for the port encoding mode: UTF8 or iconv. The iconv descriptors
are now in a separate structure so that we can avoid attaching
finalizers to the ports themselves in the future.
(enum scm_port_encoding_mode): New enum.
(struct scm_iconv_descriptors): New struct.
(scm_i_port_iconv_descriptors): Add prototype.
* libguile/ports.c (finalize_port): Don't close iconv descriptors here.
(scm_new_port_table_entry): Adapt to the iconv descriptors being
moved. Initialize 'encoding_mode'.
(scm_i_remove_port): Adapt to call 'close_iconv_descriptors'.
(close_iconv_descriptors): New static function.
(get_iconv_codepoint): Use 'scm_i_port_iconv_descriptors'.
(get_codepoint): Check the port 'encoding_mode'.
(finalize_iconv_descriptors, open_iconv_descriptors,
close_iconv_descriptors, scm_i_port_iconv_descriptors): New static
functions.
(scm_i_set_port_encoding_x): Adapt to iconv descriptors being moved
to separate structure, to set the 'encoding_mode' flag, and to use
'open_iconv_descriptors' and 'close_iconv_descriptors'.
* libguile/print.c (display_string_using_iconv): Use
'scm_i_port_iconv_descriptors'.
(display_string): Use 'encoding_mode' flag.
* libguile/ports-internal.h: New file.
* libguile/Makefile.am (noinst_HEADERS): Add ports-internal.h.
* libguile/ports.h (scm_t_port): Add a comment mentioning that the
'input_cd' and 'output_cd' fields of the public structure are no
longer what they seem to be.
* libguile/ports.c: Include ports-internal.h.
(finalize_port, scm_i_remove_port, get_iconv_codepoint, get_codepoint,
scm_i_set_port_encoding_x): Access 'input_cd' and 'output_cd' via the
new internal port structure.
(scm_new_port_table_entry): Allocate and initialize the internal port
structure.
* libguile/print.c: Include ports-internal.h.
(display_string_using_iconv, display_string): Access 'input_cd' and
'output_cd' via 'internal' pointer.
* libguile/debug.c (scm_local_eval):
libguile/ports.c (scm_current_warning_port):
libguile/strports.c (scm_eval_string_in_module): Perform
lazy-initialization while holding a mutex. Use SCM_UNDEFINED as the
uninitialized value. Use 'scm_c_*_variable'.
* doc/ref/api-modules.texi (Accessing Modules from C): Fix
'my_eval_string' example to be thread-safe.
* libguile/ports.h:
* libguile/ports.c (scm_consume_byte_order_mark): New procedure.
* libguile/fports.c (scm_open_file): Call consume-byte-order-mark if we
are opening a file in "r" mode.
* libguile/read.c (scm_i_scan_for_encoding): Don't do anything about
byte-order marks.
* libguile/load.c (scm_primitive_load): Add a note about the duplicate
encoding scan.
* test-suite/tests/filesys.test: Add tests for UTF-8, UTF-16BE, and
UTF-16LE BOM handling.
* libguile/ports.c (scm_i_port_weak_hash): Document that the values in
this hash table will now be alists. Previously the value slots were
unused.
(scm_new_port_table_entry): Change the initial value of the entry in
scm_i_port_weak_hash from SCM_BOOL_F to SCM_EOL.
Fixes <http://bugs.gnu.org/12033>.
Reported by nalaginrut <nalaginrut@gmail.com>.
* libguile/print.c (scm_i_display_substring): New function.
* libguile/print.h (scm_i_display_substring): New internal declaration.
* libguile/ports.c (scm_lfwrite_substr): Use it instead of `scm_display'
+ `scm_c_substring'.
Fixes <http://bugs.gnu.org/11468>.
* libguile/ports.c (scm_conversion_strategy): Remove.
(default_conversion_strategy_var, sym_error, sym_substitute,
sym_escape): New variables.
(scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x):
Remove.
(scm_i_default_port_conversion_handler,
scm_i_set_default_port_conversion_handler): New functions.
(scm_port_conversion_strategy): Use
`scm_i_default_port_conversion_handler' when PORT is #f.
(scm_set_port_conversion_strategy_x): Use SYM_ERROR, SYM_SUBSTITUTE,
and SYM_ESCAPE. Use `scm_i_set_default_port_conversion_handler' when
PORT is #f.
(scm_init_ports): Initialize DEFAULT_CONVERSION_STRATEGY_VAR.
* libguile/ports.h: Update declarations accordingly.
* libguile/foreign.c: Change
`scm_i_get_conversion_strategy (SCM_BOOL_F)' to
`scm_i_default_port_conversion_handler ()'.
* libguile/strings.c: Likewise.
* test-suite/tests/ports.test ("%default-port-conversion-strategy"): New
test prefix.
* test-suite/tests/foreign.test ("pointer<->string")["%default-port-conversion-strategy
is error", "%default-port-conversion-strategy is soft"]: New tests.
* test-suite/test-suite/lib.scm (exception:encoding-error): Allow the
regexp to match `scm_to_stringn' error messages.
* doc/ref/api-io.texi (Ports): Document `%default-port-conversion-strategy'.
* libguile/fports.c (scm_setvbuf): Use `scm_take_from_input_buffers'
directly instead of `scm_drain_input'; use `scm_unget_byte' instead of
`scm_unread_string' to put the drained input back to PORT. This
leaves PORT's line/column numbers unchanged, whereas they'd previously
be decreased by the `scm_unread_string' call.
* libguile/ports.c (scm_take_from_input_buffers): Update description and
variable names to refer to "bytes", not "chars".
* test-suite/tests/ports.test ("setvbuf"): New test prefix.
* libguile/ports.c (scm_init_ports): Export the port fluids to Scheme,
temporarily.
* module/ice-9/boot-9.scm (fluid->parameter): Turn `current-input-port'
et al into srfi-39 parameters, backed by the exported fluids, then
remove the fluids from the guile module.
(%cond-expand-features): Add srfi-39.
* module/srfi/srfi-39.scm: Re-export features from boot-9.
* test-suite/tests/parameters.test: Add tests.
* libguile/ports.h:
* libguile/ports.c (scm_current_warning_port)
(scm_set_current_warning_port): New functions, wrapping the Scheme
parameter.
* module/ice-9/boot-9.scm (current-warning-port): New parameter,
defining a port for warnings.
* libguile/numbers.c (scm_logand): Fix a type error (comparing a SCM
against an int, when we really wanted to compare the unpacked
fixnum).
* libguile/ports.c (scm_i_set_conversion_strategy_x): Check
scm_conversion_strategy_init, not scm_conversion_strategy.
* libguile/read.c (recsexpr): Fix loops to avoid strange test of SCM
values.
Thanks to Mark H. Weaver for pointing this out.
* libguile/ports.c (CONSUME_PEEKED_BYTE): New macro.
(get_utf8_codepoint): New variable `pt'. Use
`scm_peek_byte_or_eof'/`CONSUME_PEEKED_BYTE' pairs instead of
`scm_get_byte_or_eof'.
* test-suite/tests/ports.test ("string ports")[#xc2 #x41 #x42, #xe0 #xa0
#x41 #x42, #xf0 #x88 #x88 #x88]: Fix to conform to Unicode 6.0.0.
[#xe0 #x88 #x88]: Remove test.
[#xf0 #x80 #x80 #x41]: New test.
* libguile/ports.c (update_port_lf): Handle EOF.
(get_utf8_codepoint, get_iconv_codepoint): New functions.
(get_codepoint): Use them.
(scm_i_set_port_encoding_x): Don't open conversion descriptors when
ENCODING is "UTF-8".
* libguile/print.c (display_string_as_utf8, display_string_using_iconv):
New functions.
(display_string): Use them.
* test-suite/tests/ports.test ("string ports")[#xc2 #x41 #x42]: Add a
note that this is not the wrong behavior per Unicode 6.0.0.
* libguile/arrays.c (scm_i_read_array):
* libguile/backtrace.c (display_backtrace_body):
* libguile/filesys.c (scm_readdir)
* libguile/i18n.c (chr_to_case):
* libguile/ports.c (register_finalizer_for_port):
* libguile/posix.c (scm_nice):
* libguile/stacks.c (scm_make_stack): Clean up a number of
set-but-unused vars. Thanks to Douglas Mencken for the report.
* libguile/numbers.c (scm_log, scm_exp): Fix a few #if cases that should
be #ifdef.
* libguile/ports.c (scm_i_remove_port): Fix a case in which ports
explictly closed via close-port would leak their iconv_t data.
(scm_set_port_encoding_x): scm_i_set_port_encoding_x strdups its
argument, so we need to free the locale encoding of the incoming str.
* libguile/ports.h (scm_i_remove_port): Remove declaration, as it was
SCM_INTERNAL.
* libguile/ports.c (scm_add_to_port_table): Issue a deprecation
warning if this function is called. Remove needless SCM_API
declaration, it was already declared as such in ports.h. Safely
access the port table.
(scm_i_remove_port): Remove bogus comment about lack of need for
threadsafety. Take the port table mutex.
(scm_close_port): No need to take port table mutex around calling
scm_i_remove_port.
* libguile/ports.c (scm_i_set_default_port_encoding,
scm_i_default_port_encoding): New function. Replace
`scm_i_set_port_encoding_x' and `scm_i_get_port_encoding' with
PORT == SCM_BOOL_F.
(scm_i_set_port_encoding_x): Assume PORT is a port.
(scm_i_get_port_encoding): Remove.
(scm_port_encoding): Adjust accordingly.
(scm_new_port_table_entry): Use `scm_i_default_port_encoding'.
* libguile/ports.h (scm_i_get_port_encoding): Remove declarations.
(scm_i_default_port_encoding, scm_i_set_default_port_encoding): New
declarations.
* libguile/posix.c (setlocale): Use `scm_i_set_default_port_encoding'.
* libguile/ports.c (scm_read_char): Mention `decoding-error' in the
docstring.
(get_codepoint): Change to return an error code; add `codepoint'
output parameter. Don't raise an error from here.
(scm_getc): Raise an error with `scm_decoding_error' if
`get_codepoint' returns an error.
(scm_peek_char): Likewise. Update docstring.
* libguile/strings.c (scm_decoding_error_key): New variable.
(scm_decoding_error): New function.
(scm_from_stringn): Use `scm_decoding_error' instead of
`scm_encoding_error'.
* libguile/strings.h (scm_decoding_error): New declaration.
* test-suite/tests/ports.test ("string ports")["read-char, wrong
encoding, error"]: Change to expect `decoding-error'. Make sure PORT
points past the error.
["read-char, wrong encoding, escape"]: Likewise.
["peek-char, wrong encoding, error"]: New test.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with wrong-encoding string port"]: Change to
expect `decoding-error'.
("8.2.6 Input and output ports")["transcoded-port [error handling
mode = raise]"]: Likewise.
* test-suite/tests/rdelim.test ("read-line")["decoding error", "decoding
error, substitute"]: New tests.
* doc/ref/api-io.texi (Reading): Update documentation of `read-char' and
`peek-char'.
(Line/Delimited): Update documentation of `read-line'.
* libguile/ports.c (get_codepoint): Reset `pt->input_cd' upon failure.
If `pt->ilseq_handler' is `SCM_ICONVEH_QUESTION_MARK', then return a
question mark.
[failure]: Use `scm_encoding_error' when raising an error.
* test-suite/lib.scm (exception:encoding-error): Adjust regexp.
* test-suite/tests/ports.test ("string ports")["read-char, wrong
encoding, error", "read-char, wrong encoding, escape", "read-char,
wrong encoding, substitute"]: New tests.
* libguile/ports.c (scm_i_set_port_encoding_x): Always initialize
PT->encoding to something non-NULL. This fixes callers of
`scm_encoding_error' such that they always pass a non-NULL encoding
name. Reported by Matei Conovici.
Thanks to Bruno Haible for his suggestions. See
<http://lists.gnu.org/archive/html/bug-libunistring/2010-09/msg00007.html>,
for details.
* libguile/ports.c (register_finalizer_for_port): Always register a
finalizer for PORT.
(finalize_port): Close ENTRY->input_cd and ENTRY->output_cd.
(scm_new_port_table_entry): Initialize the `input_cd' and `output_cd'
fields.
(utf8_to_codepoint): New function.
(get_codepoint): Rewrite to use `iconv' instead of libunistring.
(scm_i_set_port_encoding_x): Initialize the `input_cd' and `output_cd'
fields.
(update_port_lf): Move upward. Use `switch' instead of `if's.
* libguile/ports.h (scm_t_port)[input_cd, output_cd]: New fields.
* libguile/print.c (codepoint_to_utf8, display_string): New functions.
(display_character): Use `display_string'.
(write_combining_character): Likewise.
(iprin1): Use `display_string' instead of `scm_lfwrite_str', and
`display_character' instead of `scm_putc'.
(write_character): Likewise.
(write_character_escaped): New function.
* test-suite/tests/encoding-escapes.test ("display output
escapes")["Rashomon"]: Use lower-case escapes.
["fake escape"]: New test.
* libguile/bytevectors.c:
* libguile/eval.c:
* libguile/goops.c:
* libguile/i18n.c:
* libguile/load.c:
* libguile/memoize.c:
* libguile/modules.c:
* libguile/ports.c:
* libguile/print.c:
* libguile/procs.c:
* libguile/programs.c:
* libguile/read.c:
* libguile/script.c:
* libguile/srfi-14.c:
* libguile/stacks.c:
* libguile/strings.c:
* libguile/throw.c:
* libguile/vm.c: Use scm_from_latin1_symboln to make symbols from string
literals, because they aren't in the user's locale -- they are in
ASCII, and we can optimize this case.
* libguile/vm-i-loader.c: Also use scm_from_latin1_symboln when loading
narrow symbols.
* libguile/ports.c (scm_drain_input): Slight optimization.
* libguile/fports.c (scm_setvbuf): If there is buffered output, flush
it. If there is input, drain it, and then unread it after updating
the buffers. Much more sensible than dropping it silently...