Fixes <http://bugs.gnu.org/12033>.
Reported by nalaginrut <nalaginrut@gmail.com>.
* libguile/print.c (scm_i_display_substring): New function.
* libguile/print.h (scm_i_display_substring): New internal declaration.
* libguile/ports.c (scm_lfwrite_substr): Use it instead of `scm_display'
+ `scm_c_substring'.
Fixes <http://bugs.gnu.org/11468>.
* libguile/ports.c (scm_conversion_strategy): Remove.
(default_conversion_strategy_var, sym_error, sym_substitute,
sym_escape): New variables.
(scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x):
Remove.
(scm_i_default_port_conversion_handler,
scm_i_set_default_port_conversion_handler): New functions.
(scm_port_conversion_strategy): Use
`scm_i_default_port_conversion_handler' when PORT is #f.
(scm_set_port_conversion_strategy_x): Use SYM_ERROR, SYM_SUBSTITUTE,
and SYM_ESCAPE. Use `scm_i_set_default_port_conversion_handler' when
PORT is #f.
(scm_init_ports): Initialize DEFAULT_CONVERSION_STRATEGY_VAR.
* libguile/ports.h: Update declarations accordingly.
* libguile/foreign.c: Change
`scm_i_get_conversion_strategy (SCM_BOOL_F)' to
`scm_i_default_port_conversion_handler ()'.
* libguile/strings.c: Likewise.
* test-suite/tests/ports.test ("%default-port-conversion-strategy"): New
test prefix.
* test-suite/tests/foreign.test ("pointer<->string")["%default-port-conversion-strategy
is error", "%default-port-conversion-strategy is soft"]: New tests.
* test-suite/test-suite/lib.scm (exception:encoding-error): Allow the
regexp to match `scm_to_stringn' error messages.
* doc/ref/api-io.texi (Ports): Document `%default-port-conversion-strategy'.
* libguile/fports.c (scm_setvbuf): Use `scm_take_from_input_buffers'
directly instead of `scm_drain_input'; use `scm_unget_byte' instead of
`scm_unread_string' to put the drained input back to PORT. This
leaves PORT's line/column numbers unchanged, whereas they'd previously
be decreased by the `scm_unread_string' call.
* libguile/ports.c (scm_take_from_input_buffers): Update description and
variable names to refer to "bytes", not "chars".
* test-suite/tests/ports.test ("setvbuf"): New test prefix.
* libguile/ports.c (scm_init_ports): Export the port fluids to Scheme,
temporarily.
* module/ice-9/boot-9.scm (fluid->parameter): Turn `current-input-port'
et al into srfi-39 parameters, backed by the exported fluids, then
remove the fluids from the guile module.
(%cond-expand-features): Add srfi-39.
* module/srfi/srfi-39.scm: Re-export features from boot-9.
* test-suite/tests/parameters.test: Add tests.
* libguile/ports.h:
* libguile/ports.c (scm_current_warning_port)
(scm_set_current_warning_port): New functions, wrapping the Scheme
parameter.
* module/ice-9/boot-9.scm (current-warning-port): New parameter,
defining a port for warnings.
* libguile/numbers.c (scm_logand): Fix a type error (comparing a SCM
against an int, when we really wanted to compare the unpacked
fixnum).
* libguile/ports.c (scm_i_set_conversion_strategy_x): Check
scm_conversion_strategy_init, not scm_conversion_strategy.
* libguile/read.c (recsexpr): Fix loops to avoid strange test of SCM
values.
Thanks to Mark H. Weaver for pointing this out.
* libguile/ports.c (CONSUME_PEEKED_BYTE): New macro.
(get_utf8_codepoint): New variable `pt'. Use
`scm_peek_byte_or_eof'/`CONSUME_PEEKED_BYTE' pairs instead of
`scm_get_byte_or_eof'.
* test-suite/tests/ports.test ("string ports")[#xc2 #x41 #x42, #xe0 #xa0
#x41 #x42, #xf0 #x88 #x88 #x88]: Fix to conform to Unicode 6.0.0.
[#xe0 #x88 #x88]: Remove test.
[#xf0 #x80 #x80 #x41]: New test.
* libguile/ports.c (update_port_lf): Handle EOF.
(get_utf8_codepoint, get_iconv_codepoint): New functions.
(get_codepoint): Use them.
(scm_i_set_port_encoding_x): Don't open conversion descriptors when
ENCODING is "UTF-8".
* libguile/print.c (display_string_as_utf8, display_string_using_iconv):
New functions.
(display_string): Use them.
* test-suite/tests/ports.test ("string ports")[#xc2 #x41 #x42]: Add a
note that this is not the wrong behavior per Unicode 6.0.0.
* libguile/arrays.c (scm_i_read_array):
* libguile/backtrace.c (display_backtrace_body):
* libguile/filesys.c (scm_readdir)
* libguile/i18n.c (chr_to_case):
* libguile/ports.c (register_finalizer_for_port):
* libguile/posix.c (scm_nice):
* libguile/stacks.c (scm_make_stack): Clean up a number of
set-but-unused vars. Thanks to Douglas Mencken for the report.
* libguile/numbers.c (scm_log, scm_exp): Fix a few #if cases that should
be #ifdef.
* libguile/ports.c (scm_i_remove_port): Fix a case in which ports
explictly closed via close-port would leak their iconv_t data.
(scm_set_port_encoding_x): scm_i_set_port_encoding_x strdups its
argument, so we need to free the locale encoding of the incoming str.
* libguile/ports.h (scm_i_remove_port): Remove declaration, as it was
SCM_INTERNAL.
* libguile/ports.c (scm_add_to_port_table): Issue a deprecation
warning if this function is called. Remove needless SCM_API
declaration, it was already declared as such in ports.h. Safely
access the port table.
(scm_i_remove_port): Remove bogus comment about lack of need for
threadsafety. Take the port table mutex.
(scm_close_port): No need to take port table mutex around calling
scm_i_remove_port.
* libguile/ports.c (scm_i_set_default_port_encoding,
scm_i_default_port_encoding): New function. Replace
`scm_i_set_port_encoding_x' and `scm_i_get_port_encoding' with
PORT == SCM_BOOL_F.
(scm_i_set_port_encoding_x): Assume PORT is a port.
(scm_i_get_port_encoding): Remove.
(scm_port_encoding): Adjust accordingly.
(scm_new_port_table_entry): Use `scm_i_default_port_encoding'.
* libguile/ports.h (scm_i_get_port_encoding): Remove declarations.
(scm_i_default_port_encoding, scm_i_set_default_port_encoding): New
declarations.
* libguile/posix.c (setlocale): Use `scm_i_set_default_port_encoding'.
* libguile/ports.c (scm_read_char): Mention `decoding-error' in the
docstring.
(get_codepoint): Change to return an error code; add `codepoint'
output parameter. Don't raise an error from here.
(scm_getc): Raise an error with `scm_decoding_error' if
`get_codepoint' returns an error.
(scm_peek_char): Likewise. Update docstring.
* libguile/strings.c (scm_decoding_error_key): New variable.
(scm_decoding_error): New function.
(scm_from_stringn): Use `scm_decoding_error' instead of
`scm_encoding_error'.
* libguile/strings.h (scm_decoding_error): New declaration.
* test-suite/tests/ports.test ("string ports")["read-char, wrong
encoding, error"]: Change to expect `decoding-error'. Make sure PORT
points past the error.
["read-char, wrong encoding, escape"]: Likewise.
["peek-char, wrong encoding, error"]: New test.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with wrong-encoding string port"]: Change to
expect `decoding-error'.
("8.2.6 Input and output ports")["transcoded-port [error handling
mode = raise]"]: Likewise.
* test-suite/tests/rdelim.test ("read-line")["decoding error", "decoding
error, substitute"]: New tests.
* doc/ref/api-io.texi (Reading): Update documentation of `read-char' and
`peek-char'.
(Line/Delimited): Update documentation of `read-line'.
* libguile/ports.c (get_codepoint): Reset `pt->input_cd' upon failure.
If `pt->ilseq_handler' is `SCM_ICONVEH_QUESTION_MARK', then return a
question mark.
[failure]: Use `scm_encoding_error' when raising an error.
* test-suite/lib.scm (exception:encoding-error): Adjust regexp.
* test-suite/tests/ports.test ("string ports")["read-char, wrong
encoding, error", "read-char, wrong encoding, escape", "read-char,
wrong encoding, substitute"]: New tests.
* libguile/ports.c (scm_i_set_port_encoding_x): Always initialize
PT->encoding to something non-NULL. This fixes callers of
`scm_encoding_error' such that they always pass a non-NULL encoding
name. Reported by Matei Conovici.
Thanks to Bruno Haible for his suggestions. See
<http://lists.gnu.org/archive/html/bug-libunistring/2010-09/msg00007.html>,
for details.
* libguile/ports.c (register_finalizer_for_port): Always register a
finalizer for PORT.
(finalize_port): Close ENTRY->input_cd and ENTRY->output_cd.
(scm_new_port_table_entry): Initialize the `input_cd' and `output_cd'
fields.
(utf8_to_codepoint): New function.
(get_codepoint): Rewrite to use `iconv' instead of libunistring.
(scm_i_set_port_encoding_x): Initialize the `input_cd' and `output_cd'
fields.
(update_port_lf): Move upward. Use `switch' instead of `if's.
* libguile/ports.h (scm_t_port)[input_cd, output_cd]: New fields.
* libguile/print.c (codepoint_to_utf8, display_string): New functions.
(display_character): Use `display_string'.
(write_combining_character): Likewise.
(iprin1): Use `display_string' instead of `scm_lfwrite_str', and
`display_character' instead of `scm_putc'.
(write_character): Likewise.
(write_character_escaped): New function.
* test-suite/tests/encoding-escapes.test ("display output
escapes")["Rashomon"]: Use lower-case escapes.
["fake escape"]: New test.
* libguile/bytevectors.c:
* libguile/eval.c:
* libguile/goops.c:
* libguile/i18n.c:
* libguile/load.c:
* libguile/memoize.c:
* libguile/modules.c:
* libguile/ports.c:
* libguile/print.c:
* libguile/procs.c:
* libguile/programs.c:
* libguile/read.c:
* libguile/script.c:
* libguile/srfi-14.c:
* libguile/stacks.c:
* libguile/strings.c:
* libguile/throw.c:
* libguile/vm.c: Use scm_from_latin1_symboln to make symbols from string
literals, because they aren't in the user's locale -- they are in
ASCII, and we can optimize this case.
* libguile/vm-i-loader.c: Also use scm_from_latin1_symboln when loading
narrow symbols.
* libguile/ports.c (scm_drain_input): Slight optimization.
* libguile/fports.c (scm_setvbuf): If there is buffered output, flush
it. If there is input, drain it, and then unread it after updating
the buffers. Much more sensible than dropping it silently...
* libguile/ports.c (scm_char_ready_p, scm_peek_char, scm_unread_char)
(scm_unread_string): Always validate the port, even in the case that
we get it the default current-input-port. Otherwise the following
causes a segfault:
(begin (close-port (current-input-port)) (peek-char))
This makes `peek-char' 40x faster on a port whose encoding is
faster on a UTF-8 port containing multi-byte codepoints.
The `xml->sxml' procedure is 4x faster on a 2.7 MiB XML file.
* libguile/ports.c (get_codepoint): New procedure, moved here from
`scm_getc', with the additional BUF and LEN parameters.
(scm_getc): Use it.
(scm_peek_char): Use it instead of the `scm_getc'/`scm_ungetc'
sequence.
* test-suite/tests/ports.test ("string ports")["peek-char [latin-1]",
"peek-char [utf-8]"]: New tests.
* benchmark-suite/Makefile.am (SCM_BENCHMARKS): Add
`benchmarks/ports.bm'.
* benchmark-suite/benchmarks/ports.bm: New file.
* libguile/ports.c (scm_getc): Provide `u32_conv_from_encoding' with the
RESULT_BUF stack-allocated buffer to avoid heap allocation.
(find_valid_encoding): Likewise.
(scm_ungetc): Ditto with `u32_conv_to_encoding'.
* libguile/ports.c (scm_port_encoding): Instead of returning "NONE" if
we don't know the encoding, return #f. Allows truncated-print to work
if you don't have a locale set.
* libguile/ports.h:
* libguile/ports.c (scm_ports_prehistory): Remove this function. Just
initialize global vars to NULL/0 instead of needing a prehistory
function.
The intent is to allow compilation with `-Wundef', which in turn should
make it easier to catch erroneous uses of nonexistent macros.
* libguile/__scm.h: Don't assume `BUILDING_LIBGUILE' is defined.
* libguile/conv-uinteger.i.c (SCM_TO_TYPE_PROTO): Remove unneeded CPP
conditional on `TYPE_MIN == 0'.
* libguile/fports.c: Check for the definition of `HAVE_CHSIZE' and
`HAVE_FTRUNCATE', not for their value.
* libguile/ports.c: Likewise.
* libguile/numbers.c (guile_ieee_init): Likewise with `HAVE_DINFINITY'
and `HAVE_DQNAN'.
* test-suite/standalone/test-conversion.c (ieee_init): Likewise.
* libguile/strings.c: Likewise with `SCM_STRING_LENGTH_HISTOGRAM'.
* libguile/strings.h: Likewise.
* libguile/tags.h: Likewise with `HAVE_INTTYPES_H' and `HAVE_STDINT_H'.
* libguile/threads.c: Likewise with `HAVE_PTHREAD_GET_STACKADDR_NP'.
* libguile/vm-engine.c (VM_NAME): Likewise with `VM_CHECK_IP'.
* libguile/gen-scmconfig.c (main): Use "#ifdef HAVE_", not "#if HAVE_".
* libguile/socket.c (scm_setsockopt): Likewise.
This requires separate small fixes.
Readline has internal logic to deal with multi-byte characters, so
it wants bytes, not characters.
scm_c_read gets called by the vm when readline is activated, and it was
truncating multi-byte characters because soft ports didn't have the
UCS-4 capability.
Soft ports need the capability to read UCS-4 characters. Since soft ports
may have a single byte buffer, full characters need to be stored into the
pushback buffer.
This broke the optimizations in scm_c_read for using an alternate buffer
for single-byte-buffered ports, because the opimization wasn't expecting
anything in the pushback buffer.
* libguile/vports.c (sf_fill_input): store complete chars, not single bytes
* libguile/ports.c (scm_c_read): don't use optimized path for non Latin-1.
Add debug prints.
* libguile/string.h: make scm_i_from_stringn and scm_i_string_ref public
so that readline can use them
* guile-readline/readline.c: read bytes, not complete chars, from the
input port. Convert output to the output port's locale