This requires separate small fixes.
Readline has internal logic to deal with multi-byte characters, so
it wants bytes, not characters.
scm_c_read gets called by the vm when readline is activated, and it was
truncating multi-byte characters because soft ports didn't have the
UCS-4 capability.
Soft ports need the capability to read UCS-4 characters. Since soft ports
may have a single byte buffer, full characters need to be stored into the
pushback buffer.
This broke the optimizations in scm_c_read for using an alternate buffer
for single-byte-buffered ports, because the opimization wasn't expecting
anything in the pushback buffer.
* libguile/vports.c (sf_fill_input): store complete chars, not single bytes
* libguile/ports.c (scm_c_read): don't use optimized path for non Latin-1.
Add debug prints.
* libguile/string.h: make scm_i_from_stringn and scm_i_string_ref public
so that readline can use them
* guile-readline/readline.c: read bytes, not complete chars, from the
input port. Convert output to the output port's locale
Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
Avoid possible mutex hang when scm_lfwrite_substr is used in error
message output and when an error has caused the stringbuf write
mutex to not be unlocked. scm_lfwrite_substr makes a substring:
making a substring requires that mutex.
Hopefully, all cases of non-local jumps when the stringbuf write
lock is held have been eliminated anyway, making this O.B.E.
* libguile/ports.c (scm_lfwrite_str): include functionality in this
function instead of making this a special case of scm_lfwrite_substr
This requres the creation of a new type
scm_t_string_failed_conversion_handler to replace libunistring's
enum iconveh_ilseq_handler.
* libguile/strings.h: don't include <uniconv.h>
(scm_t_string_failed_conversion_handler): new enum type
(SCM_FAILED_CONVERSION_ERROR, SCM_FAILED_CONVERSION_QUESTION_MARK):
(SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE): new enum type values
* libguile/strings.c (scm_to_stringn): now takes type
scm_t_string_failed_conversion_handler. All callers changed.
* libguile/print.c: include <uniconv.h>
* libguile/ports.c (scm_lfwrite_substr): use
scm_t_string_conversion_handler's constants
* libguile/gen-scmconfig.c (SCM_ICONVEH_ERROR):
(SCM_ICONVEH_QUESTION_MARK, SCM_ICONVEH_ESCAPE_SEQUENCE): store
iconveh_ilseq_hander constants as #define's
The port position macros incorrectly required enclosing braces
when used within if statements.
* libguile/ports.h (SCM_INCLINE, SCM_ZEROCOL, SCM_INCCOL)
(SCM_DECCOL, SCM_TABCOL): enclose macro in do/while
* libguile/ports.c (update_port_lf): remove extra braces
This adds full Unicode strings as a datatype, and it adds some
minimal functionality. The terminal and port encoding is assumed
to be ISO-8859-1. Non-ISO-8859-1 characters are written or
input as string character escapes.
The string character escapes now have 3 forms: \xXX \uXXXX and
\UXXXXXX, for unprintable characters that have 2, 4 or 6 hex digits.
The process for writing to strings has been modified. There is now a
function scm_i_string_start_writing that does the copy-on-write
conversion if necessary.
To compile strings that may be wide, the VM storage of strings and
string-likes has changed.
Most string-using functions have not yet been updated and may break
when used with wide strings.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
use variable width string bytecode format
* module/language/assembly.scm (byte-length): use variable width
bytecode format
* libguile/vm-i-loader.c (load-string, load-symbol):
(load-keyword, define): use variable-width bytecode format
* libguile/vm-engine.h (FETCH_WIDTH): new macro
* libguile/strings.h: new declarations
* libguile/strings.c (make_wide_stringbuf): new function
(widen_stringbuf): new function
(scm_i_make_wide_string): new function
(scm_i_is_narrow_string): new function
(scm_i_string_wide_chars): new function
(scm_i_string_start_writing): new function
(scm_i_string_ref): new function
(scm_i_string_set_x): new function
(scm_i_is_narrow_symbol): new function
(scm_i_symbol_wide_chars, scm_i_symbol_ref): new function
(scm_string_width): new function
(unistring_escapes_to_guile_escapes): new function
(scm_to_stringn): new function
(scm_i_stringbuf_free): modify for wide strings
(scm_i_substring_copy): modify for wide strings
(scm_i_string_chars, scm_string_append): modify for wide strings
(scm_i_make_symbol, scm_to_locale_stringn): modify for wide strings
(scm_string_dump, scm_symbol_dump, scm_to_locale_stringbuf):
(scm_string, scm_i_deprecated_string_chars): modify for wide strings
(scm_from_locale_string, scm_from_locale_stringn): add null test
* libguile/srfi-13.c: add calls for scm_i_string_start_writing for
each call of scm_i_string_stop_writing
(scm_string_for_each): modify for wide strings
* libguile/socket.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/rw.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/read.c (scm_read_string): allow reading of wide strings
* libguile/print.h: add declaration for scm_charprint
* libguile/print.c (iprin1): print wide strings and add new string
escapes
(scm_charprint): new function
* libguile/ports.h: new declarations for scm_lfwrite_substr and
scm_lfwrite_str
* libguile/ports.c (update_port_lf): new function
(scm_lfwrite): use update_port_lf
(scm_lfwrite_substr): new function
(scm_lfwrite_str): new function
* test-suite/tests/asm-to-bytecode.test ("compiler"): add string
width byte to sting-like asm tests
* libguile/gen-scmconfig.c (main): Produce a definition for
`scm_t_off'.
* libguile/ports.h (scm_t_port)[read_buf_size, saved_read_buf_size,
write_buf_size, seek, truncate]: Use `scm_t_off' instead of `off_t' so
that the layout and size of the structure does not depend on the
application's `_FILE_OFFSET_BITS' value. Reported by Bill
Schottstaedt, see
http://lists.gnu.org/archive/html/bug-guile/2009-06/msg00018.html.
(scm_set_port_seek, scm_set_port_truncate): Update.
* libguile/ports.c (scm_set_port_seek, scm_set_port_truncate): Use
`scm_t_off' and `off_t_or_off64_t'.
* libguile/fports.c (fport_seek, fport_truncate): Use `scm_t_off'
instead of `off_t'.
* libguile/r6rs-ports.c (bip_seek, cbp_seek, bop_seek): Use `scm_t_off'
instead of `off_t'.
* libguile/rw.c (scm_write_string_partial): Likewise.
* libguile/strports.c (st_resize_port, st_seek, st_truncate): Likewise.
* doc/ref/api-io.texi (Port Implementation): Update prototype of
`scm_set_port_seek ()' and `scm_set_port_truncate ()'.
* NEWS: Update.
* libguile/goops.c (scm_port_class): Statically allocate it.
(create_port_classes): Don't use `scm_calloc ()'.
* libguile/goops.h (scm_port_class): Update declaration.
* libguile/ports.c (scm_make_port_type): When checking whether
GOOPS is initialized, check whether the first element of
SCM_PORT_CLASS is non-zero.
* libguile/goops.c (scm_port_class): Statically allocate it.
(create_port_classes): Don't use `scm_calloc ()'.
* libguile/goops.h (scm_port_class): Update declaration.
* libguile/ports.c (scm_make_port_type): When checking whether
GOOPS is initialized, check whether the first element of
SCM_PORT_CLASS is non-zero.
We recently modified scm_c_read so that it temporarily swaps the
caller's buffer with the port's normal read buffer, in order to
improve performance in the case where the port is unbuffered (which
actually means having a single-byte buffer) - but we implemented the
swap in the buffered case too. The latter turns out to be a bad idea
- because it means that the C code of a custom port implementation
cannot rely on a port's buffer always being the same as when it was
first set up - and so this commit reverts that. The buffer swapping
trick now applies to unbuffered ports only.
* libguile/ports.c (scm_c_read): Only do swapping of port and caller
buffer for unbuffered ports.
Idea and original patch were by Ludovic Courts, this is Neil Jerram's
reworking of it.
* libguile/srfi-4.c (scm_uniform_vector_read_x): Use scm_c_read,
instead of equivalent code here.
* libguile/ports.c (scm_fill_input): Add assertion that read
buffer is empty when called.
(port_and_swap_buffer, swap_buffer): New, for...
(scm_c_read): Use caller's buffer for reading, to avoid making N
1-byte low-level read calls, in the case where the port is
unbuffered (or has a very small buffer).
* libguile/ports.c (register_finalizer_for_port): New.
(finalize_port): New.
(scm_new_port_table_entry): Call `register_finalizer_for_port ()'
before returning the new port.
(scm_ports_prehistory): Use `scm_gc_malloc_pointerless ()' instead of
`scm_gc_malloc ()' when allocating room for SCM_I_PORT_TABLE.
git-archimport-id: lcourtes@laas.fr--2005-libre/guile-core--boehm-gc--1.9--patch-37
* libguile/gc.c (scm_init_storage): Do not initialize SCM_I_PORT_TABLE
here: this is done in `scm_ports_prehistory ()'. This fixes the bug
mentioned in the previous patch log.
* libguile/ports.c (scm_new_port_table_entry): Slightly clarified the
code.
git-archimport-id: lcourtes@laas.fr--2005-libre/guile-core--boehm-gc--1.9--patch-4
* libguile/gc.c (scm_gc_stats): Fixed so that it returns a relevant
result instead of just `SCM_EOL'.
* libguile/ports.c: Include `assert.h'. Don't include `malloc.h'.
(scm_make_port_type): Use `scm_gc_realloc ()' instead of `realloc ()'.
(scm_new_port_table_entry): Likewise.
(scm_flush): Added an assertion on the port number.
(scm_ports_prehistory): Use `scm_gc_malloc ()' instead of `scm_malloc ()'.
git-archimport-id: lcourtes@laas.fr--2005-libre/guile-core--boehm-gc--1.9--patch-3