Conversion from char to scm_t_wchar require an intermediate cast to
unsigned char. By changing the return type of SCM_STRINGBUF_INLINE_CHARS
to unsigned char *, doublecasts in the code can be avoided. Also,
some clarification of return types.
* libguile/strings.c (STRINGBUF_OUTLINE_CHARS)
(STRINGBUF_INLINE_CHARS): now returns unsigned char *; all callers changed.
This requres the creation of a new type
scm_t_string_failed_conversion_handler to replace libunistring's
enum iconveh_ilseq_handler.
* libguile/strings.h: don't include <uniconv.h>
(scm_t_string_failed_conversion_handler): new enum type
(SCM_FAILED_CONVERSION_ERROR, SCM_FAILED_CONVERSION_QUESTION_MARK):
(SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE): new enum type values
* libguile/strings.c (scm_to_stringn): now takes type
scm_t_string_failed_conversion_handler. All callers changed.
* libguile/print.c: include <uniconv.h>
* libguile/ports.c (scm_lfwrite_substr): use
scm_t_string_conversion_handler's constants
* libguile/gen-scmconfig.c (SCM_ICONVEH_ERROR):
(SCM_ICONVEH_QUESTION_MARK, SCM_ICONVEH_ESCAPE_SEQUENCE): store
iconveh_ilseq_hander constants as #define's
* libguile/string.c (scm_string): Restores the functionality
where scm_string tests for circular lists
* test-suite/tests/strings.test: add test for circular lists
* libguile/strings.c (unistring_escapes_to_guile_escapes): cast
tolower's parameter to int
* libguile/read.c (CHAR_DOWNCASE): cast tolower's parameter to int
%string-dump and %symbol-dump are modified to return assocation lists
of string and symbol attributes instead of printing to stderr. They
are no longer conditional on SCM_DEBUG.
* libguile/strings.c (scm_sys_string_dump)
(scm_sys_symbol_dump): now returns alist of properties. No longer
require that SCM_DEBUG be defined.
(scm_sys_stringbuf_hist): now conditional on
SCM_STRING_LENGTH_HISTOGRAM
* libguile/strings.h: scm_sys_string_dump and scm_sys_symbol dump
are now declared as API
This adds full Unicode strings as a datatype, and it adds some
minimal functionality. The terminal and port encoding is assumed
to be ISO-8859-1. Non-ISO-8859-1 characters are written or
input as string character escapes.
The string character escapes now have 3 forms: \xXX \uXXXX and
\UXXXXXX, for unprintable characters that have 2, 4 or 6 hex digits.
The process for writing to strings has been modified. There is now a
function scm_i_string_start_writing that does the copy-on-write
conversion if necessary.
To compile strings that may be wide, the VM storage of strings and
string-likes has changed.
Most string-using functions have not yet been updated and may break
when used with wide strings.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
use variable width string bytecode format
* module/language/assembly.scm (byte-length): use variable width
bytecode format
* libguile/vm-i-loader.c (load-string, load-symbol):
(load-keyword, define): use variable-width bytecode format
* libguile/vm-engine.h (FETCH_WIDTH): new macro
* libguile/strings.h: new declarations
* libguile/strings.c (make_wide_stringbuf): new function
(widen_stringbuf): new function
(scm_i_make_wide_string): new function
(scm_i_is_narrow_string): new function
(scm_i_string_wide_chars): new function
(scm_i_string_start_writing): new function
(scm_i_string_ref): new function
(scm_i_string_set_x): new function
(scm_i_is_narrow_symbol): new function
(scm_i_symbol_wide_chars, scm_i_symbol_ref): new function
(scm_string_width): new function
(unistring_escapes_to_guile_escapes): new function
(scm_to_stringn): new function
(scm_i_stringbuf_free): modify for wide strings
(scm_i_substring_copy): modify for wide strings
(scm_i_string_chars, scm_string_append): modify for wide strings
(scm_i_make_symbol, scm_to_locale_stringn): modify for wide strings
(scm_string_dump, scm_symbol_dump, scm_to_locale_stringbuf):
(scm_string, scm_i_deprecated_string_chars): modify for wide strings
(scm_from_locale_string, scm_from_locale_stringn): add null test
* libguile/srfi-13.c: add calls for scm_i_string_start_writing for
each call of scm_i_string_stop_writing
(scm_string_for_each): modify for wide strings
* libguile/socket.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/rw.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/read.c (scm_read_string): allow reading of wide strings
* libguile/print.h: add declaration for scm_charprint
* libguile/print.c (iprin1): print wide strings and add new string
escapes
(scm_charprint): new function
* libguile/ports.h: new declarations for scm_lfwrite_substr and
scm_lfwrite_str
* libguile/ports.c (update_port_lf): new function
(scm_lfwrite): use update_port_lf
(scm_lfwrite_substr): new function
(scm_lfwrite_str): new function
* test-suite/tests/asm-to-bytecode.test ("compiler"): add string
width byte to sting-like asm tests
* libguile/generalized-vectors.h:
* libguile/generalized-vectors.c: Add a registry of vector constructors.
(scm_make_generalized_vector): New public function, constructs a
vector of a given type.
* libguile/bitvectors.c:
* libguile/bytevectors.c:
* libguile/srfi-4.c:
* libguile/strings.c:
* libguile/vectors.c: Register vector constructors.
* libguile/extensions.c (scm_init_extensions): No need to NULL the list
of registered extensions here, the static init does it for us. Allows
scm_c_register_extension to be called before scm_init_extensions.
* libguile/init.c (scm_i_init_guile): Move array initialization earlier,
so e.g. scm_init_strings has access to a valid list of array element
types when registering its vector constructor.
* libguile/array-handle.c (scm_i_register_array_implementation):
(scm_i_array_implementation_for_obj): Add generic array facility,
which will (in a few commits) detangle the array code.
(scm_array_get_handle): Use the generic array facility. Note that
scm_t_array_handle no longer has ref and set function pointers;
instead it has a pointer to the array implementation. It is unlikely
that code out there used these functions, however, as the supported
way was through scm_array_handle_ref/set_x.
(scm_array_handle_pos): Move this function here from arrays.c.
(scm_array_handle_element_type): New function, returns a Scheme value
representing the type of element stored in this array.
* libguile/array-handle.h (scm_t_array_element_type): New enum, for
generically determining the type of an array.
(scm_array_handle_rank):
(scm_array_handle_dims): These are now just #defines.
* libguile/arrays.c:
* libguile/bitvectors.c:
* libguile/bytevectors.c:
* libguile/srfi-4.c:
* libguile/strings.c:
* libguile/vectors.c: Register array implementations for all of these.
* libguile/inline.h: Update for array_handle_ref/set change.
* libguile/deprecated.h: Need to include arrays.h now.
* libguile/strings.c (SET_STRINGBUF_SHARED): Don't modify BUF if it's
already marked as shared since it might be a read-only stringbuf.
This error can be caught when linking with GNU ld with "-z relro".
* libguile/strings.h (scm_tc7_ro_string, SCM_I_STRINGBUF_F_SHARED,
SCM_I_STRINGBUF_F_INLINE): New macros.
* libguile/strings.c (STRINGBUF_F_SHARED): Alias for
`SCM_I_STRINGBUF_F_SHARED'.
(STRINGBUF_F_INLINE): Alias for `SCM_I_STRINGBUF_F_INLINE'.
(RO_STRING_TAG): Alias for `scm_tc7_ro_string'.
* libguile/strings.c (scm_string_ref): Add proper range checking for the
empty string.
(scm_string_set_x): Likewise.
Reported by Bill Schottstaedt <bil@ccrma.Stanford.EDU>.
* test-suite/tests/strings.test ("string-ref"): New test prefix.
("string-set!")["empty string", "empty string and non-zero index",
"out of range", "negative index", "regular string"]: New tests.
* NEWS: Update.
* libguile/read.c (scm_read_string): Use `scm_i_make_read_only_string ()' to
return a read-only string, as mandated by R5RS. Reported by Bill
Schottstaedt <bil@ccrma.Stanford.EDU>.
* libguile/strings.c (scm_i_make_read_only_string): New function.
(scm_i_shared_substring_read_only): Special-case the empty string
so that the read-only and read-write empty strings are `eq?'. This
optimization is relied on by the `substring/shared' `empty string'
test case in `srfi-13.test'.
* libguile/strings.h (scm_i_make_read_only_string): New declaration.
* test-suite/tests/strings.test ("string-set!")["literal string"]: New test.
* NEWS: Update.
* libguile/strings.c (scm_i_symbol_substring): Return a read-only string
since R5RS requires `symbol->string' to return a read-only string.
Reported by Bill Schottstaedt <bil@ccrma.Stanford.EDU>.
* test-suite/tests/symbols.test: Add `define-module' clause.
(exception:immutable-string): Adjust to current exception.
("symbol->string")["result is an immutable string"]: Use
`pass-if-exception' instead of `expect-fail-exception'.
* NEWS: Update.
* libguile/strings.c (scm_i_string_writable_chars): Remove use of
`scm_i_thread_put_to_sleep ()'. This leaves a race condition,
which is hopefully not harmful.