Suggested by Noah Lavine <noah.b.lavine@gmail.com>.
* doc/ref/api-data.texi (String Searching): Mention the return value of
`string-index', `string-index-right', and `string-rindex' when no
match is found.
* libguile/srfi-13.c (scm_string_index, scm_string_index_right,
scm_string_rindex): Adjust docstring accordingly.
* libguile/srfi-13.h:
* libguile/srfi-13.c (scm_string_filter, scm_string_delete): Swap
char_pred and s argument order, to comply with SRFI-13. There is a
back-compat shim that will detect programs that used the old,
erroneous interface, while giving a warning.
* doc/ref/api-data.texi: Update docs.
* doc/ref/api-data.texi (Characters): Documentation for `char-titlecase'.
* doc/ref/api-i18n.texi (Character Case Mapping): Documentation for
`char-locale-titlecase' and `string-locale-titlecase'.
* libguile/chars.c, libguile/chars.h (scm_char_titlecase, scm_c_titlecase): New
functions.
* libguile/i18n.c, libguile/i18n.h (chr_to_case, scm_char_locale_titlecase,
str_to_case, scm_string_locale_titlecase): New functions.
* libguile/i18n.c (scm_char_locale_downcase, scm_char_locale_upcase,
scm_string_locale_downcase, scm_string_locale_upcase): Refactor to share code
via chr_to_case and str_to_case, as appropriate.
* module/ice-9/i18n.scm (char-locale-title-case, string-locale-titlecase): New
functions.
* libguile/srfi-13.c (string_titlecase_x): Use uc_totitle instead of uc_toupper.
* test-suite/tests/chars.test: Tests for `char-titlecase'.
* test-suite/tests/i18n.test: Tests for `char-locale-titlecase' and
`string-locale-titlecase'.
* test-suite/tests/srfi-13.test: Tests for `string-titlecase'.
* libguile/srfi-13.c (compare_strings): Switch the "longer" and
"shorter" arguments. All the callers of this function assumed that
shorter came first. Fixes (string<? "abc" "abcd").
This adds full Unicode strings as a datatype, and it adds some
minimal functionality. The terminal and port encoding is assumed
to be ISO-8859-1. Non-ISO-8859-1 characters are written or
input as string character escapes.
The string character escapes now have 3 forms: \xXX \uXXXX and
\UXXXXXX, for unprintable characters that have 2, 4 or 6 hex digits.
The process for writing to strings has been modified. There is now a
function scm_i_string_start_writing that does the copy-on-write
conversion if necessary.
To compile strings that may be wide, the VM storage of strings and
string-likes has changed.
Most string-using functions have not yet been updated and may break
when used with wide strings.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
use variable width string bytecode format
* module/language/assembly.scm (byte-length): use variable width
bytecode format
* libguile/vm-i-loader.c (load-string, load-symbol):
(load-keyword, define): use variable-width bytecode format
* libguile/vm-engine.h (FETCH_WIDTH): new macro
* libguile/strings.h: new declarations
* libguile/strings.c (make_wide_stringbuf): new function
(widen_stringbuf): new function
(scm_i_make_wide_string): new function
(scm_i_is_narrow_string): new function
(scm_i_string_wide_chars): new function
(scm_i_string_start_writing): new function
(scm_i_string_ref): new function
(scm_i_string_set_x): new function
(scm_i_is_narrow_symbol): new function
(scm_i_symbol_wide_chars, scm_i_symbol_ref): new function
(scm_string_width): new function
(unistring_escapes_to_guile_escapes): new function
(scm_to_stringn): new function
(scm_i_stringbuf_free): modify for wide strings
(scm_i_substring_copy): modify for wide strings
(scm_i_string_chars, scm_string_append): modify for wide strings
(scm_i_make_symbol, scm_to_locale_stringn): modify for wide strings
(scm_string_dump, scm_symbol_dump, scm_to_locale_stringbuf):
(scm_string, scm_i_deprecated_string_chars): modify for wide strings
(scm_from_locale_string, scm_from_locale_stringn): add null test
* libguile/srfi-13.c: add calls for scm_i_string_start_writing for
each call of scm_i_string_stop_writing
(scm_string_for_each): modify for wide strings
* libguile/socket.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/rw.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/read.c (scm_read_string): allow reading of wide strings
* libguile/print.h: add declaration for scm_charprint
* libguile/print.c (iprin1): print wide strings and add new string
escapes
(scm_charprint): new function
* libguile/ports.h: new declarations for scm_lfwrite_substr and
scm_lfwrite_str
* libguile/ports.c (update_port_lf): new function
(scm_lfwrite): use update_port_lf
(scm_lfwrite_substr): new function
(scm_lfwrite_str): new function
* test-suite/tests/asm-to-bytecode.test ("compiler"): add string
width byte to sting-like asm tests
charset cases, count chars kept and build a string in a second pass,
rather than using a cons cell for every char kept. Use a shared
substring when nothing removed (such sharing is allowed by the srfi).
scm_string_tabulate, string_upcase_x, string_down_case_x,
string_titlecase_x, string_reverse_x, scm_string_tokenize): Use
size_t instead of int for indices into strings. Make sure that no
over- or underflow occurs. Thanks to Andreas Vögele!
(scm_xsubstring, scm_string_xcopy_x): Use ints for 'extended'
indices, which can also be negative.
scm_i_string_chars et al. Copious scm_remember_upto_heres have
been inserted. Made sure that no internal string pointer is used
across a SCM_TICK or a possible GC.