Since the regex library expects 8-bit clean characters and
an 8-bit locale, tests of 8-bit characters need to occur within
the context of an 8-bit locale.
* test-suite/tests/regexp.test (regexp-quote tests): wrap them in an
ISO-8859-1 locale
This requires separate small fixes.
Readline has internal logic to deal with multi-byte characters, so
it wants bytes, not characters.
scm_c_read gets called by the vm when readline is activated, and it was
truncating multi-byte characters because soft ports didn't have the
UCS-4 capability.
Soft ports need the capability to read UCS-4 characters. Since soft ports
may have a single byte buffer, full characters need to be stored into the
pushback buffer.
This broke the optimizations in scm_c_read for using an alternate buffer
for single-byte-buffered ports, because the opimization wasn't expecting
anything in the pushback buffer.
* libguile/vports.c (sf_fill_input): store complete chars, not single bytes
* libguile/ports.c (scm_c_read): don't use optimized path for non Latin-1.
Add debug prints.
* libguile/string.h: make scm_i_from_stringn and scm_i_string_ref public
so that readline can use them
* guile-readline/readline.c: read bytes, not complete chars, from the
input port. Convert output to the output port's locale
* NEWS
* doc/ref/scheme-scripts.texi: doc updates for character encoding of
source code
* doc/ref/api-evaluation.texi: doc updates for character encoding of
source code
String ports should be able to accept any string characters, regardless
of the current locale. Setting it to UTF-8 achieves that.
* libguile/strports.c (scm_i_mkstrport): set port's locale to UTF-8
(scm_mkstrport): convert input string to UTF-8
* libguile/unidata_to_charset.pl (designated): renamed from full
* libguile/srfi-14.c (scm_char_set_designated): new char-set
* libguile/srfi-14.i.c (cs_designated): renamed from cs_full
Since combining characters, such as accents, modify the appearance of the
previous letter, it looks awkward in its character literal form (#\name)
since it modified the backslash. This instead prints the combining
character on a small circle.
* libguile/chars.h (SCM_CODEPOINT_DOTTED_CIRCLE): new #define
* libguile/print.c (iprint1): print combining characters on dotted circles
* libguile/read.c (scm_read_character): parse the combination of combining
characters and dotted circles
* libguile/srfi-14.c (scm_i_ucs_range_to_char_set): new function that
contains the functionality of ucs_range_to_char_set, fixes
off-by-one, and doesn't store surroges
(scm_ucs_range_to_char_set, scm_ucs_range_to_char_set_x): call
scm_i_ucs_range_to_char_set
(scm_i_charset_set_range): new helper function
char-set-xor! was not modifying its input parameter. It isn't
technically required to do so by the spec, but, the other similar
functions do it.
* libguile/srfi-14.c (scm_char_set_xor_x): modify the input parameter
* module/language/tree-il/compile-glil.scm (flatten): Fix compilation of
loops within loops in non-tail positions. Will add a test case soon,
but one way to reproduce it was with the following function:
(define (test)
(let lp ()
(pk 'zero)
(let ((fk (lambda ()
(let ((fk2 (lambda () (pk 'two))))
(let ((fk3 (lambda () (if #t (pk 'three) (fk2)))))
(if #t
(fk3)
(fk2)))))))
(pk 'one)
(fk))
(lp)))
One would expect to see a sequence of "zero one three", but in fact zero
only showed once.
This should fix simplex as well.
* test-suite/tests/encoding-iso88591.test: tests for writing and display
of characters
* test-suite/tests/encoding-iso88597.test: tests for writing and display
of characters
* test-suite/tests/encoding-utf8.test: tests for writing and display
of characters
String ports, being 8-bit, store strings using the character encoding
of the port. This fixes a bug where the default character encoding, and
not the port's encoding, was being used to convert the string port data
back to a string.
* libguile/strports.c: extra comments
(scm_strport_to_string): use port's encoding when converting port data
to a string
* libguile/strings.c (scm_i_from_stringn): renamed from scm_from_stringn
and made internal. All callers changed.
(scm_from_stringn): renamed to scm_i_from_stringn.
* libguile/strings.h: declaration for scm_i_from_stringn
* libguile/srfi-14.c (charsets_complement): use surrogate #defines instead
of hardcoded numbers
* libguile/srfi-14.i.c (cs_full_ranges): remove surrogates from full
charset
* libguile/unidata_to_charset.pl (full): test for surrogates
* libguile/gc_os_dep.c (GC_linux_stack_base) [LINUX_STACKBOTTOM]: cast
input of ctype functions to int
* libguile/inet_aton.c (inet_aton): cast input of ctype functions to int
* libguile/read.c (scm_scan_for_encoding): cast input of isalnum to int
* libguile/win32-socket.c (scm_i_socket_uncomment): cast input of isspace
to int
* libguile/load.c (scm_primitive_load_path): If the compiled path was
out of date, but the fallback path was current, we correctly detected
that case, but loaded the wrong file. So here fix the typo.
* test-suite/lib.scm (with-locale, with-locale*): new test functions
* test-suite/tests/encoding-escapes: don't fail if en_US.utf8 doesn't exist
* test-suite/tests/encoding-iso88591.test: set and restore locale, if
possible
* test-suite/tests/encoding-iso88597.test: set and restore locale, if
possible
* test-suite/tests/encoding-utf8.test: set and restore locale, if possible
* test-suite/tests/srfi-14.test: don't need to setlocale to Latin-1 to
test Latin-1 since string conversion is handled at read/compile time.
Set and restore locale, if possible.
This script was used to generate srfi-14.i.c from the UnicodeData.txt
file supplied by ftp://www.unicode.org/Public/UNIDATA/
* libguile/unidata_to_charset.pl
* emacs/gds-scheme.el (gds-start-utility-guile): Use buffer-local
variable gds-client instead of client, as client is actually unbound
when the process-filter lambda runs. (i.e. This isn't Scheme code!)
* emacs/gds-scheme.el (gds-start-utility-guile): Make the extraction
of client number more robust; in particular when the client emits
comments (about auto compilation) before the number.
i.e. put the extensions where they need to be, and delete
ice-9-debugger-extensions.scm.
* doc/ref/api-debug.texi (Single Stepping through a Procedure's Code):
Change mentions of (ice-9 debugging ice-9-debugger-extensions)
module to whatever is appropriate now (or just remove them).
* module/Makefile.am (NOCOMP_SOURCES): Remove
ice-9-debugger-extensions.scm.
* module/ice-9/debugger.scm (debug-trap): Move here from
ice-9-debugger-extensions.scm.
* module/ice-9/debugger/command-loop.scm ("continue", "finish",
"step", "next"): Move here from ice-9-debugger-extensions.scm.
* module/ice-9/debugger/commands.scm (assert-continuable, continue,
finish, step, next): Move here from ice-9-debugger-extensions.scm.
* module/ice-9/debugging/breakpoints.scm: Don't use
ice-9-debugger-extensions module.
* module/ice-9/debugging/ice-9-debugger-extensions.scm: Removed.
* module/ice-9/debugging/trace.scm, module/ice-9/debugging/traps.scm:
Remove more old version code.
* module/ice-9/debugging/traps.scm (guile-trap-features): Hardcoded as
'(tweaking).
* module/ice-9/debugging/ice-9-debugger-extensions.scm: Remove all
code checking for version < 1.7, and move code for versions >= 1.7
up to top level. Comment out dummy mutex definitions for now, as
I'm not sure how to rewrite them correctly for psyntax.