* libguile/root.h
* libguile/root.c (scm_sys_protects): It used to be that for some reason
we'd define a special array of "protected" values. This was a little
silly, always, but with the BDW GC it's completely unnecessary. Also
many of these variables were unused, and none of them were good API.
So remove this array, and either eliminate, make static, or make
internal the various values.
* libguile/snarf.h: No need to generate calls to scm_permanent_object.
* guile-readline/readline.c (scm_init_readline): No need to call
scm_permanent_object.
* libguile/array-map.c (ramap, rafe): Remove the dubious nullvect
optimizations.
* libguile/async.c (scm_init_async): No need to init scm_asyncs, it is
no more.
* libguile/eval.c (scm_init_eval): No need to init scm_listofnull, it is
no more.
* libguile/gc.c: Make scm_protects a static var.
(scm_storage_prehistory): Change the sanity check to use the address
of protects.
(scm_init_gc_protect_object): No need to clear the scm_sys_protects,
as it is no more.
* libguile/keywords.c: Make the keyword obarray a static var.
* libguile/numbers.c: Make flo0 a static var.
* libguile/objprop.c: Make object_whash a static var.
* libguile/properties.c: Make properties_whash a static var.
* libguile/srcprop.h:
* libguile/srcprop.c: Make scm_source_whash a global with internal
linkage.
* libguile/strings.h:
* libguile/strings.c: Make scm_nullstr a global with internal linkage.
* libguile/vectors.c (scm_init_vectors): No need to init scm_nullvect,
it's unused.
The intent is to allow compilation with `-Wundef', which in turn should
make it easier to catch erroneous uses of nonexistent macros.
* libguile/__scm.h: Don't assume `BUILDING_LIBGUILE' is defined.
* libguile/conv-uinteger.i.c (SCM_TO_TYPE_PROTO): Remove unneeded CPP
conditional on `TYPE_MIN == 0'.
* libguile/fports.c: Check for the definition of `HAVE_CHSIZE' and
`HAVE_FTRUNCATE', not for their value.
* libguile/ports.c: Likewise.
* libguile/numbers.c (guile_ieee_init): Likewise with `HAVE_DINFINITY'
and `HAVE_DQNAN'.
* test-suite/standalone/test-conversion.c (ieee_init): Likewise.
* libguile/strings.c: Likewise with `SCM_STRING_LENGTH_HISTOGRAM'.
* libguile/strings.h: Likewise.
* libguile/tags.h: Likewise with `HAVE_INTTYPES_H' and `HAVE_STDINT_H'.
* libguile/threads.c: Likewise with `HAVE_PTHREAD_GET_STACKADDR_NP'.
* libguile/vm-engine.c (VM_NAME): Likewise with `VM_CHECK_IP'.
* libguile/gen-scmconfig.c (main): Use "#ifdef HAVE_", not "#if HAVE_".
* libguile/socket.c (scm_setsockopt): Likewise.
* libguile/strings.c (STRINGBUF_HEADER_SIZE, STRINGBUF_HEADER_BYTES):
New macros.
(STRINGBUF_F_INLINE, STRINGBUF_INLINE, STRINGBUF_OUTLINE_CHARS,
STRINGBUF_OUTLINE_LENGTH, STRINGBUF_INLINE_CHARS,
STRINGBUF_INLINE_LENGTH, STRINGBUF_MAX_INLINE_LEN): Remove.
(STRINGBUF_CHARS, STRINGBUF_WIDE_CHARS): Adjust to return a fixed
location.
(STRINGBUF_LENGTH): Get the length from word 1.
(make_stringbuf, make_wide_stringbuf): Adjust to use a contiguous
memory region.
(wide_stringbuf): Renamed from `widen_stringbuf'. Adjust similarly.
Return the new stringbuf. Callers updated.
(narrow_stringbuf): Likewise.
(scm_sys_string_dump, scm_sys_symbol_dump): Remove `stringbuf-inline'
pair.
* test-suite/tests/strings.test ("string internals")["null strings are
inlined", "short Latin-1 encoded strings are inlined", "long Latin-1
encoded strings are not inlined", "short UCS-4 encoded strings are not
inlined", "long UCS-4 encoded strings are not inlined"]: Remove.
* test-suite/tests/symbols.test ("symbol internals")["null symbols are
inlined", "short Latin-1 encoded symbols are inlined", "long Latin-1
encoded symbols are not inlined", "short UCS-4 encoded symbols are not
inlined", "long UCS-4 encoded symbols are not inlined"]: Remove.
String ports, being 8-bit, store strings using the character encoding
of the port. This fixes a bug where the default character encoding, and
not the port's encoding, was being used to convert the string port data
back to a string.
* libguile/strports.c: extra comments
(scm_strport_to_string): use port's encoding when converting port data
to a string
* libguile/strings.c (scm_i_from_stringn): renamed from scm_from_stringn
and made internal. All callers changed.
(scm_from_stringn): renamed to scm_i_from_stringn.
* libguile/strings.h: declaration for scm_i_from_stringn
Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
* libguile/socket.c (scm_recv): receive the message without holding the
stringbuf writing lock
(scm_send): try to narrow a string before using it
* libguile/stime.c (strftime): convert string to UTF-8 so that it can
be safely passed to strftime
(strptime): convert input string to UTF-8 so that it can be safely
passed through strptime
* libguile/strings.c (narrow_stringbuf): new function
(scm_i_try_narrow_string): new function
* libguile/strings.h: new declaration for scm_i_try_narrow_string
* libguile/numbers.c (scm_i_print_fraction): use string accessors
(XDIGIT2UINT): use libunistring function
(mem2uinteger, mem2integer, mem2decimal_from_point, mem2ureal)
(mem2complex): take scheme string instead of c string; use accessors
(scm_i_string_to_number): new function
(scm_c_locale_string_to_number): use scm_i_string_to_number
* libguile/numbers.h: declaration for scm_i_string_to_number
* libguile/strings.c (scm_i_string_strcmp): new function
* libguile/strings.h: declaration for scm_i_string_strcmp
Conversion from char to scm_t_wchar require an intermediate cast to
unsigned char. By changing the return type of SCM_STRINGBUF_INLINE_CHARS
to unsigned char *, doublecasts in the code can be avoided. Also,
some clarification of return types.
* libguile/strings.c (STRINGBUF_OUTLINE_CHARS)
(STRINGBUF_INLINE_CHARS): now returns unsigned char *; all callers changed.
This requres the creation of a new type
scm_t_string_failed_conversion_handler to replace libunistring's
enum iconveh_ilseq_handler.
* libguile/strings.h: don't include <uniconv.h>
(scm_t_string_failed_conversion_handler): new enum type
(SCM_FAILED_CONVERSION_ERROR, SCM_FAILED_CONVERSION_QUESTION_MARK):
(SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE): new enum type values
* libguile/strings.c (scm_to_stringn): now takes type
scm_t_string_failed_conversion_handler. All callers changed.
* libguile/print.c: include <uniconv.h>
* libguile/ports.c (scm_lfwrite_substr): use
scm_t_string_conversion_handler's constants
* libguile/gen-scmconfig.c (SCM_ICONVEH_ERROR):
(SCM_ICONVEH_QUESTION_MARK, SCM_ICONVEH_ESCAPE_SEQUENCE): store
iconveh_ilseq_hander constants as #define's
* libguile/string.c (scm_string): Restores the functionality
where scm_string tests for circular lists
* test-suite/tests/strings.test: add test for circular lists
* libguile/strings.c (unistring_escapes_to_guile_escapes): cast
tolower's parameter to int
* libguile/read.c (CHAR_DOWNCASE): cast tolower's parameter to int
%string-dump and %symbol-dump are modified to return assocation lists
of string and symbol attributes instead of printing to stderr. They
are no longer conditional on SCM_DEBUG.
* libguile/strings.c (scm_sys_string_dump)
(scm_sys_symbol_dump): now returns alist of properties. No longer
require that SCM_DEBUG be defined.
(scm_sys_stringbuf_hist): now conditional on
SCM_STRING_LENGTH_HISTOGRAM
* libguile/strings.h: scm_sys_string_dump and scm_sys_symbol dump
are now declared as API
This adds full Unicode strings as a datatype, and it adds some
minimal functionality. The terminal and port encoding is assumed
to be ISO-8859-1. Non-ISO-8859-1 characters are written or
input as string character escapes.
The string character escapes now have 3 forms: \xXX \uXXXX and
\UXXXXXX, for unprintable characters that have 2, 4 or 6 hex digits.
The process for writing to strings has been modified. There is now a
function scm_i_string_start_writing that does the copy-on-write
conversion if necessary.
To compile strings that may be wide, the VM storage of strings and
string-likes has changed.
Most string-using functions have not yet been updated and may break
when used with wide strings.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
use variable width string bytecode format
* module/language/assembly.scm (byte-length): use variable width
bytecode format
* libguile/vm-i-loader.c (load-string, load-symbol):
(load-keyword, define): use variable-width bytecode format
* libguile/vm-engine.h (FETCH_WIDTH): new macro
* libguile/strings.h: new declarations
* libguile/strings.c (make_wide_stringbuf): new function
(widen_stringbuf): new function
(scm_i_make_wide_string): new function
(scm_i_is_narrow_string): new function
(scm_i_string_wide_chars): new function
(scm_i_string_start_writing): new function
(scm_i_string_ref): new function
(scm_i_string_set_x): new function
(scm_i_is_narrow_symbol): new function
(scm_i_symbol_wide_chars, scm_i_symbol_ref): new function
(scm_string_width): new function
(unistring_escapes_to_guile_escapes): new function
(scm_to_stringn): new function
(scm_i_stringbuf_free): modify for wide strings
(scm_i_substring_copy): modify for wide strings
(scm_i_string_chars, scm_string_append): modify for wide strings
(scm_i_make_symbol, scm_to_locale_stringn): modify for wide strings
(scm_string_dump, scm_symbol_dump, scm_to_locale_stringbuf):
(scm_string, scm_i_deprecated_string_chars): modify for wide strings
(scm_from_locale_string, scm_from_locale_stringn): add null test
* libguile/srfi-13.c: add calls for scm_i_string_start_writing for
each call of scm_i_string_stop_writing
(scm_string_for_each): modify for wide strings
* libguile/socket.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/rw.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/read.c (scm_read_string): allow reading of wide strings
* libguile/print.h: add declaration for scm_charprint
* libguile/print.c (iprin1): print wide strings and add new string
escapes
(scm_charprint): new function
* libguile/ports.h: new declarations for scm_lfwrite_substr and
scm_lfwrite_str
* libguile/ports.c (update_port_lf): new function
(scm_lfwrite): use update_port_lf
(scm_lfwrite_substr): new function
(scm_lfwrite_str): new function
* test-suite/tests/asm-to-bytecode.test ("compiler"): add string
width byte to sting-like asm tests
* libguile/generalized-vectors.h:
* libguile/generalized-vectors.c: Add a registry of vector constructors.
(scm_make_generalized_vector): New public function, constructs a
vector of a given type.
* libguile/bitvectors.c:
* libguile/bytevectors.c:
* libguile/srfi-4.c:
* libguile/strings.c:
* libguile/vectors.c: Register vector constructors.
* libguile/extensions.c (scm_init_extensions): No need to NULL the list
of registered extensions here, the static init does it for us. Allows
scm_c_register_extension to be called before scm_init_extensions.
* libguile/init.c (scm_i_init_guile): Move array initialization earlier,
so e.g. scm_init_strings has access to a valid list of array element
types when registering its vector constructor.
* libguile/array-handle.c (scm_i_register_array_implementation):
(scm_i_array_implementation_for_obj): Add generic array facility,
which will (in a few commits) detangle the array code.
(scm_array_get_handle): Use the generic array facility. Note that
scm_t_array_handle no longer has ref and set function pointers;
instead it has a pointer to the array implementation. It is unlikely
that code out there used these functions, however, as the supported
way was through scm_array_handle_ref/set_x.
(scm_array_handle_pos): Move this function here from arrays.c.
(scm_array_handle_element_type): New function, returns a Scheme value
representing the type of element stored in this array.
* libguile/array-handle.h (scm_t_array_element_type): New enum, for
generically determining the type of an array.
(scm_array_handle_rank):
(scm_array_handle_dims): These are now just #defines.
* libguile/arrays.c:
* libguile/bitvectors.c:
* libguile/bytevectors.c:
* libguile/srfi-4.c:
* libguile/strings.c:
* libguile/vectors.c: Register array implementations for all of these.
* libguile/inline.h: Update for array_handle_ref/set change.
* libguile/deprecated.h: Need to include arrays.h now.
* libguile/strings.c (SET_STRINGBUF_SHARED): Don't modify BUF if it's
already marked as shared since it might be a read-only stringbuf.
This error can be caught when linking with GNU ld with "-z relro".
* libguile/strings.h (scm_tc7_ro_string, SCM_I_STRINGBUF_F_SHARED,
SCM_I_STRINGBUF_F_INLINE): New macros.
* libguile/strings.c (STRINGBUF_F_SHARED): Alias for
`SCM_I_STRINGBUF_F_SHARED'.
(STRINGBUF_F_INLINE): Alias for `SCM_I_STRINGBUF_F_INLINE'.
(RO_STRING_TAG): Alias for `scm_tc7_ro_string'.
* libguile/strings.c (scm_string_ref): Add proper range checking for the
empty string.
(scm_string_set_x): Likewise.
Reported by Bill Schottstaedt <bil@ccrma.Stanford.EDU>.
* test-suite/tests/strings.test ("string-ref"): New test prefix.
("string-set!")["empty string", "empty string and non-zero index",
"out of range", "negative index", "regular string"]: New tests.
* NEWS: Update.
* libguile/read.c (scm_read_string): Use `scm_i_make_read_only_string ()' to
return a read-only string, as mandated by R5RS. Reported by Bill
Schottstaedt <bil@ccrma.Stanford.EDU>.
* libguile/strings.c (scm_i_make_read_only_string): New function.
(scm_i_shared_substring_read_only): Special-case the empty string
so that the read-only and read-write empty strings are `eq?'. This
optimization is relied on by the `substring/shared' `empty string'
test case in `srfi-13.test'.
* libguile/strings.h (scm_i_make_read_only_string): New declaration.
* test-suite/tests/strings.test ("string-set!")["literal string"]: New test.
* NEWS: Update.
* libguile/strings.c (scm_i_symbol_substring): Return a read-only string
since R5RS requires `symbol->string' to return a read-only string.
Reported by Bill Schottstaedt <bil@ccrma.Stanford.EDU>.
* test-suite/tests/symbols.test: Add `define-module' clause.
(exception:immutable-string): Adjust to current exception.
("symbol->string")["result is an immutable string"]: Use
`pass-if-exception' instead of `expect-fail-exception'.
* NEWS: Update.
* libguile/strings.c (scm_i_string_writable_chars): Remove use of
`scm_i_thread_put_to_sleep ()'. This leaves a race condition,
which is hopefully not harmful.