This requres the creation of a new type
scm_t_string_failed_conversion_handler to replace libunistring's
enum iconveh_ilseq_handler.
* libguile/strings.h: don't include <uniconv.h>
(scm_t_string_failed_conversion_handler): new enum type
(SCM_FAILED_CONVERSION_ERROR, SCM_FAILED_CONVERSION_QUESTION_MARK):
(SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE): new enum type values
* libguile/strings.c (scm_to_stringn): now takes type
scm_t_string_failed_conversion_handler. All callers changed.
* libguile/print.c: include <uniconv.h>
* libguile/ports.c (scm_lfwrite_substr): use
scm_t_string_conversion_handler's constants
* libguile/gen-scmconfig.c (SCM_ICONVEH_ERROR):
(SCM_ICONVEH_QUESTION_MARK, SCM_ICONVEH_ESCAPE_SEQUENCE): store
iconveh_ilseq_hander constants as #define's
Also, scm_charprint is renamed to scm_i_charprint.
* libguile/strings.h: make scm_i_string_wide_chars internal.
* libguile/print.h: rename scm_charprint to scm_i_charprint. Make
internal.
* libguile/print.c (scm_i_charprint): renamed from scm_charprint
(scm_charprint): renamed to scm_i_charprint. All callers changed.
This adds full Unicode strings as a datatype, and it adds some
minimal functionality. The terminal and port encoding is assumed
to be ISO-8859-1. Non-ISO-8859-1 characters are written or
input as string character escapes.
The string character escapes now have 3 forms: \xXX \uXXXX and
\UXXXXXX, for unprintable characters that have 2, 4 or 6 hex digits.
The process for writing to strings has been modified. There is now a
function scm_i_string_start_writing that does the copy-on-write
conversion if necessary.
To compile strings that may be wide, the VM storage of strings and
string-likes has changed.
Most string-using functions have not yet been updated and may break
when used with wide strings.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
use variable width string bytecode format
* module/language/assembly.scm (byte-length): use variable width
bytecode format
* libguile/vm-i-loader.c (load-string, load-symbol):
(load-keyword, define): use variable-width bytecode format
* libguile/vm-engine.h (FETCH_WIDTH): new macro
* libguile/strings.h: new declarations
* libguile/strings.c (make_wide_stringbuf): new function
(widen_stringbuf): new function
(scm_i_make_wide_string): new function
(scm_i_is_narrow_string): new function
(scm_i_string_wide_chars): new function
(scm_i_string_start_writing): new function
(scm_i_string_ref): new function
(scm_i_string_set_x): new function
(scm_i_is_narrow_symbol): new function
(scm_i_symbol_wide_chars, scm_i_symbol_ref): new function
(scm_string_width): new function
(unistring_escapes_to_guile_escapes): new function
(scm_to_stringn): new function
(scm_i_stringbuf_free): modify for wide strings
(scm_i_substring_copy): modify for wide strings
(scm_i_string_chars, scm_string_append): modify for wide strings
(scm_i_make_symbol, scm_to_locale_stringn): modify for wide strings
(scm_string_dump, scm_symbol_dump, scm_to_locale_stringbuf):
(scm_string, scm_i_deprecated_string_chars): modify for wide strings
(scm_from_locale_string, scm_from_locale_stringn): add null test
* libguile/srfi-13.c: add calls for scm_i_string_start_writing for
each call of scm_i_string_stop_writing
(scm_string_for_each): modify for wide strings
* libguile/socket.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/rw.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/read.c (scm_read_string): allow reading of wide strings
* libguile/print.h: add declaration for scm_charprint
* libguile/print.c (iprin1): print wide strings and add new string
escapes
(scm_charprint): new function
* libguile/ports.h: new declarations for scm_lfwrite_substr and
scm_lfwrite_str
* libguile/ports.c (update_port_lf): new function
(scm_lfwrite): use update_port_lf
(scm_lfwrite_substr): new function
(scm_lfwrite_str): new function
* test-suite/tests/asm-to-bytecode.test ("compiler"): add string
width byte to sting-like asm tests
This adds the 32-bit standalone characters. Strings are still
8-bit. Characters larger than 8-bit can only be entered or
displayed in octal format at this point. At this point, the
terminal's display encoding is expected to be Latin-1.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
add 32-bit char
* module/language/assembly.scm (object->assembly): add 32-bit char
(assembly->object): add 32-bit char
* libguile/vm-i-system.c (make-char32): new op
* libguile/print.c (iprin1): print 32-bit char
* libguile/numbers.h: add type scm_t_wchar
* libguile/numbers.c: add type scm_t_wchar
* libguile/chars.h: new type scm_t_wchar
(SCM_CODEPOINT_MAX): new
(SCM_IS_UNICODE_CHAR): new
(SCM_MAKE_CHAR): operate on 32-bit char
* libguile/chars.c: comparison operators now use Unicode
codepoints
(scm_c_upcase): now receives and returns scm_t_wchar
(scm_c_downcase): now receives and returns scm_t_wchar
The global variables scm_charnames and scm_charnums are replaced with
the accessor functions scm_i_charname and scm_i_charname_to_num.
Also, the incomplete and broken EBCDIC support is removed.
* libguile/print.c (iprin1): use new func scm_i_charname
* libguile/read.c (scm_read_character): use new func
scm_i_charname_to_num
* libguile/chars.c (scm_i_charname): new function
(scm_i_charname_to_char): new function
(scm_charnames, scm_charnums): removed
* libguile/chars.h: new declarations
The idea is to introduce `gsubrs' whose arity is encoded in their type
(more precisely in the sizeof (void *) - 8 MSBs). This removes the
indirection introduced by cclos and simplifies the code.
* libguile/__scm.h (CCLO): Remove.
* libguile/debug.c (scm_procedure_source, scm_procedure_environment):
Remove references to `scm_tc7_cclo'.
* libguile/eval.c (scm_trampoline_0, scm_trampoline_1,
scm_trampoline_2): Replace `scm_tc7_cclo' with `scm_tc7_gsubr'.
* libguile/eval.i.c (CEVAL): Likewise. No longer make PROC the first
argument. Directly invoke `scm_gsubr_apply ()' instead of jump to the
`evap(N+1)' label or call to `SCM_APPLY ()'.
* libguile/evalext.c (scm_self_evaluating_p): Remove reference to
`scm_tc7_cclo'.
* libguile/gc-card.c (scm_i_sweep_card, scm_i_tag_name): Likewise.
* libguile/gc-mark.c (scm_gc_mark_dependencies): Likewise.
* libguile/goops.c (scm_class_of): Likewise.
* libguile/print.c (iprin1): Likewise.
* libguile/gsubr.c (create_gsubr): Use `unsigned int's for REQ, OPT and
RST. Use `scm_tc7_gsubr' instead of `scm_makcclo ()' in the default
case.
(scm_gsubr_apply): Remove calls to `SCM_GSUBR_PROC ()'.
(scm_f_gsubr_apply): Remove.
* libguile/gsubr.h (SCM_GSUBR_TYPE): New definition.
(SCM_GSUBR_MAX): Changed to 33.
(SCM_SET_GSUBR_TYPE, SCM_GSUBR_PROC, SCM_SET_GSUBR_PROC,
scm_f_gsubr_apply): Remove.
* libguile/procprop.c (scm_i_procedure_arity): Remove reference to
`scm_tc7_cclo'; add proper handling of `scm_tc7_gsubr'.
* libguile/procs.c (scm_makcclo, scm_make_cclo): Remove.
(scm_procedure_p): Remove reference to `scm_tc7_cclo'.
(scm_thunk_p): Likewise, plus add proper `scm_tc7_gsubr' handling.
* libguile/procs.h (SCM_CCLO_LENGTH, SCM_MAKE_CCLO_TAG,
SCM_SET_CCLO_LENGTH, SCM_CCLO_BASE, SCM_SET_CCLO_BASE, SCM_CCLO_REF,
SCM_CCLO_SET, SCM_CCLO_SUBR, SCM_SET_CCLO_SUBR, scm_makcclo,
scm_make_cclo): Remove.
* libguile/stacks.c (read_frames): Remove reference to `scm_f_gsubr_apply'.
* libguile/tags.h (scm_tc7_cclo): Remove.
(scm_tc7_gsubr): New.
(scm_tcs_subrs): Add `scm_tc7_gsubr'.
the value at its top. This fixes a reference leak.
(PUSH_REF): Perform `pstate->top++' after calling
`PSTATE_STACK_SET ()' in order to avoid undesired potential side
effects.
New.
(sym_reader): New.
(scm_print_opts): Added "quote-keywordish-symbols" option.
(quote_keywordish_symbol): New, for evaluating the option.
(scm_print_symbol_name): Use it.
(scm_init_print): Initialize new option to sym_reader.
Removed ref_stack field.
(PSTATE_STACK_REF, PSTATE_STACK_SET): New, for accessing the stack
of a print state. Use them everywhere instead of ref_stack.
print.c, ports.c, mallocs.c, hooks.c, hashtab.c, fports.c,
guardians.c, filesys.c, coop-pthreads.c, continuations.c: Use
scm_uintprint to print unsigned integers, raw heap words, and
adresses, using a cast to scm_t_bits to turn pointers into
integers.
SCM_PRINT_HIGHLIGHT_SUFFIX): New printer options.
(scm_iprin1): Use them instead of the previoulsy hardcoded
strings.
(scm_init_print): Initialize them.
* print.c (make_print_state, scm_free_print_state): Initialize it
to SCM_EOL.
(scm_iprin1): Wrap output in '{...}' when object is contained in
highlight_objects.
(SCM_VALIDATE_STRING_COPY): Deprecated. Replaced all uses with
SCM_VALIDATE_STRING plus SCM_I_STRING_CHARS or
scm_to_locale_string, etc.
(SCM_VALIDATE_SUBSTRING_SPEC_COPY): Deprecated. Replaced as
above, plus scm_i_get_substring_spec.
* regex-posix.c, read.c, random.c, ramap.c, print.c, numbers.c,
hash.c, gc.c, gc-card.c, convert.i.c, backtrace.c, strop.c,
strorder.c, strports.c, struct.c, symbols.c, unif.c, ports.c: Use
SCM_I_STRING_CHARS, SCM_I_STRING_UCHARS, and SCM_I_STRING_LENGTH
instead of SCM_STRING_CHARS, SCM_STRING_UCHARS, and
SCM_STRING_LENGTH, respectively. Also, replaced scm_return_first
with more explicit scm_remember_upto_here_1, etc, or introduced
them in the first place.
SCM_INUM): Deprecated by reenaming them to SCM_I_INUMP, SCM_I_NINUMP
and SCM_I_INUM, respectively and adding deprecated versions to
deprecated.h and deprecated.c. Changed all uses to either use the
SCM_I_ variants or scm_is_*, scm_to_*, or scm_from_*, as appropriate.
SCM_NEGATE_BOOL, SCM_BOOLP): Deprecated by moving into "deprecated.h".
Replaced all uses with scm_is_false, scm_is_true, scm_from_bool, and
scm_is_bool, respectively.
scm_i_unmemoize_expr for unmemoizing a memoized object holding a
single memoized expression.
* debug.c (memoized_print): Don't try to unmemoize the memoized
object, since we can't know whether it holds a single expression
or a body.
(scm_mem_to_proc): Removed check for lambda expression, since it
was moot anyway. Whoever uses these functions for debugging
purposes should know what they do: Creating invalid memoized code
will cause crashes, independent of whether this check is present
or not.
(scm_proc_to_mem): Take the closure's code as it is and don't
append a SCM_IM_LAMBDA isym. To allow easier debugging, the
memoized code should not be modified.
* debug.[ch] (scm_unmemoize, scm_i_unmemoize_expr): Removed
scm_unmemoize from public use, but made scm_i_unmemoize_expr
available as a guile internal function instead. However,
scm_i_unmemoize_expr will only work on memoized objects that hold
a single memoized expression. It won't work with bodies.
* debug.c (scm_procedure_source), macros.c (macro_print), print.c
(scm_iprin1): Call scm_i_unmemocopy_body for unmemoizing a body,
i. e. a list of expressions.
* eval.c (unmemoize_exprs): Drop internal body markers from the
output during unmemoization.
* eval.[ch] (scm_unmemocopy, scm_i_unmemocopy_expr,
scm_i_unmemocopy_body): Removed scm_unmemocopy from public use,
but made scm_i_unmemocopy_expr and scm_i_unmemocopy_body available
as guile internal functions instead. scm_i_unmemoize_expr will
only work on a single memoized expression, while
scm_i_unmemocopy_body will only work on bodies.
* deprecated.h (SCM_IFRINC, SCM_ICDR, SCM_IFRAME, SCM_IDIST,
SCM_ICDRP), eval.c (SCM_IFRINC, SCM_ICDR, SCM_IFRAME, SCM_IDIST,
SCM_ICDRP), eval.h (SCM_ICDR, SCM_IFRINC, SCM_IFRAME, SCM_IDIST,
SCM_ICDRP): Deprecated and added to deprecated.h. Moved from
eval.h to eval.c.
* deprecated.c (scm_isymnames), deprecated.h (scm_isymnames,
SCM_ISYMNUM, SCM_ISYMCHARS), eval.c (SCM_ISYMNUM, isymnames,
scm_unmemocopy, CEVAL), print.c (scm_isymnames), tags.h
(SCM_ISYMNUM, scm_isymnames, SCM_ISYMCHARS): Deprecated
scm_isymnames, SCM_ISYMNUM and SCM_ISYMCHARS and added to
deprecated.[hc]. Moved scm_isymnames from print.c to eval.c and
renamed to isymnames. Moved SCM_ISYMNUM from tags.h to eval.c and
renamed to ISYMNUM.
* eval.c (scm_i_print_iloc, scm_i_print_isym), eval.h
(scm_i_print_iloc, scm_i_print_isym), print.c (scm_iprin1):
Extracted printing of ilocs and isyms to guile internal functions
scm_i_print_iloc, scm_i_print_isym of eval.c.
* libguile/print.c (scm_isymnames): Add names for the new memoizer
codes.
* libguile/eval.c (s_missing_clauses, s_bad_case_clause,
s_extra_case_clause, s_bad_case_labels, s_duplicate_case_label,
literal_p): New static identifiers.
(scm_m_case): Use ASSERT_SYNTAX to signal syntax errors. Be more
specific about the kind of error that was detected. Check for
duplicate case labels. Handle bound 'else. Avoid unnecessary
consing when creating the memoized code.
(scm_m_case, unmemocopy, SCM_CEVAL): Use SCM_IM_ELSE to memoize
the syntactic keyword 'else.
* test-suite/tests/syntax.test (exception:bad-expression,
exception:missing-clauses, exception:bad-case-clause,
exception:extra-case-clause, exception:bad-case-labels): New.
Added some tests and adapted tests for 'case' to the new way of
error reporting.
implementation of evaluator specific memoization codes and special
constants like #f, '() etc. ('flags'), which are not evaluator
specific. The goal is to remove definitions of evaluator
memoization codes completely from the public interface. This will
make it possible to experiment more freely with optimizations of
guile's internal representation of memoized code.
* objects.c (scm_class_of): Eliminate dependency on SCM_ISYMNUM.
* print.c (iflagnames): New array, holding the printed names of
guile's special constants ('flags').
(scm_isymnames): Now holds only the printed names of the
memoization codes.
(scm_iprin1): Separate the handling of memoization codes and
guile's special constants.
* tags.h (scm_tc9_flag, SCM_ITAG9, SCM_MAKE_ITAG9, SCM_ITAG9_DATA,
SCM_IFLAGNUM): new
(scm_tc8_char, scm_tc8_iloc, SCM_BOOL_F, SCM_BOOL_T,
SCM_UNDEFINED, SCM_EOF_VAL, SCM_EOL, SCM_UNSPECIFIED, SCM_UNBOUND,
SCM_ELISP_NIL, SCM_IM_DISPATCH, SCM_IM_SLOT_REF,
SCM_IM_SLOT_SET_X, SCM_IM_DELAY, SCM_IM_FUTURE,
SCM_IM_CALL_WITH_VALUES, SCM_IM_NIL_COND, SCM_IM_BIND): Changed
values.
(SCM_IFLAGP): SCM_IFLAGP now only tests for flags.
(SCM_IFLAGP, SCM_MAKIFLAG, SCM_IFLAGNUM): Generalized to use the
tc9 macros and scm_tc9_flag.
The typecode of the former 14th short instruction is now used to
represent long instructions. Changed some comments to reflect
this fact.
(SCM_MAKISYM): ISYMs get a new tc7 code, namely the one that was
previously used by SCM_IM_DEFINE.
(SCM_IM_DEFINE): Turned into a long instruction.
* eval.c (unmemocopy, SCM_CEVAL): Treat SCM_IM_DEFINE as a long
instruction.
* eval.c (SCM_CEVAL): Since characters and iflags have now a tc7
code that is separate from all instructions, one level of dispatch
for long instructions can be eliminated.
* print.c (scm_isymnames): Removed some commented code.
printing with max level 7 and length 10. (Purpose: avoid printing
gigantic objects in error messages.)
* print.c, print.h (scm_i_port_with_print_state): New function.
* print.c (scm_iprin1, scm_printer_apply,
scm_port_with_print_state): Use scm_i_port_with_print_state.
(scm_simple_format): Modified not to destroy print states.
(print_state_mutex): New mutex.
(scm_make_print_state, scm_free_print_state, scm_prin1):
Lock/unlock print_state_mutex.