Thanks to Bruno Haible for his suggestions. See
<http://lists.gnu.org/archive/html/bug-libunistring/2010-09/msg00007.html>,
for details.
* libguile/ports.c (register_finalizer_for_port): Always register a
finalizer for PORT.
(finalize_port): Close ENTRY->input_cd and ENTRY->output_cd.
(scm_new_port_table_entry): Initialize the `input_cd' and `output_cd'
fields.
(utf8_to_codepoint): New function.
(get_codepoint): Rewrite to use `iconv' instead of libunistring.
(scm_i_set_port_encoding_x): Initialize the `input_cd' and `output_cd'
fields.
(update_port_lf): Move upward. Use `switch' instead of `if's.
* libguile/ports.h (scm_t_port)[input_cd, output_cd]: New fields.
* libguile/print.c (codepoint_to_utf8, display_string): New functions.
(display_character): Use `display_string'.
(write_combining_character): Likewise.
(iprin1): Use `display_string' instead of `scm_lfwrite_str', and
`display_character' instead of `scm_putc'.
(write_character): Likewise.
(write_character_escaped): New function.
* test-suite/tests/encoding-escapes.test ("display output
escapes")["Rashomon"]: Use lower-case escapes.
["fake escape"]: New test.
* libguile/bytevectors.c:
* libguile/eval.c:
* libguile/goops.c:
* libguile/i18n.c:
* libguile/load.c:
* libguile/memoize.c:
* libguile/modules.c:
* libguile/ports.c:
* libguile/print.c:
* libguile/procs.c:
* libguile/programs.c:
* libguile/read.c:
* libguile/script.c:
* libguile/srfi-14.c:
* libguile/stacks.c:
* libguile/strings.c:
* libguile/throw.c:
* libguile/vm.c: Use scm_from_latin1_symboln to make symbols from string
literals, because they aren't in the user's locale -- they are in
ASCII, and we can optimize this case.
* libguile/vm-i-loader.c: Also use scm_from_latin1_symboln when loading
narrow symbols.
* libguile/debug.c:
* libguile/eval.c:
* libguile/frames.c:
* libguile/objcodes.c:
* libguile/print.c:
* libguile/programs.c:
* libguile/read.c:
* libguile/struct.c:
* libguile/vm.c: Fix a number of instances in which we assumed we could
fit a pointer into a long.
The characters U+0007 to U+000D have non-hex forms for their
escapes when in written strings.
* libguile/print.c (write_character): use non-hex escapes
* test-suite/tests/reader.test (write R6RS string escapes): adjust test
Reported by Mike Gran <spk121@yahoo.com>.
* libguile/strings.c (scm_i_unistring_escapes_to_guile_escapes,
scm_i_unistring_escapes_to_r6rs_escapes): Augment comments.
(scm_to_stringn): When `handler ==
SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE && SCM_R6RS_ESCAPES_P', realloc
BUF so that it's large enough for the worst case.
* libguile/print.c (display_character): When `result != NULL && strategy
== SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE && SCM_R6RS_ESCAPES_P', make
LOCALE_ENCODED large enough to hold an R6RS escape.
This had been removed by commit 07f49ac786
("Factorize and optimize `write' for strings and characters.").
Thanks Mike!
* libguile/print.c (write_combining_character): New procedure.
(write_character): Use it.
* test-suite/tests/chars.test ("basic char handling")["combining accent
is pretty-printed", "combining X is pretty-printed"]: New tests.
* test-suite/tests/encoding-iso88591.test ("characters")["write A
followed by combining accent"]: New test.
* test-suite/tests/encoding-utf8.test ("characters")["write A followed
by combining accent"]: New test.
According to `write.bm', this makes `write' 2.6 times faster for strings.
* libguile/print.c (iprin1): Use `write_character' when
`SCM_WRITINGP (pstate)' and `SCM_CHARP (exp)' or `scm_is_string (exp)'.
(scm_i_charprint): Remove.
(display_character, write_character): New functions.
(scm_write_char): Use `display_character' instead of
`scm_i_charprint'.
* libguile/print.h (scm_i_charprint): Remove declaration.
* benchmark-suite/benchmarks/write.bm: New file.
* benchmark-suite/Makefile.am (SCM_BENCHMARKS): Add
`benchmarks/write.bm'.
This enables more efficient implementations of several operations,
e.g. scm_is_lisp_bool, canonicalize_boolean, fast_boolean_not,
converting SCM booleans to C booleans, etc.
* libguile/tags.h: Renumber IFLAGs.
* libguile/print.c: Renumber iflagnames to match.
* libguile/boolean.c:
* libguile/boolean.h:
SCM_XXX_ANOTHER_BOOLEAN_DONT_USE --> SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_0
This adds a reader option 'r6rs-hex-escapes that modifies the
behavior of numeric escapes in characters and strings. When enabled,
variable-length character hex escapes (#\xNNN) are allowed and become
the default output format for numerically-escaped characters. Also,
string hex escapes switch to a semicolon terminated hex escape (\xNNNN;).
* libguile/print.c (PRINT_CHAR_ESCAPE): new macro
(iprin1): use new macro PRINT_CHAR_ESCAPE
* libguile/private-options.h (SCM_R6RS_ESCAPES_P): new #define
* libguile/read.c (scm_read_opts): add new option r6rs-hex-escapes
(SCM_READ_HEX_ESCAPE): modify to take a terminator parameter
(scm_read_string): parse R6RS hex string escapes
(scm_read_character): parse R6RS hex character escapes
* test-suite/tests/chars.test (with-read-options): new procedure
(R6RS hex escapes): new tests
* test-suite/tests/strings.test (with-read-options): new procedure
(R6RS hex escapes): new tests
* libguile/tags.h (scm_tc7_gsubr): Return to the pool of unused tc7s, as
there are no more gsubrs. Yay :)
* libguile/programs.h (SCM_F_PROGRAM_IS_PRIMITIVE):
(SCM_PROGRAM_IS_PRIMITIVE): New flag and accessor.
* libguile/gsubr.c (create_gsubr):
* libguile/snarf.h (SCM_STATIC_PROGRAM): Give subrs a PRIMITIVE flag.
* libguile/smob.h:
* libguile/smob.c (scm_i_smob_arity): New internal procedure. Uses the
old GSUBR type macros, local to the file.
* libguile/procprop.c (scm_i_procedure_arity): Call out to
scm_i_smob_arity, and remove a gsubr case.
* libguile/gc.c (scm_i_tag_name):
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/goops.c (scm_class_of):
* libguile/vm.c (apply_foreign):
* libguile/hash.c (scm_hasher):
* libguile/debug.c (scm_procedure_name):
* libguile/print.c (iprin1): Remove gsubr cases.
* libguile/gsubr.h (SCM_PRIMITIVE_P): Fix to work with the new VM
program regimen.
(SCM_GSUBR_TYPE, SCM_GSUBR_MAKTYPE, SCM_GSUBR_MAX, SCM_GSUBR_REQ)
(SCM_GSUBR_OPT, SCM_GSUBR_REST): Remove these macros, that are no
longer useful.
* libguile/gsubr.c (scm_i_gsubr_apply, scm_i_gsubr_apply_list)
(scm_i_gsubr_apply_array): Remove internal gsubr application
functions.
* libguile/tags.h (scm_tc7_frame, scm_tc7_objcode, scm_tc7_vm)
(scm_tc7_vm_cont): Take more tc7s for VM-related data structures.
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/gc.c (scm_i_tag_name):
* libguile/goops.c (scm_class_of, create_standard_classes):
* libguile/print.c (iprin1): Add cases for the new tc7s.
* libguile/frames.c:
* libguile/frames.h:
* libguile/objcodes.c:
* libguile/objcodes.h:
* libguile/vm.c:
* libguile/vm.h: Desmobify.
* libguile/vm.c (scm_vm_apply): Export to Scheme, because VM objects are
no longer applicable.
* module/system/repl/command.scm (profile):
* module/system/vm/trace.scm (vm-trace):
* module/system/vm/vm.scm (vm-load): Call vm-apply to run a program in a
VM instead of treating the VM as applicable.
* libguile/foreign.h:
* libguile/foreign.c: New files, implementing simple wrappers around
foreign values, such as those that one might link in dynamically from
a library.
* libguile/tags.h (scm_tc7_foreign): Take a tc7 for foreign values.
* libguile.h:
* libguile/init.c: Add foreign.h to headers and init.
* libguile/print.c (iprin1): Add printer for foreign values.
* libguile/gc.c (scm_i_tag_name): Case for foreign values.
* libguile/goops.c (scm_class_of, create_standard_classes): Add a class
for foreign values.
* libguile/evalext.c (scm_self_evaluating_p): Add case for foreign
values.
* libguile/Makefile.am: Add foreign.[ch] to the build.
They made Sun C 5.8 emit a warning such as:
line 71: warning: dead part of constant expression is nonconstant
* libguile/print.c (scm_print_opts): Don't use `SCM_UNPACK ()' here.
* libguile/read.c (scm_read_opts): Likewise.
If you're wondering what I'm doing, I'm trying to eventually reimplement
smobs in terms of structs, so that applicable smobs can just follow the
applicable struct dispatch path. But to do that I have to get structs
initialized before things that use smobs, which means transforming a
bunch of smobby things to tc7 things. But this transformation is good
for performance anyway, and we currently have a glut of unused tc7s,
so here we go...
* libguile/tags.h (scm_tc7_fluid, scm_tc7_dynamic_state): Fluids (and
dynamic states) now have tc7s.
* libguile/fluids.h: Remove scm_fluids_prehistory, and add internal
scm_i_fluid_print. Update a comment.
* libguile/fluids.c: Update for tc7 representation. Also remove the next
pointers while we're at it, as they aren't used in the new BDW GC.
* libguile/eq.c (scm_equal_p): Remove the hashtable case. Hashtables
could never be equal? before, I don't see why to add stubs doing the
same thing now.
* libguile/print.c (iprin1):
* libguile/gc.c (scm_i_tag_name):
* libguile/evalext.c (scm_self_evaluating_p): Add fluid and
dynamic_state cases.
* libguile/goops.h: Remove scm_class_hashtable; it will be static.
* libguile/goops.c: Make <hashtable> static, and add <fluid> and
<dynamic-state> classes.
* libguile/hashtab.h:
* libguile/hashtab.c: Remove scm_i_hashtable_equal_p.
* libguile/init.c (scm_i_init_guile): Remove call to fluids_prehistory.
* libguile/tags.h (scm_tc7_hashtable): Allocate a tc7 for hashtables.
* libguile/hashtab.h: Adjust macros accordingly.
(scm_i_hashtable_print, scm_i_hashtable_equal_p): New internal
functions.
(scm_hashtab_prehistory): Remove, no more need for this.
* libguile/hashtab.c (scm_hash_fn_remove_x): Fix a longstanding bug.
(make_hash_table): Adapt to the new hash table representation.
* libguile/eq.c (scm_equal_p)
* libguile/evalext.c (scm_self_evaluating_p)
* libguile/print.c (iprin1)
* libguile/gc.c (scm_i_tag_name): Add some tc7_hashtab cases.
* libguile/init.c: Remove unused environments init functions. Remove
call to hashtab_prehistory.
* libguile/goops.h (scm_class_hashtable)
* libguile/goops.c (scm_class_of, create_standard_classes): Have to
make a class for hash tables manually, because they aren't smobs any
more.
* libguile/debug.c (scm_procedure_name): Remove a SCM_CLOSUREP case and
some dead code.
(scm_procedure_module): Remove. This was introduced a few months ago
for the hygienic expander, but now it is no longer needed, as the
expander keeps track of this information itself.
* libguile/debug.h: Remove scm_procedure_module.
* libguile/eval.c: Instead of using tc3 closures, define a "boot
closure" applicable smob type, and represent closures with that. The
advantage is that after eval.scm is compiled, boot closures take up no
address space (besides a smob number) in the runtime, and require no
special cases in procedure dispatch.
* libguile/eval.h: Remove the internal functions scm_i_call_closure_0
and scm_closure_apply, and the public function scm_closure.
* libguile/gc.c (scm_storage_prehistory): No tc3_closure displacement
registration.
(scm_i_tag_name): Remove closure case, and a dead cclo case.
* libguile/vm.c (apply_foreign):
* libguile/print.c (iprin1):
* libguile/procs.c (scm_procedure_p, scm_procedure_documentation);
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/goops.c (scm_class_of): Remove tc3_closure/tcs_closure cases.
* libguile/hash.c (scm_hasher):
* libguile/hooks.c (scm_add_hook_x): Use new scm_i_procedure_arity.
* libguile/macros.c (macro_print): Print all macros using the same code.
(scm_macro_transformer): Return any procedure, not just programs.
* libguile/procprop.h:
* libguile/procprop.c (scm_i_procedure_arity): Instead of returning a
list that the caller has to parse, have the same prototype as
scm_i_program_arity. An incompatible change, but it's an internal
function anyway.
(scm_procedure_properties, scm_set_procedure_properties)
(scm_procedure_property, scm_set_procedure_property): Remove closure
cases, and use scm_i_program_arity for arity.
* libguile/procs.h (SCM_CLOSUREP, SCM_CLOSCAR, SCM_CODE)
(SCM_CLOSURE_NUM_REQUIRED_ARGS, SCM_CLOSURE_HAS_REST_ARGS)
(SCM_CLOSURE_BODY, SCM_PROCPROPS, SCM_SETPROCPROPS, SCM_ENV)
(SCM_TOP_LEVEL): Remove these macros that pertain to boot closures
only. Only eval.c should know abut boot closures.
* libguile/procs.c (scm_closure_p): Remove this function. There is a
simple stub in deprecated.scm now.
(scm_thunk_p): Use scm_i_program_arity.
* libguile/tags.h (scm_tc3_closure): Remove. Yay, another tc3 to play
with!
(scm_tcs_closures): Remove.
* libguile/validate.h (SCM_VALIDATE_CLOSURE): Remove.
* module/ice-9/deprecated.scm (closure?): Add stub.
* module/ice-9/documentation.scm (object-documentation)
* module/ice-9/session.scm (help-doc, arity)
* module/oop/goops.scm (compute-getters-n-setters)
* module/oop/goops/describe.scm (describe)
* module/system/repl/describe.scm (display-object, display-type):
Remove calls to closure?.
* libguile/pairs.h:
* libguile/pairs.c: Previously scm_cdadr et al were implemented as
#defines that called scm_i_chase_pairs, and the Scheme-exposed
functions themselves were cxr subrs, which got special help in the
interpreter. Since now the special help is unnecessary (because the
compiler inlines and expands calls to car, cdadr, etc), the complexity
is a loss. So just implement cdadr etc using normal functions. There's
an advantage too, in that the compiler can unroll the cxring, reducing
branches.
* libguile/tags.h (scm_tc7_cxr): Remove this tag.
(scm_tcs_subrs): Now there's only one kind of subr, yay!
* libguile/debug.c (scm_procedure_name)
* libguile/evalext.c (scm_self_evaluating_p)
* libguile/gc.c (scm_i_tag_name)
* libguile/goops.c (scm_class_of)
* libguile/hash.c (scm_hasher)
* libguile/print.c (iprin1)
* libguile/procprop.c (scm_i_procedure_arity)
* libguile/procs.c (scm_procedure_p, scm_subr_p)
(scm_make_procedure_with_setter)
* libguile/vm.c (apply_foreign): Remove cxr cases. Replace uses of
scm_tcs_subrs with scm_tc7_gsubr.
* libguile/eval.c: So, ladies & gents, a new evaluator. It's similar to
the old one, in that we memoize and then evaluate, but in this
incarnation, memoization of an expression happens before evaluation,
not lazily as the expression is evaluated. This makes the evaluation
itself much cleaner, in addition to being threadsafe. In addition,
since this C evaluator will in the future just serve to bootstrap the
Scheme evaluator, we don't have to pay much concern for debugging
conveniences. So the environment is just a list of values, and the
memoizer pre-computes where it's going to find each individual value
in the environment.
Interface changes are commented below, with eval.h.
(scm_evaluator_traps): No need to reset the debug mode after rnning te
traps thing. But really, the whole traps system needs some love.
* libguile/memoize.h:
* libguile/memoize.c: New memoizer, which runs before evaluation,
checking all syntax before evaluation begins. Significantly, no
debugging information is left for lexical variables, which is not so
great for interactive debugging; perhaps we should change this to have
a var list in the future as per the classic interpreters. But it's
quite fast, and the resulting code is quite good. Also note that it
doesn't produce ilocs, memoized code is a smob whose type is in the
first word of the smob itself.
* libguile/eval.h (scm_sym_and, scm_sym_begin, scm_sym_case)
(scm_sym_cond, scm_sym_define, scm_sym_do, scm_sym_if, scm_sym_lambda)
(scm_sym_let, scm_sym_letstar, scm_sym_letrec, scm_sym_quote)
(scm_sym_quasiquote, scm_sym_unquote, scm_sym_uq_splicing, scm_sym_at)
(scm_sym_atat, scm_sym_atapply, scm_sym_atcall_cc)
(scm_sym_at_call_with_values, scm_sym_delay, scm_sym_eval_when)
(scm_sym_arrow, scm_sym_else, scm_sym_apply, scm_sym_set_x)
(scm_sym_args): Remove public declaration of these symbols.
(scm_ilookup, scm_lookupcar, scm_eval_car, scm_eval_body)
(scm_eval_args, scm_i_eval_x, scm_i_eval): Remove public declaration
of these functions.
(scm_ceval, scm_deval, scm_ceval_ptr): Remove declarations of these
deprecated functions.
(scm_i_print_iloc, scm_i_print_isym, scm_i_unmemocopy_expr)
(scm_i_unmemocopy_body): Remove declarations of these internal
functions.
(scm_primitive_eval_x, scm_eval_x): Redefine as macros for their less
destructive siblings.
* libguile/Makefile.am: Add memoize.[ch] to the build.
* libguile/debug.h (scm_debug_mode_p, scm_check_entry_p)
(scm_check_apply_p, scm_check_exit_p, scm_check_memoize_p)
(scm_debug_eframe_size): Remove these vars that were tied to the old
evaluator's execution model.
(SCM_RESET_DEBUG_MODE): Remove, no more need for this.
(SCM_MEMOIZEDP, SCM_MEMOIZED_EXP, SCM_MEMOIZED_ENV): Remove macros
referring to old memoized code representation.
(scm_local_eval, scm_procedure_environment, scm_memoized_environment)
(scm_make_memoized, scm_memoized_p): Remove functions operating on old
memoized code representation.
(scm_memcons, scm_mem_to_proc, scm_proc_to_mem): Remove debug-only
code for old evaluator.
* libguile/debug.c: Remove code to correspond with debug.h removals.
(scm_debug_options): No need to set the debug mode or frame limit
here, as we don't have C stack limits any more. Perhaps this is a bug,
but as long as we can compile eval.scm, we should be fine.
* libguile/init.c (scm_i_init_guile): Init memoize.c.
* libguile/modules.c (scm_top_level_env, scm_env_top_level)
(scm_env_module, scm_system_module_env_p): Remove these functions.
* libguile/print.c (iprin1): No more need to handle isyms. Adapt to new
form of interpreted procedures.
* libguile/procprop.c (scm_i_procedure_arity): Adapt to new form of
interpreted procedures.
* libguile/procs.c (scm_thunk_p): Adapt to new form of interpreted
procedures.
* libguile/procs.h (SCM_CLOSURE_FORMALS): Removed, this exists no more.
(SCM_CLOSURE_NUM_REQUIRED_ARGS, SCM_CLOSURE_HAS_REST_ARGS): New
accessors.
* libguile/srcprop.c (scm_source_properties, scm_source_property)
(scm_set_source_property_x): Remove special cases for memoized code.
* libguile/stacks.c (read_frame): Remove a source-property case for
interpreted code.
(NEXT_FRAME): Remove a case that I don't fully understand, that seems
to be designed to skip over apply frames. Will be obsolete in the
futures.
(read_frames): Default source value for interpreted frames to #f.
(narrow_stack): Don't pay attention to the system_module thing.
* libguile/tags.h: Remove isyms and ilocs. Whee!
* libguile/validate.h (SCM_VALIDATE_MEMOIZED): Fix to use the new
MEMOIZED_P formulation.
* module/ice-9/psyntax-pp.scm (do, quasiquote, case): Adapt for these no
longer being primitive macros.
* module/ice-9/boot-9.scm: Whitespace change, but just a poke to force a
rebuild due to and/or/cond/... not being primitives any more.
* module/ice-9/deprecated.scm (unmemoize-expr): Deprecate, it's
unmemoize-expression now.
* test-suite/tests/eval.test ("define set procedure-name"): XFAIL a
couple of tests here; I don't know what to do about them. I reckon the
expander should ensure that defined values are named.
* test-suite/tests/chars.test ("basic char handling"): Fix expected
exception when trying to apply a char.
* Renumbers the IFLAG constants.
* Adds several macros related to boolean type tests, null tests, and
boolean-truth testing (including lisp-style boolean-truth tests).
* Adds compile-time checks to verify the necessary IFLAG numbering
properties needed for the checks to work properly.
* Changes some existing code to use the new optimized macros, without
changing the semantics of the code at all (except that scm_is_bool
is changed from a function to a macro).
I added the following macros, whose names explicitly state how %nil
should be handled. See the comments in the patch for more information
about these.
scm_is_false_assume_not_lisp_nil scm_is_true_assume_not_lisp_nil
scm_is_false_and_not_lisp_nil scm_is_true_or_lisp_nil
scm_is_false_or_lisp_nil scm_is_true_and_not_lisp_nil
scm_is_lisp_false scm_is_lisp_true
scm_is_null_assume_not_lisp_nil
scm_is_null_and_not_lisp_nil
scm_is_null_or_lisp_nil
scm_is_bool_and_not_lisp_nil
scm_is_bool_or_lisp_nil
The following already-existing macros are defined as aliases, such
that their semantics is unchanged (although scm_is_bool used to be a
function and is now a macro).
scm_is_null --> scm_is_null_and_not_lisp_nil
scm_is_false --> scm_is_false_and_not_lisp_nil
scm_is_true --> scm_is_true_or_lisp_nil
scm_is_bool --> scm_is_bool_and_not_lisp_nil
(I still believe that these should be changed to versions that handle
%nil properly, but await approval on that point, so these patches do
not make those changes)
Also, if the preprocessor macro SCM_ENABLE_ELISP is not true (this
macro already existed and was used in lang.h), all overheads
associated with %nil handling are eliminated from the above macros.
* libguile/tags.h (SCM_BOOL_F, SCM_BOOL_T, SCM_UNSPECIFIED)
(SCM_UNDEFINED, SCM_UNBOUND, SCM_ELISP_NIL): Renumber, so that a
number of important distinctions (false versus true, end-of-list, etc)
can be made by masking a single bit. Also define a number of
build-time tests to assert that this condition holds.
* libguile/boolean.h (scm_is_false_and_not_nil, scm_is_true_or_nil)
(scm_is_false_assume_not_nil, scm_is_true_assume_not_nil):
(scm_is_false_or_nil, scm_is_true_and_not_nil)
(scm_is_bool_or_nil, scm_is_bool_and_not_nil): New exciting macros to
test certain boolean/end-of-list properties.
(scm_is_false, scm_is_true): Use a restrictive definition, where only
SCM_BOOL_F is false. Should probably change in the future.
(scm_is_bool): Incompatible change: changed to be a macro. Was a
function before. Probably should allow nil as a boolean, but that will
be for a later patch.
(scm_is_lisp_false, scm_is_lisp_true): New macros, implementing the
standard Lisp boolean predicates, where '() is actually false.
* libguile/eval.i.c (CEVAL): Fix a number of false-or-nil and similar
tests to use the new macros.
* libguile/lang.h (SCM_NULL_OR_NIL_P): Use scm_is_null_or_nil.
* libguile/pairs.c: Add a compile-time check that null and nil differ by
only one bit.
* libguile/pairs.h (scm_is_null_and_not_nil, scm_is_null_assume_not_nil)
(scm_is_null_or_nil): New exciting macros!
(scm_is_null): Just be scm_is_null_and_not_nil, for now.
* libguile/print.c: Adapt to the reordering, and print suitably nasty
things for the not-to-be-used values.
Since combining characters, such as accents, modify the appearance of the
previous letter, it looks awkward in its character literal form (#\name)
since it modified the backslash. This instead prints the combining
character on a small circle.
* libguile/chars.h (SCM_CODEPOINT_DOTTED_CIRCLE): new #define
* libguile/print.c (iprint1): print combining characters on dotted circles
* libguile/read.c (scm_read_character): parse the combination of combining
characters and dotted circles
Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
* libguile/hash.c (scm_i_string_hash): new function
(scm_hasher): don't unpack string: use scm_i_string_hash
* libguile/hash.h: new declaration for scm_i_string_hash
* libguile/print.c (quote_keywordish_symbol): use symbol accessors
(scm_i_print_symbol_name): new function
(scm_print_symbol_name): call scm_i_print_symbol_name
(iprin1): use scm_i_print_symbol_name to print symbols
* libguile/print.h: new declaration for scm_i_print_symbol_name
* libguile/symbols.c (lookup_interned_symbol): now takes scheme string
instead of c string; callers changed
(lookup_interned_symbol): add wide symbol support
(scm_i_c_mem2symbol): removed
(scm_i_mem2symbol): removed and replaced with scm_i_str2symbol
(scm_i_str2symbol): new function
(scm_i_mem2uninterned_symbol): removed and replaced with
scm_i_str2uninterned_symbol
(scm_i_str2uninterned_symbol): new function
(scm_make_symbol, scm_string_to_symbol, scm_from_locale_symbol)
(scm_from_locale_symboln): use scm_i_str2symbol
* test-suite/tests/symbols.test: new tests
* libguile/tags.h (scm_tc7_program):
* libguile/programs.h: Programs now have their own tc7 code. Fix up the
macros appropriately.
* libguile/programs.c: Remove smobby bits, leaving marking, printing,
and application for other parts of Guile.
* libguile/debug.c (scm_procedure_source):
* libguile/eval.c (scm_trampoline_0, scm_trampoline_1)
(scm_trampoline_2): Add cases for tc7_program.
* libguile/eval.i.c (CEVAL, SCM_APPLY):
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/gc-card.c (scm_i_sweep_card, scm_i_tag_name):
* libguile/gc-mark.c (1):
* libguile/print.c (iprin1):
* libguile/procs.c (scm_procedure_p, scm_thunk_p)
* libguile/vm-i-system.c (make-closure): Adapt to new procedure
representation.
* libguile/procprop.c (scm_i_procedure_arity): Do the right thing for
programs.
* test-suite/tests/procprop.test ("procedure-arity"): Arity test now
succeeds.
* libguile/goops.c (scm_class_of): Programs now belong to the class
<procedure>, not a smob class.
* libguile/vm.h (struct vm, struct vm_cont):
* libguile/vm-engine.c (vm_engine):
* libguile/frames.h (SCM_FRAME_BYTE_CAST, struct vm_frame):
* libguile/frames.c (scm_c_make_vm_frame): Fix usages of scm_byte_t,
changing them to scm_t_uint8.
This requres the creation of a new type
scm_t_string_failed_conversion_handler to replace libunistring's
enum iconveh_ilseq_handler.
* libguile/strings.h: don't include <uniconv.h>
(scm_t_string_failed_conversion_handler): new enum type
(SCM_FAILED_CONVERSION_ERROR, SCM_FAILED_CONVERSION_QUESTION_MARK):
(SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE): new enum type values
* libguile/strings.c (scm_to_stringn): now takes type
scm_t_string_failed_conversion_handler. All callers changed.
* libguile/print.c: include <uniconv.h>
* libguile/ports.c (scm_lfwrite_substr): use
scm_t_string_conversion_handler's constants
* libguile/gen-scmconfig.c (SCM_ICONVEH_ERROR):
(SCM_ICONVEH_QUESTION_MARK, SCM_ICONVEH_ESCAPE_SEQUENCE): store
iconveh_ilseq_hander constants as #define's
Also, scm_charprint is renamed to scm_i_charprint.
* libguile/strings.h: make scm_i_string_wide_chars internal.
* libguile/print.h: rename scm_charprint to scm_i_charprint. Make
internal.
* libguile/print.c (scm_i_charprint): renamed from scm_charprint
(scm_charprint): renamed to scm_i_charprint. All callers changed.
This adds full Unicode strings as a datatype, and it adds some
minimal functionality. The terminal and port encoding is assumed
to be ISO-8859-1. Non-ISO-8859-1 characters are written or
input as string character escapes.
The string character escapes now have 3 forms: \xXX \uXXXX and
\UXXXXXX, for unprintable characters that have 2, 4 or 6 hex digits.
The process for writing to strings has been modified. There is now a
function scm_i_string_start_writing that does the copy-on-write
conversion if necessary.
To compile strings that may be wide, the VM storage of strings and
string-likes has changed.
Most string-using functions have not yet been updated and may break
when used with wide strings.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
use variable width string bytecode format
* module/language/assembly.scm (byte-length): use variable width
bytecode format
* libguile/vm-i-loader.c (load-string, load-symbol):
(load-keyword, define): use variable-width bytecode format
* libguile/vm-engine.h (FETCH_WIDTH): new macro
* libguile/strings.h: new declarations
* libguile/strings.c (make_wide_stringbuf): new function
(widen_stringbuf): new function
(scm_i_make_wide_string): new function
(scm_i_is_narrow_string): new function
(scm_i_string_wide_chars): new function
(scm_i_string_start_writing): new function
(scm_i_string_ref): new function
(scm_i_string_set_x): new function
(scm_i_is_narrow_symbol): new function
(scm_i_symbol_wide_chars, scm_i_symbol_ref): new function
(scm_string_width): new function
(unistring_escapes_to_guile_escapes): new function
(scm_to_stringn): new function
(scm_i_stringbuf_free): modify for wide strings
(scm_i_substring_copy): modify for wide strings
(scm_i_string_chars, scm_string_append): modify for wide strings
(scm_i_make_symbol, scm_to_locale_stringn): modify for wide strings
(scm_string_dump, scm_symbol_dump, scm_to_locale_stringbuf):
(scm_string, scm_i_deprecated_string_chars): modify for wide strings
(scm_from_locale_string, scm_from_locale_stringn): add null test
* libguile/srfi-13.c: add calls for scm_i_string_start_writing for
each call of scm_i_string_stop_writing
(scm_string_for_each): modify for wide strings
* libguile/socket.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/rw.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/read.c (scm_read_string): allow reading of wide strings
* libguile/print.h: add declaration for scm_charprint
* libguile/print.c (iprin1): print wide strings and add new string
escapes
(scm_charprint): new function
* libguile/ports.h: new declarations for scm_lfwrite_substr and
scm_lfwrite_str
* libguile/ports.c (update_port_lf): new function
(scm_lfwrite): use update_port_lf
(scm_lfwrite_substr): new function
(scm_lfwrite_str): new function
* test-suite/tests/asm-to-bytecode.test ("compiler"): add string
width byte to sting-like asm tests
This adds the 32-bit standalone characters. Strings are still
8-bit. Characters larger than 8-bit can only be entered or
displayed in octal format at this point. At this point, the
terminal's display encoding is expected to be Latin-1.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
add 32-bit char
* module/language/assembly.scm (object->assembly): add 32-bit char
(assembly->object): add 32-bit char
* libguile/vm-i-system.c (make-char32): new op
* libguile/print.c (iprin1): print 32-bit char
* libguile/numbers.h: add type scm_t_wchar
* libguile/numbers.c: add type scm_t_wchar
* libguile/chars.h: new type scm_t_wchar
(SCM_CODEPOINT_MAX): new
(SCM_IS_UNICODE_CHAR): new
(SCM_MAKE_CHAR): operate on 32-bit char
* libguile/chars.c: comparison operators now use Unicode
codepoints
(scm_c_upcase): now receives and returns scm_t_wchar
(scm_c_downcase): now receives and returns scm_t_wchar
The global variables scm_charnames and scm_charnums are replaced with
the accessor functions scm_i_charname and scm_i_charname_to_num.
Also, the incomplete and broken EBCDIC support is removed.
* libguile/print.c (iprin1): use new func scm_i_charname
* libguile/read.c (scm_read_character): use new func
scm_i_charname_to_num
* libguile/chars.c (scm_i_charname): new function
(scm_i_charname_to_char): new function
(scm_charnames, scm_charnums): removed
* libguile/chars.h: new declarations