When the length is zero, the previous code would include the byte after
the end of the string in the hash. Fix that (the wide and narrow
hashers also guard against it via "case 0"), and don't bother mutating
length for the trailing bytes.
Since we already compute the char length, use that to detect all ASCII
strings and follow the same narrow string path that we do for latin-1.
libguile/hash.c (scm_i_utf8_string_hash): avoid overrun when len == 0.
Also rework so that the symbol hash uses the low bits instead of high
bits. We can do this because, for the guile-vm target, now we compute
the full target hash.
* module/language/cps/guile-vm.scm (jenkins-lookup3-hashword2):
(target-symbol-hash, target-symbol-hash-bits): New exported functions..
* module/language/cps/switch.scm (optimize-branch-chain): Change to use
target-symbol-hash and target-symbol-hash-bits from the current
target-runtime.
Noticed while investigating a migration to utf-8 strings. After making
changes that routed non-ascii symbol hashing through this function,
encoding-iso88597.test began intermittently failing because it would
traverse trailing garbage when u8_strnlen reported 8 chars instead of 4.
Change the scm_i_str2symbol and scm_i_str2uninterned_symbol internal
hash type to unsigned long to explicitly match the scm_i_string_hash
result type.
* libguile/hash.c (scm_i_utf8_string_hash): Call u8_mbsnlen not u8_strnlen.
* libguile/symbols.c (scm_i_str2symbol, scm_i_str2uninterned_symbol):
Use unsigned long for scm_i_string_hash result.
* test-suite/standalone/.gitignore: Add test-hashing.
* test-suite/standalone/Makefile.am: Add test-hashing.
* test-suite/standalone/test-hashing.c: Add.
Add keyword handling to (hash ...). Previously it would just return the
same value for all keywords.
* libguile/hash.c (scm_raw_ihash): Add scm_tc7_keyword case.
* libguile/keywords.h (SCM_I_KEYWORD_HASH): New macro.
As the FSF advises, 'There is no legal significance to using the
three-character sequence “(C)”, but it does no harm.' It does take up
space though! For that reason, we remove it here from our C files.
* libguile/struct.h (SCM_VTABLE_BASE_LAYOUT): Layout is a "pr" field.
* module/ice-9/boot-9.scm (record-type-vtable): Record vtable fields are
writable.
(<parameter>): "pw" fields.
* module/oop/goops.scm (<class>, %compute-layout): <read-only> fields
are "pw" underneath.
* module/rnrs/records/procedural.scm (record-type-vtable)
(record-constructor-vtable, make-record-type-descriptor): Use "pw"
fields in vtables.
* module/srfi/srfi-35.scm (%condition-type-vtable)
(struct-layout-for-condition): "pw" fields in vtables.
* test-suite/tests/goops.test:
* test-suite/tests/structs.test: Use "pw" fields only.
* benchmark-suite/benchmarks/structs.bm: Update for make-struct/no-tail,
to use pw fields, and also to remove useless tests that the compiler
would optimize away.
* doc/ref/api-data.texi (Vtables): Add a note about the now-vestigial
permissions character and update documentation.
(Structure Basics, Meta-Vtables): Update examples.
* libguile/hash.c (scm_i_struct_hash): Remove code that would handle
opaque/self fields.
* libguile/print.h (SCM_PRINT_STATE_LAYOUT): Use "pw" fields.
* libguile/struct.c (scm_struct_init): Simplify check for hidden
fields.
* libguile/values.c (scm_init_values): Field is "pw".
* libguile/eq.c (scm_equal_p, scm_raw_ihash): Add cases for syntax
objects, which should be comparable with equal?.
* test-suite/tests/syntax.test ("syntax objects"): Add tests.
This yields a 50% improvement on the "narrow string" benchmark of
'hash.bm', 41% on "wide string", and 76% on "long string".
* libguile/hash.c (scm_i_string_hash): Rewrite to avoid
'scm_i_string_ref' calls.
This function has been unused internally for some time and is undocumented.
* libguile/hash.c (scm_string_hash): Wrap if #if SCM_ENABLE_DEPRECATED
== 1.
* libguile/hash.h (scm_string_hash): Likewise, and replace SCM_API with
SCM_DEPRECATED.
Fixes a bug introduced in cc1cd04f81
"Fix hashing of vectors to run in bounded time."
* libguile/hash.c (scm_hasher): Avoid division by zero.
* test-suite/tests/hash.test ("hash"): Add tests.
* libguile/hash.c (SCM_MIN): New macro.
(scm_hasher): In vector case, do nothing if d is 0. Make sure to
recurse with a reduced d. Move the loop out of the 'if'.
Moved scm_i_struct_hash from struct.c to hash.c and made it static.
The port's alist is now a field of 'scm_t_port'.
Conflicts:
libguile/arrays.c
libguile/hash.c
libguile/ports.c
libguile/print.h
libguile/read.c
* libguile/hash.c (scm_hasher): Call `scm_i_struct_hash' upon
`scm_tcs_struct'.
* libguile/struct.c (scm_i_struct_hash): New function.
* libguile/struct.h (scm_i_struct_hash): New declaration.
* test-suite/tests/structs.test ("hash"): New test prefix.
* libguile/hash.c (scm_raw_ihash): Rename from `hasher'. Remove the
modulo argument; we expect the caller to deal with that. Use
scm_i_hashq for immediates and non-immediate integers. Use
scm_raw_ihashq on pointers too. Update the vector and pairs hashing
code. There is still some work to do here.
(scm_ihashv, scm_ihash): Adapt.
* libguile/hash.c (JENKINS_LOOKUP3_HASHWORD2, narrow_string_hash)
(wide_string_hash, scm_string_hash, scm_i_string_hash)
(scm_i_latin1_string_hash): Replace our lame string hash with Bob
Jenkins' hash, treating each codepoint as a word, for the purposes of
the algorithm. There are probably more optimal hashes for our use
cases.
(scm_i_locale_string_hash): Remove optimization, as it wasn't used.
(scm_i_utf8_string_hash): Add a specialized implementation for utf8.
It's tricky but mostly just cut-and-paste.
* libguile/tags.h (SCM_MAKE_ITAG8_BITS): New helper, produces a
scm_t_bits instead of a SCM, because SCM_UNPACK is not a constant
expression with SCM_DEBUG_TYPING_STRICTNESS==2.
(SCM_MAKIFLAG_BITS): Remove SCM_MAKIFLAG, and replace with this, which
returns bits.
(SCM_BOOL_F_BITS, SCM_ELISP_NIL_BITS, SCM_EOL_BITS, SCM_BOOL_T_BITS):
(SCM_UNSPECIFIED_BITS, SCM_UNDEFINED_BITS, SCM_EOF_VAL_BITS):
(SCM_UNBOUND_BITS): New definitions. Defined SCM_BOOL_F, etc in terms
of them.
(SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_0):
(SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_1):
(SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_2):
(SCM_XXX_ANOTHER_LISP_FALSE_DONT_USE): Be bits instead of SCM values.
(SCM_BITS_DIFFER_IN_EXACTLY_ONE_BIT_POSITION):
(SCM_BITS_DIFFER_IN_EXACTLY_TWO_BIT_POSITIONS): Rename from
SCM_VALUES_DIFFER_..., and take unpacked bits as the args.
* libguile/boolean.c: Update verify block to use
SCM_BITS_DIFFER_IN_EXACTLY_TWO_BIT_POSITIONS et al.
* libguile/debug.c (scm_debug_opts):
* libguile/print.c (scm_print_opts):
* libguile/read.c (scm_read_opts): Use iflags bits for initializers.
* libguile/hash.c (scm_hasher): Use _BITS for iflags as case labels.
* libguile/pairs.c: Nil/null compile-time check uses
SCM_ELISP_NIL_BITS.
* libguile/tags.h (scm_tc7_gsubr): Return to the pool of unused tc7s, as
there are no more gsubrs. Yay :)
* libguile/programs.h (SCM_F_PROGRAM_IS_PRIMITIVE):
(SCM_PROGRAM_IS_PRIMITIVE): New flag and accessor.
* libguile/gsubr.c (create_gsubr):
* libguile/snarf.h (SCM_STATIC_PROGRAM): Give subrs a PRIMITIVE flag.
* libguile/smob.h:
* libguile/smob.c (scm_i_smob_arity): New internal procedure. Uses the
old GSUBR type macros, local to the file.
* libguile/procprop.c (scm_i_procedure_arity): Call out to
scm_i_smob_arity, and remove a gsubr case.
* libguile/gc.c (scm_i_tag_name):
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/goops.c (scm_class_of):
* libguile/vm.c (apply_foreign):
* libguile/hash.c (scm_hasher):
* libguile/debug.c (scm_procedure_name):
* libguile/print.c (iprin1): Remove gsubr cases.
* libguile/gsubr.h (SCM_PRIMITIVE_P): Fix to work with the new VM
program regimen.
(SCM_GSUBR_TYPE, SCM_GSUBR_MAKTYPE, SCM_GSUBR_MAX, SCM_GSUBR_REQ)
(SCM_GSUBR_OPT, SCM_GSUBR_REST): Remove these macros, that are no
longer useful.
* libguile/gsubr.c (scm_i_gsubr_apply, scm_i_gsubr_apply_list)
(scm_i_gsubr_apply_array): Remove internal gsubr application
functions.
* libguile/debug.c (scm_procedure_name): Remove a SCM_CLOSUREP case and
some dead code.
(scm_procedure_module): Remove. This was introduced a few months ago
for the hygienic expander, but now it is no longer needed, as the
expander keeps track of this information itself.
* libguile/debug.h: Remove scm_procedure_module.
* libguile/eval.c: Instead of using tc3 closures, define a "boot
closure" applicable smob type, and represent closures with that. The
advantage is that after eval.scm is compiled, boot closures take up no
address space (besides a smob number) in the runtime, and require no
special cases in procedure dispatch.
* libguile/eval.h: Remove the internal functions scm_i_call_closure_0
and scm_closure_apply, and the public function scm_closure.
* libguile/gc.c (scm_storage_prehistory): No tc3_closure displacement
registration.
(scm_i_tag_name): Remove closure case, and a dead cclo case.
* libguile/vm.c (apply_foreign):
* libguile/print.c (iprin1):
* libguile/procs.c (scm_procedure_p, scm_procedure_documentation);
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/goops.c (scm_class_of): Remove tc3_closure/tcs_closure cases.
* libguile/hash.c (scm_hasher):
* libguile/hooks.c (scm_add_hook_x): Use new scm_i_procedure_arity.
* libguile/macros.c (macro_print): Print all macros using the same code.
(scm_macro_transformer): Return any procedure, not just programs.
* libguile/procprop.h:
* libguile/procprop.c (scm_i_procedure_arity): Instead of returning a
list that the caller has to parse, have the same prototype as
scm_i_program_arity. An incompatible change, but it's an internal
function anyway.
(scm_procedure_properties, scm_set_procedure_properties)
(scm_procedure_property, scm_set_procedure_property): Remove closure
cases, and use scm_i_program_arity for arity.
* libguile/procs.h (SCM_CLOSUREP, SCM_CLOSCAR, SCM_CODE)
(SCM_CLOSURE_NUM_REQUIRED_ARGS, SCM_CLOSURE_HAS_REST_ARGS)
(SCM_CLOSURE_BODY, SCM_PROCPROPS, SCM_SETPROCPROPS, SCM_ENV)
(SCM_TOP_LEVEL): Remove these macros that pertain to boot closures
only. Only eval.c should know abut boot closures.
* libguile/procs.c (scm_closure_p): Remove this function. There is a
simple stub in deprecated.scm now.
(scm_thunk_p): Use scm_i_program_arity.
* libguile/tags.h (scm_tc3_closure): Remove. Yay, another tc3 to play
with!
(scm_tcs_closures): Remove.
* libguile/validate.h (SCM_VALIDATE_CLOSURE): Remove.
* module/ice-9/deprecated.scm (closure?): Add stub.
* module/ice-9/documentation.scm (object-documentation)
* module/ice-9/session.scm (help-doc, arity)
* module/oop/goops.scm (compute-getters-n-setters)
* module/oop/goops/describe.scm (describe)
* module/system/repl/describe.scm (display-object, display-type):
Remove calls to closure?.
* libguile/pairs.h:
* libguile/pairs.c: Previously scm_cdadr et al were implemented as
#defines that called scm_i_chase_pairs, and the Scheme-exposed
functions themselves were cxr subrs, which got special help in the
interpreter. Since now the special help is unnecessary (because the
compiler inlines and expands calls to car, cdadr, etc), the complexity
is a loss. So just implement cdadr etc using normal functions. There's
an advantage too, in that the compiler can unroll the cxring, reducing
branches.
* libguile/tags.h (scm_tc7_cxr): Remove this tag.
(scm_tcs_subrs): Now there's only one kind of subr, yay!
* libguile/debug.c (scm_procedure_name)
* libguile/evalext.c (scm_self_evaluating_p)
* libguile/gc.c (scm_i_tag_name)
* libguile/goops.c (scm_class_of)
* libguile/hash.c (scm_hasher)
* libguile/print.c (iprin1)
* libguile/procprop.c (scm_i_procedure_arity)
* libguile/procs.c (scm_procedure_p, scm_subr_p)
(scm_make_procedure_with_setter)
* libguile/vm.c (apply_foreign): Remove cxr cases. Replace uses of
scm_tcs_subrs with scm_tc7_gsubr.
* libguile/hash.c (scm_i_string_hash): new function
(scm_hasher): don't unpack string: use scm_i_string_hash
* libguile/hash.h: new declaration for scm_i_string_hash
* libguile/print.c (quote_keywordish_symbol): use symbol accessors
(scm_i_print_symbol_name): new function
(scm_print_symbol_name): call scm_i_print_symbol_name
(iprin1): use scm_i_print_symbol_name to print symbols
* libguile/print.h: new declaration for scm_i_print_symbol_name
* libguile/symbols.c (lookup_interned_symbol): now takes scheme string
instead of c string; callers changed
(lookup_interned_symbol): add wide symbol support
(scm_i_c_mem2symbol): removed
(scm_i_mem2symbol): removed and replaced with scm_i_str2symbol
(scm_i_str2symbol): new function
(scm_i_mem2uninterned_symbol): removed and replaced with
scm_i_str2uninterned_symbol
(scm_i_str2uninterned_symbol): new function
(scm_make_symbol, scm_string_to_symbol, scm_from_locale_symbol)
(scm_from_locale_symboln): use scm_i_str2symbol
* test-suite/tests/symbols.test: new tests