* libguile/read.c (scm_read_sharp_extension): Attach source properties
to the result of a custom token reader if the returned datum is not
immediate. Previously, source properties were added to pairs only.
* libguile/read.c (enum t_keyword_style, struct t_read_opts,
scm_t_read_opts): New types.
(init_read_options): New function.
(CHAR_IS_DELIMITER): Look up square-brackets option via local 'opts'.
(scm_read): Call 'init_read_options', and pass 'opts' to helpers.
(flush_ws, maybe_annotate_source, read_complete_token, read_token,
scm_read_bytevector, scm_read_character,
scm_read_commented_expression, scm_read_expression,
scm_read_guile_bit_vector, scm_read_keyword,
scm_read_mixed_case_symbol, scm_read_nil, scm_read_number,
scm_read_number_and_radix, scm_read_quote, scm_read_sexp,
scm_read_sharp, scm_read_sharp_extension, scm_read_shebang,
scm_read_srfi4_vector, scm_read_string, scm_read_syntax,
scm_read_vector, scm_read_array): Add 'opts' as an additional
parameter, and use it to look up read options. Previously the global
read options were consulted directly.
* libguile/read.c (CHAR_IS_R5RS_DELIMITER, CHAR_IS_DELIMITER): Move the
'[' and ']' delimiters from CHAR_IS_R5RS_DELIMITER to
CHAR_IS_DELIMITER. Parenthesize all references to the macro
parameter. Don't check the global square-brackets read option until
after we know the character is '[' or ']'.
(scm_read_sexp): Don't check the global square-brackets read option
until after we know the character is ']'.
* libguile/arrays.c (read_decimal_integer): Move to read.c.
(scm_i_read_array): Remove. Incorporate the code into the
'scm_read_array' static function in read.c.
* libguile/arrays.h (scm_i_read_array): Remove prototype.
* libguile/read.c (read_decimal_integer): Move here from read.c.
(scm_read_array): Incorporate the code from 'scm_i_read_array'. Call
'scm_read_vector' and 'scm_read_sexp' instead of 'scm_read'.
According to the new benchmarks, this leads a 5% speed improvement when
reading small strings, and a 27% improvement when reading large strings.
* libguile/read.c (READER_STRING_BUFFER_SIZE): Change to 128; update
comment to mention codepoints.
(scm_read_string): Make `str' a list of strings, instead of a string.
Store characters read in buffer `c_str'. Cons to STR when C_STR is
full, and concatenate/reverse at the end.
* benchmark-suite/benchmarks/read.bm (small, large): New variables.
Set %DEFAULT-PORT-ENCODING to "UTF-8".
("read")["small strings", "large strings"]: New benchmarks.
* libguile/read.c (read_token): Remove unneeded `const' before `size_t'.
(read_complete_token): Remove `overflow_buffer' parameter; return
`char *' instead of `int'. Allocate the overflow buffer with
`scm_gc_malloc_pointerless' instead of `scm_malloc'. Return either
the overflow buffer or BUFFER.
(scm_read_number, scm_read_mixed_case_symbol,
scm_read_number_and_radix): Rename `buffer' to `local_buffer', and
`overflow_buffer' to `buffer'. Remove `overflow'. Adjust code to new
`read_complete_token'.
* libguile/read.c (scm_read_number): Set source properties on
non-immediate numbers if the 'positions' reader option is set.
* doc/ref/api-debug.texi (Source Properties): Update manual.
* libguile/read.c (scm_read_array): New internal helper that
calls scm_i_read_array and sets its source property if the
'positions' reader option is set.
(scm_read_string): Set source properties on strings if the 'positions'
reader option is set.
(scm_read_vector, scm_read_srfi4_vector, scm_read_bytevector,
scm_read_guile_bitvector, scm_read_sharp): Add new arguments for the
'line' and 'column' of the first character of the datum being read.
Set source properties if the 'positions' reader option is set.
(scm_read_expression): Pass 'line' and 'column' to scm_read_sharp.
* doc/ref/api-debug.texi (Source Properties): Update manual.
* libguile/read.c (scm_read_string): Return a freshly allocated string
every time, even for empty strings. The motivation is to allow source
properties to be added to all strings. Previously, the shared global
'scm_nullstr' was returned for empty strings. Note that empty strings
still share a common global 'null_stringbuf'.
* test-suite/tests/srfi-13.test (substring/shared): Fix tests to reflect
the fact that empty string literals are no longer guaranteed to be
'eq?' to each other.
* libguile/read.c (scm_read_r6rs_block_comment):
* test-suite/tests/reader.test ("reading"): Fix reading of #||||#,
originally reported in bug debbugs.gnu.org/9672, by Bruno Haible.
Thanks, Bruno!
* libguile/read.c (scm_read_sexp): Don't confuse `#{.}#' with `.' for
the purpose of reading dotted pairs. Thanks to CRLF0710 for the
report.
* test-suite/tests/reader.test ("#{}#"): Add test.
* libguile/srcprop.h: Remove internal scm_source_whash declaration.
* libguile/srcprop.c (scm_i_set_source_properties_x)
(scm_i_has_source_properties): New helpers.
(scm_source_whash): Make static.
* libguile/read.c (scm_read_sexp): Remove register declarations here;
let's trust the compiler. Remove code to incrementally build up a
copy; instead let's let scm_i_set_source_properties_x handle copying
the expression if needed.
(scm_read_quote, scm_read_syntax): Use scm_i_set_source_properties_x.
(recsexpr): Remove this helper from 1996.
(scm_read_sharp_extension): Instead of trying to recursively label
sharp-read subforms with source properties, just label the outside
form and rely on the macro-expander to propagate it down.
* libguile/numbers.c (scm_logand): Fix a type error (comparing a SCM
against an int, when we really wanted to compare the unpacked
fixnum).
* libguile/ports.c (scm_i_set_conversion_strategy_x): Check
scm_conversion_strategy_init, not scm_conversion_strategy.
* libguile/read.c (recsexpr): Fix loops to avoid strange test of SCM
values.
* libguile/tags.h (SCM_MAKE_ITAG8_BITS): New helper, produces a
scm_t_bits instead of a SCM, because SCM_UNPACK is not a constant
expression with SCM_DEBUG_TYPING_STRICTNESS==2.
(SCM_MAKIFLAG_BITS): Remove SCM_MAKIFLAG, and replace with this, which
returns bits.
(SCM_BOOL_F_BITS, SCM_ELISP_NIL_BITS, SCM_EOL_BITS, SCM_BOOL_T_BITS):
(SCM_UNSPECIFIED_BITS, SCM_UNDEFINED_BITS, SCM_EOF_VAL_BITS):
(SCM_UNBOUND_BITS): New definitions. Defined SCM_BOOL_F, etc in terms
of them.
(SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_0):
(SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_1):
(SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_2):
(SCM_XXX_ANOTHER_LISP_FALSE_DONT_USE): Be bits instead of SCM values.
(SCM_BITS_DIFFER_IN_EXACTLY_ONE_BIT_POSITION):
(SCM_BITS_DIFFER_IN_EXACTLY_TWO_BIT_POSITIONS): Rename from
SCM_VALUES_DIFFER_..., and take unpacked bits as the args.
* libguile/boolean.c: Update verify block to use
SCM_BITS_DIFFER_IN_EXACTLY_TWO_BIT_POSITIONS et al.
* libguile/debug.c (scm_debug_opts):
* libguile/print.c (scm_print_opts):
* libguile/read.c (scm_read_opts): Use iflags bits for initializers.
* libguile/hash.c (scm_hasher): Use _BITS for iflags as case labels.
* libguile/pairs.c: Nil/null compile-time check uses
SCM_ELISP_NIL_BITS.
* libguile/deprecated.h:
* libguile/deprecated.c (scm_whash_get_handle, SCM_WHASHFOUNDP)
(SCM_WHASHREF, SCM_WHASHSET, scm_whash_create_handle)
(scm_whash_lookup, scm_whash_insert): Deprecate this API.
* libguile/srcprop.c:
* libguile/srcprop.h:
* libguile/read.c (scm_read_sexp): Use the hashq API instead of the
whash API.
* libguile/read.c (scm_read_extended_symbol): Interpret '\' as an escape
character. Due to some historical oddities we have to support '\'
before any character, but since we never emitted '\' in front of
"normal" characters like 'x' we can interpret "\x..;" to be an R6RS
hex escape.
* test-suite/tests/reader.test ("#{}#"): Add tests.
* libguile/read.c (scm_read_sharp): Move the "#c..." case outside of
#if SCM_ENABLE_DEPRECATED, and to the same section which handles
"#s...", "#u..." and "#f...".
Thanks to Andreas Rottmann <a.rottmann@gmx.at> for the bug report.
* libguile/read.c (scm_i_scan_for_encoding): Fix for coding on first
line #! and for !# immediately following the coding.
* test-suite/Makefile.am:
* test-suite/tests/coding.test: Add tests.
* libguile/read.c (scm_i_scan_for_encoding): If possible, just use the
read buffer for the encoding scan, and avoid seeking. Fixes
`(open-input-file "/dev/urandom")', because /dev/urandom can't be
seeked backwards.
* libguile/read.c (scm_read_scsh_block_comment): Use `scm_getc' instead
of `scm_get_byte_or_eof'.
* test-suite/tests/reader.test ("read-options")["position of SCSH block
comment"]: New test.
* libguile/read.c (scm_read_opts): Default "positions" to #t. The
compiler was already turning it on anyway, and this allows
primitive-load without --auto-compile to also propagate source
information through the expander, for better errors and to let macros
know their source.
* module/language/scheme/spec.scm: No need to enable positions here
now.
* libguile/private-options.h (SCM_HUNGRY_EOL_ESCAPES_P): New private
option.
* libguile/read.c: Define SCM_HUNGRY_EOL_ESCAPES_P, defaulting to #f.
(skip_intraline_whitespace): New helper.
(scm_read_string): If SCM_HUNGRY_EOL_ESCAPES_P,
skip_intraline_whitespace after an escaped EOL.
* test-suite/tests/reader.test ("read-options"): Add test.
* libguile/bytevectors.c:
* libguile/eval.c:
* libguile/goops.c:
* libguile/i18n.c:
* libguile/load.c:
* libguile/memoize.c:
* libguile/modules.c:
* libguile/ports.c:
* libguile/print.c:
* libguile/procs.c:
* libguile/programs.c:
* libguile/read.c:
* libguile/script.c:
* libguile/srfi-14.c:
* libguile/stacks.c:
* libguile/strings.c:
* libguile/throw.c:
* libguile/vm.c: Use scm_from_latin1_symboln to make symbols from string
literals, because they aren't in the user's locale -- they are in
ASCII, and we can optimize this case.
* libguile/vm-i-loader.c: Also use scm_from_latin1_symboln when loading
narrow symbols.
* libguile/bytevectors.c:
* libguile/goops.c:
* libguile/instructions.c:
* libguile/numbers.c:
* libguile/random.c:
* libguile/read.c:
* libguile/vm-i-scheme.c: Fix a number of assumptions that a long could
hold an inum. This is not the case on platforms whose void* is larger
than their long.
* libguile/numbers.c (scm_i_inum2big): New helper, only implemented for
sizeof(void*) == sizeof(long); produces a compile error on other
platforms. Basically gmp doesn't have a nice interface for converting
between mpz values and intmax_t.
* libguile/debug.c:
* libguile/eval.c:
* libguile/frames.c:
* libguile/objcodes.c:
* libguile/print.c:
* libguile/programs.c:
* libguile/read.c:
* libguile/struct.c:
* libguile/vm.c: Fix a number of instances in which we assumed we could
fit a pointer into a long.
This allows customizing the reader behavior for a dynamic extent more easily.
* libguile/read.c (scm_read_hash_procedures): Renamed to
`scm_i_read_hash_procedures'.
(scm_i_read_hash_procedures_ref, scm_i_read_hash_procedures_set_x):
New (internal) accessor functions for the fluid.
(scm_read_hash_extend, scm_get_hash_procedure): Use these accessor
functions.
(scm_init_read): Create the fluid, named `%read-hash-procedures' instead of
the previous plain list `read-hash-procedures'.
* test-suite/tests/reader.test: Adapt the "R6RS/SRFI-30 block comment
syntax overridden" test to make use of the fluid.
* module/ice-9/deprecated.scm (read-hash-procedures):
New identifier macro -- backward-compatibility shim.
Signed-off-by: Ludovic Courtès <ludo@gnu.org>
* libguile/private-options.h (SCM_ELISP_VECTORS_P, SCM_ESCAPED_PARENS_P):
* libguile/read.c (scm_read_opts): Remove unused elisp-vectors option,
and the elisp-strings option (which allowed \( and \) escapes in
strings).
(scm_read_string): Remove the elisp-strings case.
* doc/ref/api-options.texi (Reader options): Update, and update wording
of the case-insensitive bit.
R6RS character hex escapes do not conflict with legacy Guile octal
character escapes, so they can be enabled by default.
* libguile/read.c (scm_read_character): modified
* test-suite/tests/reader.test: modify character escape tests
* doc/ref/api-data.texi: modified
* doc/ref/api-options.texi: modified
Note especially that the variable 'i' has two different uses in this
function, and they get confused.
* libguile/read.c (scm_i_scan_for_encoding): cleanup
* libguile/read.c (scm_read_shebang): New function;
(scm_read_sharp): Call scm_read_shebang on '!', which delegates to
scm_read_scsh_block_comment as necessary.
* test-suite/tests/reader.test ("R6RS lexeme comment", "partial R6RS
lexeme comment"): New tests.
* libguile/read.c (read_token): now takes a C buffer instead of a SCM.
string. All callers changed.
(read_complete_token): now takes C buffers, not SCM strings. No longer
does port position updates or encoding processing. All callers changed.
(scm_read_number, scm_read_mixed_case_symbol, scm_read_number_and_radix)
(scm_read_character): Do port updates and string processing no longer
done by read_complete_token. Some reordering for optimization.
* libguile/private-options.h:
* libguile/read.c (scm_read_opts, SCM_SQUARE_BRACKETS_P): Add an option
for treating [ and ] as parentheses, on by default. Note that this
makes them delimiters also, so [ and ] cannot appear in a symbol name,
with this read option on.
(scm_read_sexp): If we start with [, we end with ].
(scm_read_expression): Add case for [.