* libguile/print.c (display_string_using_iconv): If the encoding the
full utf8 buffer would overflow the output buffer, just keep trucking
instead of erroring. Fixes#22667.
* test-suite/tests/iconv.test ("round-trip"): Add some tests.
This shows a 19% improvement on the "string without escapes"
micro-benchmark of 'write.bm', and 12% on "string with escapes".
* libguile/print.c (iprin1) <scm_tc7_string>: Replace 'scm_i_string_ref'
loop with a call to 'write_string'.
(display_character): Adjust description of return value in comment.
(write_string): New function.
This is a follow-up to e26ab06.
* libguile/print.c (scm_simple_format): Pass 1 to
SCM_VALIDATE_OPORT_VALUE, for 'destination'.
* test-suite/tests/format.test ("simple-format"): Add test.
* libguile/print.c (print_r7rs_extended_symbol): Print any unicode
graphic character other than '|' or '\' unescaped. Escape any spacing
character other than ASCII space.
* libguile/print.c (scm_print_opts): Add 'r7rs-symbols' print option.
(symbol_has_extended_read_syntax): If the 'r7rs-symbols' option is
enabled, then disallow '|' and '\' from bare symbols.
(print_extended_symbol): Use 'scm_lfwrite' and 'scm_putc' instead of
'display_string' and 'display_character' when printing ASCII literals.
(print_r7rs_extended_symbol): New static function.
(scm_i_print_symbol_name): If the 'r7rs-symbols' option is enabled,
use 'print_r7rs_extended_symbol' instead of 'print_extended_symbol'.
* libguile/private-options.h (SCM_PRINT_R7RS_SYMBOLS_P): New macro.
(SCM_N_PRINT_OPTIONS): Increment.
* doc/ref/api-evaluation.texi (Scheme Write): Mention 'r7rs-symbols'
print option.
* test-suite/tests/print.test ("write"): Add tests.
* libguile/ports-internal.h (struct scm_port_internal): Add new members
'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
(SCM_UNICODE_BOM): New macro.
(scm_i_port_iconv_descriptors): Add 'mode' parameter to prototype.
* libguile/ports.c (scm_new_port_table_entry): Initialize
'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
(get_iconv_codepoint): Pass new 'mode' parameter to
'scm_i_port_iconv_descriptors'.
(get_codepoint): After reading a codepoint at stream start, record
that we're no longer at stream start, and consume a BOM where
appropriate.
(scm_seek): Set the stream start flags according to the new position.
(looking_at_bytes): New static function.
(scm_utf8_bom, scm_utf16be_bom, scm_utf16le_bom, scm_utf32be_bom,
scm_utf32le_bom): New static const arrays.
(decide_utf16_encoding, decide_utf32_encoding): New static functions.
(scm_i_port_iconv_descriptors): Add new 'mode' parameter. If the
specified encoding is UTF-16 or UTF-32, make that precise by deciding
what byte order to use, and construct iconv descriptors based on the
precise encoding.
(scm_i_set_port_encoding_x): Record that we are now at stream start.
Do not open the new iconv descriptors immediately; let them be
initialized lazily.
* libguile/print.c (display_string_using_iconv): Record that we're no
longer at stream start. Write a BOM if appropriate.
* doc/ref/api-io.texi (BOM Handling): New node.
* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
Adapt test to cope with the fact that 'set-port-encoding!' does not
immediately open the iconv descriptors.
(bv-read-test): New procedure.
("unicode byte-order marks (BOMs)"): New test prefix.
Based on 6c98257f2e by Andy Wingo.
* libguile/ports-internal.h (struct scm_port_internal): Add a flag
for the port encoding mode: UTF8 or iconv. The iconv descriptors
are now in a separate structure so that we can avoid attaching
finalizers to the ports themselves in the future.
(enum scm_port_encoding_mode): New enum.
(struct scm_iconv_descriptors): New struct.
(scm_i_port_iconv_descriptors): Add prototype.
* libguile/ports.c (finalize_port): Don't close iconv descriptors here.
(scm_new_port_table_entry): Adapt to the iconv descriptors being
moved. Initialize 'encoding_mode'.
(scm_i_remove_port): Adapt to call 'close_iconv_descriptors'.
(close_iconv_descriptors): New static function.
(get_iconv_codepoint): Use 'scm_i_port_iconv_descriptors'.
(get_codepoint): Check the port 'encoding_mode'.
(finalize_iconv_descriptors, open_iconv_descriptors,
close_iconv_descriptors, scm_i_port_iconv_descriptors): New static
functions.
(scm_i_set_port_encoding_x): Adapt to iconv descriptors being moved
to separate structure, to set the 'encoding_mode' flag, and to use
'open_iconv_descriptors' and 'close_iconv_descriptors'.
* libguile/print.c (display_string_using_iconv): Use
'scm_i_port_iconv_descriptors'.
(display_string): Use 'encoding_mode' flag.
* libguile/ports-internal.h: New file.
* libguile/Makefile.am (noinst_HEADERS): Add ports-internal.h.
* libguile/ports.h (scm_t_port): Add a comment mentioning that the
'input_cd' and 'output_cd' fields of the public structure are no
longer what they seem to be.
* libguile/ports.c: Include ports-internal.h.
(finalize_port, scm_i_remove_port, get_iconv_codepoint, get_codepoint,
scm_i_set_port_encoding_x): Access 'input_cd' and 'output_cd' via the
new internal port structure.
(scm_new_port_table_entry): Allocate and initialize the internal port
structure.
* libguile/print.c: Include ports-internal.h.
(display_string_using_iconv, display_string): Access 'input_cd' and
'output_cd' via 'internal' pointer.
Fixes <http://bugs.gnu.org/12033>.
Reported by nalaginrut <nalaginrut@gmail.com>.
* libguile/print.c (scm_i_display_substring): New function.
* libguile/print.h (scm_i_display_substring): New internal declaration.
* libguile/ports.c (scm_lfwrite_substr): Use it instead of `scm_display'
+ `scm_c_substring'.
* libguile/print.c (PORT_CONVERSION_HANDLER): New macro.
(print_extended_symbol, iprin1, write_character, scm_write_char): Use
it instead of `scm_i_get_conversion_strategy'.
* libguile/strports.c (scm_mkstrport): Assign `pt->ilseq_handler'
directly instead of via `scm_i_set_conversion_strategy_x'.
* libguile/tags.h (scm_tc7_array): Allocate a tag for arrays.
* libguile/arrays.h (SCM_I_ARRAYP): Change to use scm_tc7_array. The
previous definition was not externally usable because scm_i_tc16_array
was internal.
(scm_i_print_array): Declare, though internally.
* libguile/arrays.c (scm_i_make_array): Use scm_cell with the tc7
instead of NEWSMOB.
(scm_i_print_array): Make not static.
(SCM_ARRAY_IMPLEMENTATION): Adapt.
(scm_init_arrays): Remove array smob declaration.
* libguile/eq.c (scm_equal_p): Refactor to put the string, pointer, and
bytevector cases in the switch. Add a case for arrays.
* libguile/goops.c: Add <array> declarations.
* libguile/print.c (iprin1): Call scm_i_print_array as needed.
* libguile/evalext.c (scm_self_evaluating_p): Add a case for arrays.
* libguile/private-options.h (SCM_PRINT_ESCAPE_NEWLINES_P):
* libguile/print.c: Add new escape-newlines print option, defaulting to
on.
(write_character): For newlines, if SCM_PRINT_ESCAPE_NEWLINES_P, then
print them as \n.
(scm_init_print): Refactor print options initialization.
* libguile/tags.h (SCM_MAKE_ITAG8_BITS): New helper, produces a
scm_t_bits instead of a SCM, because SCM_UNPACK is not a constant
expression with SCM_DEBUG_TYPING_STRICTNESS==2.
(SCM_MAKIFLAG_BITS): Remove SCM_MAKIFLAG, and replace with this, which
returns bits.
(SCM_BOOL_F_BITS, SCM_ELISP_NIL_BITS, SCM_EOL_BITS, SCM_BOOL_T_BITS):
(SCM_UNSPECIFIED_BITS, SCM_UNDEFINED_BITS, SCM_EOF_VAL_BITS):
(SCM_UNBOUND_BITS): New definitions. Defined SCM_BOOL_F, etc in terms
of them.
(SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_0):
(SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_1):
(SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_2):
(SCM_XXX_ANOTHER_LISP_FALSE_DONT_USE): Be bits instead of SCM values.
(SCM_BITS_DIFFER_IN_EXACTLY_ONE_BIT_POSITION):
(SCM_BITS_DIFFER_IN_EXACTLY_TWO_BIT_POSITIONS): Rename from
SCM_VALUES_DIFFER_..., and take unpacked bits as the args.
* libguile/boolean.c: Update verify block to use
SCM_BITS_DIFFER_IN_EXACTLY_TWO_BIT_POSITIONS et al.
* libguile/debug.c (scm_debug_opts):
* libguile/print.c (scm_print_opts):
* libguile/read.c (scm_read_opts): Use iflags bits for initializers.
* libguile/hash.c (scm_hasher): Use _BITS for iflags as case labels.
* libguile/pairs.c: Nil/null compile-time check uses
SCM_ELISP_NIL_BITS.
* libguile/ports.c (update_port_lf): Handle EOF.
(get_utf8_codepoint, get_iconv_codepoint): New functions.
(get_codepoint): Use them.
(scm_i_set_port_encoding_x): Don't open conversion descriptors when
ENCODING is "UTF-8".
* libguile/print.c (display_string_as_utf8, display_string_using_iconv):
New functions.
(display_string): Use them.
* test-suite/tests/ports.test ("string ports")[#xc2 #x41 #x42]: Add a
note that this is not the wrong behavior per Unicode 6.0.0.
* libguile/print.c (symbol_has_extended_read_syntax): Use a more
general, unicode-appropriate algorithm. Hopefully doesn't cause
any current #{}# cases to be unescaped.
(print_extended_symbol): Use more appropriate unicode algorithm, and
emit unicode hex escapes instead of our own lame escapes.
* test-suite/tests/symbols.test: Add tests.
* libguile/print.c (symbol_has_extended_read_syntax)
(print_normal_symbol, print_extended_symbol, scm_i_print_symbol_name):
Factor scm_i_print_symbol_name into separate routines. Add comments.
There are a number of bugs here.
* libguile/strports.c (INITIAL_BUFFER_SIZE): New macro.
(scm_mkstrport): If STR is false, allocate a bytevector on the
caller's behalf.
(scm_object_to_string, scm_call_with_output_string,
scm_open_output_string): Pass SCM_BOOL_F as the STR argument of
`scm_mkstrport'.
* libguile/backtrace.c (scm_display_application,
display_backtrace_body): Likewise.
* libguile/gdbint.c (scm_init_gdbint): Likewise.
* libguile/print.c (scm_simple_format): Likewise.
Thanks to Bruno Haible for his suggestions. See
<http://lists.gnu.org/archive/html/bug-libunistring/2010-09/msg00007.html>,
for details.
* libguile/ports.c (register_finalizer_for_port): Always register a
finalizer for PORT.
(finalize_port): Close ENTRY->input_cd and ENTRY->output_cd.
(scm_new_port_table_entry): Initialize the `input_cd' and `output_cd'
fields.
(utf8_to_codepoint): New function.
(get_codepoint): Rewrite to use `iconv' instead of libunistring.
(scm_i_set_port_encoding_x): Initialize the `input_cd' and `output_cd'
fields.
(update_port_lf): Move upward. Use `switch' instead of `if's.
* libguile/ports.h (scm_t_port)[input_cd, output_cd]: New fields.
* libguile/print.c (codepoint_to_utf8, display_string): New functions.
(display_character): Use `display_string'.
(write_combining_character): Likewise.
(iprin1): Use `display_string' instead of `scm_lfwrite_str', and
`display_character' instead of `scm_putc'.
(write_character): Likewise.
(write_character_escaped): New function.
* test-suite/tests/encoding-escapes.test ("display output
escapes")["Rashomon"]: Use lower-case escapes.
["fake escape"]: New test.
* libguile/bytevectors.c:
* libguile/eval.c:
* libguile/goops.c:
* libguile/i18n.c:
* libguile/load.c:
* libguile/memoize.c:
* libguile/modules.c:
* libguile/ports.c:
* libguile/print.c:
* libguile/procs.c:
* libguile/programs.c:
* libguile/read.c:
* libguile/script.c:
* libguile/srfi-14.c:
* libguile/stacks.c:
* libguile/strings.c:
* libguile/throw.c:
* libguile/vm.c: Use scm_from_latin1_symboln to make symbols from string
literals, because they aren't in the user's locale -- they are in
ASCII, and we can optimize this case.
* libguile/vm-i-loader.c: Also use scm_from_latin1_symboln when loading
narrow symbols.
* libguile/debug.c:
* libguile/eval.c:
* libguile/frames.c:
* libguile/objcodes.c:
* libguile/print.c:
* libguile/programs.c:
* libguile/read.c:
* libguile/struct.c:
* libguile/vm.c: Fix a number of instances in which we assumed we could
fit a pointer into a long.
The characters U+0007 to U+000D have non-hex forms for their
escapes when in written strings.
* libguile/print.c (write_character): use non-hex escapes
* test-suite/tests/reader.test (write R6RS string escapes): adjust test
Reported by Mike Gran <spk121@yahoo.com>.
* libguile/strings.c (scm_i_unistring_escapes_to_guile_escapes,
scm_i_unistring_escapes_to_r6rs_escapes): Augment comments.
(scm_to_stringn): When `handler ==
SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE && SCM_R6RS_ESCAPES_P', realloc
BUF so that it's large enough for the worst case.
* libguile/print.c (display_character): When `result != NULL && strategy
== SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE && SCM_R6RS_ESCAPES_P', make
LOCALE_ENCODED large enough to hold an R6RS escape.
This had been removed by commit 07f49ac786
("Factorize and optimize `write' for strings and characters.").
Thanks Mike!
* libguile/print.c (write_combining_character): New procedure.
(write_character): Use it.
* test-suite/tests/chars.test ("basic char handling")["combining accent
is pretty-printed", "combining X is pretty-printed"]: New tests.
* test-suite/tests/encoding-iso88591.test ("characters")["write A
followed by combining accent"]: New test.
* test-suite/tests/encoding-utf8.test ("characters")["write A followed
by combining accent"]: New test.
According to `write.bm', this makes `write' 2.6 times faster for strings.
* libguile/print.c (iprin1): Use `write_character' when
`SCM_WRITINGP (pstate)' and `SCM_CHARP (exp)' or `scm_is_string (exp)'.
(scm_i_charprint): Remove.
(display_character, write_character): New functions.
(scm_write_char): Use `display_character' instead of
`scm_i_charprint'.
* libguile/print.h (scm_i_charprint): Remove declaration.
* benchmark-suite/benchmarks/write.bm: New file.
* benchmark-suite/Makefile.am (SCM_BENCHMARKS): Add
`benchmarks/write.bm'.
This enables more efficient implementations of several operations,
e.g. scm_is_lisp_bool, canonicalize_boolean, fast_boolean_not,
converting SCM booleans to C booleans, etc.
* libguile/tags.h: Renumber IFLAGs.
* libguile/print.c: Renumber iflagnames to match.
* libguile/boolean.c:
* libguile/boolean.h:
SCM_XXX_ANOTHER_BOOLEAN_DONT_USE --> SCM_XXX_ANOTHER_BOOLEAN_DONT_USE_0
This adds a reader option 'r6rs-hex-escapes that modifies the
behavior of numeric escapes in characters and strings. When enabled,
variable-length character hex escapes (#\xNNN) are allowed and become
the default output format for numerically-escaped characters. Also,
string hex escapes switch to a semicolon terminated hex escape (\xNNNN;).
* libguile/print.c (PRINT_CHAR_ESCAPE): new macro
(iprin1): use new macro PRINT_CHAR_ESCAPE
* libguile/private-options.h (SCM_R6RS_ESCAPES_P): new #define
* libguile/read.c (scm_read_opts): add new option r6rs-hex-escapes
(SCM_READ_HEX_ESCAPE): modify to take a terminator parameter
(scm_read_string): parse R6RS hex string escapes
(scm_read_character): parse R6RS hex character escapes
* test-suite/tests/chars.test (with-read-options): new procedure
(R6RS hex escapes): new tests
* test-suite/tests/strings.test (with-read-options): new procedure
(R6RS hex escapes): new tests
* libguile/tags.h (scm_tc7_gsubr): Return to the pool of unused tc7s, as
there are no more gsubrs. Yay :)
* libguile/programs.h (SCM_F_PROGRAM_IS_PRIMITIVE):
(SCM_PROGRAM_IS_PRIMITIVE): New flag and accessor.
* libguile/gsubr.c (create_gsubr):
* libguile/snarf.h (SCM_STATIC_PROGRAM): Give subrs a PRIMITIVE flag.
* libguile/smob.h:
* libguile/smob.c (scm_i_smob_arity): New internal procedure. Uses the
old GSUBR type macros, local to the file.
* libguile/procprop.c (scm_i_procedure_arity): Call out to
scm_i_smob_arity, and remove a gsubr case.
* libguile/gc.c (scm_i_tag_name):
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/goops.c (scm_class_of):
* libguile/vm.c (apply_foreign):
* libguile/hash.c (scm_hasher):
* libguile/debug.c (scm_procedure_name):
* libguile/print.c (iprin1): Remove gsubr cases.
* libguile/gsubr.h (SCM_PRIMITIVE_P): Fix to work with the new VM
program regimen.
(SCM_GSUBR_TYPE, SCM_GSUBR_MAKTYPE, SCM_GSUBR_MAX, SCM_GSUBR_REQ)
(SCM_GSUBR_OPT, SCM_GSUBR_REST): Remove these macros, that are no
longer useful.
* libguile/gsubr.c (scm_i_gsubr_apply, scm_i_gsubr_apply_list)
(scm_i_gsubr_apply_array): Remove internal gsubr application
functions.
* libguile/tags.h (scm_tc7_frame, scm_tc7_objcode, scm_tc7_vm)
(scm_tc7_vm_cont): Take more tc7s for VM-related data structures.
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/gc.c (scm_i_tag_name):
* libguile/goops.c (scm_class_of, create_standard_classes):
* libguile/print.c (iprin1): Add cases for the new tc7s.
* libguile/frames.c:
* libguile/frames.h:
* libguile/objcodes.c:
* libguile/objcodes.h:
* libguile/vm.c:
* libguile/vm.h: Desmobify.
* libguile/vm.c (scm_vm_apply): Export to Scheme, because VM objects are
no longer applicable.
* module/system/repl/command.scm (profile):
* module/system/vm/trace.scm (vm-trace):
* module/system/vm/vm.scm (vm-load): Call vm-apply to run a program in a
VM instead of treating the VM as applicable.
* libguile/foreign.h:
* libguile/foreign.c: New files, implementing simple wrappers around
foreign values, such as those that one might link in dynamically from
a library.
* libguile/tags.h (scm_tc7_foreign): Take a tc7 for foreign values.
* libguile.h:
* libguile/init.c: Add foreign.h to headers and init.
* libguile/print.c (iprin1): Add printer for foreign values.
* libguile/gc.c (scm_i_tag_name): Case for foreign values.
* libguile/goops.c (scm_class_of, create_standard_classes): Add a class
for foreign values.
* libguile/evalext.c (scm_self_evaluating_p): Add case for foreign
values.
* libguile/Makefile.am: Add foreign.[ch] to the build.
They made Sun C 5.8 emit a warning such as:
line 71: warning: dead part of constant expression is nonconstant
* libguile/print.c (scm_print_opts): Don't use `SCM_UNPACK ()' here.
* libguile/read.c (scm_read_opts): Likewise.