Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
* libguile/goops.c (scm_make_extended_class_from_symbol): new function
(scm_class_of): don't unpack symbol chars
(wrap_init): don't unpack symbol chars
(make_class_from_symbol): new function
(make_struct_class): don't unpack symbol chars
* libguile/socket.c (scm_recv): receive the message without holding the
stringbuf writing lock
(scm_send): try to narrow a string before using it
* libguile/stime.c (strftime): convert string to UTF-8 so that it can
be safely passed to strftime
(strptime): convert input string to UTF-8 so that it can be safely
passed through strptime
* libguile/strings.c (narrow_stringbuf): new function
(scm_i_try_narrow_string): new function
* libguile/strings.h: new declaration for scm_i_try_narrow_string
* libguile/struct.c (scm_make_struct_layout, scm_struct_init)
(scm_struct_vtable_p, scm_struct_ref, scm_struct_set_x): use string
and symbol accessors and avoid unpacking strings and symbols
* libguile/throw.c (scm_ithrow): allow wide symbols in the error message
* libguile/unif.c (scm_enclose_array, scm_istr2bve): use string
accessors and avoid unpacking strings
Problem was that if an application includes both libguile.h and the
system's setjmp.h, and is compiled on IA64, it gets compile errors
because of jmp_buf, setjmp and longjmp being multiply defined.
* libguile/__scm.h (__ia64__): Define scm_i_jmp_buf, SCM_I_SETJMP and
SCM_I_LONGJMP instead of jmp_buf, setjmp and longjmp.
(all other platforms): Map scm_i_jmp_buf, SCM_I_SETJMP and
SCM_I_LONGJMP to jmp_buf, setjmp and longjmp.
* libguile/continuations.c (scm_make_continuation): Use `SCM_I_SETJMP'
instead of `setjmp'.
(copy_stack_and_call): Use `SCM_I_LONJMP' instead of `longjmp'.
(scm_ia64_longjmp): Use type `scm_i_jmp_buf' instead of `jmp_buf'.
* libguile/continuations.h (scm_t_contregs): Use type `scm_i_jmp_buf'
instead of `jmp_buf'.
* libguile/threads.c (suspend): Use `SCM_I_SETJMP' instead of
`setjmp'.
* libguile/threads.h (scm_i_thread): Use type `scm_i_jmp_buf' instead
of `jmp_buf'.
* libguile/throw.c (JBJMPBUF, make_jmpbuf, jmp_buf_and_retval): Use
type `scm_i_jmp_buf' instead of `jmp_buf'.
(scm_c_catch): Use `SCM_I_SETJMP' instead of `setjmp'.
(scm_ithrow): Use `SCM_I_LONGJMP' instead of `longjmp'.
* libguile/srcprop.c (scm_set_source_properties_x): Look for the special
source properties, save them off, and then construct a srcprops object
using them.
As with the previous commit, this is to avoid any suggestion that
the source properties API uses the property list format, i.e.
(key1 value1 key2 value2 ...).
Also remove scm_srcprops_to_plist () from the API. It doesn't have any
external usefulness and has never documented.
* libguile/numbers.c (scm_i_print_fraction): use string accessors
(XDIGIT2UINT): use libunistring function
(mem2uinteger, mem2integer, mem2decimal_from_point, mem2ureal)
(mem2complex): take scheme string instead of c string; use accessors
(scm_i_string_to_number): new function
(scm_c_locale_string_to_number): use scm_i_string_to_number
* libguile/numbers.h: declaration for scm_i_string_to_number
* libguile/strings.c (scm_i_string_strcmp): new function
* libguile/strings.h: declaration for scm_i_string_strcmp
* libguile/hash.c (scm_i_string_hash): new function
(scm_hasher): don't unpack string: use scm_i_string_hash
* libguile/hash.h: new declaration for scm_i_string_hash
* libguile/print.c (quote_keywordish_symbol): use symbol accessors
(scm_i_print_symbol_name): new function
(scm_print_symbol_name): call scm_i_print_symbol_name
(iprin1): use scm_i_print_symbol_name to print symbols
* libguile/print.h: new declaration for scm_i_print_symbol_name
* libguile/symbols.c (lookup_interned_symbol): now takes scheme string
instead of c string; callers changed
(lookup_interned_symbol): add wide symbol support
(scm_i_c_mem2symbol): removed
(scm_i_mem2symbol): removed and replaced with scm_i_str2symbol
(scm_i_str2symbol): new function
(scm_i_mem2uninterned_symbol): removed and replaced with
scm_i_str2uninterned_symbol
(scm_i_str2uninterned_symbol): new function
(scm_make_symbol, scm_string_to_symbol, scm_from_locale_symbol)
(scm_from_locale_symboln): use scm_i_str2symbol
* test-suite/tests/symbols.test: new tests
The symbol's characters are only accessed in case they are needed
for an error message. This can be avoided by passing the symbol
all the way to a error message function.
* libguile/__scm.h (SCM_WTA_DISPATCH_1_SUBR): new macro
* libguile/error.c (scm_i_wrong_type_arg_symbol): new error function
* libguile/error.h: declaration of scm_i_wrong_type_arg_symbol
* libguile/eval.c (call_dsubr_1): use new macro SCM_WTA_DISPATCH_1_SUBR
to avoid having to unpack the symbol's chars
* libguile/eval.i.c: use new macro SCM_WTA_DISPATCH_1_SUBR
* libguile/deprecated.c (intern_obarray_soft): new function
(scm_intern_obarray_soft, scm_string_to_obarray_symbol): use
intern_obarray_soft
(scm_gentemp): don't unpack string chars, use intern_obarray_soft
* libguile/discouraged.c (scm_make_keyword_from_dash_symbol): use
symbol accessor
* libguile/tags.h (scm_tc7_program):
* libguile/programs.h: Programs now have their own tc7 code. Fix up the
macros appropriately.
* libguile/programs.c: Remove smobby bits, leaving marking, printing,
and application for other parts of Guile.
* libguile/debug.c (scm_procedure_source):
* libguile/eval.c (scm_trampoline_0, scm_trampoline_1)
(scm_trampoline_2): Add cases for tc7_program.
* libguile/eval.i.c (CEVAL, SCM_APPLY):
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/gc-card.c (scm_i_sweep_card, scm_i_tag_name):
* libguile/gc-mark.c (1):
* libguile/print.c (iprin1):
* libguile/procs.c (scm_procedure_p, scm_thunk_p)
* libguile/vm-i-system.c (make-closure): Adapt to new procedure
representation.
* libguile/procprop.c (scm_i_procedure_arity): Do the right thing for
programs.
* test-suite/tests/procprop.test ("procedure-arity"): Arity test now
succeeds.
* libguile/goops.c (scm_class_of): Programs now belong to the class
<procedure>, not a smob class.
* libguile/vm.h (struct vm, struct vm_cont):
* libguile/vm-engine.c (vm_engine):
* libguile/frames.h (SCM_FRAME_BYTE_CAST, struct vm_frame):
* libguile/frames.c (scm_c_make_vm_frame): Fix usages of scm_byte_t,
changing them to scm_t_uint8.
* libguile/_scm.h (SCM_OBJCODE_MINOR_VERSION): Bump
* libguile/vm-engine.c (vm_engine): Push a frame corresponding to the
mv-call.
* libguile/vm-i-system.c: Renumber ops.
(new-frame): New op, pushes a frame.
(call, mv-call): No need to shuffle args, though we do need to pop the
frame in the non-vm call case.
(goto/args): Inconsequential tweaks.
(call/cc): Push a frame if needed.
* module/language/tree-il/compile-glil.scm (flatten): Emit `new-frame'
as appropriate.
* test-suite/tests/tree-il.test: Fix to expect new-frame.
* libguile/frames.h: Reorder the frame layout so the return address
comes below the arguments.working
(SCM_FRAME_SET_RETURN_ADDRESS, SCM_FRAME_SET_MV_RETURN_ADDRESS): New
macros.
* libguile/frames.c (scm_vm_frame_arguments): Use the macros to access
the arguments.
* libguile/vm-engine.c (vm_engine): Fix for new calling convention.
* libguile/vm-engine.h (INIT_FRAME): New macro. Does part of what
NEW_FRAME used to do.
* libguile/vm-i-system.c (call, mv-call): Shuffle args up to make room
for the stack, and adapt to new calling convention.
(goto/args): Shuffling down is easier now.
(return, return/args): Adapt to new frame layout.
* libguile/vm.c (vm_mark_stack): Adapt to new frame layout, and the
possibility of there being crap on the stack.
(really_make_boot_program): Remove extraneous comment.
* libguile/vm-i-system.c: Remove mark, list-mark, cons-mark,
vector-mark, and list-break, as they are no longer used.
(call, goto/args, mv-call): Remove bits about trampolines, which was
slower, and VM continuations, which are not used (we use Guile's
continuations as the applicable objects).
Renumber ops.
* libguile/_scm.h (SCM_OBJCODE_MINOR_VERSION): Bump.
* libguile/load.c (scm_init_load_path): Append a slash after
XDG_CACHE_HOME.
* meta/gdb-uninstalled-guile.in:
* meta/guile.in (XDG_CACHE_HOME): Export this var so we write to a cache
within the build directory. Probably we should have a GUILE_CACHE_DIR
to be more specific, though.
* Makefile.am (clean-local): Clear the cache when making clean.
Avoid possible mutex hang when scm_lfwrite_substr is used in error
message output and when an error has caused the stringbuf write
mutex to not be unlocked. scm_lfwrite_substr makes a substring:
making a substring requires that mutex.
Hopefully, all cases of non-local jumps when the stringbuf write
lock is held have been eliminated anyway, making this O.B.E.
* libguile/ports.c (scm_lfwrite_str): include functionality in this
function instead of making this a special case of scm_lfwrite_substr
Conversion from char to scm_t_wchar require an intermediate cast to
unsigned char. By changing the return type of SCM_STRINGBUF_INLINE_CHARS
to unsigned char *, doublecasts in the code can be avoided. Also,
some clarification of return types.
* libguile/strings.c (STRINGBUF_OUTLINE_CHARS)
(STRINGBUF_INLINE_CHARS): now returns unsigned char *; all callers changed.
* libguile/load.h:
* libguile/load.c (scm_sys_warn_autocompilation_enabled): New primitive,
not exported. Since `load' autocompiles now, it should warn in the
same way that the bits hardcoded into C warn.
(scm_try_autocompile): Use scm_sys_warn_autocompilation_enabled.
* module/ice-9/boot-9.scm (autocompiled-file-name): New helper.
(load): Try autocompiling the argument, if appropriate. Will
autocompile files passed on Guile's command line. `primitive-load' is
unaffected.
This should now work thanks to the changes in
28b119ee3d ("make sure all programs are
8-byte aligned"). This commit is a follow-up to
ec99fe8ecb ("Add FIXMEs about misaligned
objcode-metas.").
* libguile/objcodes.c (scm_c_make_objcode_slice): Uncomment assertion
that checks for proper alignment of PTR.
* module/language/assembly/compile-bytecode.scm (write-bytecode): Update
comment about META's alignment.
* module/language/tree-il/compile-glil.scm (compile-glil): Compute
warnings before optimizing, as unreferenced variables will be
optimized out.
* libguile/_scm.h: Fix C99 comment.
* module/language/tree-il/fix-letrec.scm (partition-vars): Also analyze
let-bound vars.
(fix-letrec!): Fix a bug whereby a set! to an unreffed var would be
called for value, not effect. Also "fix" <let>-bound lambda
expressions -- really speeds up pmatch.
* test-suite/tests/tree-il.test ("lexical sets", "the or hack"): Update
to take into account the new optimizations.
This requres the creation of a new type
scm_t_string_failed_conversion_handler to replace libunistring's
enum iconveh_ilseq_handler.
* libguile/strings.h: don't include <uniconv.h>
(scm_t_string_failed_conversion_handler): new enum type
(SCM_FAILED_CONVERSION_ERROR, SCM_FAILED_CONVERSION_QUESTION_MARK):
(SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE): new enum type values
* libguile/strings.c (scm_to_stringn): now takes type
scm_t_string_failed_conversion_handler. All callers changed.
* libguile/print.c: include <uniconv.h>
* libguile/ports.c (scm_lfwrite_substr): use
scm_t_string_conversion_handler's constants
* libguile/gen-scmconfig.c (SCM_ICONVEH_ERROR):
(SCM_ICONVEH_QUESTION_MARK, SCM_ICONVEH_ESCAPE_SEQUENCE): store
iconveh_ilseq_hander constants as #define's
* libguile/string.c (scm_string): Restores the functionality
where scm_string tests for circular lists
* test-suite/tests/strings.test: add test for circular lists
* libguile/_scm.h (SCM_OBJCODE_MINOR_VERSION): Bump.
* libguile/vm-engine.c (vm_error_bad_wide_string_length): New error
case.
* libguile/vm-i-loader.c (load-unsigned-integer, load-integer)
(load-keyword): Remove these instructions. The former two are
obsoleted by make-int64/make-uint64, the latter via make-keyword.
(load-string): Only handle narrow strings.
(load-symbol): Only handle narrow symbols. The wide case is handled
via make-symbol.
(load-wide-string): New instruction, for wide strings.
* libguile/vm-i-system.c (define): Move here from loaders.c, as now it
just takes a sym on the stack.
(make-keyword, make-symbol): New instructions.
* module/language/assembly.scm: Remove removed instructions. No more
width byte in load-string etc.
* module/language/assembly/compile-bytecode.scm (write-bytecode): Adapt
to change in instruction set.
* module/language/glil/compile-assembly.scm (glil->assembly): Compile
define by pushing the sym then emitting (define).
(dump-object): Dump narrow and wide strings differently. Use
make-keyword and make-symbol as appropriate.
* module/language/tree-il/compile-glil.scm (flatten): When compiling a
ref to a primitive (not a call), first see if the primitive is
actually bound in the root module. (That's not the case with e.g.
bytevector-u8-ref).
* module/system/xref.scm (program-callee-rev-vars): Don't parse out
"nexts".
* test-suite/tests/asm-to-bytecode.test ("compiler"): Adapt to bytecode
format change.
To avoid leaving Guile in a broken state, the conversion
from locale-dependent case modification to Unicode case modification
should be an atomic commit
* libguile/chars.c (scm_c_upcase): revert to locale-dependent
toupper and tolower
* libguile/strings.c (unistring_escapes_to_guile_escapes): cast
tolower's parameter to int
* libguile/read.c (CHAR_DOWNCASE): cast tolower's parameter to int