Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
* libguile/hash.c (scm_i_string_hash): new function
(scm_hasher): don't unpack string: use scm_i_string_hash
* libguile/hash.h: new declaration for scm_i_string_hash
* libguile/print.c (quote_keywordish_symbol): use symbol accessors
(scm_i_print_symbol_name): new function
(scm_print_symbol_name): call scm_i_print_symbol_name
(iprin1): use scm_i_print_symbol_name to print symbols
* libguile/print.h: new declaration for scm_i_print_symbol_name
* libguile/symbols.c (lookup_interned_symbol): now takes scheme string
instead of c string; callers changed
(lookup_interned_symbol): add wide symbol support
(scm_i_c_mem2symbol): removed
(scm_i_mem2symbol): removed and replaced with scm_i_str2symbol
(scm_i_str2symbol): new function
(scm_i_mem2uninterned_symbol): removed and replaced with
scm_i_str2uninterned_symbol
(scm_i_str2uninterned_symbol): new function
(scm_make_symbol, scm_string_to_symbol, scm_from_locale_symbol)
(scm_from_locale_symboln): use scm_i_str2symbol
* test-suite/tests/symbols.test: new tests
* libguile/tags.h (scm_tc7_program):
* libguile/programs.h: Programs now have their own tc7 code. Fix up the
macros appropriately.
* libguile/programs.c: Remove smobby bits, leaving marking, printing,
and application for other parts of Guile.
* libguile/debug.c (scm_procedure_source):
* libguile/eval.c (scm_trampoline_0, scm_trampoline_1)
(scm_trampoline_2): Add cases for tc7_program.
* libguile/eval.i.c (CEVAL, SCM_APPLY):
* libguile/evalext.c (scm_self_evaluating_p):
* libguile/gc-card.c (scm_i_sweep_card, scm_i_tag_name):
* libguile/gc-mark.c (1):
* libguile/print.c (iprin1):
* libguile/procs.c (scm_procedure_p, scm_thunk_p)
* libguile/vm-i-system.c (make-closure): Adapt to new procedure
representation.
* libguile/procprop.c (scm_i_procedure_arity): Do the right thing for
programs.
* test-suite/tests/procprop.test ("procedure-arity"): Arity test now
succeeds.
* libguile/goops.c (scm_class_of): Programs now belong to the class
<procedure>, not a smob class.
* libguile/vm.h (struct vm, struct vm_cont):
* libguile/vm-engine.c (vm_engine):
* libguile/frames.h (SCM_FRAME_BYTE_CAST, struct vm_frame):
* libguile/frames.c (scm_c_make_vm_frame): Fix usages of scm_byte_t,
changing them to scm_t_uint8.
This requres the creation of a new type
scm_t_string_failed_conversion_handler to replace libunistring's
enum iconveh_ilseq_handler.
* libguile/strings.h: don't include <uniconv.h>
(scm_t_string_failed_conversion_handler): new enum type
(SCM_FAILED_CONVERSION_ERROR, SCM_FAILED_CONVERSION_QUESTION_MARK):
(SCM_FAILED_CONVERSION_ESCAPE_SEQUENCE): new enum type values
* libguile/strings.c (scm_to_stringn): now takes type
scm_t_string_failed_conversion_handler. All callers changed.
* libguile/print.c: include <uniconv.h>
* libguile/ports.c (scm_lfwrite_substr): use
scm_t_string_conversion_handler's constants
* libguile/gen-scmconfig.c (SCM_ICONVEH_ERROR):
(SCM_ICONVEH_QUESTION_MARK, SCM_ICONVEH_ESCAPE_SEQUENCE): store
iconveh_ilseq_hander constants as #define's
Also, scm_charprint is renamed to scm_i_charprint.
* libguile/strings.h: make scm_i_string_wide_chars internal.
* libguile/print.h: rename scm_charprint to scm_i_charprint. Make
internal.
* libguile/print.c (scm_i_charprint): renamed from scm_charprint
(scm_charprint): renamed to scm_i_charprint. All callers changed.
This adds full Unicode strings as a datatype, and it adds some
minimal functionality. The terminal and port encoding is assumed
to be ISO-8859-1. Non-ISO-8859-1 characters are written or
input as string character escapes.
The string character escapes now have 3 forms: \xXX \uXXXX and
\UXXXXXX, for unprintable characters that have 2, 4 or 6 hex digits.
The process for writing to strings has been modified. There is now a
function scm_i_string_start_writing that does the copy-on-write
conversion if necessary.
To compile strings that may be wide, the VM storage of strings and
string-likes has changed.
Most string-using functions have not yet been updated and may break
when used with wide strings.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
use variable width string bytecode format
* module/language/assembly.scm (byte-length): use variable width
bytecode format
* libguile/vm-i-loader.c (load-string, load-symbol):
(load-keyword, define): use variable-width bytecode format
* libguile/vm-engine.h (FETCH_WIDTH): new macro
* libguile/strings.h: new declarations
* libguile/strings.c (make_wide_stringbuf): new function
(widen_stringbuf): new function
(scm_i_make_wide_string): new function
(scm_i_is_narrow_string): new function
(scm_i_string_wide_chars): new function
(scm_i_string_start_writing): new function
(scm_i_string_ref): new function
(scm_i_string_set_x): new function
(scm_i_is_narrow_symbol): new function
(scm_i_symbol_wide_chars, scm_i_symbol_ref): new function
(scm_string_width): new function
(unistring_escapes_to_guile_escapes): new function
(scm_to_stringn): new function
(scm_i_stringbuf_free): modify for wide strings
(scm_i_substring_copy): modify for wide strings
(scm_i_string_chars, scm_string_append): modify for wide strings
(scm_i_make_symbol, scm_to_locale_stringn): modify for wide strings
(scm_string_dump, scm_symbol_dump, scm_to_locale_stringbuf):
(scm_string, scm_i_deprecated_string_chars): modify for wide strings
(scm_from_locale_string, scm_from_locale_stringn): add null test
* libguile/srfi-13.c: add calls for scm_i_string_start_writing for
each call of scm_i_string_stop_writing
(scm_string_for_each): modify for wide strings
* libguile/socket.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/rw.c: add calls for scm_i_string_start_writing for each
call of scm_i_string_stop_writing
* libguile/read.c (scm_read_string): allow reading of wide strings
* libguile/print.h: add declaration for scm_charprint
* libguile/print.c (iprin1): print wide strings and add new string
escapes
(scm_charprint): new function
* libguile/ports.h: new declarations for scm_lfwrite_substr and
scm_lfwrite_str
* libguile/ports.c (update_port_lf): new function
(scm_lfwrite): use update_port_lf
(scm_lfwrite_substr): new function
(scm_lfwrite_str): new function
* test-suite/tests/asm-to-bytecode.test ("compiler"): add string
width byte to sting-like asm tests
This adds the 32-bit standalone characters. Strings are still
8-bit. Characters larger than 8-bit can only be entered or
displayed in octal format at this point. At this point, the
terminal's display encoding is expected to be Latin-1.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
add 32-bit char
* module/language/assembly.scm (object->assembly): add 32-bit char
(assembly->object): add 32-bit char
* libguile/vm-i-system.c (make-char32): new op
* libguile/print.c (iprin1): print 32-bit char
* libguile/numbers.h: add type scm_t_wchar
* libguile/numbers.c: add type scm_t_wchar
* libguile/chars.h: new type scm_t_wchar
(SCM_CODEPOINT_MAX): new
(SCM_IS_UNICODE_CHAR): new
(SCM_MAKE_CHAR): operate on 32-bit char
* libguile/chars.c: comparison operators now use Unicode
codepoints
(scm_c_upcase): now receives and returns scm_t_wchar
(scm_c_downcase): now receives and returns scm_t_wchar
The global variables scm_charnames and scm_charnums are replaced with
the accessor functions scm_i_charname and scm_i_charname_to_num.
Also, the incomplete and broken EBCDIC support is removed.
* libguile/print.c (iprin1): use new func scm_i_charname
* libguile/read.c (scm_read_character): use new func
scm_i_charname_to_num
* libguile/chars.c (scm_i_charname): new function
(scm_i_charname_to_char): new function
(scm_charnames, scm_charnums): removed
* libguile/chars.h: new declarations
The idea is to introduce `gsubrs' whose arity is encoded in their type
(more precisely in the sizeof (void *) - 8 MSBs). This removes the
indirection introduced by cclos and simplifies the code.
* libguile/__scm.h (CCLO): Remove.
* libguile/debug.c (scm_procedure_source, scm_procedure_environment):
Remove references to `scm_tc7_cclo'.
* libguile/eval.c (scm_trampoline_0, scm_trampoline_1,
scm_trampoline_2): Replace `scm_tc7_cclo' with `scm_tc7_gsubr'.
* libguile/eval.i.c (CEVAL): Likewise. No longer make PROC the first
argument. Directly invoke `scm_gsubr_apply ()' instead of jump to the
`evap(N+1)' label or call to `SCM_APPLY ()'.
* libguile/evalext.c (scm_self_evaluating_p): Remove reference to
`scm_tc7_cclo'.
* libguile/gc-card.c (scm_i_sweep_card, scm_i_tag_name): Likewise.
* libguile/gc-mark.c (scm_gc_mark_dependencies): Likewise.
* libguile/goops.c (scm_class_of): Likewise.
* libguile/print.c (iprin1): Likewise.
* libguile/gsubr.c (create_gsubr): Use `unsigned int's for REQ, OPT and
RST. Use `scm_tc7_gsubr' instead of `scm_makcclo ()' in the default
case.
(scm_gsubr_apply): Remove calls to `SCM_GSUBR_PROC ()'.
(scm_f_gsubr_apply): Remove.
* libguile/gsubr.h (SCM_GSUBR_TYPE): New definition.
(SCM_GSUBR_MAX): Changed to 33.
(SCM_SET_GSUBR_TYPE, SCM_GSUBR_PROC, SCM_SET_GSUBR_PROC,
scm_f_gsubr_apply): Remove.
* libguile/procprop.c (scm_i_procedure_arity): Remove reference to
`scm_tc7_cclo'; add proper handling of `scm_tc7_gsubr'.
* libguile/procs.c (scm_makcclo, scm_make_cclo): Remove.
(scm_procedure_p): Remove reference to `scm_tc7_cclo'.
(scm_thunk_p): Likewise, plus add proper `scm_tc7_gsubr' handling.
* libguile/procs.h (SCM_CCLO_LENGTH, SCM_MAKE_CCLO_TAG,
SCM_SET_CCLO_LENGTH, SCM_CCLO_BASE, SCM_SET_CCLO_BASE, SCM_CCLO_REF,
SCM_CCLO_SET, SCM_CCLO_SUBR, SCM_SET_CCLO_SUBR, scm_makcclo,
scm_make_cclo): Remove.
* libguile/stacks.c (read_frames): Remove reference to `scm_f_gsubr_apply'.
* libguile/tags.h (scm_tc7_cclo): Remove.
(scm_tc7_gsubr): New.
(scm_tcs_subrs): Add `scm_tc7_gsubr'.
* libguile/print.c (iprin1): When displaying a weak vector, access
elements via `scm_c_vector_ref ()', not via the macro.
git-archimport-id: lcourtes@laas.fr--2005-libre/guile-core--boehm-gc--1.9--patch-14
the value at its top. This fixes a reference leak.
(PUSH_REF): Perform `pstate->top++' after calling
`PSTATE_STACK_SET ()' in order to avoid undesired potential side
effects.
New.
(sym_reader): New.
(scm_print_opts): Added "quote-keywordish-symbols" option.
(quote_keywordish_symbol): New, for evaluating the option.
(scm_print_symbol_name): Use it.
(scm_init_print): Initialize new option to sym_reader.
Removed ref_stack field.
(PSTATE_STACK_REF, PSTATE_STACK_SET): New, for accessing the stack
of a print state. Use them everywhere instead of ref_stack.
print.c, ports.c, mallocs.c, hooks.c, hashtab.c, fports.c,
guardians.c, filesys.c, coop-pthreads.c, continuations.c: Use
scm_uintprint to print unsigned integers, raw heap words, and
adresses, using a cast to scm_t_bits to turn pointers into
integers.
SCM_PRINT_HIGHLIGHT_SUFFIX): New printer options.
(scm_iprin1): Use them instead of the previoulsy hardcoded
strings.
(scm_init_print): Initialize them.
* print.c (make_print_state, scm_free_print_state): Initialize it
to SCM_EOL.
(scm_iprin1): Wrap output in '{...}' when object is contained in
highlight_objects.
(SCM_VALIDATE_STRING_COPY): Deprecated. Replaced all uses with
SCM_VALIDATE_STRING plus SCM_I_STRING_CHARS or
scm_to_locale_string, etc.
(SCM_VALIDATE_SUBSTRING_SPEC_COPY): Deprecated. Replaced as
above, plus scm_i_get_substring_spec.
* regex-posix.c, read.c, random.c, ramap.c, print.c, numbers.c,
hash.c, gc.c, gc-card.c, convert.i.c, backtrace.c, strop.c,
strorder.c, strports.c, struct.c, symbols.c, unif.c, ports.c: Use
SCM_I_STRING_CHARS, SCM_I_STRING_UCHARS, and SCM_I_STRING_LENGTH
instead of SCM_STRING_CHARS, SCM_STRING_UCHARS, and
SCM_STRING_LENGTH, respectively. Also, replaced scm_return_first
with more explicit scm_remember_upto_here_1, etc, or introduced
them in the first place.
SCM_INUM): Deprecated by reenaming them to SCM_I_INUMP, SCM_I_NINUMP
and SCM_I_INUM, respectively and adding deprecated versions to
deprecated.h and deprecated.c. Changed all uses to either use the
SCM_I_ variants or scm_is_*, scm_to_*, or scm_from_*, as appropriate.
SCM_NEGATE_BOOL, SCM_BOOLP): Deprecated by moving into "deprecated.h".
Replaced all uses with scm_is_false, scm_is_true, scm_from_bool, and
scm_is_bool, respectively.
scm_i_unmemoize_expr for unmemoizing a memoized object holding a
single memoized expression.
* debug.c (memoized_print): Don't try to unmemoize the memoized
object, since we can't know whether it holds a single expression
or a body.
(scm_mem_to_proc): Removed check for lambda expression, since it
was moot anyway. Whoever uses these functions for debugging
purposes should know what they do: Creating invalid memoized code
will cause crashes, independent of whether this check is present
or not.
(scm_proc_to_mem): Take the closure's code as it is and don't
append a SCM_IM_LAMBDA isym. To allow easier debugging, the
memoized code should not be modified.
* debug.[ch] (scm_unmemoize, scm_i_unmemoize_expr): Removed
scm_unmemoize from public use, but made scm_i_unmemoize_expr
available as a guile internal function instead. However,
scm_i_unmemoize_expr will only work on memoized objects that hold
a single memoized expression. It won't work with bodies.
* debug.c (scm_procedure_source), macros.c (macro_print), print.c
(scm_iprin1): Call scm_i_unmemocopy_body for unmemoizing a body,
i. e. a list of expressions.
* eval.c (unmemoize_exprs): Drop internal body markers from the
output during unmemoization.
* eval.[ch] (scm_unmemocopy, scm_i_unmemocopy_expr,
scm_i_unmemocopy_body): Removed scm_unmemocopy from public use,
but made scm_i_unmemocopy_expr and scm_i_unmemocopy_body available
as guile internal functions instead. scm_i_unmemoize_expr will
only work on a single memoized expression, while
scm_i_unmemocopy_body will only work on bodies.
* deprecated.h (SCM_IFRINC, SCM_ICDR, SCM_IFRAME, SCM_IDIST,
SCM_ICDRP), eval.c (SCM_IFRINC, SCM_ICDR, SCM_IFRAME, SCM_IDIST,
SCM_ICDRP), eval.h (SCM_ICDR, SCM_IFRINC, SCM_IFRAME, SCM_IDIST,
SCM_ICDRP): Deprecated and added to deprecated.h. Moved from
eval.h to eval.c.
* deprecated.c (scm_isymnames), deprecated.h (scm_isymnames,
SCM_ISYMNUM, SCM_ISYMCHARS), eval.c (SCM_ISYMNUM, isymnames,
scm_unmemocopy, CEVAL), print.c (scm_isymnames), tags.h
(SCM_ISYMNUM, scm_isymnames, SCM_ISYMCHARS): Deprecated
scm_isymnames, SCM_ISYMNUM and SCM_ISYMCHARS and added to
deprecated.[hc]. Moved scm_isymnames from print.c to eval.c and
renamed to isymnames. Moved SCM_ISYMNUM from tags.h to eval.c and
renamed to ISYMNUM.
* eval.c (scm_i_print_iloc, scm_i_print_isym), eval.h
(scm_i_print_iloc, scm_i_print_isym), print.c (scm_iprin1):
Extracted printing of ilocs and isyms to guile internal functions
scm_i_print_iloc, scm_i_print_isym of eval.c.