This reverts the change to SCM_MAKE_CHAR made in the previous commit
63818453ad, which used an arithmetic trick
to avoid evaluating its argument more than once.
Here, we restore the previous implementation of SCM_MAKE_CHAR, which
evaluates its argument twice. Instead, we introduce a new inlinable
function 'scm_c_make_char' and replace uses of SCM_MAKE_CHAR with calls
to 'scm_c_make_char' where appropriate.
* libguile/chars.h (scm_c_make_char): New inline function.
* libguile/inline.c: Include chars.h.
* libguile/srfi-13.c (REF_IN_CHARSET, scm_string_any, scm_string_every)
(scm_string_trim, scm_string_trim_right, scm_string_trim_both)
(scm_string_index, scm_string_index_right, scm_string_skip)
(scm_string_skip_right, scm_string_count, string_titlecase_x)
(string_reverse_x, scm_string_fold, scm_string_fold_right)
(scm_string_for_each, scm_string_filter, scm_string_delete):
Use 'scm_c_make_char' instead of 'SCM_MAKE_CHAR' in cases where the
argument calls a function.
* libguile/chars.c (scm_char_upcase, scm_char_downcase, scm_char_titlecase),
libguile/ports.c (scm_port_decode_char),
libguile/print.c (scm_simple_format),
libguile/read.c (scm_read_character),
libguile/strings.c (scm_string_ref, scm_c_string_ref),
The motivation for this change is that SCM_MAKE_CHAR is sometimes passed
an expression that involves a procedure call that is not always trivial.
In other cases, the results are not guaranteed to be the same both
times, which could lead to the creation of invalid SCM objects.
* libguile/chars.h (SCM_MAKE_CHAR): Reimplement.
As the FSF advises, 'There is no legal significance to using the
three-character sequence “(C)”, but it does no harm.' It does take up
space though! For that reason, we remove it here from our C files.
* libguile/chars.c, libguile/chars.h (scm_char_general_category): New function.
* test-suite/tests/chars.test: Unit tests for `char-general-category'.
* doc/ref/api-data.texi (Characters): Documentation for
`char-general-category'.
* doc/ref/api-data.texi (Characters): Documentation for `char-titlecase'.
* doc/ref/api-i18n.texi (Character Case Mapping): Documentation for
`char-locale-titlecase' and `string-locale-titlecase'.
* libguile/chars.c, libguile/chars.h (scm_char_titlecase, scm_c_titlecase): New
functions.
* libguile/i18n.c, libguile/i18n.h (chr_to_case, scm_char_locale_titlecase,
str_to_case, scm_string_locale_titlecase): New functions.
* libguile/i18n.c (scm_char_locale_downcase, scm_char_locale_upcase,
scm_string_locale_downcase, scm_string_locale_upcase): Refactor to share code
via chr_to_case and str_to_case, as appropriate.
* module/ice-9/i18n.scm (char-locale-title-case, string-locale-titlecase): New
functions.
* libguile/srfi-13.c (string_titlecase_x): Use uc_totitle instead of uc_toupper.
* test-suite/tests/chars.test: Tests for `char-titlecase'.
* test-suite/tests/i18n.test: Tests for `char-locale-titlecase' and
`string-locale-titlecase'.
* test-suite/tests/srfi-13.test: Tests for `string-titlecase'.
Since combining characters, such as accents, modify the appearance of the
previous letter, it looks awkward in its character literal form (#\name)
since it modified the backslash. This instead prints the combining
character on a small circle.
* libguile/chars.h (SCM_CODEPOINT_DOTTED_CIRCLE): new #define
* libguile/print.c (iprint1): print combining characters on dotted circles
* libguile/read.c (scm_read_character): parse the combination of combining
characters and dotted circles
Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
Since the contents of SCM_MAKE_CHAR are evaluated more than once,
don't use it in situations where this could cause side-effects.
* libguile/vm-i-system.c (make-char8): avoid side-effects with
SCM_MAKE_CHAR call
* libguile/chars.h (SCM_MAKE_CHAR): modified
This adds the 32-bit standalone characters. Strings are still
8-bit. Characters larger than 8-bit can only be entered or
displayed in octal format at this point. At this point, the
terminal's display encoding is expected to be Latin-1.
* module/language/assembly/compile-bytecode.scm (write-bytecode):
add 32-bit char
* module/language/assembly.scm (object->assembly): add 32-bit char
(assembly->object): add 32-bit char
* libguile/vm-i-system.c (make-char32): new op
* libguile/print.c (iprin1): print 32-bit char
* libguile/numbers.h: add type scm_t_wchar
* libguile/numbers.c: add type scm_t_wchar
* libguile/chars.h: new type scm_t_wchar
(SCM_CODEPOINT_MAX): new
(SCM_IS_UNICODE_CHAR): new
(SCM_MAKE_CHAR): operate on 32-bit char
* libguile/chars.c: comparison operators now use Unicode
codepoints
(scm_c_upcase): now receives and returns scm_t_wchar
(scm_c_downcase): now receives and returns scm_t_wchar
The global variables scm_charnames and scm_charnums are replaced with
the accessor functions scm_i_charname and scm_i_charname_to_num.
Also, the incomplete and broken EBCDIC support is removed.
* libguile/print.c (iprin1): use new func scm_i_charname
* libguile/read.c (scm_read_character): use new func
scm_i_charname_to_num
* libguile/chars.c (scm_i_charname): new function
(scm_i_charname_to_char): new function
(scm_charnames, scm_charnums): removed
* libguile/chars.h: new declarations
before converting to unsigned int. This prevents hi-bit ascii as
being converted large integers.
(string_upcase_x): change caller for scm_{up,down}case to
scm_c_{up,down}case
* chars.h (scm_init_chars): change scm_{upcase,downcase} to
scm_c_{up,down}case.
(SCM_MAKE_CHAR): add (unsigned char) cast. This prevents havoc
when hi-bit ASCII is subjected to SCM_MAKE_CHAR().
added append docs from R4RS.
* strings.c: Docstring typo fix, + eliminate unneeded IMP tests.
Thanks Dirk Hermann!
* chars.h: Provide SCM_CHARP, SCM_CHAR, SCM_MAKE_CHAR and
deprecate SCM_ICHRP, SCM_ICHR, SCM_MAKICHR. Thanks Dirk Hermann!
* *.h, *.c: Use SCM_CHARP, SCM_CHAR, SCM_MAKE_CHAR throughout.
Drop use of SCM_P for function prototypes... assume an ANSI C
compiler. Thanks Dirk Hermann!
main #include path; put most of them in a subdirectory called
'libguile'. This avoids naming conflicts between Guile header
files and system header files (of which there were a few).
* Makefile.in (pkgincludedir): Deleted.
(innerincludedir): New variable; this and $(includedir) are enough.
(INCLUDE_CFLAGS): Search for headers in "-I$(srcdir)/..".
(installed_h_files): Divide this up. Now this variable lists
those header files which should go into $(includedir) (i.e. appear
directly in the #include path), and ...
(inner_h_files): ... this new variable says which files appear in
a subdirectory, and are referred to as <libguile/mumble.h>.
(h_files): List them both.
(install): Create innerincludedir, not pkgincludedir. Put
the installed_h_files and inner_h_files in their proper places.
(uninstall): Corresponding changes.
* alist.h, append.h, arbiters.h, async.h, boolean.h, chars.h,
continuations.h, debug.h, dynwind.h, error.h, eval.h, fdsocket.h,
feature.h, fports.h, gc.h, genio.h, gsubr.h, hash.h, init.h,
ioext.h, kw.h, libguile.h, list.h, markers.h, marksweep.h,
mbstrings.h, numbers.h, options.h, pairs.h, ports.h, posix.h,
print.h, procprop.h, procs.h, ramap.h, read.h, root.h,
sequences.h, smob.h, socket.h, srcprop.h, stackchk.h, stime.h,
strings.h, strop.h, strorder.h, strports.h, struct.h, symbols.h,
tag.h, throw.h, unif.h, variable.h, vectors.h, version.h,
vports.h, weaks.h: Find __scm.h in its new location.
* __scm.h: Find scmconfig.h and tags.h in their new locations
(they're both "inner" files).