* libguile/strports.c (scm_i_mkstrport): Remove.
(scm_mkstrport): Don't change the port's encoding to UTF-8; convert
STR to the default port encoding.
(scm_strport_to_string): Fix documentation & indentation.
* libguile/strports.h (scm_i_mkstrport): Remove.
* test-suite/lib.scm (exception:encoding-error): New variable.
(format-test-name): Set `%default-port-encoding' to "UTF-8".
* test-suite/tests/ports.test ("string ports")["%default-port-encoding
is honored", "suitable encoding [latin-1]", "suitable encoding
[latin-3]", "wrong encoding"]: New tests.
* test-suite/tests/r6rs-ports.test ("7.2.11 Binary
Output")["put-bytevector with UTF-16 string port", "put-bytevector
with wrong-encoding string port"]: New tests.
* test-suite/tests/reader.test (read-string): Set
`%default-port-encoding' to `#f'.
("reading")["unprintable symbol"]: Use a string that doesn't contain
zeros.
* doc/ref/api-io.texi (String Ports): Document encoding issues with
`call-with-output-string' and `with-output-to-string'.
Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
scm_eval_string_in_module): New prototypes.
* strports.c (scm_eval_string_in_module): New, but use
"eval-string" as the Scheme name and make second parameter
optional.
(scm_eval_string): Implement using scm_eval_string_in_module.
(scm_c_eval_string_in_module): New.
Thanks to Ralf Mattes for the suggestion!
added append docs from R4RS.
* strings.c: Docstring typo fix, + eliminate unneeded IMP tests.
Thanks Dirk Hermann!
* chars.h: Provide SCM_CHARP, SCM_CHAR, SCM_MAKE_CHAR and
deprecate SCM_ICHRP, SCM_ICHR, SCM_MAKICHR. Thanks Dirk Hermann!
* *.h, *.c: Use SCM_CHARP, SCM_CHAR, SCM_MAKE_CHAR throughout.
Drop use of SCM_P for function prototypes... assume an ANSI C
compiler. Thanks Dirk Hermann!
* strports.c (scm_strprint_obj): simplify. start with initial
buffer size of 0.
(st_seek): don't allow string to be extended if seeking past
the end of a read-only port.
1999-07-12 Gary Houston <ghouston@easynet.co.uk>
* strports.c (st_seek): change the resize checks.
* ports.c (scm_ftruncate): throw error if offset works out negative.
* strports.c (st_flush): increase string size in blocks of
SCM_WRITE_BLOCK instead of 1. set read_end to read_pos if
it's greater and reset read_buf_size.
(scm_mkstrport): set rw_randow if only writing, since read_buf needs
to be maintained for output ports too (it holds the written
part of the string, while write_buf may have unwritten buffer
chars.)
(st_truncate): rewritten.
(top of file): added a few notes.
1999-07-06 Gary Houston <ghouston@easynet.co.uk>
* strports.c (st_grow_port): set pt->read_pos. set
pt->read_buf_size one less than pt->write_buf_size if there's
an unwritten char at the end of the string. similarly for
pt->read_end.
(st_resize_port): renamed from st_grow_port.
(st_seek): simplify by assuming that pt->write_pos == pt->read_pos.
seek from read_end instead of write_end for SEEK_END.
(st_ftruncate): calculate current length using readbuf, not write
buf.
(scm_strport_to_string): use read_buf_size for length.
(stfill_buffer): don't re-initialise the readbuf.
1999-07-05 Gary Houston <ghouston@easynet.co.uk>
* strports.c (scm_strport_to_string): new procedure.
(scm_call_with_output_string, scm_strprint_obj): use
scm_strport_to_string.
used SCM_INUM0 instead of SCM_MAKINUM (0) in a few places.
main #include path; put most of them in a subdirectory called
'libguile'. This avoids naming conflicts between Guile header
files and system header files (of which there were a few).
* Makefile.in (pkgincludedir): Deleted.
(innerincludedir): New variable; this and $(includedir) are enough.
(INCLUDE_CFLAGS): Search for headers in "-I$(srcdir)/..".
(installed_h_files): Divide this up. Now this variable lists
those header files which should go into $(includedir) (i.e. appear
directly in the #include path), and ...
(inner_h_files): ... this new variable says which files appear in
a subdirectory, and are referred to as <libguile/mumble.h>.
(h_files): List them both.
(install): Create innerincludedir, not pkgincludedir. Put
the installed_h_files and inner_h_files in their proper places.
(uninstall): Corresponding changes.
* alist.h, append.h, arbiters.h, async.h, boolean.h, chars.h,
continuations.h, debug.h, dynwind.h, error.h, eval.h, fdsocket.h,
feature.h, fports.h, gc.h, genio.h, gsubr.h, hash.h, init.h,
ioext.h, kw.h, libguile.h, list.h, markers.h, marksweep.h,
mbstrings.h, numbers.h, options.h, pairs.h, ports.h, posix.h,
print.h, procprop.h, procs.h, ramap.h, read.h, root.h,
sequences.h, smob.h, socket.h, srcprop.h, stackchk.h, stime.h,
strings.h, strop.h, strorder.h, strports.h, struct.h, symbols.h,
tag.h, throw.h, unif.h, variable.h, vectors.h, version.h,
vports.h, weaks.h: Find __scm.h in its new location.
* __scm.h: Find scmconfig.h and tags.h in their new locations
(they're both "inner" files).