1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-04-30 03:40:34 +02:00

Doc updates for Unicode string escapes and port encodings

* NEWS: string and port changes

* doc/ref/api-data.texi: string escapes and string-ci

* doc/ref/api-io.texi: port encoding functions
This commit is contained in:
Michael Gran 2009-09-04 07:55:05 -07:00
parent 18d8fcd43c
commit 28cc8dac2f
3 changed files with 103 additions and 11 deletions

11
NEWS
View file

@ -10,6 +10,17 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
Changes in 1.9.3 (since the 1.9.2 prerelease):
** Ports do transcoding
Ports now have an associated character encoding, and port read/write
operations do conversion to/from locales automatically. Ports also
have an associated strategy for how to deal with locale conversion
failures. Four functions to support this: set-port-encoding!,
port-encoding, set-port-conversion-strategy!,
port-conversion-strategy.
** String and SRFI-13 functions can operate on Unicode strings
** SRFI-14 char-sets are modified for Unicode
The default char-sets are not longer locale dependent and contain

View file

@ -2690,6 +2690,14 @@ Vertical tab character (ASCII 11).
@item @nicode{\xHH}
Character code given by two hexadecimal digits. For example
@nicode{\x7f} for an ASCII DEL (127).
@item @nicode{\uHHHH}
Character code given by four hexadecimal digits. For example
@nicode{\u0100} for a capital A with macron (U+0100).
@item @nicode{\UHHHHHH}
Character code given by six hexadecimal digits. For example
@nicode{\U010402}.
@end table
@noindent
@ -3110,9 +3118,14 @@ The procedures in this section are similar to the character ordering
predicates (@pxref{Characters}), but are defined on character sequences.
The first set is specified in R5RS and has names that end in @code{?}.
The second set is specified in SRFI-13 and the names have no ending
@code{?}. The predicates ending in @code{-ci} ignore the character case
when comparing strings. @xref{Text Collation, the @code{(ice-9
The second set is specified in SRFI-13 and the names have not ending
@code{?}.
The predicates ending in @code{-ci} ignore the character case
when comparing strings. For now, case-insensitive comparison is done
using the R5RS rules, where every lower-case character that has a
single character upper-case form is converted to uppercase before
comparison. See @xref{Text Collation, the @code{(ice-9
i18n)} module}, for locale-dependent string comparison.
@rnindex string=?

View file

@ -47,7 +47,7 @@ are two interesting and powerful examples of this technique.
Ports are garbage collected in the usual way (@pxref{Memory
Management}), and will be closed at that time if not already closed.
In this case any errors occuring in the close will not be reported.
In this case any errors occurring in the close will not be reported.
Usually a program will want to explicitly close so as to be sure all
its operations have been successful. Of course if a program has
abandoned something due to an error or other condition then closing
@ -70,6 +70,18 @@ All file access uses the ``LFS'' large file support functions when
available, so files bigger than 2 Gbytes (@math{2^31} bytes) can be
read and written on a 32-bit system.
Each port has an associated character encoding that controls how bytes
read from the port are converted to characters and string and controls
how characters and strings written to the port are converted to bytes.
When ports are created, they inherit their character encoding from the
current locale, but, that can be modified after the port is created.
Each port also has an associated conversion strategy: what to do when
a Guile character can't be converted to the port's encoded character
representation for output. There are three possible strategies: to
raise an error, to replace the character with a hex escape, or to
replace the character with a substitute character.
@rnindex input-port?
@deffn {Scheme Procedure} input-port? x
@deffnx {C Function} scm_input_port_p (x)
@ -93,6 +105,55 @@ Equivalent to @code{(or (input-port? @var{x}) (output-port?
@var{x}))}.
@end deffn
@deffn {Scheme Procedure} set-port-encoding! port enc
@deffnx {C Function} scm_set_port_encoding_x (port, enc)
Sets the character encoding that will be used to interpret all port
I/O. @var{enc} is a string containing the name of an encoding.
@end deffn
New ports are created with the encoding appropriate for the current
locale if @code{setlocale} has been called or ISO-8859-1 otherwise,
and this procedure can be used to modify that encoding.
@deffn {Scheme Procedure} port-encoding port
@deffnx {C Function} scm_port_encoding
Returns, as a string, the character encoding that @var{port} uses to
interpret its input and output.
@end deffn
@deffn {Scheme Procedure} set-port-conversion-strategy! port sym
@deffnx {C Function} scm_set_port_conversion_strategy_x (port, sym)
Sets the behavior of the interpreter when outputting a character that
is not representable in the port's current encoding. @var{sym} can be
either @code{'error}, @code{'substitute}, or @code{'escape}. If it is
@code{'error}, an error will be thrown when an nonconvertible character
is encountered. If it is @code{'substitute}, then nonconvertible
characters will be replaced with approximate characters, or with
question marks if no approximately correct character is available. If
it is @code{'escape}, it will appear as a hex escape when output.
If @var{port} is an open port, the conversion error behavior
is set for that port. If it is @code{#f}, it is set as the
default behavior for any future ports that get created in
this thread.
@end deffn
@deffn {Scheme Procedure} port-conversion-strategy port
@deffnx {C Function} scm_port_conversion_strategy (port)
Returns the behavior of the port when outputting a character that is
not representable in the port's current encoding. It returns the
symbol @code{error} if unrepresentable characters should cause
exceptions, @code{substitute} if the port should try to replace
unrepresentable characters with question marks or approximate
characters, or @code{escape} if unrepresentable characters should be
converted to string escapes.
If @var{port} is @code{#f}, then the current default behavior will be
returned. New ports will have this default behavior when they are
created.
@end deffn
@node Reading
@subsection Reading
@ -238,7 +299,7 @@ output port if not given.
The output is designed to be machine readable, and can be read back
with @code{read} (@pxref{Reading}). Strings are printed in
doublequotes, with escapes if necessary, and characters are printed in
double quotes, with escapes if necessary, and characters are printed in
@samp{#\} notation.
@end deffn
@ -248,7 +309,7 @@ Send a representation of @var{obj} to @var{port} or to the current
output port if not given.
The output is designed for human readability, it differs from
@code{write} in that strings are printed without doublequotes and
@code{write} in that strings are printed without double quotes and
escapes, and characters are printed as per @code{write-char}, not in
@samp{#\} form.
@end deffn
@ -496,7 +557,7 @@ used. This function is equivalent to:
@end lisp
@end deffn
Some of the abovementioned I/O functions rely on the following C
Some of the aforementioned I/O functions rely on the following C
primitives. These will mainly be of interest to people hacking Guile
internals.
@ -815,11 +876,11 @@ Open @var{filename} for output. Equivalent to
Open @var{filename} for input or output, and call @code{(@var{proc}
port)} with the resulting port. Return the value returned by
@var{proc}. @var{filename} is opened as per @code{open-input-file} or
@code{open-output-file} respectively, and an error is signalled if it
@code{open-output-file} respectively, and an error is signaled if it
cannot be opened.
When @var{proc} returns, the port is closed. If @var{proc} does not
return (eg.@: if it throws an error), then the port might not be
return (e.g.@: if it throws an error), then the port might not be
closed automatically, though it will be garbage collected in the usual
way if not otherwise referenced.
@end deffn
@ -834,7 +895,7 @@ setup as respectively the @code{current-input-port},
@code{current-output-port}, or @code{current-error-port}. Return the
value returned by @var{thunk}. @var{filename} is opened as per
@code{open-input-file} or @code{open-output-file} respectively, and an
error is signalled if it cannot be opened.
error is signaled if it cannot be opened.
When @var{thunk} returns, the port is closed and the previous setting
of the respective current port is restored.
@ -891,6 +952,13 @@ Determine whether @var{obj} is a port that is related to a file.
The following allow string ports to be opened by analogy to R4R*
file port facilities:
With string ports, the port-encoding is treated differently than other
types of ports. When string ports are created, they do not inherit a
character encoding from the current locale. They are given a
default locale that allows them to handle all valid string characters.
Typically one should not modify a string port's character encoding
away from its default.
@deffn {Scheme Procedure} call-with-output-string proc
@deffnx {C Function} scm_call_with_output_string (proc)
Calls the one-argument procedure @var{proc} with a newly created output
@ -1409,7 +1477,7 @@ is set.
@node Port Implementation
@subsubsection Port Implementation
@cindex Port implemenation
@cindex Port implementation
This section describes how to implement a new port type in C.