mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 11:50:28 +02:00
Doc updates for Unicode string escapes and port encodings
* NEWS: string and port changes * doc/ref/api-data.texi: string escapes and string-ci * doc/ref/api-io.texi: port encoding functions
This commit is contained in:
parent
18d8fcd43c
commit
28cc8dac2f
3 changed files with 103 additions and 11 deletions
11
NEWS
11
NEWS
|
@ -10,6 +10,17 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
|
||||||
|
|
||||||
Changes in 1.9.3 (since the 1.9.2 prerelease):
|
Changes in 1.9.3 (since the 1.9.2 prerelease):
|
||||||
|
|
||||||
|
** Ports do transcoding
|
||||||
|
|
||||||
|
Ports now have an associated character encoding, and port read/write
|
||||||
|
operations do conversion to/from locales automatically. Ports also
|
||||||
|
have an associated strategy for how to deal with locale conversion
|
||||||
|
failures. Four functions to support this: set-port-encoding!,
|
||||||
|
port-encoding, set-port-conversion-strategy!,
|
||||||
|
port-conversion-strategy.
|
||||||
|
|
||||||
|
** String and SRFI-13 functions can operate on Unicode strings
|
||||||
|
|
||||||
** SRFI-14 char-sets are modified for Unicode
|
** SRFI-14 char-sets are modified for Unicode
|
||||||
|
|
||||||
The default char-sets are not longer locale dependent and contain
|
The default char-sets are not longer locale dependent and contain
|
||||||
|
|
|
@ -2690,6 +2690,14 @@ Vertical tab character (ASCII 11).
|
||||||
@item @nicode{\xHH}
|
@item @nicode{\xHH}
|
||||||
Character code given by two hexadecimal digits. For example
|
Character code given by two hexadecimal digits. For example
|
||||||
@nicode{\x7f} for an ASCII DEL (127).
|
@nicode{\x7f} for an ASCII DEL (127).
|
||||||
|
|
||||||
|
@item @nicode{\uHHHH}
|
||||||
|
Character code given by four hexadecimal digits. For example
|
||||||
|
@nicode{\u0100} for a capital A with macron (U+0100).
|
||||||
|
|
||||||
|
@item @nicode{\UHHHHHH}
|
||||||
|
Character code given by six hexadecimal digits. For example
|
||||||
|
@nicode{\U010402}.
|
||||||
@end table
|
@end table
|
||||||
|
|
||||||
@noindent
|
@noindent
|
||||||
|
@ -3110,9 +3118,14 @@ The procedures in this section are similar to the character ordering
|
||||||
predicates (@pxref{Characters}), but are defined on character sequences.
|
predicates (@pxref{Characters}), but are defined on character sequences.
|
||||||
|
|
||||||
The first set is specified in R5RS and has names that end in @code{?}.
|
The first set is specified in R5RS and has names that end in @code{?}.
|
||||||
The second set is specified in SRFI-13 and the names have no ending
|
The second set is specified in SRFI-13 and the names have not ending
|
||||||
@code{?}. The predicates ending in @code{-ci} ignore the character case
|
@code{?}.
|
||||||
when comparing strings. @xref{Text Collation, the @code{(ice-9
|
|
||||||
|
The predicates ending in @code{-ci} ignore the character case
|
||||||
|
when comparing strings. For now, case-insensitive comparison is done
|
||||||
|
using the R5RS rules, where every lower-case character that has a
|
||||||
|
single character upper-case form is converted to uppercase before
|
||||||
|
comparison. See @xref{Text Collation, the @code{(ice-9
|
||||||
i18n)} module}, for locale-dependent string comparison.
|
i18n)} module}, for locale-dependent string comparison.
|
||||||
|
|
||||||
@rnindex string=?
|
@rnindex string=?
|
||||||
|
|
|
@ -47,7 +47,7 @@ are two interesting and powerful examples of this technique.
|
||||||
|
|
||||||
Ports are garbage collected in the usual way (@pxref{Memory
|
Ports are garbage collected in the usual way (@pxref{Memory
|
||||||
Management}), and will be closed at that time if not already closed.
|
Management}), and will be closed at that time if not already closed.
|
||||||
In this case any errors occuring in the close will not be reported.
|
In this case any errors occurring in the close will not be reported.
|
||||||
Usually a program will want to explicitly close so as to be sure all
|
Usually a program will want to explicitly close so as to be sure all
|
||||||
its operations have been successful. Of course if a program has
|
its operations have been successful. Of course if a program has
|
||||||
abandoned something due to an error or other condition then closing
|
abandoned something due to an error or other condition then closing
|
||||||
|
@ -70,6 +70,18 @@ All file access uses the ``LFS'' large file support functions when
|
||||||
available, so files bigger than 2 Gbytes (@math{2^31} bytes) can be
|
available, so files bigger than 2 Gbytes (@math{2^31} bytes) can be
|
||||||
read and written on a 32-bit system.
|
read and written on a 32-bit system.
|
||||||
|
|
||||||
|
Each port has an associated character encoding that controls how bytes
|
||||||
|
read from the port are converted to characters and string and controls
|
||||||
|
how characters and strings written to the port are converted to bytes.
|
||||||
|
When ports are created, they inherit their character encoding from the
|
||||||
|
current locale, but, that can be modified after the port is created.
|
||||||
|
|
||||||
|
Each port also has an associated conversion strategy: what to do when
|
||||||
|
a Guile character can't be converted to the port's encoded character
|
||||||
|
representation for output. There are three possible strategies: to
|
||||||
|
raise an error, to replace the character with a hex escape, or to
|
||||||
|
replace the character with a substitute character.
|
||||||
|
|
||||||
@rnindex input-port?
|
@rnindex input-port?
|
||||||
@deffn {Scheme Procedure} input-port? x
|
@deffn {Scheme Procedure} input-port? x
|
||||||
@deffnx {C Function} scm_input_port_p (x)
|
@deffnx {C Function} scm_input_port_p (x)
|
||||||
|
@ -93,6 +105,55 @@ Equivalent to @code{(or (input-port? @var{x}) (output-port?
|
||||||
@var{x}))}.
|
@var{x}))}.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
|
@deffn {Scheme Procedure} set-port-encoding! port enc
|
||||||
|
@deffnx {C Function} scm_set_port_encoding_x (port, enc)
|
||||||
|
Sets the character encoding that will be used to interpret all port
|
||||||
|
I/O. @var{enc} is a string containing the name of an encoding.
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
New ports are created with the encoding appropriate for the current
|
||||||
|
locale if @code{setlocale} has been called or ISO-8859-1 otherwise,
|
||||||
|
and this procedure can be used to modify that encoding.
|
||||||
|
|
||||||
|
@deffn {Scheme Procedure} port-encoding port
|
||||||
|
@deffnx {C Function} scm_port_encoding
|
||||||
|
Returns, as a string, the character encoding that @var{port} uses to
|
||||||
|
interpret its input and output.
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
@deffn {Scheme Procedure} set-port-conversion-strategy! port sym
|
||||||
|
@deffnx {C Function} scm_set_port_conversion_strategy_x (port, sym)
|
||||||
|
Sets the behavior of the interpreter when outputting a character that
|
||||||
|
is not representable in the port's current encoding. @var{sym} can be
|
||||||
|
either @code{'error}, @code{'substitute}, or @code{'escape}. If it is
|
||||||
|
@code{'error}, an error will be thrown when an nonconvertible character
|
||||||
|
is encountered. If it is @code{'substitute}, then nonconvertible
|
||||||
|
characters will be replaced with approximate characters, or with
|
||||||
|
question marks if no approximately correct character is available. If
|
||||||
|
it is @code{'escape}, it will appear as a hex escape when output.
|
||||||
|
|
||||||
|
If @var{port} is an open port, the conversion error behavior
|
||||||
|
is set for that port. If it is @code{#f}, it is set as the
|
||||||
|
default behavior for any future ports that get created in
|
||||||
|
this thread.
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
@deffn {Scheme Procedure} port-conversion-strategy port
|
||||||
|
@deffnx {C Function} scm_port_conversion_strategy (port)
|
||||||
|
Returns the behavior of the port when outputting a character that is
|
||||||
|
not representable in the port's current encoding. It returns the
|
||||||
|
symbol @code{error} if unrepresentable characters should cause
|
||||||
|
exceptions, @code{substitute} if the port should try to replace
|
||||||
|
unrepresentable characters with question marks or approximate
|
||||||
|
characters, or @code{escape} if unrepresentable characters should be
|
||||||
|
converted to string escapes.
|
||||||
|
|
||||||
|
If @var{port} is @code{#f}, then the current default behavior will be
|
||||||
|
returned. New ports will have this default behavior when they are
|
||||||
|
created.
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@node Reading
|
@node Reading
|
||||||
@subsection Reading
|
@subsection Reading
|
||||||
|
@ -238,7 +299,7 @@ output port if not given.
|
||||||
|
|
||||||
The output is designed to be machine readable, and can be read back
|
The output is designed to be machine readable, and can be read back
|
||||||
with @code{read} (@pxref{Reading}). Strings are printed in
|
with @code{read} (@pxref{Reading}). Strings are printed in
|
||||||
doublequotes, with escapes if necessary, and characters are printed in
|
double quotes, with escapes if necessary, and characters are printed in
|
||||||
@samp{#\} notation.
|
@samp{#\} notation.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
|
@ -248,7 +309,7 @@ Send a representation of @var{obj} to @var{port} or to the current
|
||||||
output port if not given.
|
output port if not given.
|
||||||
|
|
||||||
The output is designed for human readability, it differs from
|
The output is designed for human readability, it differs from
|
||||||
@code{write} in that strings are printed without doublequotes and
|
@code{write} in that strings are printed without double quotes and
|
||||||
escapes, and characters are printed as per @code{write-char}, not in
|
escapes, and characters are printed as per @code{write-char}, not in
|
||||||
@samp{#\} form.
|
@samp{#\} form.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
@ -496,7 +557,7 @@ used. This function is equivalent to:
|
||||||
@end lisp
|
@end lisp
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
Some of the abovementioned I/O functions rely on the following C
|
Some of the aforementioned I/O functions rely on the following C
|
||||||
primitives. These will mainly be of interest to people hacking Guile
|
primitives. These will mainly be of interest to people hacking Guile
|
||||||
internals.
|
internals.
|
||||||
|
|
||||||
|
@ -815,11 +876,11 @@ Open @var{filename} for output. Equivalent to
|
||||||
Open @var{filename} for input or output, and call @code{(@var{proc}
|
Open @var{filename} for input or output, and call @code{(@var{proc}
|
||||||
port)} with the resulting port. Return the value returned by
|
port)} with the resulting port. Return the value returned by
|
||||||
@var{proc}. @var{filename} is opened as per @code{open-input-file} or
|
@var{proc}. @var{filename} is opened as per @code{open-input-file} or
|
||||||
@code{open-output-file} respectively, and an error is signalled if it
|
@code{open-output-file} respectively, and an error is signaled if it
|
||||||
cannot be opened.
|
cannot be opened.
|
||||||
|
|
||||||
When @var{proc} returns, the port is closed. If @var{proc} does not
|
When @var{proc} returns, the port is closed. If @var{proc} does not
|
||||||
return (eg.@: if it throws an error), then the port might not be
|
return (e.g.@: if it throws an error), then the port might not be
|
||||||
closed automatically, though it will be garbage collected in the usual
|
closed automatically, though it will be garbage collected in the usual
|
||||||
way if not otherwise referenced.
|
way if not otherwise referenced.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
@ -834,7 +895,7 @@ setup as respectively the @code{current-input-port},
|
||||||
@code{current-output-port}, or @code{current-error-port}. Return the
|
@code{current-output-port}, or @code{current-error-port}. Return the
|
||||||
value returned by @var{thunk}. @var{filename} is opened as per
|
value returned by @var{thunk}. @var{filename} is opened as per
|
||||||
@code{open-input-file} or @code{open-output-file} respectively, and an
|
@code{open-input-file} or @code{open-output-file} respectively, and an
|
||||||
error is signalled if it cannot be opened.
|
error is signaled if it cannot be opened.
|
||||||
|
|
||||||
When @var{thunk} returns, the port is closed and the previous setting
|
When @var{thunk} returns, the port is closed and the previous setting
|
||||||
of the respective current port is restored.
|
of the respective current port is restored.
|
||||||
|
@ -891,6 +952,13 @@ Determine whether @var{obj} is a port that is related to a file.
|
||||||
The following allow string ports to be opened by analogy to R4R*
|
The following allow string ports to be opened by analogy to R4R*
|
||||||
file port facilities:
|
file port facilities:
|
||||||
|
|
||||||
|
With string ports, the port-encoding is treated differently than other
|
||||||
|
types of ports. When string ports are created, they do not inherit a
|
||||||
|
character encoding from the current locale. They are given a
|
||||||
|
default locale that allows them to handle all valid string characters.
|
||||||
|
Typically one should not modify a string port's character encoding
|
||||||
|
away from its default.
|
||||||
|
|
||||||
@deffn {Scheme Procedure} call-with-output-string proc
|
@deffn {Scheme Procedure} call-with-output-string proc
|
||||||
@deffnx {C Function} scm_call_with_output_string (proc)
|
@deffnx {C Function} scm_call_with_output_string (proc)
|
||||||
Calls the one-argument procedure @var{proc} with a newly created output
|
Calls the one-argument procedure @var{proc} with a newly created output
|
||||||
|
@ -1409,7 +1477,7 @@ is set.
|
||||||
|
|
||||||
@node Port Implementation
|
@node Port Implementation
|
||||||
@subsubsection Port Implementation
|
@subsubsection Port Implementation
|
||||||
@cindex Port implemenation
|
@cindex Port implementation
|
||||||
|
|
||||||
This section describes how to implement a new port type in C.
|
This section describes how to implement a new port type in C.
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue