mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-06-10 05:50:26 +02:00
(Strings): Document copy-on-write behavior and
mutation-sharing substrings. (Symbols): Document scm_from_locale_symbol and scm_from_locale_symboln.
This commit is contained in:
parent
cd505b38ad
commit
c48c62d085
1 changed files with 93 additions and 19 deletions
|
@ -1859,10 +1859,38 @@ entered at the @acronym{REPL} or in Scheme source files.
|
|||
Strings always carry the information about how many characters they are
|
||||
composed of with them, so there is no special end-of-string character,
|
||||
like in C. That means that Scheme strings can contain any character,
|
||||
even the @samp{NUL} character @samp{\0}. But note: Since most operating
|
||||
system calls dealing with strings (such as for file operations) expect
|
||||
strings to be zero-terminated, they might do unexpected things when
|
||||
called with string containing unusual characters.
|
||||
even the @samp{#\nul} character @samp{\0}.
|
||||
|
||||
To use strings efficiently, you need to know a bit about how Guile
|
||||
implements them. In Guile, a string consists of two parts, a head and
|
||||
the actual memory where the characters are stored. When a string (or
|
||||
a substring of it) is copied, only a new head gets created, the memory
|
||||
is usually not copied. The two heads start out pointing to the same
|
||||
memory.
|
||||
|
||||
When one of these two strings is modified, as with @code{string-set!},
|
||||
their common memory does get copied so that each string has its own
|
||||
memory and modifying one does not accidently modify the other as well.
|
||||
Thus, Guile's strings are `copy on write'; the actual copying of their
|
||||
memory is delayed until one string is written to.
|
||||
|
||||
This implementation makes functions like @code{substring} very
|
||||
efficient in the common case that no modifications are done to the
|
||||
involved strings.
|
||||
|
||||
If you do know that your strings are getting modified right away, you
|
||||
can use @code{substring/copy} instead of @code{substring}. This
|
||||
function performs the copy immediately at the time of creation. This
|
||||
is more efficient, especially in a multi-threaded program. Also,
|
||||
@code{substring/copy} can avoid the problem that a short substring
|
||||
holds on to the memory of a very large original string that could
|
||||
otherwise be recycled.
|
||||
|
||||
If you want to avoid the copy altogether, so that modifications of one
|
||||
string show up in the other, you can use @code{substring/shared}. The
|
||||
strings created by this procedure are called @dfn{mutation sharing
|
||||
substrings} since the substring and the original string share
|
||||
modifications to each other.
|
||||
|
||||
@menu
|
||||
* String Syntax:: Read syntax for strings.
|
||||
|
@ -1887,9 +1915,7 @@ called with string containing unusual characters.
|
|||
@c special in a string (they're not).
|
||||
|
||||
The read syntax for strings is an arbitrarily long sequence of
|
||||
characters enclosed in double quotes (@nicode{"}). @footnote{Actually,
|
||||
the current implementation restricts strings to a length of
|
||||
@math{2^24}, or 16,777,216, characters. Sorry.}
|
||||
characters enclosed in double quotes (@nicode{"}).
|
||||
|
||||
Backslash is an escape character and can be used to insert the
|
||||
following special characters. @nicode{\"} and @nicode{\\} are R5RS
|
||||
|
@ -1972,7 +1998,9 @@ y @result{} "foo"
|
|||
@subsubsection String Constructors
|
||||
|
||||
The string constructor procedures create new string objects, possibly
|
||||
initializing them with some specified character data.
|
||||
initializing them with some specified character data. See also
|
||||
@xref{String Selection}, for ways to create strings from existing
|
||||
strings.
|
||||
|
||||
@c FIXME::martin: list->string belongs into `List/String Conversion'
|
||||
|
||||
|
@ -1994,6 +2022,11 @@ the string are initialized to @var{chr}, otherwise the contents
|
|||
of the @var{string} are unspecified.
|
||||
@end deffn
|
||||
|
||||
@deftypefn {C Function} SCM scm_c_make_string (size_t len, SCM chr)
|
||||
Like @code{scm_make_string}, but expects the length as a
|
||||
@code{size_t}.
|
||||
@end deftypefn
|
||||
|
||||
@node List/String Conversion
|
||||
@subsubsection List/String conversion
|
||||
|
||||
|
@ -2047,6 +2080,10 @@ Portions of strings can be extracted by these procedures.
|
|||
Return the number of characters in @var{string}.
|
||||
@end deffn
|
||||
|
||||
@deftypefn {C Function} size_t scm_c_string_length (SCM str)
|
||||
Return the number of characters in @var{str} as a @code{size_t}.
|
||||
@end deftypefn
|
||||
|
||||
@rnindex string-ref
|
||||
@deffn {Scheme Procedure} string-ref str k
|
||||
@deffnx {C Function} scm_string_ref (str, k)
|
||||
|
@ -2054,24 +2091,54 @@ Return character @var{k} of @var{str} using zero-origin
|
|||
indexing. @var{k} must be a valid index of @var{str}.
|
||||
@end deffn
|
||||
|
||||
@deftypefn {C Function} SCM scm_c_string_ref (SCM str, size_t k)
|
||||
Return character @var{k} of @var{str} using zero-origin
|
||||
indexing. @var{k} must be a valid index of @var{str}.
|
||||
@end deftypefn
|
||||
|
||||
@rnindex string-copy
|
||||
@deffn {Scheme Procedure} string-copy str
|
||||
@deffnx {C Function} scm_string_copy (str)
|
||||
Return a newly allocated copy of the given @var{string}.
|
||||
Return a copy of the given @var{string}.
|
||||
|
||||
The returned string shares storage with @var{str} initially, but it is
|
||||
copied as soon as one of the two strings is modified.
|
||||
@end deffn
|
||||
|
||||
@rnindex substring
|
||||
@deffn {Scheme Procedure} substring str start [end]
|
||||
@deffnx {C Function} scm_substring (str, start, end)
|
||||
Return a newly allocated string formed from the characters
|
||||
Return a new string formed from the characters
|
||||
of @var{str} beginning with index @var{start} (inclusive) and
|
||||
ending with index @var{end} (exclusive).
|
||||
@var{str} must be a string, @var{start} and @var{end} must be
|
||||
exact integers satisfying:
|
||||
|
||||
0 <= @var{start} <= @var{end} <= @code{(string-length @var{str})}.
|
||||
|
||||
The returned string shares storage with @var{str} initially, but it is
|
||||
copied as soon as one of the two strings is modified.
|
||||
@end deffn
|
||||
|
||||
@deffn {Scheme Procedure} substring/shared str start [end]
|
||||
@deffnx {C Function} scm_substring_shared (str, start, end)
|
||||
Like @code{substring}, but the strings continue to share their storage
|
||||
even if they are modified. Thus, modifications to @var{str} show up
|
||||
in the new string, and vice versa.
|
||||
@end deffn
|
||||
|
||||
@deffn {Scheme Procedure} substring/copy str start [end]
|
||||
@deffnx {C Function} scm_substring_copy (str, start, end)
|
||||
Like @code{substring}, but the storage for the new string is copied
|
||||
immediately.
|
||||
@end deffn
|
||||
|
||||
@deftypefn {C Function} SCM scm_c_substring (SCM str, size_t start, size_t end)
|
||||
@deftypefnx {C Function} SCM scm_c_substring_shared (SCM str, size_t start, size_t end)
|
||||
@deftypefnx {C Function} SCM scm_c_substring_copy (SCM str, size_t start, size_t end)
|
||||
Like @code{scm_substring}, etc. but the bounds are given as a @code{size_t}.
|
||||
@end deftypefn
|
||||
|
||||
@node String Modification
|
||||
@subsubsection String Modification
|
||||
|
||||
|
@ -2087,6 +2154,10 @@ an unspecified value. @var{k} must be a valid index of
|
|||
@var{str}.
|
||||
@end deffn
|
||||
|
||||
@deftypefn {C Function} void scm_c_string_set_x (SCM str, size_t k, SCM chr)
|
||||
Like @code{scm_string_set_x}, but the index is given as a @code{size_t}.
|
||||
@end deftypefn
|
||||
|
||||
@rnindex string-fill!
|
||||
@deffn {Scheme Procedure} string-fill! str chr
|
||||
@deffnx {C Function} scm_string_fill_x (str, chr)
|
||||
|
@ -2338,9 +2409,9 @@ the bytes, only the characters.
|
|||
Well, ideally, anyway. Right now, Guile simply equates Scheme
|
||||
characters and bytes, ignoring the possibility of multi-byte encodings
|
||||
completely. This will change in the future, where Guile will use
|
||||
Unicode codepoints as its characters and UTF-8 (or maybe UCS-4) as its
|
||||
internal encoding. When you exclusively use the functions listed in
|
||||
this section, you are `future-proof'.
|
||||
Unicode codepoints as its characters and UTF-8 or some other encoding
|
||||
as its internal encoding. When you exclusively use the functions
|
||||
listed in this section, you are `future-proof'.
|
||||
|
||||
Converting a Scheme string to a C string will often allocate fresh
|
||||
memory to hold the result. You must take care that this memory is
|
||||
|
@ -3194,14 +3265,17 @@ the case-sensitivity of symbols:
|
|||
@end lisp
|
||||
|
||||
From C, there are lower level functions that construct a Scheme symbol
|
||||
from a null terminated C string or from a sequence of bytes whose length
|
||||
is specified explicitly.
|
||||
from a C string in the current locale encoding.
|
||||
|
||||
@deffn {C Function} scm_str2symbol (const char * name)
|
||||
@deffnx {C Function} scm_mem2symbol (const char * name, size_t len)
|
||||
When you want to do more from C, you should convert between symbols
|
||||
and strings using @code{scm_symbol_to_string} and
|
||||
@code{scm_string_to_symbol} and work with the strings.
|
||||
|
||||
@deffn {C Function} scm_from_locale_symbol (const char *name)
|
||||
@deffnx {C Function} scm_from_locale_symboln (const char *name, size_t len)
|
||||
Construct and return a Scheme symbol whose name is specified by
|
||||
@var{name}. For @code{scm_str2symbol} @var{name} must be null
|
||||
terminated; For @code{scm_mem2symbol} the length of @var{name} is
|
||||
@var{name}. For @code{scm_from_locale_symbol}, @var{name} must be null
|
||||
terminated; for @code{scm_from_locale_symboln} the length of @var{name} is
|
||||
specified explicitly by @var{len}.
|
||||
@end deffn
|
||||
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue