guile/doc/srfi-13-14.texi

@node SRFI-13/14
@chapter SRFI-13 and SRFI-14

This chapter documents the SRFI-13/14 library, which provides the string
utility procedures defined in SRFI-13 and the character-set procedures
defined in SRFI-14 for Guile.

@menu
* Introduction::                What is this all about?
* Loading SRFI-13/14::          Loading the module into a running Guile.
* String Functions::            Available string processing procedures.
* Character-set Procedures::    Procedures for manipulating character sets.
@end menu


@c ===================================================================

@node Introduction
@section Introduction

The SRFI-13/14 library is a shared library which provides the procedures
defined in SRFI-13 (string library) and the procedures defined in
SRFI-14 (character-set library).  You should also refer to the SRFI
documents, which provide some details I will not document here.

If you don't know what SRFI means, and what all the numbers are about,
you may want to refer to the SRFI home page at
@url{http://srfi.schemers.org}.

Note that only the procedures from SRFI-13 are documented here which are
not already contained in Guile.  For procedures not documented here
please refer to the relevant chapters in the Guile Reference Manual, for
example the documentation of strings and string procedures (REFFIXME).

The SRFI-14 procedures are documented completely.

@menu
* What can be done::            What is possible with SRFI-13/14
* What cannot be done::         and what is not?
@end menu


@c ===================================================================

@node What can be done
@subsection What can be done

All of the procedures defined in SRFI-13, which are not already included
in the Guile core library, are implemented in the module @code{(srfi
srfi-13)}.  The procedures which are both in Guile and in SRFI-13, but
which are slightly extended, have been implemented in this module, and
the bindings overwrite those in the Guile core.

All procedures from SRFI-14 (character-set library) are implemented in
the module @code{(srfi srfi-14)}, as well as the standard variables
@code{char-set:letter}, @code{char-set:digit} etc.


@c ===================================================================

@node What cannot be done
@subsection What cannot be done

The procedures which are defined in the section @emph{Low-level
procedures} of SRFI-13 for parsing optional string indices, substring
specification checking and Knuth-Morris-Pratt-Searching are not
implemented.

The procedures @code{string-contains} and @code{string-contains-ci} are
not implemented very efficiently at the moment.  This will be changed as
soon as possible.


@c ===================================================================

@node Loading SRFI-13/14
@section Loading SRFI-13/14

When Guile is properly installed, it can be loaded into a running Guile
by using the @code{(srfi srfi-13)} module.

@example
$ guile
guile> (use-modules (srfi srfi-13))
guile>
@end example

When this step causes any errors, Guile is not properly installed.

One possible reason is that Guile cannot find either the Scheme module
file @file{srfi-13.scm}, or it cannot find the shared object file
@file{libguile-srfi-srfi-13-14.so}.  Make sure that the former is in the
Guile load path and that the latter is either installed in some default
location like @file{/usr/local/lib} or that the directory it was
installed to is in your @code{LTDL_LIBRARY_PATH}.  The same applies to
@file{srfi-14.scm}.

Now you can test whether the SRFI-13 procedures are working by calling
the @code{string-concatenate} procedure.

@example
guile> (string-concatenate '("Hello" " " "World!"))
"Hello World!"
@end example

The same goes for the SRFI-14 module, of course.

@example
$ guile
guile> (use-modules (srfi srfi-14))
guile> (char-set-union (char-set #\f #\o #\o) (string->char-set "bar"))
#<charset @{#\a #\b #\f #\o #\r@}>
guile>
@end example


@c ===================================================================

@node String Functions
@section String Functions

In this section, we will describe all procedures defined in SRFI-13
(string library) and implemented by the module @code{(srfi srfi-13)}.

Except for the procedures in the section @emph{Low-level procedures} of
SRFI-13, all string procedures defined there are implemented completely.

@menu
* Predicates::                  Testing strings.
* SRFI-13 Constructors::        Constructing strings.
* SRFI-13 List/String Conversion::  Converstion from/to character lists.
* SRFI-13 Selection::           Selecting portions from strings.
* SRFI-13 Modification::        Modifying string in--place.
* SRFI-13 Comparison::          Comparing strings.
* Prefixes/Suffixes::           Checking for common pre-/suffixes.
* Searching::                   Searching in strings.
* Case Mapping::                Changing the case of strings.
* Reverse/Append::              Append, concatenate and reverse strings.
* Fold/Unfold/Map::             Fold/Unfold/Map over strings.
* Replicate/Rotate::            String replication and rotation.
* Miscellaneous::               Miscellaneous string procedures.
* Filtering/Deleting::          Deleting characters from strings.
@end menu


@c ===================================================================

@node Predicates
@subsection Predicates

In addition to the primitives @code{string?} and @code{string-null?},
which are already in the Guile core, the string predicates
@code{string-any} and @code{string-every} are defined by SRFI-13.

@deffn primitive string-any pred s [start end]
Check if the predicate @var{pred} is true for any character in
the string @var{s}, proceeding from left (index @var{start}) to
right (index @var{end}).  If @code{string-any} returns true,
the returned true value is the one produced by the first
successful application of @var{pred}.
@end deffn

@deffn primitive string-every pred s [start end]
Check if the predicate @var{pred} is true for every character
in the string @var{s}, proceeding from left (index @var{start})
to right (index @var{end}).  If @code{string-every} returns
true, the returned true value is the one produced by the final
application of @var{pred} to the last character of @var{s}.
@end deffn


@c ===================================================================

@node SRFI-13 Constructors
@subsection Constructors

SRFI-13 defines several procedures for constructing new strings.  In
addition to @code{make-string} and @code{string} (available in the Guile
core library), the procedure @code{string-tabulate} does exist.

@deffn primitive string-tabulate proc len
@var{proc} is an integer->char procedure.  Construct a string
of size @var{len} by applying @var{proc} to each index to
produce the corresponding string element.  The order in which
@var{proc} is applied to the indices is not specified.
@end deffn


@c ===================================================================

@node SRFI-13 List/String Conversion
@subsection List/String Conversion

The procedure @code{string->list} is extended by SRFI-13, that is why it
is included in @code{(srfi srfi-13)}.  The other procedures are new.
The Guile core already contains the procedure @code{list->string} for
converting a list of characters into a string (REFFIXME).

@deffn primitive string->list str [start end]
Convert the string @var{str} into a list of characters.
@end deffn

@deffn primitive reverse-list->string chrs
An efficient implementation of @code{(compose string->list
reverse)}:

@smalllisp
(reverse-list->string '(#\a #\B #\c)) @result{} "cBa"
@end smalllisp
@end deffn

@deffn primitive string-join ls [delimiter grammar]
Append the string in the string list @var{ls}, using the string
@var{delim} as a delimiter between the elements of @var{ls}.
@var{grammar} is a symbol which specifies how the delimiter is
placed between the strings, and defaults to the symbol
@code{infix}.

@table @code
@item infix
Insert the separator between list elements.  An empty string
will produce an empty list.

@item string-infix
Like @code{infix}, but will raise an error if given the empty
list.

@item suffix
Insert the separator after every list element.

@item prefix
Insert the separator before each list element.
@end table
@end deffn


@c ===================================================================

@node SRFI-13 Selection
@subsection Selection

These procedures are called @dfn{selectors}, because they access
information about the string or select pieces of a given string.

Additional selector procedures are documented in the Strings section
(REFFIXME), like @code{string-length} or @code{string-ref}.

@code{string-copy} is also available in core Guile, but this version
accepts additional start/end indices.

@deffn primitive string-copy str [start end]
Return a freshly allocated copy of the string @var{str}.  If
given, @var{start} and @var{end} delimit the portion of
@var{str} which is copied.
@end deffn

@deffn primitive substring/shared str start [end]
Like @code{substring}, but the result may share memory with the
argument @var{str}.
@end deffn

@deffn primitive string-copy! target tstart s [start end]
Copy the sequence of characters from index range [@var{start},
@var{end}) in string @var{s} to string @var{target}, beginning
at index @var{tstart}.  The characters are copied left-to-right
or right-to-left as needed -- the copy is guaranteed to work,
even if @var{target} and @var{s} are the same string.  It is an
error if the copy operation runs off the end of the target
string.
@end deffn

@deffn primitive string-take s n
@deffnx primitive string-take-right s n
Return the @var{n} first/last characters of @var{s}.
@end deffn

@deffn primitive string-drop s n
@deffnx primitive string-drop-right s n
Return all but the first/last @var{n} characters of @var{s}.
@end deffn

@deffn primitive string-pad s len [chr start end]
@deffnx primitive string-pad-right s len [chr start end]
Take that characters from @var{start} to @var{end} from the
string @var{s} and return a new string, right(left)-padded by the
character @var{chr} to length @var{len}.  If the resulting
string is longer than @var{len}, it is truncated on the right (left).
@end deffn

@deffn primitive string-trim s [char_pred start end]
@deffnx primitive string-trim-right s [char_pred start end]
@deffnx primitive string-trim-both s [char_pred start end]
Trim @var{s} by skipping over all characters on the left/right/both
sides of the string that satisfy the parameter @var{char_pred}:

@itemize @bullet
@item
if it is the character @var{ch}, characters equal to
@var{ch} are trimmed,

@item
if it is a procedure @var{pred} characters that
satisfy @var{pred} are trimmed,

@item
if it is a character set, characters in that set are trimmed.
@end itemize

If called without a @var{char_pred} argument, all whitespace is
trimmed.
@end deffn


@c ===================================================================

@node SRFI-13 Modification
@subsection Modification

The procedure @code{string-fill!} is extended from R5RS because it
accepts optional start/end indices.  This bindings shadows the procedure
of the same name in the Guile core.  The second modification procedure
@code{string-set!} is documented in the Strings section (REFFIXME).

@deffn primitive string-fill! str chr [start end]
Stores @var{chr} in every element of the given @var{str} and
returns an unspecified value.
@end deffn


@c ===================================================================

@node SRFI-13 Comparison
@subsection Comparison

The procedures in this section are used for comparing strings in
different ways.  The comparison predicates differ from those in R5RS in
that they do not only return @code{#t} or @code{#f}, but the mismatch
index in the case of a true return value.

@code{string-hash} and @code{string-hash-ci} are for calculating hash
values for strings, useful for implementing fast lookup mechanisms.

@deffn primitive string-compare s1 s2 proc_lt proc_eq proc_gt [start1 end1 start2 end2]
@deffnx primitive string-compare-ci s1 s2 proc_lt proc_eq proc_gt [start1 end1 start2 end2]
Apply @var{proc_lt}, @var{proc_eq}, @var{proc_gt} to the
mismatch index, depending upon whether @var{s1} is less than,
equal to, or greater than @var{s2}.  The mismatch index is the
largest index @var{i} such that for every 0 <= @var{j} <
@var{i}, @var{s1}[@var{j}] = @var{s2}[@var{j}] -- that is,
@var{i} is the first position that does not match.  The
character comparison is done case-insensitively.
@end deffn

@deffn primitive string= s1 s2 [start1 end1 start2 end2]
@deffnx primitive string<> s1 s2 [start1 end1 start2 end2]
@deffnx primitive string< s1 s2 [start1 end1 start2 end2]
@deffnx primitive string> s1 s2 [start1 end1 start2 end2]
@deffnx primitive string<= s1 s2 [start1 end1 start2 end2]
@deffnx primitive string>= s1 s2 [start1 end1 start2 end2]
Compare @var{s1} and @var{s2} and return @code{#f} if the predicate
fails.  Otherwise, the mismatch index is returned (or @var{end1} in the
case of @code{string=}.
@end deffn

@deffn primitive string-ci= s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-ci<> s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-ci< s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-ci> s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-ci<= s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-ci>= s1 s2 [start1 end1 start2 end2]
Compare @var{s1} and @var{s2} and return @code{#f} if the predicate
fails.  Otherwise, the mismatch index is returned (or @var{end1} in the
case of @code{string=}.  These are the case-insensitive variants.
@end deffn

@deffn primitive string-hash s [bound start end]
@deffnx primitive string-hash-ci s [bound start end]
Return a hash value of the string @var{s} in the range 0 @dots{}
@var{bound} - 1.  @code{string-hash-ci} is the case-insensitive variant.
@end deffn


@c ===================================================================

@node Prefixes/Suffixes
@subsection Prefixes/Suffixes

Using these procedures you can determine whether a given string is a
prefix or suffix of another string or how long a common prefix/suffix
is.

@deffn primitive string-prefix-length s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-prefix-length-ci s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-suffix-length s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-suffix-length-ci s1 s2 [start1 end1 start2 end2]
Return the length of the longest common prefix/suffix of the two
strings. @code{string-prefix-length-ci} and
@code{string-suffix-length-ci} are the case-insensitive variants.
@end deffn

@deffn primitive string-prefix? s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-prefix-ci? s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-suffix? s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-suffix-ci? s1 s2 [start1 end1 start2 end2]
Is @var{s1} a prefix/suffix of @var{s2}. @code{string-prefix-ci?} and
@code{string-suffix-ci?} are the case-insensitive variants.
@end deffn


@c ===================================================================

@node Searching
@subsection Searching

Use these procedures to find out whether a string contains a given
character or a given substring, or a character from a set of characters.

@deffn primitive string-index s char_pred [start end]
@deffnx primitive string-index-right s char_pred [start end]
Search through the string @var{s} from left to right (right to left),
returning the index of the first (last) occurence of a character which

@itemize
@item
equals @var{char_pred}, if it is character,

@item
satisifies the predicate @var{char_pred}, if it is a
procedure,

@item
is in the set @var{char_pred}, if it is a character set.
@end itemize
@end deffn

@deffn primitive string-skip s char_pred [start end]
@deffnx primitive string-skip-right s char_pred [start end]
Search through the string @var{s} from left to right (right to left),
returning the index of the first (last) occurence of a character which

@itemize
@item
does not equal @var{char_pred}, if it is character,

@item
does not satisify the predicate @var{char_pred}, if it is
a procedure.

@item
is not in the set if @var{char_pred} is a character set.
@end itemize
@end deffn

@deffn primitive string-count s char_pred [start end]
Return the count of the number of characters in the string
@var{s} which

@itemize @bullet
@item
equals @var{char_pred}, if it is character,

@item
satisifies the predicate @var{char_pred}, if it is a procedure.

@item
is in the set @var{char_pred}, if it is a character set.
@end itemize
@end deffn

@deffn primitive string-contains s1 s2 [start1 end1 start2 end2]
@deffnx primitive string-contains-ci s1 s2 [start1 end1 start2 end2]
Does string @var{s1} contain string @var{s2}?  Return the index
in @var{s1} where @var{s2} occurs as a substring, or false.
The optional start/end indices restrict the operation to the
indicated substrings.

@code{string-contains-ci} is the case-insensitive variant.
@end deffn


@c ===================================================================

@node Case Mapping
@subsection Alphabetic Case Mapping

These procedures convert the alphabetic case of strings.  They are
similar to the procedures in the Guile core, but are extended to handle
optional start/end indices.

@deffn primitive string-upcase s [start end]
@deffnx primitive string-upcase! s [start end]
Upcase every character in @var{s}.  @code{string-upcase!} is the
side-effecting variant.
@end deffn

@deffn primitive string-downcase s [start end]
@deffnx primitive string-downcase! s [start end]
Downcase every character in @var{s}.  @code{string-downcase!} is the
side--effecting variant.
@end deffn

@deffn primitive string-titlecase s [start end]
@deffnx primitive string-titlecase! s [start end]
Upcase every first character in every word in @var{s}, downcase the
other characters.  @code{string-titlecase!} is the side--effecting
variant.
@end deffn


@c ===================================================================

@node Reverse/Append
@subsection Reverse/Append

One appending procedure, @code{string-append} is the same in R5RS and in
SRFI-13, so it is not redefined.

@deffn primitive string-reverse str [start end]
@deffnx primitive string-reverse! str [start end]
Reverse the string @var{str}.  The optional arguments
@var{start} and @var{end} delimit the region of @var{str} to
operate on.

@code{string-reverse!} modifies the argument string and returns an
unspecified value.
@end deffn

@deffn primitive string-append/shared ls @dots{}
Like @code{string-append}, but the result may share memory
with the argument strings.
@end deffn

@deffn primitive string-concatenate ls
Append the elements of @var{ls} (which must be strings)
together into a single string.  Guaranteed to return a freshly
allocated string.
@end deffn

@deffn primitive string-concatenate/shared ls
Like @code{string-concatenate}, but the result may share memory
with the strings in the list @var{ls}.
@end deffn

@deffn primitive string-concatenate-reverse ls final_string end
Without optional arguments, this procedure is equivalent to

@smalllisp
(string-concatenate (reverse ls))
@end smalllisp

If the optional argument @var{final_string} is specified, it is
consed onto the beginning to @var{ls} before performing the
list-reverse and string-concatenate operations.  If @var{end}
is given, only the characters of @var{final_string} up to index
@var{end} are used.

Guaranteed to return a freshly allocated string.
@end deffn

@deffn primitive string-concatenate-reverse/shared ls final_string end
Like @code{string-concatenate-reverse}, but the result may
share memory with the the strings in the @var{ls} arguments.
@end deffn


@c ===================================================================

@node Fold/Unfold/Map
@subsection Fold/Unfold/Map

@code{string-map}, @code{string-for-each} etc. are for iterating over
the characters a string is composed of.  The fold and unfold procedures
are list iterators and constructors.

@deffn primitive string-map proc s [start end]
@var{proc} is a char->char procedure, it is mapped over
@var{s}.  The order in which the procedure is applied to the
string elements is not specified.
@end deffn

@deffn primitive string-map! proc s [start end]
@var{proc} is a char->char procedure, it is mapped over
@var{s}.  The order in which the procedure is applied to the
string elements is not specified.  The string @var{s} is
modified in-place, the return value is not specified.
@end deffn

@deffn primitive string-fold kons knil s [start end]
@deffnx primitive string-fold-right kons knil s [start end]
Fold @var{kons} over the characters of @var{s}, with @var{knil} as the
terminating element, from left to right (or right to left, for
@code{string-fold-right}).  @var{kons} must expect two arguments: The
actual character and the last result of @var{kons}' application.
@end deffn

@deffn primitive string-unfold p f g seed [base make_final]
@deffnx primitive string-unfold-right p f g seed [base make_final]
These are the fundamental string constructors.
@itemize
@item @var{g} is used to generate a series of @emph{seed}
values from the initial @var{seed}: @var{seed}, (@var{g}
@var{seed}), (@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}),
@dots{}
@item @var{p} tells us when to stop -- when it returns true
when applied to one of these seed values.
@item @var{f} maps each seed value to the corresponding
character in the result string.  These chars are assembled into the
string in a left-to-right (right-to-left) order.
@item @var{base} is the optional initial/leftmost (rightmost)
 portion of the constructed string; it default to the empty string.
@item @var{make_final} is applied to the terminal seed
value (on which @var{p} returns true) to produce the final/rightmost
(leftmost) portion of the constructed string.  It defaults to
@code{(lambda (x) "")}.
@end itemize
@end deffn

@deffn primitive string-for-each proc s [start end]
@var{proc} is mapped over @var{s} in left-to-right order.  The
return value is not specified.
@end deffn


@c ===================================================================

@node Replicate/Rotate
@subsection Replicate/Rotate

These procedures are special substring procedures, which can also be
used for replicating strings.  They are a bit tricky to use, but
consider this code fragment, which replicates the input string
@code{"foo"} so often that the resulting string has a length of six.

@lisp
(xsubstring "foo" 0 6)
@result{}
"foofoo"
@end lisp

@deffn primitive xsubstring s from [to start end]
This is the @emph{extended substring} procedure that implements
replicated copying of a substring of some string.

@var{s} is a string, @var{start} and @var{end} are optional
arguments that demarcate a substring of @var{s}, defaulting to
0 and the length of @var{s}.  Replicate this substring up and
down index space, in both the positive and negative directions.
@code{xsubstring} returns the substring of this string
beginning at index @var{from}, and ending at @var{to}, which
defaults to @var{from} + (@var{end} - @var{start}).
@end deffn

@deffn primitive string-xcopy! target tstart s sfrom [sto start end]
Exactly the same as @code{xsubstring}, but the extracted text
is written into the string @var{target} starting at index
@var{tstart}.  The operation is not defined if @code{(eq?
@var{target} @var{s})} or these arguments share storage -- you
cannot copy a string on top of itself.
@end deffn


@c ===================================================================

@node Miscellaneous
@subsection Miscellaneous

@code{string-replace} is for replacing a portion of a string with
another string and @code{string-tokenize} splits a string into a list of
strings, breaking it up at a specified character.

@deffn primitive string-replace s1 s2 [start1 end1 start2 end2]
Return the string @var{s1}, but with the characters
@var{start1} @dots{} @var{end1} replaced by the characters
@var{start2} @dots{} @var{end2} from @var{s2}.
@end deffn

@deffn primitive string-tokenize s [token_char start end]
Split the string @var{s} into a list of substrings, where each
substring is a maximal non-empty contiguous sequence of
characters equal to the character @var{token_char}, or
whitespace, if @var{token_char} is not given.  If
@var{token_char} is a character set, it is used for finding the
token borders.
@end deffn


@c ===================================================================

@node Filtering/Deleting
@subsection Filtering/Deleting

@dfn{Filtering} means to remove all characters from a string which do
not match a given criteria, @dfn{deleting} means the opposite.

@deffn primitive string-filter s char_pred [start end]
Filter the string @var{s}, retaining only those characters that
satisfy the @var{char_pred} argument.  If the argument is a
procedure, it is applied to each character as a predicate, if
it is a character, it is tested for equality and if it is a
character set, it is tested for membership.
@end deffn

@deffn primitive string-delete s char_pred [start end]
Filter the string @var{s}, retaining only those characters that
do not satisfy the @var{char_pred} argument.  If the argument
is a procedure, it is applied to each character as a predicate,
if it is a character, it is tested for equality and if it is a
character set, it is tested for membership.
@end deffn


@c ===================================================================

@node Character-set Procedures
@section Character-set Procedures

SRFI-14 defines the data type @dfn{character set}, and also defines a
lot of procedures for handling this character type, and a few standard
character sets like whitespace, alphabetic characters and others.

@menu
* Character Set Data Type::     Description of the character set data type.
* Predicates/Comparison::       Testing character sets.
* Iterating Over Character Sets::  Iterating over the members of a set.
* Creating Character Sets::     Creating new character sets.
* Querying Character Sets::     Extracting information from character sets.
* Character-Set Algebra::       Set-algebra on character sets.
* Standard Character Sets::     Variables containg standard character sets.
@end menu


@c ===================================================================

@node Character Set Data Type
@subsection Character Set Data Type

The data type @dfn{charset} implements sets of characters (REFFIXME).
Because the internal representation of character sets is not visible to
the user, a lot of procedures for handling them are provided.

Character sets can be created, extended, tested for the membership of a
characters and be compared to other character sets.

The Guile implementation of character sets deals with 8-bit characters.
In the standard variables, only the ASCII part of the character range is
really used, so that for example @dfn{Umlaute} and other accented
characters are not considered to be letters.  In the future, as Guile
may get support for international character sets, this will change, so
don't rely on these ``features''.


@c ===================================================================

@node Predicates/Comparison
@subsection Predicates/Comparison

Use these procedures for testing whether an object is a character set,
or whether several character sets are equal or subsets of each other.
@code{char-set-hash} can be used for calculating a hash value, maybe for
usage in fast lookup procedures.

@deffn primitive char-set? obj
Return @code{#t} if @var{obj} is a character set, @code{#f}
otherwise.
@end deffn

@deffn primitive char-set= cs1 @dots{}
Return @code{#t} if all given character sets are equal.
@end deffn

@deffn primitive char-set<= cs1 @dots{}
Return @code{#t} if every character set @var{cs}i is a subset
of character set @var{cs}i+1.
@end deffn

@deffn primitive char-set-hash cs [bound]
Compute a hash value for the character set @var{cs}.  If
@var{bound} is given and not @code{#f}, it restricts the
returned value to the range 0 @dots{} @var{bound - 1}.
@end deffn


@c ===================================================================

@node Iterating Over Character Sets
@subsection Iterating Over Character Sets

Character set cursors are a means for iterating over the members of a
character sets.  After creating a character set cursor with
@code{char-set-cursor}, a cursor can be dereferenced with
@code{char-set-ref}, advanced to the next member with
@code{char-set-cursor-next}.  Whether a cursor has passed past the last
element of the set can be checked with @code{end-of-char-set?}.

Additionally, mapping and (un-)folding procedures for character sets are
provided.

@deffn primitive char-set-cursor cs
Return a cursor into the character set @var{cs}.
@end deffn

@deffn primitive char-set-ref cs cursor
Return the character at the current cursor position
@var{cursor} in the character set @var{cs}.  It is an error to
pass a cursor for which @code{end-of-char-set?} returns true.
@end deffn

@deffn primitive char-set-cursor-next cs cursor
Advance the character set cursor @var{cursor} to the next
character in the character set @var{cs}.  It is an error if the
cursor given satisfies @code{end-of-char-set?}.
@end deffn

@deffn primitive end-of-char-set? cursor
Return @code{#t} if @var{cursor} has reached the end of a
character set, @code{#f} otherwise.
@end deffn

@deffn primitive char-set-fold kons knil cs
Fold the procedure @var{kons} over the character set @var{cs},
initializing it with @var{knil}.
@end deffn

@deffn primitive char-set-unfold p f g seed [base_cs]
@deffnx primitive char-set-unfold! p f g seed base_cs
This is a fundamental constructor for character sets.
@itemize
@item @var{g} is used to generate a series of ``seed'' values
from the initial seed: @var{seed}, (@var{g} @var{seed}),
(@var{g}^2 @var{seed}), (@var{g}^3 @var{seed}), @dots{}
@item @var{p} tells us when to stop -- when it returns true
when applied to one of the seed values.
@item @var{f} maps each seed value to a character. These
characters are added to the base character set @var{base_cs} to
form the result; @var{base_cs} defaults to the empty set.
@end itemize

@code{char-set-unfold!} is the side-effecting variant.
@end deffn

@deffn primitive char-set-for-each proc cs
Apply @var{proc} to every character in the character set
@var{cs}.  The return value is not specified.
@end deffn

@deffn primitive char-set-map proc cs
Map the procedure @var{proc} over every character in @var{cs}.
@var{proc} must be a character -> character procedure.
@end deffn


@c ===================================================================

@node Creating Character Sets
@subsection Creating Character Sets

New character sets are produced with these procedures.

@deffn primitive char-set-copy cs
Return a newly allocated character set containing all
characters in @var{cs}.
@end deffn

@deffn primitive char-set char1 @dots{}
Return a character set containing all given characters.
@end deffn

@deffn primitive list->char-set char_list [base_cs]
@deffnx primitive list->char-set! char_list base_cs
Convert the character list @var{list} to a character set.  If
the character set @var{base_cs} is given, the character in this
set are also included in the result.

@code{list->char-set!} is the side-effecting variant.
@end deffn

@deffn primitive string->char-set s [base_cs]
@deffnx primitive string->char-set! s base_cs
Convert the string @var{str} to a character set.  If the
character set @var{base_cs} is given, the characters in this
set are also included in the result.

@code{string->char-set!} is the side-effecting variant.
@end deffn

@deffn primitive char-set-filter pred cs [base_cs]
@deffnx primitive char-set-filter! pred cs base_cs
Return a character set containing every character from @var{cs}
so that it satisfies @var{pred}.  If provided, the characters
from @var{base_cs} are added to the result.

@code{char-set-filter!} is the side-effecting variant.
@end deffn

@deffn primitive ucs-range->char-set lower upper [error? base_cs]
@deffnx primitive uce-range->char-set! lower upper error? base_cs
Return a character set containing all characters whose
character codes lie in the half-open range
[@var{lower},@var{upper}).

If @var{error} is a true value, an error is signalled if the
specified range contains characters which are not contained in
the implemented character range.  If @var{error} is @code{#f},
these characters are silently left out of the resultung
character set.

The characters in @var{base_cs} are added to the result, if
given.

@code{ucs-range->char-set!} is the side-effecting variant.
@end deffn

@deffn procedure ->char-set x
Coerce @var{x} into a character set.  @var{x} may be a string, a
character or a character set.
@end deffn


@c ===================================================================

@node Querying Character Sets
@subsection Querying Character Sets

Access the elements and other information of a character set with these
procedures.

@deffn primitive char-set-size cs
Return the number of elements in character set @var{cs}.
@end deffn

@deffn primitive char-set-count pred cs
Return the number of the elements int the character set
@var{cs} which satisfy the predicate @var{pred}.
@end deffn

@deffn primitive char-set->list cs
Return a list containing the elements of the character set
@var{cs}.
@end deffn

@deffn primitive char-set->string cs
Return a string containing the elements of the character set
@var{cs}.  The order in which the characters are placed in the
string is not defined.
@end deffn

@deffn primitive char-set-contains? cs char
Return @code{#t} iff the character @var{ch} is contained in the
character set @var{cs}.
@end deffn

@deffn primitive char-set-every pred cs
Return a true value if every character in the character set
@var{cs} satisfies the predicate @var{pred}.
@end deffn

@deffn primitive char-set-any pred cs
Return a true value if any character in the character set
@var{cs} satisfies the predicate @var{pred}.
@end deffn


@c ===================================================================

@node Character-Set Algebra
@subsection Character-Set Algebra

Character sets can be manipulated with the common set algebra operation,
such as union, complement, intersection etc.  All of these procedures
provide side--effecting variants, which modify their character set
argument(s).

@deffn primitive char-set-adjoin cs char1 @dots{}
@deffnx primitive char-set-adjoin! cs char1 @dots{}
Add all character arguments to the first argument, which must
be a character set.
@end deffn

@deffn primitive char-set-delete cs char1 @dots{}
@deffnx primitive char-set-delete! cs char1 @dots{}
Delete all character arguments from the first argument, which
must be a character set.
@end deffn

@deffn primitive char-set-complement cs
@deffnx primitive char-set-complement! cs
Return the complement of the character set @var{cs}.
@end deffn

@deffn primitive char-set-union cs1 @dots{}
@deffnx primitive char-set-union! cs1 @dots{}
Return the union of all argument character sets.
@end deffn

@deffn primitive char-set-intersection cs1 @dots{}
@deffnx primitive char-set-intersection! cs1 @dots{}
Return the intersection of all argument character sets.
@end deffn

@deffn primitive char-set-difference cs1 @dots{}
@deffnx primitive char-set-difference! cs1 @dots{}
Return the difference of all argument character sets.
@end deffn

@deffn primitive char-set-xor cs1 @dots{}
@deffnx primitive char-set-xor! cs1 @dots{}
Return the exclusive--or of all argument character sets.
@end deffn

@deffn primitive char-set-diff+intersection cs1 @dots{}
@deffnx primitive char-set-diff+intersection! cs1 @dots{}
Return the difference and the intersection of all argument
character sets.
@end deffn


@c ===================================================================

@node Standard Character Sets
@subsection Standard Character Sets

In order to make the use of the character set data type and procedures
useful, several predefined character set variables exist.

@defvar char-set:lower-case
All lower--case characters.
@end defvar

@defvar char-set:upper-case
All upper--case characters.
@end defvar

@defvar char-set:title-case
This is empty, because ASCII has no titlecase characters.
@end defvar

@defvar char-set:letter
All letters, e.g. the union of @code{char-set:lower-case} and
@code{char-set:upper-case}.
@end defvar

@defvar char-set:digit
All digits.
@end defvar

@defvar char-set:letter+digit
The union of @code{char-set:letter} and @code{char-set:digit}.
@end defvar

@defvar char-set:graphic
All characters which would put ink on the paper.
@end defvar

@defvar char-set:printing
The union of @code{char-set:graphic} and @code{char-set:whitespace}.
@end defvar

@defvar char-set:whitespace
All whitespace characters.
@end defvar

@defvar char-set:blank
All horizontal whitespace characters, that is @code{#\space} and
@code{#\tab}.
@end defvar

@defvar char-set:iso-control
The ISO control characters with the codes 0--31 and 127.
@end defvar

@defvar char-set:punctuation
The characters @code{!"#%&'()*,-./:;?@@[\\]_@{@}}
@end defvar

@defvar char-set:symbol
The characters @code{$+<=>^`|~}.
@end defvar

@defvar char-set:hex-digit
The hexadecimal digits @code{0123456789abcdefABCDEF}.
@end defvar

@defvar char-set:ascii
All ASCII characters.
@end defvar

@defvar char-set:empty
The empty character set.
@end defvar

@defvar char-set:full
This character set contains all possible characters.
@end defvar