Doc updates for srfi-14 character sets

* NEWS: updates for srfi-14 character sets * doc/ref/api-data.texi: update char-set section and some spellchecking
2025-06-29 14:30:34 +02:00 · 2009-09-03 09:03:53 -07:00 · 2009-09-03 09:03:53 -07:00 · be3eb25c64
commit be3eb25c64
parent bb15a36c25
2 changed files with 78 additions and 39 deletions
--- a/7
+++ b/7
@ -10,6 +10,13 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
 Changes in 1.9.3 (since the 1.9.2 prerelease):
 ** SRFI-14 char-sets are modified for Unicode
 The default char-sets are not longer locale dependent and contain
 characters from the whole Unicode range.  There is a new char-set,
 char-set:designated, which contains all assigned Unicode characters.
 There is a new debugging function: %char-set-dump.
 ** Character functions operate on Unicode characters
 char-upcase and char-downcase use default Unicode casing rules.
--- a/doc/ref/api-data.texi
+++ b/doc/ref/api-data.texi
@ -539,7 +539,7 @@ error.  Instead, the result of the division is either plus or minus
 infinity, depending on the sign of the divided number.
 The infinities are written @samp{+inf.0} and @samp{-inf.0},
-respectivly.  This syntax is also recognized by @code{read} as an
+respectively.  This syntax is also recognized by @code{read} as an
 extension to the usual Scheme syntax.
 Dividing zero by zero yields something that is not a number at all:
@ -637,7 +637,7 @@ magnitude.  The argument @var{val} must be a real number.
@end deftypefn
@deftypefn {C Function} SCM scm_from_double (double val)
-Return the @code{SCM} value that representats @var{val}.  The returned
+Return the @code{SCM} value that represents @var{val}.  The returned
 value is inexact according to the predicate @code{inexact?}, but it
 will be exactly equal to @var{val}.
@end deftypefn
@ -1834,7 +1834,7 @@ the backslash of @code{#\}.
 Many of the non-printing characters, such as whitespace characters and
 control characters, also have names.
-The most commonly used non-printing chararacters are space and
+The most commonly used non-printing characters are space and
 newline.  Their character names are @code{#\space} and
@code{#\newline}.  There are also names for all of the ``C0 control
 characters'' (those with code points below 32).  The following table
@ -2059,12 +2059,6 @@ handling them are provided.
 Character sets can be created, extended, tested for the membership of a
 characters and be compared to other character sets.
 The Guile implementation of character sets currently deals only with
 8-bit characters.  In the future, when Guile gets support for
 international character sets, this will change, but the functions
 provided here will always then be able to efficiently cope with very
 large character sets.
@menu
 * Character Set Predicates/Comparison::
 * Iterating Over Character Sets::  Enumerate charset elements.
@ -2263,7 +2257,7 @@ character codes lie in the half-open range
 If @var{error} is a true value, an error is signalled if the
 specified range contains characters which are not contained in
 the implemented character range.  If @var{error} is @code{#f},
-these characters are silently left out of the resultung
+these characters are silently left out of the resulting
 character set.
 The characters in @var{base_cs} are added to the result, if
@ -2279,7 +2273,7 @@ character codes lie in the half-open range
 If @var{error} is a true value, an error is signalled if the
 specified range contains characters which are not contained in
 the implemented character range.  If @var{error} is @code{#f},
-these characters are silently left out of the resultung
+these characters are silently left out of the resulting
 character set.
 The characters are added to @var{base_cs} and @var{base_cs} is
@ -2288,7 +2282,10 @@ returned.
@deffn {Scheme Procedure} ->char-set x
@deffnx {C Function} scm_to_char_set (x)
-Coerces x into a char-set. @var{x} may be a string, character or char-set. A string is converted to the set of its constituent characters; a character is converted to a singleton set; a char-set is returned as-is.
+Coerces x into a char-set. @var{x} may be a string, character or
 char-set. A string is converted to the set of its constituent
 characters; a character is converted to a singleton set; a char-set is
 returned as-is.
@end deffn
@c ===================================================================
@ -2299,6 +2296,23 @@ Coerces x into a char-set. @var{x} may be a string, character or char-set. A str
 Access the elements and other information of a character set with these
 procedures.
@deffn {Scheme Procedure} %char-set-dump cs
 Returns an association list containing debugging information
 for @var{cs}. The association list has the following entries.
@table @code
@item char-set
 The char-set itself
@item len
 The number of groups of contiguous code points the char-set
 contains
@item ranges
 A list of lists where each sublist is a range of code points
 and their associated characters
@end table
 The return value of this function cannot be relied upon to be
 consistent between versions of Guile and should not be used in code.
@end deffn
@deffn {Scheme Procedure} char-set-size cs
@deffnx {C Function} scm_char_set_size (cs)
 Return the number of elements in character set @var{cs}.
@ -2380,6 +2394,12 @@ must be a character set.
 Return the complement of the character set @var{cs}.
@end deffn
 Note that the complement of a character set is likely to contain many
 reserved code points (code points that are not associated with
 characters).  It may be helpful to modify the output of
@code{char-set-complement} by computing its intersection with the set
 of designated code points, @code{char-set:designated}.
@deffn {Scheme Procedure} char-set-union . rest
@deffnx {C Function} scm_char_set_union (rest)
 Return the union of all argument character sets.
@ -2449,12 +2469,10 @@ useful, several predefined character set variables exist.
@cindex charset
@cindex locale
-Currently, the contents of these character sets are recomputed upon a
+These character sets are locale independent and are not recomputed
-successful @code{setlocale} call (@pxref{Locales}) in order to reflect
+upon a @code{setlocale} call.  They contain characters from the whole
-the characters available in the current locale's codeset.  For
+range of Unicode code points. For instance, @code{char-set:letter}
-instance, @code{char-set:letter} contains 52 characters under an ASCII
+contains about 94,000 characters.
 locale (e.g., the default @code{C} locale) and 117 characters under an
 ISO-8859-1 (``Latin-1'') locale.
@defvr {Scheme Variable} char-set:lower-case
@defvrx {C Variable} scm_char_set_lower_case
@ -2468,13 +2486,16 @@ All upper-case characters.
@defvr {Scheme Variable} char-set:title-case
@defvrx {C Variable} scm_char_set_title_case
-This is empty, because ASCII has no titlecase characters.
+All single characters that function as if they were an upper-case
 letter followed by a lower-case letter.
@end defvr
@defvr {Scheme Variable} char-set:letter
@defvrx {C Variable} scm_char_set_letter
-All letters, e.g. the union of @code{char-set:lower-case} and
+All letters.  This includes @code{char-set:lower-case},
-@code{char-set:upper-case}.
+@code{char-set:upper-case}, @code{char-set:title-case}, and many
 letters that have no case at all.  For example, Chinese and Japanese
 characters typically have no concept of case.
@end defvr
@defvr {Scheme Variable} char-set:digit
@ -2504,23 +2525,26 @@ All whitespace characters.
@defvr {Scheme Variable} char-set:blank
@defvrx {C Variable} scm_char_set_blank
-All horizontal whitespace characters, that is @code{#\space} and
+All horizontal whitespace characters, which notably includes
-@code{#\tab}.
+@code{#\space} and @code{#\tab}.
@end defvr
@defvr {Scheme Variable} char-set:iso-control
@defvrx {C Variable} scm_char_set_iso_control
-The ISO control characters with the codes 0--31 and 127.
+The ISO control characters are the C0 control characters (U+0000 to
 U+001F), delete (U+007F), and the C1 control characters (U+0080 to
 U+009F).
@end defvr
@defvr {Scheme Variable} char-set:punctuation
@defvrx {C Variable} scm_char_set_punctuation
-The characters @code{!"#%&'()*,-./:;?@@[\\]_@{@}}
+All punctuation characters, such as the characters
@code{!"#%&'()*,-./:;?@@[\\]_@{@}}
@end defvr
@defvr {Scheme Variable} char-set:symbol
@defvrx {C Variable} scm_char_set_symbol
-The characters @code{$+<=>^`|~}.
+All symbol characters, such as the characters @code{$+<=>^`|~}.
@end defvr
@defvr {Scheme Variable} char-set:hex-digit
@ -2538,9 +2562,17 @@ All ASCII characters.
 The empty character set.
@end defvr
@defvr {Scheme Variable} char-set:designated
@defvrx {C Variable} scm_char_set_designated
 This character set contains all designated code points.  This includes
 all the code points to which Unicode has assigned a character or other
 meaning.
@end defvr
@defvr {Scheme Variable} char-set:full
@defvrx {C Variable} scm_char_set_full
-This character set contains all possible characters.
+This character set contains all possible code points.  This includes
 both designated and reserved code points.
@end defvr
@node Strings
@ -2568,7 +2600,7 @@ memory.
 When one of these two strings is modified, as with @code{string-set!},
 their common memory does get copied so that each string has its own
-memory and modifying one does not accidently modify the other as well.
+memory and modifying one does not accidentally modify the other as well.
 Thus, Guile's strings are `copy on write'; the actual copying of their
 memory is delayed until one string is written to.
@ -2988,7 +3020,7 @@ characters.
@deffnx {C Function} scm_string_trim (s, char_pred, start, end)
@deffnx {C Function} scm_string_trim_right (s, char_pred, start, end)
@deffnx {C Function} scm_string_trim_both (s, char_pred, start, end)
-Trim occurrances of @var{char_pred} from the ends of @var{s}.
+Trim occurrences of @var{char_pred} from the ends of @var{s}.
@code{string-trim} trims @var{char_pred} characters from the left
 (start) of the string, @code{string-trim-right} trims them from the
@ -3270,14 +3302,14 @@ Compute a hash value for @var{S}.  the optional argument @var{bound} is a non-ne
@deffn {Scheme Procedure} string-index s char_pred [start [end]]
@deffnx {C Function} scm_string_index (s, char_pred, start, end)
 Search through the string @var{s} from left to right, returning
-the index of the first occurence of a character which
+the index of the first occurrence of a character which
@itemize @bullet
@item
 equals @var{char_pred}, if it is character,
@item
-satisifies the predicate @var{char_pred}, if it is a procedure,
+satisfies the predicate @var{char_pred}, if it is a procedure,
@item
 is in the set @var{char_pred}, if it is a character set.
@ -3287,14 +3319,14 @@ is in the set @var{char_pred}, if it is a character set.
@deffn {Scheme Procedure} string-rindex s char_pred [start [end]]
@deffnx {C Function} scm_string_rindex (s, char_pred, start, end)
 Search through the string @var{s} from right to left, returning
-the index of the last occurence of a character which
+the index of the last occurrence of a character which
@itemize @bullet
@item
 equals @var{char_pred}, if it is character,
@item
-satisifies the predicate @var{char_pred}, if it is a procedure,
+satisfies the predicate @var{char_pred}, if it is a procedure,
@item
 is in the set if @var{char_pred} is a character set.
@ -3348,14 +3380,14 @@ Is @var{s1} a suffix of @var{s2}, ignoring character case?
@deffn {Scheme Procedure} string-index-right s char_pred [start [end]]
@deffnx {C Function} scm_string_index_right (s, char_pred, start, end)
 Search through the string @var{s} from right to left, returning
-the index of the last occurence of a character which
+the index of the last occurrence of a character which
@itemize @bullet
@item
 equals @var{char_pred}, if it is character,
@item
-satisifies the predicate @var{char_pred}, if it is a procedure,
+satisfies the predicate @var{char_pred}, if it is a procedure,
@item
 is in the set if @var{char_pred} is a character set.
@ -3365,14 +3397,14 @@ is in the set if @var{char_pred} is a character set.
@deffn {Scheme Procedure} string-skip s char_pred [start [end]]
@deffnx {C Function} scm_string_skip (s, char_pred, start, end)
 Search through the string @var{s} from left to right, returning
-the index of the first occurence of a character which
+the index of the first occurrence of a character which
@itemize @bullet
@item
 does not equal @var{char_pred}, if it is character,
@item
-does not satisify the predicate @var{char_pred}, if it is a
+does not satisfy the predicate @var{char_pred}, if it is a
 procedure,
@item
@ -3383,7 +3415,7 @@ is not in the set if @var{char_pred} is a character set.
@deffn {Scheme Procedure} string-skip-right s char_pred [start [end]]
@deffnx {C Function} scm_string_skip_right (s, char_pred, start, end)
 Search through the string @var{s} from right to left, returning
-the index of the last occurence of a character which
+the index of the last occurrence of a character which
@itemize @bullet
@item
@ -3408,7 +3440,7 @@ Return the count of the number of characters in the string
 equals @var{char_pred}, if it is character,
@item
-satisifies the predicate @var{char_pred}, if it is a procedure.
+satisfies the predicate @var{char_pred}, if it is a procedure.
@item
 is in the set @var{char_pred}, if it is a character set.