1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-05-20 11:40:18 +02:00

Doc: clarification on regexes and encodings

* doc/ref/api-regex.texi: make it more obviously clear that regexp
  matching supports only characters supported by the locale encoding.
This commit is contained in:
Jean Abou Samra 2022-12-11 12:28:02 +01:00 committed by Arne Babenhauserheide
parent 7d5ab8fa40
commit ff165ec904

View file

@ -57,7 +57,11 @@ locale's encoding, and then passed to the C library's regular expression
routines (@pxref{Regular Expressions,,, libc, The GNU C Library routines (@pxref{Regular Expressions,,, libc, The GNU C Library
Reference Manual}). The returned match structures always point to Reference Manual}). The returned match structures always point to
characters in the strings, not to individual bytes, even in the case of characters in the strings, not to individual bytes, even in the case of
multi-byte encodings. multi-byte encodings. This ensures that the match structures are
correct when performing matching with characters that have a multi-byte
representation in the locale encoding. Note, however, that using
characters which cannot be represented in the locale encoding can
lead to surprising results.
@deffn {Scheme Procedure} string-match pattern str [start] @deffn {Scheme Procedure} string-match pattern str [start]
Compile the string @var{pattern} into a regular expression and compare Compile the string @var{pattern} into a regular expression and compare
@ -325,7 +329,7 @@ example the following is the date example from
@code{string-match} call. @code{string-match} call.
@lisp @lisp
(define date-regex (define date-regex
"([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])") "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
(define s "Date 20020429 12am.") (define s "Date 20020429 12am.")
(regexp-substitute/global #f date-regex s (regexp-substitute/global #f date-regex s