1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-06-12 23:00:22 +02:00

doc: Fix description of regexp/locale encoding interaction.

* doc/ref/api-regex.texi (Regexp Functions): Update paragraph that
  mentions locale encoding and strings-as-bytes.

* test-suite/tests/regexp.test ("nonascii locales")["match structures
  refer to char offsets, non-ASCII pattern"]: New test.
This commit is contained in:
Ludovic Courtès 2012-08-27 00:09:30 +02:00
parent fd99e505d7
commit 7aa394b53c
2 changed files with 19 additions and 9 deletions

View file

@ -1,6 +1,6 @@
@c -*-texinfo-*- @c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual. @c This is part of the GNU Guile Reference Manual.
@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009, 2010 @c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009, 2010, 2012
@c Free Software Foundation, Inc. @c Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions. @c See the file guile.texi for copying conditions.
@ -54,11 +54,12 @@ Zero bytes (@code{#\nul}) cannot be used in regex patterns or input
strings, since the underlying C functions treat that as the end of strings, since the underlying C functions treat that as the end of
string. If there's a zero byte an error is thrown. string. If there's a zero byte an error is thrown.
Patterns and input strings are treated as being in the locale Internally, patterns and input strings are converted to the current
character set if @code{setlocale} has been called (@pxref{Locales}), locale's encoding, and then passed to the C library's regular expression
and in a multibyte locale this includes treating multi-byte sequences routines (@pxref{Regular Expressions,,, libc, The GNU C Library
as a single character. (Guile strings are currently merely bytes, Reference Manual}). The returned match structures always point to
though this may change in the future, @xref{Conversion to/from C}.) characters in the strings, not to individual bytes, even in the case of
multi-byte encodings.
@deffn {Scheme Procedure} string-match pattern str [start] @deffn {Scheme Procedure} string-match pattern str [start]
Compile the string @var{pattern} into a regular expression and compare Compile the string @var{pattern} into a regular expression and compare

View file

@ -1,8 +1,9 @@
;;;; regexp.test --- test Guile's regexps -*- coding: utf-8; mode: scheme -*- ;;;; regexp.test --- test Guile's regexps -*- coding: utf-8; mode: scheme -*-
;;;; Jim Blandy <jimb@red-bean.com> --- September 1999 ;;;; Jim Blandy <jimb@red-bean.com> --- September 1999
;;;; ;;;;
;;;; Copyright (C) 1999, 2004, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc. ;;;; Copyright (C) 1999, 2004, 2006, 2007, 2008, 2009, 2010,
;;;; ;;;; 2012 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or ;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public ;;;; modify it under the terms of the GNU Lesser General Public
;;;; License as published by the Free Software Foundation; either ;;;; License as published by the Free Software Foundation; either
@ -280,4 +281,12 @@
(with-locale "en_US.utf8" (with-locale "en_US.utf8"
;; bug #31650 ;; bug #31650
(equal? (match:substring (string-match ".*" "calçot") 0) (equal? (match:substring (string-match ".*" "calçot") 0)
"calçot")))) "calçot")))
(pass-if "match structures refer to char offsets, non-ASCII pattern"
(with-locale "en_US.utf8"
;; bug #31650
(equal? (match:substring (string-match "λ: The Ultimate (.*)"
"λ: The Ultimate GOTO")
1)
"GOTO"))))