1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-06-11 22:31:12 +02:00

doc: Fix description of regexp/locale encoding interaction.

* doc/ref/api-regex.texi (Regexp Functions): Update paragraph that
  mentions locale encoding and strings-as-bytes.

* test-suite/tests/regexp.test ("nonascii locales")["match structures
  refer to char offsets, non-ASCII pattern"]: New test.
This commit is contained in:
Ludovic Courtès 2012-08-27 00:09:30 +02:00
parent fd99e505d7
commit 7aa394b53c
2 changed files with 19 additions and 9 deletions

View file

@ -1,6 +1,6 @@
@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009, 2010
@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009, 2010, 2012
@c Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions.
@ -54,11 +54,12 @@ Zero bytes (@code{#\nul}) cannot be used in regex patterns or input
strings, since the underlying C functions treat that as the end of
string. If there's a zero byte an error is thrown.
Patterns and input strings are treated as being in the locale
character set if @code{setlocale} has been called (@pxref{Locales}),
and in a multibyte locale this includes treating multi-byte sequences
as a single character. (Guile strings are currently merely bytes,
though this may change in the future, @xref{Conversion to/from C}.)
Internally, patterns and input strings are converted to the current
locale's encoding, and then passed to the C library's regular expression
routines (@pxref{Regular Expressions,,, libc, The GNU C Library
Reference Manual}). The returned match structures always point to
characters in the strings, not to individual bytes, even in the case of
multi-byte encodings.
@deffn {Scheme Procedure} string-match pattern str [start]
Compile the string @var{pattern} into a regular expression and compare

View file

@ -1,8 +1,9 @@
;;;; regexp.test --- test Guile's regexps -*- coding: utf-8; mode: scheme -*-
;;;; Jim Blandy <jimb@red-bean.com> --- September 1999
;;;;
;;;; Copyright (C) 1999, 2004, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
;;;;
;;;; Copyright (C) 1999, 2004, 2006, 2007, 2008, 2009, 2010,
;;;; 2012 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
;;;; License as published by the Free Software Foundation; either
@ -280,4 +281,12 @@
(with-locale "en_US.utf8"
;; bug #31650
(equal? (match:substring (string-match ".*" "calçot") 0)
"calçot"))))
"calçot")))
(pass-if "match structures refer to char offsets, non-ASCII pattern"
(with-locale "en_US.utf8"
;; bug #31650
(equal? (match:substring (string-match "λ: The Ultimate (.*)"
"λ: The Ultimate GOTO")
1)
"GOTO"))))