mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-06-11 22:31:12 +02:00
doc: Fix description of regexp/locale encoding interaction.
* doc/ref/api-regex.texi (Regexp Functions): Update paragraph that mentions locale encoding and strings-as-bytes. * test-suite/tests/regexp.test ("nonascii locales")["match structures refer to char offsets, non-ASCII pattern"]: New test.
This commit is contained in:
parent
fd99e505d7
commit
7aa394b53c
2 changed files with 19 additions and 9 deletions
|
@ -1,6 +1,6 @@
|
|||
@c -*-texinfo-*-
|
||||
@c This is part of the GNU Guile Reference Manual.
|
||||
@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009, 2010
|
||||
@c Copyright (C) 1996, 1997, 2000, 2001, 2002, 2003, 2004, 2007, 2009, 2010, 2012
|
||||
@c Free Software Foundation, Inc.
|
||||
@c See the file guile.texi for copying conditions.
|
||||
|
||||
|
@ -54,11 +54,12 @@ Zero bytes (@code{#\nul}) cannot be used in regex patterns or input
|
|||
strings, since the underlying C functions treat that as the end of
|
||||
string. If there's a zero byte an error is thrown.
|
||||
|
||||
Patterns and input strings are treated as being in the locale
|
||||
character set if @code{setlocale} has been called (@pxref{Locales}),
|
||||
and in a multibyte locale this includes treating multi-byte sequences
|
||||
as a single character. (Guile strings are currently merely bytes,
|
||||
though this may change in the future, @xref{Conversion to/from C}.)
|
||||
Internally, patterns and input strings are converted to the current
|
||||
locale's encoding, and then passed to the C library's regular expression
|
||||
routines (@pxref{Regular Expressions,,, libc, The GNU C Library
|
||||
Reference Manual}). The returned match structures always point to
|
||||
characters in the strings, not to individual bytes, even in the case of
|
||||
multi-byte encodings.
|
||||
|
||||
@deffn {Scheme Procedure} string-match pattern str [start]
|
||||
Compile the string @var{pattern} into a regular expression and compare
|
||||
|
|
|
@ -1,8 +1,9 @@
|
|||
;;;; regexp.test --- test Guile's regexps -*- coding: utf-8; mode: scheme -*-
|
||||
;;;; Jim Blandy <jimb@red-bean.com> --- September 1999
|
||||
;;;;
|
||||
;;;; Copyright (C) 1999, 2004, 2006, 2007, 2008, 2009, 2010 Free Software Foundation, Inc.
|
||||
;;;;
|
||||
;;;; Copyright (C) 1999, 2004, 2006, 2007, 2008, 2009, 2010,
|
||||
;;;; 2012 Free Software Foundation, Inc.
|
||||
;;;;
|
||||
;;;; This library is free software; you can redistribute it and/or
|
||||
;;;; modify it under the terms of the GNU Lesser General Public
|
||||
;;;; License as published by the Free Software Foundation; either
|
||||
|
@ -280,4 +281,12 @@
|
|||
(with-locale "en_US.utf8"
|
||||
;; bug #31650
|
||||
(equal? (match:substring (string-match ".*" "calçot") 0)
|
||||
"calçot"))))
|
||||
"calçot")))
|
||||
|
||||
(pass-if "match structures refer to char offsets, non-ASCII pattern"
|
||||
(with-locale "en_US.utf8"
|
||||
;; bug #31650
|
||||
(equal? (match:substring (string-match "λ: The Ultimate (.*)"
|
||||
"λ: The Ultimate GOTO")
|
||||
1)
|
||||
"GOTO"))))
|
||||
|
|
Loading…
Add table
Add a link
Reference in a new issue