1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-06-11 14:21:10 +02:00

(Regexp Functions): Revise regex-substitute and

regex-substitute/global for clarity, add some examples.
This commit is contained in:
Kevin Ryde 2004-12-13 22:37:31 +00:00
parent edcd3e83d3
commit a13befdcd3

View file

@ -3809,86 +3809,117 @@ Return @code{#t} if @var{obj} is a compiled regular expression,
or @code{#f} otherwise. or @code{#f} otherwise.
@end deffn @end deffn
Regular expressions are commonly used to find patterns in one string and @sp 1
replace them with the contents of another string. Regular expressions are commonly used to find patterns in one string
and replace them with the contents of another string. The following
functions are convenient ways to do this.
@c begin (scm-doc-string "regex.scm" "regexp-substitute") @c begin (scm-doc-string "regex.scm" "regexp-substitute")
@deffn {Scheme Procedure} regexp-substitute port match [item@dots{}] @deffn {Scheme Procedure} regexp-substitute port match [item@dots{}]
Write to the output port @var{port} selected contents of the match Write to @var{port} selected parts of the match structure @var{match}.
structure @var{match}. Each @var{item} specifies what should be Or if @var{port} is @code{#f} then form a string from those parts and
written, and may be one of the following arguments: return that.
Each @var{item} specifies a part to be written, and may be one of the
following,
@itemize @bullet @itemize @bullet
@item @item
A string. String arguments are written out verbatim. A string. String arguments are written out verbatim.
@item @item
An integer. The submatch with that number is written. An integer. The submatch with that number is written
(@code{match:substring}). Zero is the entire match.
@item @item
The symbol @samp{pre}. The portion of the matched string preceding The symbol @samp{pre}. The portion of the matched string preceding
the regexp match is written. the regexp match is written (@code{match:prefix}).
@item @item
The symbol @samp{post}. The portion of the matched string following The symbol @samp{post}. The portion of the matched string following
the regexp match is written. the regexp match is written (@code{match:suffix}).
@end itemize @end itemize
The @var{port} argument may be @code{#f}, in which case nothing is For example, changing a match and retaining the text before and after,
written; instead, @code{regexp-substitute} constructs a string from the
specified @var{item}s and returns that.
@end deffn
The following example takes a regular expression that matches a standard @example
@sc{yyyymmdd}-format date such as @code{"20020828"}. The (regexp-substitute #f (string-match "[0-9]+" "number 25 is good")
@code{regexp-substitute} call returns a string computed from the 'pre "37" 'post)
information in the match structure, consisting of the fields and text @result{} "number 37 is good"
from the original string reordered and reformatted. @end example
Or matching a @sc{yyyymmdd} format date such as @samp{20020828} and
re-ordering and hyphenating the fields.
@lisp @lisp
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])") (define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
(define s "Date 20020429 12am.") (define s "Date 20020429 12am.")
(define sm (string-match date-regex s)) (regexp-substitute #f (string-match date-regex s)
(regexp-substitute #f sm 'pre 2 "-" 3 "-" 1 'post " (" 0 ")") 'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
@result{} "Date 04-29-2002 12am. (20020429)" @result{} "Date 04-29-2002 12am. (20020429)"
@end lisp @end lisp
@end deffn
@c begin (scm-doc-string "regex.scm" "regexp-substitute") @c begin (scm-doc-string "regex.scm" "regexp-substitute")
@deffn {Scheme Procedure} regexp-substitute/global port regexp target [item@dots{}] @deffn {Scheme Procedure} regexp-substitute/global port regexp target [item@dots{}]
Similar to @code{regexp-substitute}, but can be used to perform global @cindex search and replace
substitutions on @var{str}. Instead of taking a match structure as an Write to @var{port} selected parts of matches of @var{regexp} in
argument, @code{regexp-substitute/global} takes two string arguments: a @var{target}. If @var{port} is @code{#f} then form a string from
@var{regexp} string describing a regular expression, and a @var{target} those parts and return that. @var{regexp} can be a string or a
string which should be matched against this regular expression. compiled regex.
Each @var{item} behaves as in @code{regexp-substitute}, with the This is similar to @code{regexp-substitute}, but allows global
following exceptions: substitutions on @var{target}. Each @var{item} behaves as per
@code{regexp-substitute}, with the following differences,
@itemize @bullet @itemize @bullet
@item @item
A function may be supplied. When this function is called, it will be A function. Called as @code{(@var{item} match)} with the match
passed one argument: a match structure for a given regular expression structure for the @var{regexp} match, it should return a string to be
match. It should return a string to be written out to @var{port}. written to @var{port}.
@item @item
The @samp{post} symbol causes @code{regexp-substitute/global} to recurse The symbol @samp{post}. This doesn't output anything, but instead
on the unmatched portion of @var{str}. This @emph{must} be supplied in causes @code{regexp-substitute/global} to recurse on the unmatched
order to perform global search-and-replace on @var{str}; if it is not portion of @var{target}.
present among the @var{item}s, then @code{regexp-substitute/global} will
return after processing a single match.
@end itemize
@end deffn
The example above for @code{regexp-substitute} could be rewritten as This @emph{must} be supplied to perform a global search and replace on
follows to remove the @code{string-match} stage: @var{target}; without it @code{regexp-substitute/global} returns after
a single match and output.
@end itemize
For example, to collapse runs of tabs and spaces to a single hyphen
each,
@example
(regexp-substitute/global #f "[ \t]+" "this is the text"
'pre "-" 'post)
@result{} "this-is-the-text"
@end example
Or using a function to reverse the letters in each word,
@example
(regexp-substitute/global #f "[a-z]+" "to do and not-do"
'pre (lambda (m) (string-reverse (match:substring m))) 'post)
@result{} "ot od dna ton-od"
@end example
Without the @code{post} symbol, just one regexp match is made. For
example the following is the date example from
@code{regexp-substitute} above, without the need for the separate
@code{string-match} call.
@lisp @lisp
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])") (define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
(define s "Date 20020429 12am.") (define s "Date 20020429 12am.")
(regexp-substitute/global #f date-regex s (regexp-substitute/global #f date-regex s
'pre 2 "-" 3 "-" 1 'post " (" 0 ")") 'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
@result{} "Date 04-29-2002 12am. (20020429)" @result{} "Date 04-29-2002 12am. (20020429)"
@end lisp @end lisp
@end deffn
@node Match Structures @node Match Structures