mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-06-11 14:21:10 +02:00
(Regexp Functions): Revise regex-substitute and
regex-substitute/global for clarity, add some examples.
This commit is contained in:
parent
edcd3e83d3
commit
a13befdcd3
1 changed files with 70 additions and 39 deletions
|
@ -3809,86 +3809,117 @@ Return @code{#t} if @var{obj} is a compiled regular expression,
|
||||||
or @code{#f} otherwise.
|
or @code{#f} otherwise.
|
||||||
@end deffn
|
@end deffn
|
||||||
|
|
||||||
Regular expressions are commonly used to find patterns in one string and
|
@sp 1
|
||||||
replace them with the contents of another string.
|
Regular expressions are commonly used to find patterns in one string
|
||||||
|
and replace them with the contents of another string. The following
|
||||||
|
functions are convenient ways to do this.
|
||||||
|
|
||||||
@c begin (scm-doc-string "regex.scm" "regexp-substitute")
|
@c begin (scm-doc-string "regex.scm" "regexp-substitute")
|
||||||
@deffn {Scheme Procedure} regexp-substitute port match [item@dots{}]
|
@deffn {Scheme Procedure} regexp-substitute port match [item@dots{}]
|
||||||
Write to the output port @var{port} selected contents of the match
|
Write to @var{port} selected parts of the match structure @var{match}.
|
||||||
structure @var{match}. Each @var{item} specifies what should be
|
Or if @var{port} is @code{#f} then form a string from those parts and
|
||||||
written, and may be one of the following arguments:
|
return that.
|
||||||
|
|
||||||
|
Each @var{item} specifies a part to be written, and may be one of the
|
||||||
|
following,
|
||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item
|
@item
|
||||||
A string. String arguments are written out verbatim.
|
A string. String arguments are written out verbatim.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
An integer. The submatch with that number is written.
|
An integer. The submatch with that number is written
|
||||||
|
(@code{match:substring}). Zero is the entire match.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
The symbol @samp{pre}. The portion of the matched string preceding
|
The symbol @samp{pre}. The portion of the matched string preceding
|
||||||
the regexp match is written.
|
the regexp match is written (@code{match:prefix}).
|
||||||
|
|
||||||
@item
|
@item
|
||||||
The symbol @samp{post}. The portion of the matched string following
|
The symbol @samp{post}. The portion of the matched string following
|
||||||
the regexp match is written.
|
the regexp match is written (@code{match:suffix}).
|
||||||
@end itemize
|
@end itemize
|
||||||
|
|
||||||
The @var{port} argument may be @code{#f}, in which case nothing is
|
For example, changing a match and retaining the text before and after,
|
||||||
written; instead, @code{regexp-substitute} constructs a string from the
|
|
||||||
specified @var{item}s and returns that.
|
|
||||||
@end deffn
|
|
||||||
|
|
||||||
The following example takes a regular expression that matches a standard
|
@example
|
||||||
@sc{yyyymmdd}-format date such as @code{"20020828"}. The
|
(regexp-substitute #f (string-match "[0-9]+" "number 25 is good")
|
||||||
@code{regexp-substitute} call returns a string computed from the
|
'pre "37" 'post)
|
||||||
information in the match structure, consisting of the fields and text
|
@result{} "number 37 is good"
|
||||||
from the original string reordered and reformatted.
|
@end example
|
||||||
|
|
||||||
|
Or matching a @sc{yyyymmdd} format date such as @samp{20020828} and
|
||||||
|
re-ordering and hyphenating the fields.
|
||||||
|
|
||||||
@lisp
|
@lisp
|
||||||
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
|
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
|
||||||
(define s "Date 20020429 12am.")
|
(define s "Date 20020429 12am.")
|
||||||
(define sm (string-match date-regex s))
|
(regexp-substitute #f (string-match date-regex s)
|
||||||
(regexp-substitute #f sm 'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
|
'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
|
||||||
@result{} "Date 04-29-2002 12am. (20020429)"
|
@result{} "Date 04-29-2002 12am. (20020429)"
|
||||||
@end lisp
|
@end lisp
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
|
||||||
@c begin (scm-doc-string "regex.scm" "regexp-substitute")
|
@c begin (scm-doc-string "regex.scm" "regexp-substitute")
|
||||||
@deffn {Scheme Procedure} regexp-substitute/global port regexp target [item@dots{}]
|
@deffn {Scheme Procedure} regexp-substitute/global port regexp target [item@dots{}]
|
||||||
Similar to @code{regexp-substitute}, but can be used to perform global
|
@cindex search and replace
|
||||||
substitutions on @var{str}. Instead of taking a match structure as an
|
Write to @var{port} selected parts of matches of @var{regexp} in
|
||||||
argument, @code{regexp-substitute/global} takes two string arguments: a
|
@var{target}. If @var{port} is @code{#f} then form a string from
|
||||||
@var{regexp} string describing a regular expression, and a @var{target}
|
those parts and return that. @var{regexp} can be a string or a
|
||||||
string which should be matched against this regular expression.
|
compiled regex.
|
||||||
|
|
||||||
Each @var{item} behaves as in @code{regexp-substitute}, with the
|
This is similar to @code{regexp-substitute}, but allows global
|
||||||
following exceptions:
|
substitutions on @var{target}. Each @var{item} behaves as per
|
||||||
|
@code{regexp-substitute}, with the following differences,
|
||||||
|
|
||||||
@itemize @bullet
|
@itemize @bullet
|
||||||
@item
|
@item
|
||||||
A function may be supplied. When this function is called, it will be
|
A function. Called as @code{(@var{item} match)} with the match
|
||||||
passed one argument: a match structure for a given regular expression
|
structure for the @var{regexp} match, it should return a string to be
|
||||||
match. It should return a string to be written out to @var{port}.
|
written to @var{port}.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
The @samp{post} symbol causes @code{regexp-substitute/global} to recurse
|
The symbol @samp{post}. This doesn't output anything, but instead
|
||||||
on the unmatched portion of @var{str}. This @emph{must} be supplied in
|
causes @code{regexp-substitute/global} to recurse on the unmatched
|
||||||
order to perform global search-and-replace on @var{str}; if it is not
|
portion of @var{target}.
|
||||||
present among the @var{item}s, then @code{regexp-substitute/global} will
|
|
||||||
return after processing a single match.
|
|
||||||
@end itemize
|
|
||||||
@end deffn
|
|
||||||
|
|
||||||
The example above for @code{regexp-substitute} could be rewritten as
|
This @emph{must} be supplied to perform a global search and replace on
|
||||||
follows to remove the @code{string-match} stage:
|
@var{target}; without it @code{regexp-substitute/global} returns after
|
||||||
|
a single match and output.
|
||||||
|
@end itemize
|
||||||
|
|
||||||
|
For example, to collapse runs of tabs and spaces to a single hyphen
|
||||||
|
each,
|
||||||
|
|
||||||
|
@example
|
||||||
|
(regexp-substitute/global #f "[ \t]+" "this is the text"
|
||||||
|
'pre "-" 'post)
|
||||||
|
@result{} "this-is-the-text"
|
||||||
|
@end example
|
||||||
|
|
||||||
|
Or using a function to reverse the letters in each word,
|
||||||
|
|
||||||
|
@example
|
||||||
|
(regexp-substitute/global #f "[a-z]+" "to do and not-do"
|
||||||
|
'pre (lambda (m) (string-reverse (match:substring m))) 'post)
|
||||||
|
@result{} "ot od dna ton-od"
|
||||||
|
@end example
|
||||||
|
|
||||||
|
Without the @code{post} symbol, just one regexp match is made. For
|
||||||
|
example the following is the date example from
|
||||||
|
@code{regexp-substitute} above, without the need for the separate
|
||||||
|
@code{string-match} call.
|
||||||
|
|
||||||
@lisp
|
@lisp
|
||||||
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
|
(define date-regex "([0-9][0-9][0-9][0-9])([0-9][0-9])([0-9][0-9])")
|
||||||
(define s "Date 20020429 12am.")
|
(define s "Date 20020429 12am.")
|
||||||
(regexp-substitute/global #f date-regex s
|
(regexp-substitute/global #f date-regex s
|
||||||
'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
|
'pre 2 "-" 3 "-" 1 'post " (" 0 ")")
|
||||||
|
|
||||||
@result{} "Date 04-29-2002 12am. (20020429)"
|
@result{} "Date 04-29-2002 12am. (20020429)"
|
||||||
@end lisp
|
@end lisp
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
|
||||||
@node Match Structures
|
@node Match Structures
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue