mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 03:40:34 +02:00
* doc/ref/sxml-match.texi: Replace all square brackets with round brackets in order to be consistent with the rest of the documentation. Signed-off-by: Ludovic Courtès <ludo@gnu.org>
377 lines
12 KiB
Text
377 lines
12 KiB
Text
@c -*-texinfo-*-
|
|
@c This is part of the GNU Guile Reference Manual.
|
|
@c Copyright (C) 2010, 2013 Free Software Foundation, Inc.
|
|
@c See the file guile.texi for copying conditions.
|
|
@c
|
|
@c Based on the documentation at
|
|
@c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>,
|
|
@c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the
|
|
@c rest of `sxml-match'.)
|
|
@c
|
|
@c Converted to Texinfo and modified by Ludovic Courtès, 2010.
|
|
|
|
@node sxml-match
|
|
@section @code{sxml-match}: Pattern Matching of SXML
|
|
|
|
@cindex pattern matching (SXML)
|
|
@cindex SXML pattern matching
|
|
|
|
The @code{(sxml match)} module provides syntactic forms for pattern
|
|
matching of SXML trees, in a ``by example'' style reminiscent of the
|
|
pattern matching of the @code{syntax-rules} and @code{syntax-case} macro
|
|
systems. @xref{SXML}, for more information on SXML.
|
|
|
|
The following example@footnote{This example is taken from a paper by
|
|
Krishnamurthi et al. Their paper was the first to show the usefulness of the
|
|
@code{syntax-rules} style of pattern matching for transformation of XML, though
|
|
the language described, XT3D, is an XML language.} provides a brief
|
|
illustration, transforming a music album catalog language into HTML.
|
|
|
|
@lisp
|
|
(define (album->html x)
|
|
(sxml-match x
|
|
((album (@@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...)
|
|
`(ul (li ,t)
|
|
(li (b ,n) (i ,f)) ...))))
|
|
@end lisp
|
|
|
|
Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and
|
|
@code{sxml-match-let*}.
|
|
|
|
Compared to a standard s-expression pattern matcher (@pxref{Pattern
|
|
Matching}), @code{sxml-match} provides the following benefits:
|
|
|
|
@itemize
|
|
@item
|
|
matching of SXML elements does not depend on any degree of normalization of the
|
|
SXML;
|
|
@item
|
|
matching of SXML attributes (within an element) is under-ordered; the order of
|
|
the attributes specified within the pattern need not match the ordering with the
|
|
element being matched;
|
|
@item
|
|
all attributes specified in the pattern must be present in the element being
|
|
matched; in the spirit that XML is 'extensible', the element being matched may
|
|
include additional attributes not specified in the pattern.
|
|
@end itemize
|
|
|
|
The present module is a descendant of WebIt!, and was inspired by an
|
|
s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent
|
|
Dybvig at Indiana University.
|
|
|
|
@unnumberedsubsec Syntax
|
|
|
|
@code{sxml-match} provides @code{case}-like form for pattern matching of XML
|
|
nodes.
|
|
|
|
@deffn {Scheme Syntax} sxml-match input-expression clause1 clause2 @dots{}
|
|
Match @var{input-expression}, an SXML tree, according to the given @var{clause}s
|
|
(one or more), each consisting of a pattern and one or more expressions to be
|
|
evaluated if the pattern match succeeds. Optionally, each @var{clause} within
|
|
@code{sxml-match} may include a @dfn{guard expression}.
|
|
@end deffn
|
|
|
|
The pattern notation is based on that of Scheme's @code{syntax-rules} and
|
|
@code{syntax-case} macro systems. The grammar for the @code{sxml-match} syntax
|
|
is given below:
|
|
|
|
@verbatim
|
|
match-form ::= (sxml-match input-expression
|
|
clause+)
|
|
|
|
clause ::= [node-pattern action-expression+]
|
|
| [node-pattern (guard expression*) action-expression+]
|
|
|
|
node-pattern ::= literal-pattern
|
|
| pat-var-or-cata
|
|
| element-pattern
|
|
| list-pattern
|
|
|
|
literal-pattern ::= string
|
|
| character
|
|
| number
|
|
| #t
|
|
| #f
|
|
|
|
attr-list-pattern ::= (@ attribute-pattern*)
|
|
| (@ attribute-pattern* . pat-var-or-cata)
|
|
|
|
attribute-pattern ::= (tag-symbol attr-val-pattern)
|
|
|
|
attr-val-pattern ::= literal-pattern
|
|
| pat-var-or-cata
|
|
| (pat-var-or-cata default-value-expr)
|
|
|
|
element-pattern ::= (tag-symbol attr-list-pattern?)
|
|
| (tag-symbol attr-list-pattern? nodeset-pattern)
|
|
| (tag-symbol attr-list-pattern?
|
|
nodeset-pattern? . pat-var-or-cata)
|
|
|
|
list-pattern ::= (list nodeset-pattern)
|
|
| (list nodeset-pattern? . pat-var-or-cata)
|
|
| (list)
|
|
|
|
nodeset-pattern ::= node-pattern
|
|
| node-pattern ...
|
|
| node-pattern nodeset-pattern
|
|
| node-pattern ... nodeset-pattern
|
|
|
|
pat-var-or-cata ::= (unquote var-symbol)
|
|
| (unquote [var-symbol*])
|
|
| (unquote [cata-expression -> var-symbol*])
|
|
@end verbatim
|
|
|
|
Within a list or element body pattern, ellipses may appear only once, but may be
|
|
followed by zero or more node patterns.
|
|
|
|
Guard expressions cannot refer to the return values of catamorphisms.
|
|
|
|
Ellipses in the output expressions must appear only in an expression context;
|
|
ellipses are not allowed in a syntactic form.
|
|
|
|
The sections below illustrate specific aspects of the @code{sxml-match} pattern
|
|
matcher.
|
|
|
|
@unnumberedsubsec Matching XML Elements
|
|
|
|
The example below illustrates the pattern matching of an XML element:
|
|
|
|
@lisp
|
|
(sxml-match '(e (@@ (i 1)) 3 4 5)
|
|
((e (@@ (i ,d)) ,a ,b ,c) (list d a b c))
|
|
(,otherwise #f))
|
|
@end lisp
|
|
|
|
Each clause in @code{sxml-match} contains two parts: a pattern and one or more
|
|
expressions which are evaluated if the pattern is successfully match. The
|
|
example above matches an element @code{e} with an attribute @code{i} and three
|
|
children.
|
|
|
|
Pattern variables must be ``unquoted'' in the pattern. The above expression
|
|
binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c}
|
|
to @code{5}.
|
|
|
|
@unnumberedsubsec Ellipses in Patterns
|
|
|
|
As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern.
|
|
Note that the pattern @code{item ...} specifies zero-or-more matches of the
|
|
pattern @code{item}.
|
|
|
|
The use of ellipses in a pattern is illustrated in the code fragment below,
|
|
where nested ellipses are used to match the children of repeated instances of an
|
|
@code{a} element, within an element @code{d}.
|
|
|
|
@lisp
|
|
(define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))
|
|
|
|
(sxml-match x
|
|
((d (a ,b ...) ...)
|
|
(list (list b ...) ...)))
|
|
@end lisp
|
|
|
|
The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}.
|
|
|
|
@unnumberedsubsec Ellipses in Quasiquote'd Output
|
|
|
|
Within the body of an @code{sxml-match} form, a slightly extended version of
|
|
quasiquote is provided, which allows the use of ellipses. This is illustrated
|
|
in the example below.
|
|
|
|
@lisp
|
|
(sxml-match '(e 3 4 5 6 7)
|
|
((e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end"))
|
|
(,otherwise #f))
|
|
@end lisp
|
|
|
|
The general pattern is that @code{`(something ,i ...)} is rewritten as
|
|
@code{`(something ,@@i)}.
|
|
|
|
@unnumberedsubsec Matching Nodesets
|
|
|
|
A nodeset pattern is designated by a list in the pattern, beginning the
|
|
identifier list. The example below illustrates matching a nodeset.
|
|
|
|
@lisp
|
|
(sxml-match '("i" "j" "k" "l" "m")
|
|
((list ,a ,b ,c ,d ,e)
|
|
`((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))))
|
|
@end lisp
|
|
|
|
This example wraps each nodeset item in an HTML paragraph element. This example
|
|
can be rewritten and simplified through using ellipsis:
|
|
|
|
@lisp
|
|
(sxml-match '("i" "j" "k" "l" "m")
|
|
((list ,i ...)
|
|
`((p ,i) ...)))
|
|
@end lisp
|
|
|
|
This version will match nodesets of any length, and wrap each item in the
|
|
nodeset in an HTML paragraph element.
|
|
|
|
@unnumberedsubsec Matching the ``Rest'' of a Nodeset
|
|
|
|
Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern
|
|
at the end of an element or nodeset pattern.
|
|
|
|
This is illustrated in the example below:
|
|
|
|
@lisp
|
|
(sxml-match '(e 3 (f 4 5 6) 7)
|
|
((e ,a (f . ,y) ,d)
|
|
(list a y d)))
|
|
@end lisp
|
|
|
|
The above expression returns @code{(3 (4 5 6) 7)}.
|
|
|
|
@unnumberedsubsec Matching the Unmatched Attributes
|
|
|
|
Sometimes it is useful to bind a list of attributes present in the element being
|
|
matched, but which do not appear in the pattern. This is achieved by using a
|
|
@code{. rest)} pattern at the end of the attribute list pattern. This is
|
|
illustrated in the example below:
|
|
|
|
@lisp
|
|
(sxml-match '(a (@@ (z 1) (y 2) (x 3)) 4 5 6)
|
|
((a (@@ (y ,www) . ,qqq) ,t ,u ,v)
|
|
(list www qqq t u v)))
|
|
@end lisp
|
|
|
|
The above expression matches the attribute @code{y} and binds a list of the
|
|
remaining attributes to the variable @var{qqq}. The result of the above
|
|
expression is @code{(2 ((z 1) (x 3)) 4 5 6)}.
|
|
|
|
This type of pattern also allows the binding of all attributes:
|
|
|
|
@lisp
|
|
(sxml-match '(a (@@ (z 1) (y 2) (x 3)))
|
|
((a (@@ . ,qqq))
|
|
qqq))
|
|
@end lisp
|
|
|
|
@unnumberedsubsec Default Values in Attribute Patterns
|
|
|
|
It is possible to specify a default value for an attribute which is used if the
|
|
attribute is not present in the element being matched. This is illustrated in
|
|
the following example:
|
|
|
|
@lisp
|
|
(sxml-match '(e 3 4 5)
|
|
((e (@@ (z (,d 1))) ,a ,b ,c) (list d a b c)))
|
|
@end lisp
|
|
|
|
The value @code{1} is used when the attribute @code{z} is absent from the
|
|
element @code{e}.
|
|
|
|
@unnumberedsubsec Guards in Patterns
|
|
|
|
Guards may be added to a pattern clause via the @code{guard} keyword. A guard
|
|
expression may include zero or more expressions which are evaluated only if the
|
|
pattern is matched. The body of the clause is only evaluated if the guard
|
|
expressions evaluate to @code{#t}.
|
|
|
|
The use of guard expressions is illustrated below:
|
|
|
|
@lisp
|
|
(sxml-match '(a 2 3)
|
|
((a ,n) (guard (number? n)) n)
|
|
((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
|
|
@end lisp
|
|
|
|
@unnumberedsubsec Catamorphisms
|
|
|
|
The example below illustrates the use of explicit recursion within an
|
|
@code{sxml-match} form. This example implements a simple calculator for the
|
|
basic arithmetic operations, which are represented by the XML elements
|
|
@code{plus}, @code{minus}, @code{times}, and @code{div}.
|
|
|
|
@lisp
|
|
(define simple-eval
|
|
(lambda (x)
|
|
(sxml-match x
|
|
(,i (guard (integer? i)) i)
|
|
((plus ,x ,y) (+ (simple-eval x) (simple-eval y)))
|
|
((times ,x ,y) (* (simple-eval x) (simple-eval y)))
|
|
((minus ,x ,y) (- (simple-eval x) (simple-eval y)))
|
|
((div ,x ,y) (/ (simple-eval x) (simple-eval y)))
|
|
(,otherwise (error "simple-eval: invalid expression" x)))))
|
|
@end lisp
|
|
|
|
Using the catamorphism feature of @code{sxml-match}, a more concise version of
|
|
@code{simple-eval} can be written. The pattern @code{,(x)} recursively invokes
|
|
the pattern matcher on the value bound in this position.
|
|
|
|
@lisp
|
|
(define simple-eval
|
|
(lambda (x)
|
|
(sxml-match x
|
|
(,i (guard (integer? i)) i)
|
|
((plus ,(x) ,(y)) (+ x y))
|
|
((times ,(x) ,(y)) (* x y))
|
|
((minus ,(x) ,(y)) (- x y))
|
|
((div ,(x) ,(y)) (/ x y))
|
|
(,otherwise (error "simple-eval: invalid expression" x)))))
|
|
@end lisp
|
|
|
|
@unnumberedsubsec Named-Catamorphisms
|
|
|
|
It is also possible to explicitly name the operator in the ``cata'' position.
|
|
Where @code{,(id*)} recurs to the top of the current @code{sxml-match},
|
|
@code{,(cata -> id*)} recurs to @code{cata}. @code{cata} must evaluate to a
|
|
procedure which takes one argument, and returns as many values as there are
|
|
identifiers following @code{->}.
|
|
|
|
Named catamorphism patterns allow processing to be split into multiple, mutually
|
|
recursive procedures. This is illustrated in the example below: a
|
|
transformation that formats a ``TV Guide'' into HTML.
|
|
|
|
@lisp
|
|
(define (tv-guide->html g)
|
|
(define (cast-list cl)
|
|
(sxml-match cl
|
|
((CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
|
|
`(div (ul (li ,ch ": " ,a) ...)))))
|
|
(define (prog p)
|
|
(sxml-match p
|
|
((Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
|
|
(Description ,desc ...))
|
|
`(div (p ,start-time
|
|
(br) ,series-title
|
|
(br) ,desc ...)))
|
|
((Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
|
|
(Description ,desc ...)
|
|
,(cast-list -> cl))
|
|
`(div (p ,start-time
|
|
(br) ,series-title
|
|
(br) ,desc ...)
|
|
,cl))))
|
|
(sxml-match g
|
|
((TVGuide (@@ (start ,start-date)
|
|
(end ,end-date))
|
|
(Channel (Name ,nm) ,(prog -> p) ...) ...)
|
|
`(html (head (title "TV Guide"))
|
|
(body (h1 "TV Guide")
|
|
(div (h2 ,nm) ,p ...) ...)))))
|
|
@end lisp
|
|
|
|
@unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*}
|
|
|
|
@deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ...
|
|
@deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ...
|
|
These forms generalize the @code{let} and @code{let*} forms of Scheme to allow
|
|
an XML pattern in the binding position, rather than a simple variable.
|
|
@end deffn
|
|
|
|
For example, the expression below:
|
|
|
|
@lisp
|
|
(sxml-match-let (((a ,i ,j) '(a 1 2)))
|
|
(+ i j))
|
|
@end lisp
|
|
|
|
binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML
|
|
value given.
|
|
|
|
@c Local Variables:
|
|
@c coding: utf-8
|
|
@c End:
|