mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 11:50:28 +02:00
377 lines
12 KiB
Text
377 lines
12 KiB
Text
@c -*-texinfo-*-
|
|
@c This is part of the GNU Guile Reference Manual.
|
|
@c Copyright (C) 2010 Free Software Foundation, Inc.
|
|
@c See the file guile.texi for copying conditions.
|
|
@c
|
|
@c Based on the documentation at
|
|
@c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>,
|
|
@c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the
|
|
@c rest of `sxml-match'.)
|
|
@c
|
|
@c Converted to Texinfo and modified by Ludovic Courtès, 2010.
|
|
|
|
@node sxml-match
|
|
@section @code{sxml-match}: Pattern Matching of SXML
|
|
|
|
@cindex pattern matching (SXML)
|
|
@cindex SXML pattern matching
|
|
|
|
The @code{(sxml match)} module provides syntactic forms for pattern matching of
|
|
SXML trees, in a ``by example'' style reminiscent of the pattern matching of the
|
|
@code{syntax-rules} and @code{syntax-case} macro systems. @xref{sxml simple,
|
|
the @code{(sxml simple)} module}, for more information on SXML.
|
|
|
|
The following example@footnote{This example is taken from a paper by
|
|
Krishnamurthi et al. Their paper was the first to show the usefulness of the
|
|
@code{syntax-rules} style of pattern matching for transformation of XML, though
|
|
the language described, XT3D, is an XML language.} provides a brief
|
|
illustration, transforming a music album catalog language into HTML.
|
|
|
|
@lisp
|
|
(define (album->html x)
|
|
(sxml-match x
|
|
[(album (@@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...)
|
|
`(ul (li ,t)
|
|
(li (b ,n) (i ,f)) ...)]))
|
|
@end lisp
|
|
|
|
Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and
|
|
@code{sxml-match-let*}.
|
|
|
|
Compared to a standard s-expression pattern matcher, @code{sxml-match} provides
|
|
the following benefits:
|
|
|
|
@itemize
|
|
@item
|
|
matching of SXML elements does not depend on any degree of normalization of the
|
|
SXML;
|
|
@item
|
|
matching of SXML attributes (within an element) is under-ordered; the order of
|
|
the attributes specified within the pattern need not match the ordering with the
|
|
element being matched;
|
|
@item
|
|
all attributes specified in the pattern must be present in the element being
|
|
matched; in the spirit that XML is 'extensible', the element being matched may
|
|
include additional attributes not specified in the pattern.
|
|
@end itemize
|
|
|
|
The present module is a descendant of WebIt!, and was inspired by an
|
|
s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent
|
|
Dybvig at Indiana University.
|
|
|
|
@unnumberedsubsec Syntax
|
|
|
|
@code{sxml-match} provides @code{case}-like form for pattern matching of XML
|
|
nodes.
|
|
|
|
@deffn {Scheme Syntax} sxml-match input-expression clause ...
|
|
Match @var{input-expression}, an SXML tree, according to the given @var{clause}s
|
|
(one or more), each consisting of a pattern and one or more expressions to be
|
|
evaluated if the pattern match succeeds. Optionally, each @var{clause} within
|
|
@code{sxml-match} may include a @dfn{guard expression}.
|
|
@end deffn
|
|
|
|
The pattern notation is based on that of Scheme's @code{syntax-rules} and
|
|
@code{syntax-case} macro systems. The grammar for the @code{sxml-match} syntax
|
|
is given below:
|
|
|
|
@verbatim
|
|
match-form ::= (sxml-match input-expression
|
|
clause+)
|
|
|
|
clause ::= [node-pattern action-expression+]
|
|
| [node-pattern (guard expression*) action-expression+]
|
|
|
|
node-pattern ::= literal-pattern
|
|
| pat-var-or-cata
|
|
| element-pattern
|
|
| list-pattern
|
|
|
|
literal-pattern ::= string
|
|
| character
|
|
| number
|
|
| #t
|
|
| #f
|
|
|
|
attr-list-pattern ::= (@ attribute-pattern*)
|
|
| (@ attribute-pattern* . pat-var-or-cata)
|
|
|
|
attribute-pattern ::= (tag-symbol attr-val-pattern)
|
|
|
|
attr-val-pattern ::= literal-pattern
|
|
| pat-var-or-cata
|
|
| (pat-var-or-cata default-value-expr)
|
|
|
|
element-pattern ::= (tag-symbol attr-list-pattern?)
|
|
| (tag-symbol attr-list-pattern? nodeset-pattern)
|
|
| (tag-symbol attr-list-pattern?
|
|
nodeset-pattern? . pat-var-or-cata)
|
|
|
|
list-pattern ::= (list nodeset-pattern)
|
|
| (list nodeset-pattern? . pat-var-or-cata)
|
|
| (list)
|
|
|
|
nodeset-pattern ::= node-pattern
|
|
| node-pattern ...
|
|
| node-pattern nodeset-pattern
|
|
| node-pattern ... nodeset-pattern
|
|
|
|
pat-var-or-cata ::= (unquote var-symbol)
|
|
| (unquote [var-symbol*])
|
|
| (unquote [cata-expression -> var-symbol*])
|
|
@end verbatim
|
|
|
|
Within a list or element body pattern, ellipses may appear only once, but may be
|
|
followed by zero or more node patterns.
|
|
|
|
Guard expressions cannot refer to the return values of catamorphisms.
|
|
|
|
Ellipses in the output expressions must appear only in an expression context;
|
|
ellipses are not allowed in a syntactic form.
|
|
|
|
The sections below illustrate specific aspects of the @code{sxml-match} pattern
|
|
matcher.
|
|
|
|
@unnumberedsubsec Matching XML Elements
|
|
|
|
The example below illustrates the pattern matching of an XML element:
|
|
|
|
@lisp
|
|
(sxml-match '(e (@@ (i 1)) 3 4 5)
|
|
[(e (@@ (i ,d)) ,a ,b ,c) (list d a b c)]
|
|
[,otherwise #f])
|
|
@end lisp
|
|
|
|
Each clause in @code{sxml-match} contains two parts: a pattern and one or more
|
|
expressions which are evaluated if the pattern is successfully match. The
|
|
example above matches an element @code{e} with an attribute @code{i} and three
|
|
children.
|
|
|
|
Pattern variables are must be ``unquoted'' in the pattern. The above expression
|
|
binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c}
|
|
to @code{5}.
|
|
|
|
@unnumberedsubsec Ellipses in Patterns
|
|
|
|
As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern.
|
|
Note that the pattern @code{item ...} specifies zero-or-more matches of the
|
|
pattern @code{item}.
|
|
|
|
The use of ellipses in a pattern is illustrated in the code fragment below,
|
|
where nested ellipses are used to match the children of repeated instances of an
|
|
@code{a} element, within an element @code{d}.
|
|
|
|
@lisp
|
|
(define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))
|
|
|
|
(sxml-match x
|
|
[(d (a ,b ...) ...)
|
|
(list (list b ...) ...)])
|
|
@end lisp
|
|
|
|
The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}.
|
|
|
|
@unnumberedsubsec Ellipses in Quasiquote'd Output
|
|
|
|
Within the body of an @code{sxml-match} form, a slightly extended version of
|
|
quasiquote is provided, which allows the use of ellipses. This is illustrated
|
|
in the example below.
|
|
|
|
@lisp
|
|
(sxml-match '(e 3 4 5 6 7)
|
|
[(e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end")]
|
|
[,otherwise #f])
|
|
@end lisp
|
|
|
|
The general pattern is that @code{`(something ,i ...)} is rewritten as
|
|
@code{`(something ,@@i)}.
|
|
|
|
@unnumberedsubsec Matching Nodesets
|
|
|
|
A nodeset pattern is designated by a list in the pattern, beginning the
|
|
identifier list. The example below illustrates matching a nodeset.
|
|
|
|
@lisp
|
|
(sxml-match '("i" "j" "k" "l" "m")
|
|
[(list ,a ,b ,c ,d ,e)
|
|
`((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))])
|
|
@end lisp
|
|
|
|
This example wraps each nodeset item in an HTML paragraph element. This example
|
|
can be rewritten and simplified through using ellipsis:
|
|
|
|
@lisp
|
|
(sxml-match '("i" "j" "k" "l" "m")
|
|
[(list ,i ...)
|
|
`((p ,i) ...)])
|
|
@end lisp
|
|
|
|
This version will match nodesets of any length, and wrap each item in the
|
|
nodeset in an HTML paragraph element.
|
|
|
|
@unnumberedsubsec Matching the ``Rest'' of a Nodeset
|
|
|
|
Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern
|
|
at the end of an element or nodeset pattern.
|
|
|
|
This is illustrated in the example below:
|
|
|
|
@lisp
|
|
(sxml-match '(e 3 (f 4 5 6) 7)
|
|
[(e ,a (f . ,y) ,d)
|
|
(list a y d)])
|
|
@end lisp
|
|
|
|
The above expression returns @code{(3 (4 5 6) 7)}.
|
|
|
|
@unnumberedsubsec Matching the Unmatched Attributes
|
|
|
|
Sometimes it is useful to bind a list of attributes present in the element being
|
|
matched, but which do not appear in the pattern. This is achieved by using a
|
|
@code{. rest)} pattern at the end of the attribute list pattern. This is
|
|
illustrated in the example below:
|
|
|
|
@lisp
|
|
(sxml-match '(a (@@ (z 1) (y 2) (x 3)) 4 5 6)
|
|
[(a (@@ (y ,www) . ,qqq) ,t ,u ,v)
|
|
(list www qqq t u v)])
|
|
@end lisp
|
|
|
|
The above expression matches the attribute @code{y} and binds a list of the
|
|
remaining attributes to the variable @var{qqq}. The result of the above
|
|
expression is @code{(2 ((z 1) (x 3)) 4 5 6)}.
|
|
|
|
This type of pattern also allows the binding of all attributes:
|
|
|
|
@lisp
|
|
(sxml-match '(a (@@ (z 1) (y 2) (x 3)))
|
|
[(a (@@ . ,qqq))
|
|
qqq])
|
|
@end lisp
|
|
|
|
@unnumberedsubsec Default Values in Attribute Patterns
|
|
|
|
It is possible to specify a default value for an attribute which is used if the
|
|
attribute is not present in the element being matched. This is illustrated in
|
|
the following example:
|
|
|
|
@lisp
|
|
(sxml-match '(e 3 4 5)
|
|
[(e (@@ (z (,d 1))) ,a ,b ,c) (list d a b c)])
|
|
@end lisp
|
|
|
|
The value @code{1} is used when the attribute @code{z} is absent from the
|
|
element @code{e}.
|
|
|
|
@unnumberedsubsec Guards in Patterns
|
|
|
|
Guards may be added to a pattern clause via the @code{guard} keyword. A guard
|
|
expression may include zero or more expressions which are evaluated only if the
|
|
pattern is matched. The body of the clause is only evaluated if the guard
|
|
expressions evaluate to @code{#t}.
|
|
|
|
The use of guard expressions is illustrated below:
|
|
|
|
@lisp
|
|
(sxml-match '(a 2 3)
|
|
((a ,n) (guard (number? n)) n)
|
|
((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
|
|
@end lisp
|
|
|
|
@unnumberedsubsec Catamorphisms
|
|
|
|
The example below illustrates the use of explicit recursion within an
|
|
@code{sxml-match} form. This example implements a simple calculator for the
|
|
basic arithmetic operations, which are represented by the XML elements
|
|
@code{plus}, @code{minus}, @code{times}, and @code{div}.
|
|
|
|
@lisp
|
|
(define simple-eval
|
|
(lambda (x)
|
|
(sxml-match x
|
|
[,i (guard (integer? i)) i]
|
|
[(plus ,x ,y) (+ (simple-eval x) (simple-eval y))]
|
|
[(times ,x ,y) (* (simple-eval x) (simple-eval y))]
|
|
[(minus ,x ,y) (- (simple-eval x) (simple-eval y))]
|
|
[(div ,x ,y) (/ (simple-eval x) (simple-eval y))]
|
|
[,otherwise (error "simple-eval: invalid expression" x)])))
|
|
@end lisp
|
|
|
|
Using the catamorphism feature of @code{sxml-match}, a more concise version of
|
|
@code{simple-eval} can be written. The pattern @code{,[x]} recusively invokes
|
|
the pattern matcher on the value bound in this position.
|
|
|
|
@lisp
|
|
(define simple-eval
|
|
(lambda (x)
|
|
(sxml-match x
|
|
[,i (guard (integer? i)) i]
|
|
[(plus ,[x] ,[y]) (+ x y)]
|
|
[(times ,[x] ,[y]) (* x y)]
|
|
[(minus ,[x] ,[y]) (- x y)]
|
|
[(div ,[x] ,[y]) (/ x y)]
|
|
[,otherwise (error "simple-eval: invalid expression" x)])))
|
|
@end lisp
|
|
|
|
@unnumberedsubsec Named-Catamorphisms
|
|
|
|
It is also possible to explicitly name the operator in the ``cata'' position.
|
|
Where @code{,[id*]} recurs to the top of the current @code{sxml-match},
|
|
@code{,[cata -> id*]} recurs to @code{cata}. @code{cata} must evaluate to a
|
|
procedure which takes one argument, and returns as many values as there are
|
|
identifiers following @code{->}.
|
|
|
|
Named catamorphism patterns allow processing to be split into multiple, mutually
|
|
recursive procedures. This is illustrated in the example below: a
|
|
transformation that formats a ``TV Guide'' into HTML.
|
|
|
|
@lisp
|
|
(define (tv-guide->html g)
|
|
(define (cast-list cl)
|
|
(sxml-match cl
|
|
[(CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
|
|
`(div (ul (li ,ch ": " ,a) ...))]))
|
|
(define (prog p)
|
|
(sxml-match p
|
|
[(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
|
|
(Description ,desc ...))
|
|
`(div (p ,start-time
|
|
(br) ,series-title
|
|
(br) ,desc ...))]
|
|
[(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
|
|
(Description ,desc ...)
|
|
,[cast-list -> cl])
|
|
`(div (p ,start-time
|
|
(br) ,series-title
|
|
(br) ,desc ...)
|
|
,cl)]))
|
|
(sxml-match g
|
|
[(TVGuide (@@ (start ,start-date)
|
|
(end ,end-date))
|
|
(Channel (Name ,nm) ,[prog -> p] ...) ...)
|
|
`(html (head (title "TV Guide"))
|
|
(body (h1 "TV Guide")
|
|
(div (h2 ,nm) ,p ...) ...))]))
|
|
@end lisp
|
|
|
|
@unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*}
|
|
|
|
@deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ...)
|
|
@deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ...)
|
|
These forms generalize the @code{let} and @code{let*} forms of Scheme to allow
|
|
an XML pattern in the binding position, rather than a simple variable.
|
|
@end deffn
|
|
|
|
For example, the expression below:
|
|
|
|
@lisp
|
|
(sxml-match-let ([(a ,i ,j) '(a 1 2)])
|
|
(+ i j))
|
|
@end lisp
|
|
|
|
binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML
|
|
value given.
|
|
|
|
@c Local Variables:
|
|
@c coding: utf-8
|
|
@c End:
|