mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-06-16 08:40:19 +02:00
Add (sxml match).
* module/Makefile.am (LIB_SOURCES): Add `sxml/match.scm'. (NOCOMP_SOURCES): Add `sxml/sxml-match.ss'. * module/sxml/match.scm, module/sxml/sxml-match.ss: New files. * test-suite/Makefile.am (SCM_TESTS): Add `tests/sxml.match.test'. (EXTRA_DIST): Add `tests/sxml-match-tests.ss'. * test-suite/tests/sxml-match-tests.ss, test-suite/tests/sxml.match.test: New files. * doc/ref/guile.texi (Guile Modules): Include `sxml-match.texi'. * doc/ref/sxml-match.texi: New file. * doc/ref/Makefile.am (guile_TEXINFOS): Add `sxml-match.texi'.
This commit is contained in:
parent
adb8f30600
commit
400a5dcb8b
9 changed files with 2003 additions and 1 deletions
|
@ -58,6 +58,7 @@ guile_TEXINFOS = preface.texi \
|
|||
posix.texi \
|
||||
expect.texi \
|
||||
scsh.texi \
|
||||
sxml-match.texi \
|
||||
scheme-scripts.texi \
|
||||
api-overview.texi \
|
||||
api-discdepr.texi \
|
||||
|
|
|
@ -359,6 +359,7 @@ available through both Scheme and C interfaces.
|
|||
* Streams:: Sequences of values.
|
||||
* Buffered Input:: Ports made from a reader function.
|
||||
* Expect:: Controlling interactive programs with Guile.
|
||||
* sxml-match:: Pattern matching of SXML.
|
||||
* The Scheme shell (scsh):: Using scsh interfaces in Guile.
|
||||
* Tracing:: Tracing program execution.
|
||||
@end menu
|
||||
|
@ -370,6 +371,10 @@ available through both Scheme and C interfaces.
|
|||
@include repl-modules.texi
|
||||
@include misc-modules.texi
|
||||
@include expect.texi
|
||||
|
||||
@c XXX: Would be nicer if it were close to the (sxml simple) documentation.
|
||||
@include sxml-match.texi
|
||||
|
||||
@include scsh.texi
|
||||
@include scheme-debugging.texi
|
||||
|
||||
|
|
377
doc/ref/sxml-match.texi
Normal file
377
doc/ref/sxml-match.texi
Normal file
|
@ -0,0 +1,377 @@
|
|||
@c -*-texinfo-*-
|
||||
@c This is part of the GNU Guile Reference Manual.
|
||||
@c Copyright (C) 2010 Free Software Foundation, Inc.
|
||||
@c See the file guile.texi for copying conditions.
|
||||
@c
|
||||
@c Based on the documentation at
|
||||
@c <http://planet.plt-scheme.org/package-source/jim/sxml-match.plt/1/1/doc.txt>,
|
||||
@c copyright 2005 Jim Bender, and released under the MIT/X11 license (like the
|
||||
@c rest of `sxml-match'.)
|
||||
@c
|
||||
@c Converted to Texinfo and modified by Ludovic Courtès, 2010.
|
||||
|
||||
@node sxml-match
|
||||
@section @code{sxml-match}: Pattern Matching of SXML
|
||||
|
||||
@cindex pattern matching (SXML)
|
||||
@cindex SXML pattern matching
|
||||
|
||||
The @code{(sxml match)} module provides syntactic forms for pattern matching of
|
||||
SXML trees, in a ``by example'' style reminiscent of the pattern matching of the
|
||||
@code{syntax-rules} and @code{syntax-case} macro systems. @xref{sxml simple,
|
||||
the @code{(sxml simple)} module}, for more information on SXML.
|
||||
|
||||
The following example@footnote{This example is taken from a paper by
|
||||
Krishnamurthi et al. Their paper was the first to show the usefulness of the
|
||||
@code{syntax-rules} style of pattern matching for transformation of XML, though
|
||||
the language described, XT3D, is an XML language.} provides a brief
|
||||
illustration, transforming a music album catalog language into HTML.
|
||||
|
||||
@lisp
|
||||
(define (album->html x)
|
||||
(sxml-match x
|
||||
[(album (@ (title ,t)) (catalog (num ,n) (fmt ,f)) ...)
|
||||
`(ul (li ,t)
|
||||
(li (b ,n) (i ,f)) ...)]))
|
||||
@end lisp
|
||||
|
||||
Three macros are provided: @code{sxml-match}, @code{sxml-match-let}, and
|
||||
@code{sxml-match-let*}.
|
||||
|
||||
Compared to a standard s-expression pattern matcher, @code{sxml-match} provides
|
||||
the following benefits:
|
||||
|
||||
@itemize
|
||||
@item
|
||||
matching of SXML elements does not depend on any degree of normalization of the
|
||||
SXML;
|
||||
@item
|
||||
matching of SXML attributes (within an element) is under-ordered; the order of
|
||||
the attributes specified within the pattern need not match the ordering with the
|
||||
element being matched;
|
||||
@item
|
||||
all attributes specified in the pattern must be present in the element being
|
||||
matched; in the spirit that XML is 'extensible', the element being matched may
|
||||
include additional attributes not specified in the pattern.
|
||||
@end itemize
|
||||
|
||||
The present module is a descendant of WebIt!, and was inspired by an
|
||||
s-expression pattern matcher developed by Erik Hilsdale, Dan Friedman, and Kent
|
||||
Dybvig at Indiana University.
|
||||
|
||||
@unnumberedsubsec Syntax
|
||||
|
||||
@code{sxml-match} provides @code{case}-like form for pattern matching of XML
|
||||
nodes.
|
||||
|
||||
@deffn {Scheme Syntax} sxml-match input-expression clause ...
|
||||
Match @var{input-expression}, an SXML tree, according to the given @var{clause}s
|
||||
(one or more), each consisting of a pattern and one or more expressions to be
|
||||
evaluated if the pattern match succeeds. Optionally, each @var{clause} within
|
||||
@code{sxml-match} may include a @dfn{guard expression}.
|
||||
@end deffn
|
||||
|
||||
The pattern notation is based on that of Scheme's @code{syntax-rules} and
|
||||
@code{syntax-case} macro systems. The grammar for the @code{sxml-match} syntax
|
||||
is given below:
|
||||
|
||||
@verbatim
|
||||
match-form ::= (sxml-match input-expression
|
||||
clause+)
|
||||
|
||||
clause ::= [node-pattern action-expression+]
|
||||
| [node-pattern (guard expression*) action-expression+]
|
||||
|
||||
node-pattern ::= literal-pattern
|
||||
| pat-var-or-cata
|
||||
| element-pattern
|
||||
| list-pattern
|
||||
|
||||
literal-pattern ::= string
|
||||
| character
|
||||
| number
|
||||
| #t
|
||||
| #f
|
||||
|
||||
attr-list-pattern ::= (@ attribute-pattern*)
|
||||
| (@ attribute-pattern* . pat-var-or-cata)
|
||||
|
||||
attribute-pattern ::= (tag-symbol attr-val-pattern)
|
||||
|
||||
attr-val-pattern ::= literal-pattern
|
||||
| pat-var-or-cata
|
||||
| (pat-var-or-cata default-value-expr)
|
||||
|
||||
element-pattern ::= (tag-symbol attr-list-pattern?)
|
||||
| (tag-symbol attr-list-pattern? nodeset-pattern)
|
||||
| (tag-symbol attr-list-pattern?
|
||||
nodeset-pattern? . pat-var-or-cata)
|
||||
|
||||
list-pattern ::= (list nodeset-pattern)
|
||||
| (list nodeset-pattern? . pat-var-or-cata)
|
||||
| (list)
|
||||
|
||||
nodeset-pattern ::= node-pattern
|
||||
| node-pattern ...
|
||||
| node-pattern nodeset-pattern
|
||||
| node-pattern ... nodeset-pattern
|
||||
|
||||
pat-var-or-cata ::= (unquote var-symbol)
|
||||
| (unquote [var-symbol*])
|
||||
| (unquote [cata-expression -> var-symbol*])
|
||||
@end verbatim
|
||||
|
||||
Within a list or element body pattern, ellipses may appear only once, but may be
|
||||
followed by zero or more node patterns.
|
||||
|
||||
Guard expressions cannot refer to the return values of catamorphisms.
|
||||
|
||||
Ellipses in the output expressions must appear only in an expression context;
|
||||
ellipses are not allowed in a syntactic form.
|
||||
|
||||
The sections below illustrate specific aspects of the @code{sxml-match} pattern
|
||||
matcher.
|
||||
|
||||
@unnumberedsubsec Matching XML Elements
|
||||
|
||||
The example below illustrates the pattern matching of an XML element:
|
||||
|
||||
@lisp
|
||||
(sxml-match '(e (@ (i 1)) 3 4 5)
|
||||
[(e (@ (i ,d)) ,a ,b ,c) (list d a b c)]
|
||||
[,otherwise #f])
|
||||
@end lisp
|
||||
|
||||
Each clause in @code{sxml-match} contains two parts: a pattern and one or more
|
||||
expressions which are evaluated if the pattern is successfully match. The
|
||||
example above matches an element @code{e} with an attribute @code{i} and three
|
||||
children.
|
||||
|
||||
Pattern variables are must be ``unquoted'' in the pattern. The above expression
|
||||
binds @var{d} to @code{1}, @var{a} to @code{3}, @var{b} to @code{4}, and @var{c}
|
||||
to @code{5}.
|
||||
|
||||
@unnumberedsubsec Ellipses in Patterns
|
||||
|
||||
As in @code{syntax-rules}, ellipses may be used to specify a repeated pattern.
|
||||
Note that the pattern @code{item ...} specifies zero-or-more matches of the
|
||||
pattern @code{item}.
|
||||
|
||||
The use of ellipses in a pattern is illustrated in the code fragment below,
|
||||
where nested ellipses are used to match the children of repeated instances of an
|
||||
@code{a} element, within an element @code{d}.
|
||||
|
||||
@lisp
|
||||
(define x '(d (a 1 2 3) (a 4 5) (a 6 7 8) (a 9 10)))
|
||||
|
||||
(sxml-match x
|
||||
[(d (a ,b ...) ...)
|
||||
(list (list b ...) ...)])
|
||||
@end lisp
|
||||
|
||||
The above expression returns a value of @code{((1 2 3) (4 5) (6 7 8) (9 10))}.
|
||||
|
||||
@unnumberedsubsec Ellipses in Quasiquote'd Output
|
||||
|
||||
Within the body of an @code{sxml-match} form, a slightly extended version of
|
||||
quasiquote is provided, which allows the use of ellipses. This is illustrated
|
||||
in the example below.
|
||||
|
||||
@lisp
|
||||
(sxml-match '(e 3 4 5 6 7)
|
||||
[(e ,i ... 6 7) `("start" ,(list 'wrap i) ... "end")]
|
||||
[,otherwise #f])
|
||||
@end lisp
|
||||
|
||||
The general pattern is that @code{`(something ,i ...)} is rewritten as
|
||||
@code{`(something ,@@i)}.
|
||||
|
||||
@unnumberedsubsec Matching Nodesets
|
||||
|
||||
A nodeset pattern is designated by a list in the pattern, beginning the
|
||||
identifier list. The example below illustrates matching a nodeset.
|
||||
|
||||
@lisp
|
||||
(sxml-match '("i" "j" "k" "l" "m")
|
||||
[(list ,a ,b ,c ,d ,e)
|
||||
`((p ,a) (p ,b) (p ,c) (p ,d) (p ,e))])
|
||||
@end lisp
|
||||
|
||||
This example wraps each nodeset item in an HTML paragraph element. This example
|
||||
can be rewritten and simplified through using ellipsis:
|
||||
|
||||
@lisp
|
||||
(sxml-match '("i" "j" "k" "l" "m")
|
||||
[(list ,i ...)
|
||||
`((p ,i) ...)])
|
||||
@end lisp
|
||||
|
||||
This version will match nodesets of any length, and wrap each item in the
|
||||
nodeset in an HTML paragraph element.
|
||||
|
||||
@unnumberedsubsec Matching the ``Rest'' of a Nodeset
|
||||
|
||||
Matching the ``rest'' of a nodeset is achieved by using a @code{. rest)} pattern
|
||||
at the end of an element or nodeset pattern.
|
||||
|
||||
This is illustrated in the example below:
|
||||
|
||||
@lisp
|
||||
(sxml-match '(e 3 (f 4 5 6) 7)
|
||||
[(e ,a (f . ,y) ,d)
|
||||
(list a y d)])
|
||||
@end lisp
|
||||
|
||||
The above expression returns @code{(3 (4 5 6) 7)}.
|
||||
|
||||
@unnumberedsubsec Matching the Unmatched Attributes
|
||||
|
||||
Sometimes it is useful to bind a list of attributes present in the element being
|
||||
matched, but which do not appear in the pattern. This is achieved by using a
|
||||
@code{. rest)} pattern at the end of the attribute list pattern. This is
|
||||
illustrated in the example below:
|
||||
|
||||
@lisp
|
||||
(sxml-match '(a (@ (z 1) (y 2) (x 3)) 4 5 6)
|
||||
[(a (@ (y ,www) . ,qqq) ,t ,u ,v)
|
||||
(list www qqq t u v)])
|
||||
@end lisp
|
||||
|
||||
The above expression matches the attribute @code{y} and binds a list of the
|
||||
remaining attributes to the variable @var{qqq}. The result of the above
|
||||
expression is @code{(2 ((z 1) (x 3)) 4 5 6)}.
|
||||
|
||||
This type of pattern also allows the binding of all attributes:
|
||||
|
||||
@lisp
|
||||
(sxml-match '(a (@ (z 1) (y 2) (x 3)))
|
||||
[(a (@ . ,qqq))
|
||||
qqq])
|
||||
@end lisp
|
||||
|
||||
@unnumberedsubsec Default Values in Attribute Patterns
|
||||
|
||||
It is possible to specify a default value for an attribute which is used if the
|
||||
attribute is not present in the element being matched. This is illustrated in
|
||||
the following example:
|
||||
|
||||
@lisp
|
||||
(sxml-match '(e 3 4 5)
|
||||
[(e (@ (z (,d 1))) ,a ,b ,c) (list d a b c)])
|
||||
@end lisp
|
||||
|
||||
The value @code{1} is used when the attribute @code{z} is absent from the
|
||||
element @code{e}.
|
||||
|
||||
@unnumberedsubsec Guards in Patterns
|
||||
|
||||
Guards may be added to a pattern clause via the @code{guard} keyword. A guard
|
||||
expression may include zero or more expressions which are evaluated only if the
|
||||
pattern is matched. The body of the clause is only evaluated if the guard
|
||||
expressions evaluate to @code{#t}.
|
||||
|
||||
The use of guard expressions is illustrated below:
|
||||
|
||||
@lisp
|
||||
(sxml-match '(a 2 3)
|
||||
((a ,n) (guard (number? n)) n)
|
||||
((a ,m ,n) (guard (number? m) (number? n)) (+ m n)))
|
||||
@end lisp
|
||||
|
||||
@unnumberedsubsec Catamorphisms
|
||||
|
||||
The example below illustrates the use of explicit recursion within an
|
||||
@code{sxml-match} form. This example implements a simple calculator for the
|
||||
basic arithmetic operations, which are represented by the XML elements
|
||||
@code{plus}, @code{minus}, @code{times}, and @code{div}.
|
||||
|
||||
@lisp
|
||||
(define simple-eval
|
||||
(lambda (x)
|
||||
(sxml-match x
|
||||
[,i (guard (integer? i)) i]
|
||||
[(plus ,x ,y) (+ (simple-eval x) (simple-eval y))]
|
||||
[(times ,x ,y) (* (simple-eval x) (simple-eval y))]
|
||||
[(minus ,x ,y) (- (simple-eval x) (simple-eval y))]
|
||||
[(div ,x ,y) (/ (simple-eval x) (simple-eval y))]
|
||||
[,otherwise (error "simple-eval: invalid expression" x)])))
|
||||
@end lisp
|
||||
|
||||
Using the catamorphism feature of @code{sxml-match}, a more concise version of
|
||||
@code{simple-eval} can be written. The pattern @code{,[x]} recusively invokes
|
||||
the pattern matcher on the value bound in this position.
|
||||
|
||||
@lisp
|
||||
(define simple-eval
|
||||
(lambda (x)
|
||||
(sxml-match x
|
||||
[,i (guard (integer? i)) i]
|
||||
[(plus ,[x] ,[y]) (+ x y)]
|
||||
[(times ,[x] ,[y]) (* x y)]
|
||||
[(minus ,[x] ,[y]) (- x y)]
|
||||
[(div ,[x] ,[y]) (/ x y)]
|
||||
[,otherwise (error "simple-eval: invalid expression" x)])))
|
||||
@end lisp
|
||||
|
||||
@unnumberedsubsec Named-Catamorphisms
|
||||
|
||||
It is also possible to explicitly name the operator in the ``cata'' position.
|
||||
Where @code{,[id*]} recurs to the top of the current @code{sxml-match},
|
||||
@code{,[cata -> id*]} recurs to @code{cata}. @code{cata} must evaluate to a
|
||||
procedure which takes one argument, and returns as many values as there are
|
||||
identifiers following @code{->}.
|
||||
|
||||
Named catamorphism patterns allow processing to be split into multiple, mutually
|
||||
recursive procedures. This is illustrated in the example below: a
|
||||
transformation that formats a "TV Guide" into HTML.
|
||||
|
||||
@lisp
|
||||
(define (tv-guide->html g)
|
||||
(define (cast-list cl)
|
||||
(sxml-match cl
|
||||
[(CastList (CastMember (Character (Name ,ch)) (Actor (Name ,a))) ...)
|
||||
`(div (ul (li ,ch ": " ,a) ...))]))
|
||||
(define (prog p)
|
||||
(sxml-match p
|
||||
[(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
|
||||
(Description ,desc ...))
|
||||
`(div (p ,start-time
|
||||
(br) ,series-title
|
||||
(br) ,desc ...))]
|
||||
[(Program (Start ,start-time) (Duration ,dur) (Series ,series-title)
|
||||
(Description ,desc ...)
|
||||
,[cast-list -> cl])
|
||||
`(div (p ,start-time
|
||||
(br) ,series-title
|
||||
(br) ,desc ...)
|
||||
,cl)]))
|
||||
(sxml-match g
|
||||
[(TVGuide (@ (start ,start-date)
|
||||
(end ,end-date))
|
||||
(Channel (Name ,nm) ,[prog -> p] ...) ...)
|
||||
`(html (head (title "TV Guide"))
|
||||
(body (h1 "TV Guide")
|
||||
(div (h2 ,nm) ,p ...) ...))]))
|
||||
@end lisp
|
||||
|
||||
@unnumberedsubsec @code{sxml-match-let} and @code{sxml-match-let*}
|
||||
|
||||
@deffn {Scheme Syntax} sxml-match-let ((pat expr) ...) expression0 expression ...)
|
||||
@deffnx {Scheme Syntax} sxml-match-let* ((pat expr) ...) expression0 expression ...)
|
||||
These forms generalize the @code{let} and @code{let*} forms of Scheme to allow
|
||||
an XML pattern in the binding position, rather than a simple variable.
|
||||
@end deffn
|
||||
|
||||
For example, the expression below:
|
||||
|
||||
@lisp
|
||||
(sxml-match-let ([(a ,i ,j) '(a 1 2)])
|
||||
(+ i j))
|
||||
@end lisp
|
||||
|
||||
binds the variables @var{i} and @var{j} to @code{1} and @code{2} in the XML
|
||||
value given.
|
||||
|
||||
@c Local Variables:
|
||||
@c coding: utf-8
|
||||
@c End:
|
Loading…
Add table
Add a link
Reference in a new issue