mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 03:40:34 +02:00
add #:doctype-handler to xml->sxml
* module/sxml/simple.scm (read-internal-doctype-as-string): New helper. (xml->sxml): Add #:doctype-handler argument. * doc/ref/sxml.texi (Reading and Writing XML): Document #:doctype-handler. Fix some other examples, and fix the default value of #:declare-namespaces?. * test-suite/tests/sxml.simple.test: Add all tests from the manual here.
This commit is contained in:
parent
1260fd0b2c
commit
e10c250928
3 changed files with 170 additions and 23 deletions
|
@ -57,7 +57,8 @@ to text.
|
||||||
|
|
||||||
@deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
|
@deffn {Scheme Procedure} xml->sxml [string-or-port] [#:namespaces='()] @
|
||||||
[#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
|
[#:declare-namespaces?=#t] [#:trim-whitespace?=#f] @
|
||||||
[#:entities='()] [#:default-entity-handler=#f]
|
[#:entities='()] [#:default-entity-handler=#f] @
|
||||||
|
[#:doctype-handler=#f]
|
||||||
Use SSAX to parse an XML document into SXML. Takes one optional
|
Use SSAX to parse an XML document into SXML. Takes one optional
|
||||||
argument, @var{string-or-port}, which defaults to the current input
|
argument, @var{string-or-port}, which defaults to the current input
|
||||||
port. Returns the resulting SXML document. If @var{string-or-port} is
|
port. Returns the resulting SXML document. If @var{string-or-port} is
|
||||||
|
@ -99,18 +100,19 @@ for certain namespaces with the @code{#:namespaces} keyword argument to
|
||||||
@result{} (*TOP* (foo (ns2:baz)))
|
@result{} (*TOP* (foo (ns2:baz)))
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
Passing a true @code{#:declare-namespaces?} argument will cause the
|
By default, namespaces passed to @code{xml->sxml} are treated as if they
|
||||||
user-given @code{#:namespaces} to be treated as if they were declared on
|
were declared on the root element. Passing a false
|
||||||
the root element.
|
@code{#:declare-namespaces?} argument will disable this behavior,
|
||||||
|
requiring in-document declarations of namespaces before use..
|
||||||
|
|
||||||
@example
|
@example
|
||||||
(xml->sxml "<foo><ns2:baz/></foo>"
|
(xml->sxml "<foo><ns2:baz/></foo>"
|
||||||
#:namespaces '((ns2 . "http://example.org/ns2")))
|
#:namespaces '((ns2 . "http://example.org/ns2")))
|
||||||
@result{} error: undeclared namespace: `bar'
|
@result{} (*TOP* (foo (ns2:baz)))
|
||||||
(xml->sxml "<foo><ns2:baz/></foo>"
|
(xml->sxml "<foo><ns2:baz/></foo>"
|
||||||
#:namespaces '((ns2 . "http://example.org/ns2"))
|
#:namespaces '((ns2 . "http://example.org/ns2"))
|
||||||
#:declare-namespaces? #t)
|
#:declare-namespaces? #f)
|
||||||
@result{} (*TOP* (foo (ns2:baz)))
|
@result{} error: undeclared namespace: `bar'
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
By default, all whitespace in XML is significant. Passing the
|
By default, all whitespace in XML is significant. Passing the
|
||||||
|
@ -120,10 +122,10 @@ whitespace in front, behind and between elements, treating it as
|
||||||
|
|
||||||
@example
|
@example
|
||||||
(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
|
(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
|
||||||
@result{} (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n")
|
@result{} (*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n"))
|
||||||
(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
|
(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
|
||||||
#:trim-whitespace? #t)
|
#:trim-whitespace? #t)
|
||||||
@result{} (*TOP* (foo (bar " Alfie the parrot! "))
|
@result{} (*TOP* (foo (bar " Alfie the parrot! ")))
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
Parsed entities may be declared with the @code{#:entities} keyword
|
Parsed entities may be declared with the @code{#:entities} keyword
|
||||||
|
@ -159,6 +161,38 @@ numeric character entities.
|
||||||
@result{} (*TOP* (foo "\xa0 foo"))
|
@result{} (*TOP* (foo "\xa0 foo"))
|
||||||
@end example
|
@end example
|
||||||
|
|
||||||
|
By default, @code{xml->sxml} skips over the @code{<!DOCTYPE>}
|
||||||
|
declaration, if any. This behavior can be overridden with the
|
||||||
|
@code{#:doctype-handler} argument, which should be a procedure of three
|
||||||
|
arguments: the @dfn{docname} (a symbol), @dfn{systemid} (a string), and
|
||||||
|
the internal doctype subset (as a string or @code{#f} if not present).
|
||||||
|
|
||||||
|
The handler should return keyword arguments as multiple values, as if it
|
||||||
|
were calling its continuation with keyword arguments. The continuation
|
||||||
|
accepts the @code{#:entities} and @code{#:namespaces} keyword arguments,
|
||||||
|
in the same format that @code{xml->sxml} itself takes. These entities
|
||||||
|
and namespaces will be prepended to those given to the @code{xml->sxml}
|
||||||
|
invocation.
|
||||||
|
|
||||||
|
@example
|
||||||
|
(define (handle-foo docname systemid internal-subset)
|
||||||
|
(case docname
|
||||||
|
((foo)
|
||||||
|
(values #:entities '((greets . "<i>Hello, world!</i>"))))
|
||||||
|
(else
|
||||||
|
(values))))
|
||||||
|
|
||||||
|
(xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
|
||||||
|
#:doctype-handler handle-foo)
|
||||||
|
@result{} (*TOP* (p (i "Hello, world!")))
|
||||||
|
@end example
|
||||||
|
|
||||||
|
If the document has no doctype declaration, the @var{doctype-handler} is
|
||||||
|
invoked with @code{#f} for the three arguments.
|
||||||
|
|
||||||
|
In the future, the continuation may accept other keyword arguments, for
|
||||||
|
example to validate the parsed SXML against the doctype.
|
||||||
|
|
||||||
@deffn {Scheme Procedure} sxml->xml tree [port]
|
@deffn {Scheme Procedure} sxml->xml tree [port]
|
||||||
Serialize the SXML tree @var{tree} as XML. The output will be written to
|
Serialize the SXML tree @var{tree} as XML. The output will be written to
|
||||||
the current output port, unless the optional argument @var{port} is
|
the current output port, unless the optional argument @var{port} is
|
||||||
|
|
|
@ -26,6 +26,7 @@
|
||||||
;;; Code:
|
;;; Code:
|
||||||
|
|
||||||
(define-module (sxml simple)
|
(define-module (sxml simple)
|
||||||
|
#:use-module (sxml ssax input-parse)
|
||||||
#:use-module (sxml ssax)
|
#:use-module (sxml ssax)
|
||||||
#:use-module (sxml transform)
|
#:use-module (sxml transform)
|
||||||
#:use-module (ice-9 match)
|
#:use-module (ice-9 match)
|
||||||
|
@ -35,10 +36,6 @@
|
||||||
;; Helpers from upstream/SSAX.scm.
|
;; Helpers from upstream/SSAX.scm.
|
||||||
;;
|
;;
|
||||||
|
|
||||||
(define (ssax:warn port msg . args)
|
|
||||||
(format (current-ssax-error-port)
|
|
||||||
";;; SSAX warning: ~a ~a\n" msg args))
|
|
||||||
|
|
||||||
; ssax:reverse-collect-str LIST-OF-FRAGS -> LIST-OF-FRAGS
|
; ssax:reverse-collect-str LIST-OF-FRAGS -> LIST-OF-FRAGS
|
||||||
; given the list of fragments (some of which are text strings)
|
; given the list of fragments (some of which are text strings)
|
||||||
; reverse the list and concatenate adjacent text strings.
|
; reverse the list and concatenate adjacent text strings.
|
||||||
|
@ -65,6 +62,17 @@
|
||||||
(cons (string-concatenate/shared strs) result)))
|
(cons (string-concatenate/shared strs) result)))
|
||||||
'())))))))
|
'())))))))
|
||||||
|
|
||||||
|
(define (read-internal-doctype-as-string port)
|
||||||
|
(string-concatenate/shared
|
||||||
|
(let loop ()
|
||||||
|
(let ((fragment
|
||||||
|
(next-token '() '(#\]) "reading internal DOCTYPE" port)))
|
||||||
|
(if (eqv? #\> (peek-next-char port))
|
||||||
|
(begin
|
||||||
|
(read-char port)
|
||||||
|
(cons fragment '()))
|
||||||
|
(cons* fragment "]" (loop)))))))
|
||||||
|
|
||||||
;; Ideas for the future for this interface:
|
;; Ideas for the future for this interface:
|
||||||
;;
|
;;
|
||||||
;; * Allow doctypes to provide parsed entities
|
;; * Allow doctypes to provide parsed entities
|
||||||
|
@ -81,7 +89,8 @@
|
||||||
(declare-namespaces? #t)
|
(declare-namespaces? #t)
|
||||||
(trim-whitespace? #f)
|
(trim-whitespace? #f)
|
||||||
(entities '())
|
(entities '())
|
||||||
(default-entity-handler #f))
|
(default-entity-handler #f)
|
||||||
|
(doctype-handler #f))
|
||||||
"Use SSAX to parse an XML document into SXML. Takes one optional
|
"Use SSAX to parse an XML document into SXML. Takes one optional
|
||||||
argument, @var{string-or-port}, which defaults to the current input
|
argument, @var{string-or-port}, which defaults to the current input
|
||||||
port."
|
port."
|
||||||
|
@ -96,7 +105,7 @@ port."
|
||||||
;; NAMESPACES: list of (DOC-PREFIX . (USER-PREFIX . URI)).
|
;; NAMESPACES: list of (DOC-PREFIX . (USER-PREFIX . URI)).
|
||||||
;; A DOC-PREFIX of #f indicates that it comes from the user.
|
;; A DOC-PREFIX of #f indicates that it comes from the user.
|
||||||
;; Otherwise, prefixes are symbols.
|
;; Otherwise, prefixes are symbols.
|
||||||
(define (user-namespaces)
|
(define (munge-namespaces namespaces)
|
||||||
(map (lambda (el)
|
(map (lambda (el)
|
||||||
(match el
|
(match el
|
||||||
((prefix . uri-string)
|
((prefix . uri-string)
|
||||||
|
@ -105,6 +114,9 @@ port."
|
||||||
(ssax:uri-string->symbol uri-string)))))
|
(ssax:uri-string->symbol uri-string)))))
|
||||||
namespaces))
|
namespaces))
|
||||||
|
|
||||||
|
(define (user-namespaces)
|
||||||
|
(munge-namespaces namespaces))
|
||||||
|
|
||||||
(define (user-entities)
|
(define (user-entities)
|
||||||
(if (and default-entity-handler
|
(if (and default-entity-handler
|
||||||
(not (assq '*DEFAULT* entities)))
|
(not (assq '*DEFAULT* entities)))
|
||||||
|
@ -117,6 +129,13 @@ port."
|
||||||
(symbol-append prefix (string->symbol ":") local-part))
|
(symbol-append prefix (string->symbol ":") local-part))
|
||||||
(_ name)))
|
(_ name)))
|
||||||
|
|
||||||
|
(define (doctype-continuation seed)
|
||||||
|
(lambda* (#:key (entities '()) (namespaces '()))
|
||||||
|
(values #f
|
||||||
|
(append entities (user-entities))
|
||||||
|
(append (munge-namespaces namespaces) (user-namespaces))
|
||||||
|
seed)))
|
||||||
|
|
||||||
;; The SEED in this parser is the SXML: initialized to '() at each new
|
;; The SEED in this parser is the SXML: initialized to '() at each new
|
||||||
;; level by the fdown handlers; built in reverse by the fhere parsers;
|
;; level by the fdown handlers; built in reverse by the fhere parsers;
|
||||||
;; and reverse-collected by the fup handlers.
|
;; and reverse-collected by the fup handlers.
|
||||||
|
@ -159,18 +178,29 @@ port."
|
||||||
;;
|
;;
|
||||||
;; SEED builds up the content.
|
;; SEED builds up the content.
|
||||||
(lambda (port docname systemid internal-subset? seed)
|
(lambda (port docname systemid internal-subset? seed)
|
||||||
(when internal-subset?
|
(call-with-values
|
||||||
(ssax:warn port "Internal DTD subset is not currently handled ")
|
(lambda ()
|
||||||
(ssax:skip-internal-dtd port))
|
(cond
|
||||||
(ssax:warn port "DOCTYPE DECL " docname " "
|
(doctype-handler
|
||||||
systemid " found and skipped")
|
(doctype-handler docname systemid
|
||||||
(values #f (user-entities) (user-namespaces) seed))
|
(and internal-subset?
|
||||||
|
(read-internal-doctype-as-string port))))
|
||||||
|
(else
|
||||||
|
(when internal-subset?
|
||||||
|
(ssax:skip-internal-dtd port))
|
||||||
|
(values))))
|
||||||
|
(doctype-continuation seed)))
|
||||||
|
|
||||||
UNDECL-ROOT
|
UNDECL-ROOT
|
||||||
;; This is like the DOCTYPE handler, but for documents that do not
|
;; This is like the DOCTYPE handler, but for documents that do not
|
||||||
;; have a <!DOCTYPE!> entry.
|
;; have a <!DOCTYPE!> entry.
|
||||||
(lambda (elem-gi seed)
|
(lambda (elem-gi seed)
|
||||||
(values #f (user-entities) (user-namespaces) seed))
|
(call-with-values
|
||||||
|
(lambda ()
|
||||||
|
(if doctype-handler
|
||||||
|
(doctype-handler #f #f #f)
|
||||||
|
(values)))
|
||||||
|
(doctype-continuation seed)))
|
||||||
|
|
||||||
PI
|
PI
|
||||||
((*DEFAULT*
|
((*DEFAULT*
|
||||||
|
|
|
@ -1,6 +1,6 @@
|
||||||
;;;; sxml.simple.test --- (sxml simple) -*- mode: scheme; coding: utf-8; -*-
|
;;;; sxml.simple.test --- (sxml simple) -*- mode: scheme; coding: utf-8; -*-
|
||||||
;;;;
|
;;;;
|
||||||
;;;; Copyright (C) 2010 Free Software Foundation, Inc.
|
;;;; Copyright (C) 2010, 2013 Free Software Foundation, Inc.
|
||||||
;;;;
|
;;;;
|
||||||
;;;; This library is free software; you can redistribute it and/or
|
;;;; This library is free software; you can redistribute it and/or
|
||||||
;;;; modify it under the terms of the GNU Lesser General Public
|
;;;; modify it under the terms of the GNU Lesser General Public
|
||||||
|
@ -20,6 +20,8 @@
|
||||||
#:use-module (test-suite lib)
|
#:use-module (test-suite lib)
|
||||||
#:use-module (sxml simple))
|
#:use-module (sxml simple))
|
||||||
|
|
||||||
|
(define parser-error '(parser-error . ""))
|
||||||
|
|
||||||
(define %xml-sample
|
(define %xml-sample
|
||||||
;; An XML sample without any space in between tags, to make it easier.
|
;; An XML sample without any space in between tags, to make it easier.
|
||||||
(string-append "<?xml version='1.0' encoding='utf-8'?>"
|
(string-append "<?xml version='1.0' encoding='utf-8'?>"
|
||||||
|
@ -50,3 +52,84 @@
|
||||||
(lambda ()
|
(lambda ()
|
||||||
(sxml->xml
|
(sxml->xml
|
||||||
(xml->sxml (open-input-string %xml-sample))))))))))
|
(xml->sxml (open-input-string %xml-sample))))))))))
|
||||||
|
|
||||||
|
(with-test-prefix "namespaces"
|
||||||
|
(pass-if-equal
|
||||||
|
(xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>")
|
||||||
|
'(*TOP* (http://example.org/ns1:foo "text")))
|
||||||
|
|
||||||
|
(pass-if-equal
|
||||||
|
(xml->sxml "<foo xmlns=\"http://example.org/ns1\">text</foo>"
|
||||||
|
#:namespaces '((ns1 . "http://example.org/ns1")))
|
||||||
|
'(*TOP* (ns1:foo "text")))
|
||||||
|
|
||||||
|
(pass-if-equal
|
||||||
|
(xml->sxml "<foo xmlns:bar=\"http://example.org/ns2\"><bar:baz/></foo>"
|
||||||
|
#:namespaces '((ns2 . "http://example.org/ns2")))
|
||||||
|
'(*TOP* (foo (ns2:baz))))
|
||||||
|
|
||||||
|
(pass-if-equal
|
||||||
|
(xml->sxml "<foo><ns2:baz/></foo>"
|
||||||
|
#:namespaces '((ns2 . "http://example.org/ns2")))
|
||||||
|
'(*TOP* (foo (ns2:baz))))
|
||||||
|
|
||||||
|
(pass-if-exception "namespace undeclared" parser-error
|
||||||
|
(xml->sxml "<foo><ns2:baz/></foo>"
|
||||||
|
#:namespaces '((ns2 . "http://example.org/ns2"))
|
||||||
|
#:declare-namespaces? #f)))
|
||||||
|
|
||||||
|
(with-test-prefix "whitespace"
|
||||||
|
(pass-if-equal
|
||||||
|
(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>")
|
||||||
|
'(*TOP* (foo "\n" (bar " Alfie the parrot! ") "\n")))
|
||||||
|
|
||||||
|
(pass-if-equal
|
||||||
|
(xml->sxml "<foo>\n<bar> Alfie the parrot! </bar>\n</foo>"
|
||||||
|
#:trim-whitespace? #t)
|
||||||
|
'(*TOP* (foo (bar " Alfie the parrot! ")))))
|
||||||
|
|
||||||
|
(with-test-prefix "parsed entities"
|
||||||
|
(pass-if-equal
|
||||||
|
'(*TOP* (foo "&"))
|
||||||
|
(xml->sxml "<foo>&</foo>"))
|
||||||
|
|
||||||
|
(pass-if-exception "nbsp undefined" parser-error
|
||||||
|
(xml->sxml "<foo> </foo>"))
|
||||||
|
|
||||||
|
(pass-if-equal
|
||||||
|
'(*TOP* (foo "\xA0"))
|
||||||
|
(xml->sxml "<foo> </foo>"
|
||||||
|
#:entities '((nbsp . "\xA0"))))
|
||||||
|
|
||||||
|
(pass-if-equal
|
||||||
|
'(*TOP* (foo "\xA0"))
|
||||||
|
(xml->sxml "<foo> </foo>"))
|
||||||
|
|
||||||
|
(let ((ents '()))
|
||||||
|
(pass-if-equal
|
||||||
|
(xml->sxml "<foo> &foo;</foo>"
|
||||||
|
#:default-entity-handler
|
||||||
|
(lambda (port name)
|
||||||
|
(case name
|
||||||
|
((nbsp) "\xa0")
|
||||||
|
(else
|
||||||
|
(set! ents (cons name ents))
|
||||||
|
"qux"))))
|
||||||
|
'(*TOP* (foo "\xa0 qux")))
|
||||||
|
|
||||||
|
(pass-if-equal
|
||||||
|
ents
|
||||||
|
'(foo))))
|
||||||
|
|
||||||
|
(with-test-prefix "doctype handlers"
|
||||||
|
(define (handle-foo docname systemid internal-subset)
|
||||||
|
(case docname
|
||||||
|
((foo)
|
||||||
|
(values #:entities '((greets . "<i>Hello, world!</i>"))))
|
||||||
|
(else
|
||||||
|
(values))))
|
||||||
|
|
||||||
|
(pass-if-equal
|
||||||
|
(xml->sxml "<!DOCTYPE foo><p>&greets;</p>"
|
||||||
|
#:doctype-handler handle-foo)
|
||||||
|
'(*TOP* (p (i "Hello, world!")))))
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue