mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-06-12 06:41:13 +02:00
rewrite web.texi intro
* doc/ref/web.texi (Web): Rewrite the intro. (Types and the Web): New subsection, a mini-rant.
This commit is contained in:
parent
8a41c56af1
commit
d75a81b128
1 changed files with 139 additions and 17 deletions
156
doc/ref/web.texi
156
doc/ref/web.texi
|
@ -9,28 +9,31 @@
|
||||||
@cindex WWW
|
@cindex WWW
|
||||||
@cindex HTTP
|
@cindex HTTP
|
||||||
|
|
||||||
When Guile started back in the mid-nineties, the GNU system was still
|
It has always been possible to connect computers together and share
|
||||||
focused on producing a good POSIX implementation. This is why Guile's
|
information between them, but the rise of the World-Wide Web over the
|
||||||
POSIX support is good, and has been so for a while.
|
last couple of decades has made it much easier to do so. The result is
|
||||||
|
a richly connected network of computation, in which Guile forms a part.
|
||||||
|
|
||||||
But times change, and in a way these days the web is the new POSIX: a
|
By ``the web'', we mean the HTTP protocol@footnote{Yes, the P is for
|
||||||
standard and a motley set of implementations on which much computing is
|
protocol, but this phrase appears repeatedly in RFC 2616.} as handled by
|
||||||
done. So today's Guile also supports the web at the programming
|
servers, clients, proxies, caches, and the various kinds of messages and
|
||||||
language level, by defining common data types and operations for the
|
message components that can be sent and received by that protocol,
|
||||||
technologies underpinning the web: URIs, HTTP, and XML.
|
notably HTML.
|
||||||
|
|
||||||
It is particularly important to define native web data types. Though
|
On one level, the web is text in motion: the protocols themselves are
|
||||||
the web is text in motion, programming the web in text is like
|
textual (though the payload may be binary), and it's possible to create
|
||||||
programming with @code{goto}: muddy, and error-prone. Most current
|
a socket and speak text to the web. But such an approach is obviously
|
||||||
security problems on the web are due to treating the web as text instead
|
primitive. This section details the higher-level data types and
|
||||||
of as instances of the proper data types.
|
operations provided by Guile: URIs, HTTP request and response records,
|
||||||
|
and a conventional web server implementation.
|
||||||
|
|
||||||
In addition, common web data types help programmers to share code.
|
The material in this section is arranged in ascending order, in which
|
||||||
|
later concepts build on previous ones. If you prefer to start with the
|
||||||
Well. That's all very nice and opinionated and such, but how do I use
|
highest-level perspective, @pxref{Web Examples}, and work your way
|
||||||
the thing? Read on!
|
back.
|
||||||
|
|
||||||
@menu
|
@menu
|
||||||
|
* Types and the Web:: Types prevent bugs and security problems.
|
||||||
* URIs:: Universal Resource Identifiers.
|
* URIs:: Universal Resource Identifiers.
|
||||||
* HTTP:: The Hyper-Text Transfer Protocol.
|
* HTTP:: The Hyper-Text Transfer Protocol.
|
||||||
* HTTP Headers:: How Guile represents specific header values.
|
* HTTP Headers:: How Guile represents specific header values.
|
||||||
|
@ -40,6 +43,125 @@ the thing? Read on!
|
||||||
* Web Examples:: How to use this thing.
|
* Web Examples:: How to use this thing.
|
||||||
@end menu
|
@end menu
|
||||||
|
|
||||||
|
@node Types and the Web
|
||||||
|
@subsection Types and the Web
|
||||||
|
|
||||||
|
It is a truth universally acknowledged, that a program with good use of
|
||||||
|
data types, will be free from many common bugs. Unfortunately, the
|
||||||
|
common practice in web programming seems to ignore this maxim. This
|
||||||
|
subsection makes the case for expressive data types in web programming.
|
||||||
|
|
||||||
|
By ``expressive data types'', we mean that the data types @emph{say}
|
||||||
|
something about how a program solves a problem. For example, if we
|
||||||
|
choose to represent dates using SRFI 19 date records (@pxref{SRFI-19}),
|
||||||
|
this indicates that there is a part of the program that will always have
|
||||||
|
valid dates. Error handling for a number of basic cases, like invalid
|
||||||
|
dates, occurs on the boundary in which we produce a SRFI 19 date record
|
||||||
|
from other types, like strings.
|
||||||
|
|
||||||
|
With regards to the web, data types are help in the two broad phases of
|
||||||
|
HTTP messages: parsing and generation.
|
||||||
|
|
||||||
|
Consider a server, which has to parse a request, and produce a response.
|
||||||
|
Guile will parse the request into an HTTP request object
|
||||||
|
(@pxref{Requests}), with each header parsed into an appropriate Scheme
|
||||||
|
data type. This transition from an incoming stream of characters to
|
||||||
|
typed data is a state change in a program---the strings might parse, or
|
||||||
|
they might not, and something has to happen if they do not. (Guile
|
||||||
|
throws an error in this case.) But after you have the parsed request,
|
||||||
|
``client'' code (code built on top of the Guile web framework) will not
|
||||||
|
have to check for syntactic validity. The types already make this
|
||||||
|
information manifest.
|
||||||
|
|
||||||
|
This state change on the parsing boundary makes programs more robust,
|
||||||
|
as they themselves are freed from the need to do a number of common
|
||||||
|
error checks, and they can use normal Scheme procedures to handle a
|
||||||
|
request instead of ad-hoc string parsers.
|
||||||
|
|
||||||
|
The need for types on the response generation side (in a server) is more
|
||||||
|
subtle, though not less important. Consider the example of a POST
|
||||||
|
handler, which prints out the text that a user submits from a form.
|
||||||
|
Such a handler might include a procedure like this:
|
||||||
|
|
||||||
|
@example
|
||||||
|
;; First, a helper procedure
|
||||||
|
(define (para . contents)
|
||||||
|
(string-append "<p>" (string-concatenate contents) "</p>"))
|
||||||
|
|
||||||
|
;; Now the meat of our simple web application
|
||||||
|
(define (you-said text)
|
||||||
|
(para "You said: " text))
|
||||||
|
|
||||||
|
(display (you-said "Hi!"))
|
||||||
|
@print{} <p>You said: Hi!</p>
|
||||||
|
@end example
|
||||||
|
|
||||||
|
This is a perfectly valid implementation, provided that the incoming
|
||||||
|
text does not contain the special HTML characters @samp{<}, @samp{>}, or
|
||||||
|
@samp{&}. But this provision of a restricted character set is not
|
||||||
|
reflected anywhere in the program itself: we must @emph{assume} that the
|
||||||
|
programmer understands this, and performs the check elsewhere.
|
||||||
|
|
||||||
|
Unfortunately, the short history of the practice of programming does not
|
||||||
|
bear out this assumption. A @dfn{cross-site scripting} (@acronym{XSS})
|
||||||
|
vulnerability is just such a common error in which unfiltered user input
|
||||||
|
is allowed into the output. A user could submit a crafted comment to
|
||||||
|
your web site which results in visitors running malicious Javascript,
|
||||||
|
within the security context of your domain:
|
||||||
|
|
||||||
|
@example
|
||||||
|
(display (you-said "<script src=\"http://bad.com/nasty.js\" />"))
|
||||||
|
@print{} <p>You said: <script src="http://bad.com/nasty.js" /></p>
|
||||||
|
@end example
|
||||||
|
|
||||||
|
The fundamental problem here is that both user data and the program
|
||||||
|
template are represented using strings. This identity means that types
|
||||||
|
can't help the programmer to make a distinction between these two, so
|
||||||
|
they get confused.
|
||||||
|
|
||||||
|
There are a number of possible solutions, but perhaps the best is to
|
||||||
|
treat HTML not as strings, but as native s-expressions: as SXML. The
|
||||||
|
basic idea is that HTML is either text, represented by a string, or an
|
||||||
|
element, represented as a tagged list. So @samp{foo} becomes
|
||||||
|
@samp{"foo"}, and @samp{<b>foo</b>} becomes @samp{(b "foo")}.
|
||||||
|
Attributes, if present, go in a tagged list headed by @samp{@@}, like
|
||||||
|
@samp{(img (@@ (src "http://example.com/foo.png")))}. @xref{sxml
|
||||||
|
simple}, for more information.
|
||||||
|
|
||||||
|
The good thing about SXML is that HTML elements cannot be confused with
|
||||||
|
text. Let's make a new definition of @code{para}:
|
||||||
|
|
||||||
|
@example
|
||||||
|
(define (para . contents)
|
||||||
|
`(p ,@@contents))
|
||||||
|
|
||||||
|
(use-modules (sxml simple))
|
||||||
|
(sxml->xml (you-said "Hi!"))
|
||||||
|
@print{} <p>You said: Hi!</p>
|
||||||
|
|
||||||
|
(sxml->xml (you-said "<i>Rats, foiled again!</i>"))
|
||||||
|
@print{} <p>You said: <i>Rats, foiled again!</i></p>
|
||||||
|
@end example
|
||||||
|
|
||||||
|
So we see in the second example that HTML elements cannot be unwittingly
|
||||||
|
introduced into the output. However it is perfectly acceptable to pass
|
||||||
|
SXML to @code{you-said}; in fact, that is the big advantage of SXML over
|
||||||
|
everything-as-a-string.
|
||||||
|
|
||||||
|
@example
|
||||||
|
(sxml->xml (you-said (you-said "<Hi!>")))
|
||||||
|
@print{} <p>You said: <p>You said: <Hi!></p></p>
|
||||||
|
@end example
|
||||||
|
|
||||||
|
The SXML types allow procedures to @emph{compose}. The types make
|
||||||
|
manifest which parts are HTML elements, and which are text. So you
|
||||||
|
needn't worry about escaping user input; the type transition back to a
|
||||||
|
string handles that for you. @acronym{XSS} vulnerabilities are a thing
|
||||||
|
of the past.
|
||||||
|
|
||||||
|
Well. That's all very nice and opinionated and such, but how do I use
|
||||||
|
the thing? Read on!
|
||||||
|
|
||||||
@node URIs
|
@node URIs
|
||||||
@subsection Universal Resource Identifiers
|
@subsection Universal Resource Identifiers
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue