1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-05-31 09:20:23 +02:00
guile/doc/ref/web.texi
Andy Wingo b3f9444892 clarify uri fragment discussion
* doc/ref/web.texi (URIs): Clarify the discussion of URI fragments.
2011-01-07 09:18:36 -08:00

1332 lines
45 KiB
Text

@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
@c Copyright (C) 2010 Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions.
@node Web
@section @acronym{HTTP}, the Web, and All That
@cindex Web
@cindex WWW
@cindex HTTP
It has always been possible to connect computers together and share
information between them, but the rise of the World-Wide Web over the
last couple of decades has made it much easier to do so. The result is
a richly connected network of computation, in which Guile forms a part.
By ``the web'', we mean the HTTP protocol@footnote{Yes, the P is for
protocol, but this phrase appears repeatedly in RFC 2616.} as handled by
servers, clients, proxies, caches, and the various kinds of messages and
message components that can be sent and received by that protocol,
notably HTML.
On one level, the web is text in motion: the protocols themselves are
textual (though the payload may be binary), and it's possible to create
a socket and speak text to the web. But such an approach is obviously
primitive. This section details the higher-level data types and
operations provided by Guile: URIs, HTTP request and response records,
and a conventional web server implementation.
The material in this section is arranged in ascending order, in which
later concepts build on previous ones. If you prefer to start with the
highest-level perspective, @pxref{Web Examples}, and work your way
back.
@menu
* Types and the Web:: Types prevent bugs and security problems.
* URIs:: Universal Resource Identifiers.
* HTTP:: The Hyper-Text Transfer Protocol.
* HTTP Headers:: How Guile represents specific header values.
* Requests:: HTTP requests.
* Responses:: HTTP responses.
* Web Server:: Serving HTTP to the internet.
* Web Examples:: How to use this thing.
@end menu
@node Types and the Web
@subsection Types and the Web
It is a truth universally acknowledged, that a program with good use of
data types, will be free from many common bugs. Unfortunately, the
common practice in web programming seems to ignore this maxim. This
subsection makes the case for expressive data types in web programming.
By ``expressive data types'', we mean that the data types @emph{say}
something about how a program solves a problem. For example, if we
choose to represent dates using SRFI 19 date records (@pxref{SRFI-19}),
this indicates that there is a part of the program that will always have
valid dates. Error handling for a number of basic cases, like invalid
dates, occurs on the boundary in which we produce a SRFI 19 date record
from other types, like strings.
With regards to the web, data types are help in the two broad phases of
HTTP messages: parsing and generation.
Consider a server, which has to parse a request, and produce a response.
Guile will parse the request into an HTTP request object
(@pxref{Requests}), with each header parsed into an appropriate Scheme
data type. This transition from an incoming stream of characters to
typed data is a state change in a program---the strings might parse, or
they might not, and something has to happen if they do not. (Guile
throws an error in this case.) But after you have the parsed request,
``client'' code (code built on top of the Guile web framework) will not
have to check for syntactic validity. The types already make this
information manifest.
This state change on the parsing boundary makes programs more robust,
as they themselves are freed from the need to do a number of common
error checks, and they can use normal Scheme procedures to handle a
request instead of ad-hoc string parsers.
The need for types on the response generation side (in a server) is more
subtle, though not less important. Consider the example of a POST
handler, which prints out the text that a user submits from a form.
Such a handler might include a procedure like this:
@example
;; First, a helper procedure
(define (para . contents)
(string-append "<p>" (string-concatenate contents) "</p>"))
;; Now the meat of our simple web application
(define (you-said text)
(para "You said: " text))
(display (you-said "Hi!"))
@print{} <p>You said: Hi!</p>
@end example
This is a perfectly valid implementation, provided that the incoming
text does not contain the special HTML characters @samp{<}, @samp{>}, or
@samp{&}. But this provision of a restricted character set is not
reflected anywhere in the program itself: we must @emph{assume} that the
programmer understands this, and performs the check elsewhere.
Unfortunately, the short history of the practice of programming does not
bear out this assumption. A @dfn{cross-site scripting} (@acronym{XSS})
vulnerability is just such a common error in which unfiltered user input
is allowed into the output. A user could submit a crafted comment to
your web site which results in visitors running malicious Javascript,
within the security context of your domain:
@example
(display (you-said "<script src=\"http://bad.com/nasty.js\" />"))
@print{} <p>You said: <script src="http://bad.com/nasty.js" /></p>
@end example
The fundamental problem here is that both user data and the program
template are represented using strings. This identity means that types
can't help the programmer to make a distinction between these two, so
they get confused.
There are a number of possible solutions, but perhaps the best is to
treat HTML not as strings, but as native s-expressions: as SXML. The
basic idea is that HTML is either text, represented by a string, or an
element, represented as a tagged list. So @samp{foo} becomes
@samp{"foo"}, and @samp{<b>foo</b>} becomes @samp{(b "foo")}.
Attributes, if present, go in a tagged list headed by @samp{@@}, like
@samp{(img (@@ (src "http://example.com/foo.png")))}. @xref{sxml
simple}, for more information.
The good thing about SXML is that HTML elements cannot be confused with
text. Let's make a new definition of @code{para}:
@example
(define (para . contents)
`(p ,@@contents))
(use-modules (sxml simple))
(sxml->xml (you-said "Hi!"))
@print{} <p>You said: Hi!</p>
(sxml->xml (you-said "<i>Rats, foiled again!</i>"))
@print{} <p>You said: &lt;i&gt;Rats, foiled again!&lt;/i&gt;</p>
@end example
So we see in the second example that HTML elements cannot be unwittingly
introduced into the output. However it is perfectly acceptable to pass
SXML to @code{you-said}; in fact, that is the big advantage of SXML over
everything-as-a-string.
@example
(sxml->xml (you-said (you-said "<Hi!>")))
@print{} <p>You said: <p>You said: &lt;Hi!&gt;</p></p>
@end example
The SXML types allow procedures to @emph{compose}. The types make
manifest which parts are HTML elements, and which are text. So you
needn't worry about escaping user input; the type transition back to a
string handles that for you. @acronym{XSS} vulnerabilities are a thing
of the past.
Well. That's all very nice and opinionated and such, but how do I use
the thing? Read on!
@node URIs
@subsection Universal Resource Identifiers
Guile provides a standard data type for Universal Resource Identifiers
(URIs), as defined in RFC 3986.
The generic URI syntax is as follows:
@example
URI := scheme ":" ["//" [userinfo "@@"] host [":" port]] path \
[ "?" query ] [ "#" fragment ]
@end example
For example, in the URI, @indicateurl{http://www.gnu.org/help/}, the
scheme is @code{http}, the host is @code{www.gnu.org}, the path is
@code{/help/}, and there is no userinfo, port, query, or path. All URIs
have a scheme and a path (though the path might be empty). Some URIs
have a host, and some of those have ports and userinfo. Any URI might
have a query part or a fragment.
Userinfo is something of an abstraction, as some legacy URI schemes
allowed userinfo of the form @code{@var{username}:@var{passwd}}. But
since passwords do not belong in URIs, the RFC does not want to condone
this practice, so it calls anything before the @code{@@} sign
@dfn{userinfo}.
Properly speaking, a fragment is not part of a URI. For example, when a
web browser follows a link to @indicateurl{http://example.com/#foo}, it
sends a request for @indicateurl{http://example.com/}, then looks in the
resulting page for the fragment identified @code{foo} reference. A
fragment identifies a part of a resource, not the resource itself. But
it is useful to have a fragment field in the URI record itself, so we
hope you will forgive the inconsistency.
@example
(use-modules (web uri))
@end example
The following procedures can be found in the @code{(web uri)}
module. Load it into your Guile, using a form like the above, to have
access to them.
@defun build-uri scheme [#:userinfo] [#:host] [#:port] [#:path] [#:query] [#:fragment] [#:validate?]
Construct a URI object. If @var{validate?} is true, also run some
consistency checks to make sure that the constructed URI is valid.
@end defun
@defun uri? x
@defunx uri-scheme uri
@defunx uri-userinfo uri
@defunx uri-host uri
@defunx uri-port uri
@defunx uri-path uri
@defunx uri-query uri
@defunx uri-fragment uri
A predicate and field accessors for the URI record type.
@end defun
@defun declare-default-port! scheme port
Declare a default port for the given URI scheme.
Default ports are for printing URI objects: a default port is not
printed.
@end defun
@defun parse-uri string
Parse @var{string} into a URI object. Returns @code{#f} if the string
could not be parsed.
@end defun
@defun unparse-uri uri
Serialize @var{uri} to a string.
@end defun
@defun uri-decode str [#:charset]
Percent-decode the given @var{str}, according to @var{charset}.
Note that this function should not generally be applied to a full URI
string. For paths, use split-and-decode-uri-path instead. For query
strings, split the query on @code{&} and @code{=} boundaries, and decode
the components separately.
Note that percent-encoded strings encode @emph{bytes}, not characters.
There is no guarantee that a given byte sequence is a valid string
encoding. Therefore this routine may signal an error if the decoded
bytes are not valid for the given encoding. Pass @code{#f} for
@var{charset} if you want decoded bytes as a bytevector directly.
@end defun
@defun uri-encode str [#:charset] [#:unescaped-chars]
Percent-encode any character not in @var{unescaped-chars}.
Percent-encoding first writes out the given character to a bytevector
within the given @var{charset}, then encodes each byte as
@code{%@var{HH}}, where @var{HH} is the hexadecimal representation of
the byte.
@end defun
@defun split-and-decode-uri-path path
Split @var{path} into its components, and decode each component,
removing empty components.
For example, @code{"/foo/bar/"} decodes to the two-element list,
@code{("foo" "bar")}.
@end defun
@defun encode-and-join-uri-path parts
URI-encode each element of @var{parts}, which should be a list of
strings, and join the parts together with @code{/} as a delimiter.
@end defun
@node HTTP
@subsection The Hyper-Text Transfer Protocol
The initial motivation for including web functionality in Guile, rather
than rely on an external package, was to establish a standard base on
which people can share code. To that end, we continue the focus on data
types by providing a number of low-level parsers and unparsers for
elements of the HTTP protocol.
If you are want to skip the low-level details for now and move on to web
pages, @pxref{Web Server}. Otherwise, load the HTTP module, and read
on.
@example
(use-modules (web http))
@end example
The focus of the @code{(web http)} module is to parse and unparse
standard HTTP headers, representing them to Guile as native data
structures. For example, a @code{Date:} header will be represented as a
SRFI-19 date record (@pxref{SRFI-19}), rather than as a string.
Guile tries to follow RFCs fairly strictly---the road to perdition being
paved with compatibility hacks---though some allowances are made for
not-too-divergent texts.
The first bit is to define a registry of parsers, validators, and
unparsers, keyed by header name. That is the function of the
@code{<header-decl>} object.
@defun make-header-decl sym name multiple? parser validator writer
@defunx header-decl? x
@defunx header-decl-sym decl
@defunx header-decl-name decl
@defunx header-decl-multiple? decl
@defunx header-decl-parser decl
@defunx header-decl-validator decl
@defunx header-decl-writer decl.
A constructor, predicate, and field accessors for the
@code{<header-decl>} type. The fields are as follows:
@table @code
@item sym
The symbol name for this header field, always in lower-case. For
example, @code{"Content-Length"} has a symbolic name of
@code{content-length}.
@item name
The string name of the header, in its preferred capitalization.
@item multiple?
@code{#t} iff this header may appear multiple times in a message.
@item parser
A procedure which takes a string and returns a parsed value.
@item validator
A predicate, returning @code{#t} iff the value is valid for this header.
@item writer
A writer, which writes a value to the port given in the second argument.
@end table
@end defun
@defun declare-header! sym name [#:multiple?] [#:parser] [#:validator] [#:writer]
Make a header declaration, as above, and register it by symbol and by
name.
@end defun
@defun lookup-header-decl name
Return the @var{header-decl} object registered for the given @var{name}.
@var{name} may be a symbol or a string. Strings are mapped to headers in
a case-insensitive fashion.
@end defun
@defun valid-header? sym val
Returns a true value iff @var{val} is a valid Scheme value for the
header with name @var{sym}.
@end defun
Now that we have a generic interface for reading and writing headers, we
do just that.
@defun read-header port
Reads one HTTP header from @var{port}. Returns two values: the header
name and the parsed Scheme value. May raise an exception if the header
was known but the value was invalid.
Returns @var{#f} for both values if the end of the message body was
reached (i.e., a blank line).
@end defun
@defun parse-header name val
Parse @var{val}, a string, with the parser for the header named
@var{name}.
Returns two values, the header name and parsed value. If a parser was
found, the header name will be returned as a symbol. If a parser was not
found, both the header name and the value are returned as strings.
@end defun
@defun write-header name val port
Writes the given header name and value to @var{port}. If @var{name} is a
symbol, looks up a declared header and uses that writer. Otherwise the
value is written using @var{display}.
@end defun
@defun read-headers port
Read an HTTP message from @var{port}, returning the headers as an
ordered alist.
@end defun
@defun write-headers headers port
Write the given header alist to @var{port}. Doesn't write the final
\r\n, as the user might want to add another header.
@end defun
The @code{(web http)} module also has some utility procedures to read
and write request and response lines.
@defun parse-http-method str [start] [end]
Parse an HTTP method from @var{str}. The result is an upper-case symbol,
like @code{GET}.
@end defun
@defun parse-http-version str [start] [end]
Parse an HTTP version from @var{str}, returning it as a major-minor
pair. For example, @code{HTTP/1.1} parses as the pair of integers,
@code{(1 . 1)}.
@end defun
@defun parse-request-uri str [start] [end]
Parse a URI from an HTTP request line. Note that URIs in requests do not
have to have a scheme or host name. The result is a URI object.
@end defun
@defun read-request-line port
Read the first line of an HTTP request from @var{port}, returning three
values: the method, the URI, and the version.
@end defun
@defun write-request-line method uri version port
Write the first line of an HTTP request to @var{port}.
@end defun
@defun read-response-line port
Read the first line of an HTTP response from @var{port}, returning three
values: the HTTP version, the response code, and the "reason phrase".
@end defun
@defun write-response-line version code reason-phrase port
Write the first line of an HTTP response to @var{port}.
@end defun
@node HTTP Headers
@subsection HTTP Headers
The @code{(web http)} module defines parsers and unparsers for all
headers defined in the HTTP/1.1 standard. This section describes the
parsed format of the various headers.
We cannot describe the function of all of these headers, however, in
sufficient detail. The interested reader would do well to download a
copy of RFC 2616 and have it on hand.
To begin with, we should make a few definitions:
@table @dfn
@item key-value list
A key-value list is a list of values. Each value may be a string,
a symbol, or a pair. Known keys are parsed to symbols; otherwise keys
are left as strings. Keys with values are parsed to pairs, the car of
which is the symbol or string key, and the cdr is the parsed value.
Parsed values for known keys have key-dependent formats. Parsed values
for unknown keys are strings.
@item param list
A param list is a list of key-value lists. When serialized to a string,
items in the inner lists are separated by semicolons. Again, known keys
are parsed to symbols.
@item quality
A number of headers have quality values in them, which are decimal
fractions between zero and one indicating a preference for various kinds
of responses, which the server may choose to heed. Given that only
three digits are allowed in the fractional part, Guile parses quality
values to integers between 0 and 1000 instead of inexact numbers between
0.0 and 1.0.
@item quality list
A list of pairs, the car of which is a quality value.
@item entity tag
A pair, the car of which is an opaque string, and the cdr of which is
true iff the entity tag is a ``strong'' entity tag.
@end table
@subsubsection General Headers
@table @code
@item cache-control
A key-value list of cache-control directives. Known keys are
@code{max-age}, @code{max-stale}, @code{min-fresh},
@code{must-revalidate}, @code{no-cache}, @code{no-store},
@code{no-transform}, @code{only-if-cached}, @code{private},
@code{proxy-revalidate}, @code{public}, and @code{s-maxage}.
If present, parameters to @code{max-age}, @code{max-stale},
@code{min-fresh}, and @code{s-maxage} are all parsed as non-negative
integers.
If present, parameters to @code{private} and @code{no-cache} are parsed
as lists of header names, represented as symbols if they are known
headers or strings otherwise.
@item connection
A list of connection tokens. A connection token is a string.
@item date
A SRFI-19 date record.
@item pragma
A key-value list of pragma directives. @code{no-cache} is the only
known key.
@item trailer
A list of header names. Known header names are parsed to symbols,
otherwise they are left as strings.
@item transfer-encoding
A param list of transfer codings. @code{chunked} is the only known key.
@item upgrade
A list of strings.
@item via
A list of strings. There may be multiple @code{via} headers in ne
message.
@item warning
A list of warnings. Each warning is a itself a list of four elements: a
code, as an exact integer between 0 and 1000, a host as a string, the
warning text as a string, and either @code{#f} or a SRFI-19 date.
There may be multiple @code{warning} headers in one message.
@end table
@subsubsection Entity Headers
@table @code
@item allow
A list of methods, as strings. Methods are parsed as strings instead of
@code{parse-http-method} so as to allow for new methods.
@item content-encoding
A list of content codings, as strings.
@item content-language
A list of language tags, as strings.
@item content-length
An exact, non-negative integer.
@item content-location
A URI record.
@item content-md5
A string.
@item content-range
A list of three elements: the symbol @code{bytes}, either the symbol
@code{*} or a pair of integers, indicating the byte rage, and either
@code{*} or an integer, for the instance length.
@item content-type
A pair, the car of which is the media type as a string, and the cdr is
an alist of parameters, with strings as keys and values.
For example, @code{"text/plain"} parses as @code{("text/plain")}, and
@code{"text/plain;charset=utf-8"} parses as @code{("text/plain"
("charset" . "utf-8"))}.
@item expires
A SRFI-19 date.
@item last-modified
A SRFI-19 date.
@end table
@subsubsection Request Headers
@table @code
@item accept
A param list. Each element in the list indicates one media-range
with accept-params. They only known key is @code{q}, whose value is
parsed as a quality value.
@item accept-charset
A quality-list of charsets, as strings.
@item accept-encoding
A quality-list of content codings, as strings.
@item accept-language
A quality-list of languages, as strings.
@item authorization
A string.
@item expect
A param list of expectations. The only known key is
@code{100-continue}.
@item from
A string.
@item host
A pair of the host, as a string, and the port, as an integer. If no port
is given, port is @code{#f}.
@item if-match
Either the symbol @code{*}, or a list of entity tags (see above).
@item if-modified-since
A SRFI-19 date.
@item if-none-match
Either the symbol @code{*}, or a list of entity tags (see above).
@item if-range
Either an entity tag, or a SRFI-19 date.
@item if-unmodified-since
A SRFI-19 date.
@item max-forwards
An exact non-negative integer.
@item proxy-authorization
A string.
@item range
A pair whose car is the symbol @code{bytes}, and whose cdr is a list of
pairs. Each element of the cdr indicates a range; the car is the first
byte position and the cdr is the last byte position, as integers, or
@code{#f} if not given.
@item referer
A URI.
@item te
A param list of transfer-codings. The only known key is
@code{trailers}.
@item user-agent
A string.
@end table
@subsubsection Response Headers
@table @code
@item accept-ranges
A list of strings.
@item age
An exact, non-negative integer.
@item etag
An entity tag.
@item location
A URI.
@item proxy-authenticate
A string.
@item retry-after
Either an exact, non-negative integer, or a SRFI-19 date.
@item server
A string.
@item vary
Either the symbol @code{*}, or a list of headers, with known headers
parsed to symbols.
@item www-authenticate
A string.
@end table
@node Requests
@subsection HTTP Requests
@example
(use-modules (web request))
@end example
The request module contains a data type for HTTP requests. Note that
the body is not part of the request, but the port is. Once you have
read a request, you may read the body separately, and likewise for
writing requests.
@defun build-request [#:method] [#:uri] [#:version] [#:headers] [#:port] [#:meta] [#:validate-headers?]
Construct an HTTP request object. If @var{validate-headers?} is true,
the headers are each run through their respective validators.
@end defun
@defun request?
@defunx request-method
@defunx request-uri
@defunx request-version
@defunx request-headers
@defunx request-meta
@defunx request-port
A predicate and field accessors for the request type. The fields are as
follows:
@table @code
@item method
The HTTP method, for example, @code{GET}.
@item uri
The URI as a URI record.
@item version
The HTTP version pair, like @code{(1 . 1)}.
@item headers
The request headers, as an alist of parsed values.
@item meta
An arbitrary alist of other data, for example information returned in
the @code{sockaddr} from @code{accept} (@pxref{Network Sockets and
Communication}).
@item port
The port on which to read or write a request body, if any.
@end table
@end defun
@defun read-request port [meta]
Read an HTTP request from @var{port}, optionally attaching the given
metadata, @var{meta}.
As a side effect, sets the encoding on @var{port} to ISO-8859-1
(latin-1), so that reading one character reads one byte. See the
discussion of character sets in "HTTP Requests" in the manual, for more
information.
@end defun
@defun write-request r port
Write the given HTTP request to @var{port}.
Returns a new request, whose @code{request-port} will continue writing
on @var{port}, perhaps using some transfer encoding.
@end defun
@defun read-request-body/latin-1 r
Reads the request body from @var{r}, as a string.
Assumes that the request port has ISO-8859-1 encoding, so that the
number of characters to read is the same as the
@code{request-content-length}. Returns @code{#f} if there was no request
body.
@end defun
@defun write-request-body/latin-1 r body
Write @var{body}, a string encodable in ISO-8859-1, to the port
corresponding to the HTTP request @var{r}.
@end defun
@defun read-request-body/bytevector r
Reads the request body from @var{r}, as a bytevector. Returns @code{#f}
if there was no request body.
@end defun
@defun write-request-body/bytevector r bv
Write @var{body}, a bytevector, to the port corresponding to the HTTP
request @var{r}.
@end defun
The various headers that are typically associated with HTTP requests may
be accessed with these dedicated accessors. @xref{HTTP Headers}, for
more information on the format of parsed headers.
@defun request-accept request [default='()]
@defunx request-accept-charset request [default='()]
@defunx request-accept-encoding request [default='()]
@defunx request-accept-language request [default='()]
@defunx request-allow request [default='()]
@defunx request-authorization request [default=#f]
@defunx request-cache-control request [default='()]
@defunx request-connection request [default='()]
@defunx request-content-encoding request [default='()]
@defunx request-content-language request [default='()]
@defunx request-content-length request [default=#f]
@defunx request-content-location request [default=#f]
@defunx request-content-md5 request [default=#f]
@defunx request-content-range request [default=#f]
@defunx request-content-type request [default=#f]
@defunx request-date request [default=#f]
@defunx request-expect request [default='()]
@defunx request-expires request [default=#f]
@defunx request-from request [default=#f]
@defunx request-host request [default=#f]
@defunx request-if-match request [default=#f]
@defunx request-if-modified-since request [default=#f]
@defunx request-if-none-match request [default=#f]
@defunx request-if-range request [default=#f]
@defunx request-if-unmodified-since request [default=#f]
@defunx request-last-modified request [default=#f]
@defunx request-max-forwards request [default=#f]
@defunx request-pragma request [default='()]
@defunx request-proxy-authorization request [default=#f]
@defunx request-range request [default=#f]
@defunx request-referer request [default=#f]
@defunx request-te request [default=#f]
@defunx request-trailer request [default='()]
@defunx request-transfer-encoding request [default='()]
@defunx request-upgrade request [default='()]
@defunx request-user-agent request [default=#f]
@defunx request-via request [default='()]
@defunx request-warning request [default='()]
Return the given request header, or @var{default} if none was present.
@end defun
@defun request-absolute-uri r [default-host] [default-port]
A helper routine to determine the absolute URI of a request, using the
@code{host} header and the default host and port.
@end defun
@node Responses
@subsection HTTP Responses
@example
(use-modules (web response))
@end example
As with requests (@pxref{Requests}), Guile offers a data type for HTTP
responses. Again, the body is represented separately from the request.
@defun response?
@defunx response-version
@defunx response-code
@defunx response-reason-phrase response
@defunx response-headers
@defunx response-port
A predicate and field accessors for the response type. The fields are as
follows:
@table @code
@item version
The HTTP version pair, like @code{(1 . 1)}.
@item code
The HTTP response code, like @code{200}.
@item reason-phrase
The reason phrase, or the standard reason phrase for the response's
code.
@item headers
The response headers, as an alist of parsed values.
@item port
The port on which to read or write a response body, if any.
@end table
@end defun
@defun read-response port
Read an HTTP response from @var{port}, optionally attaching the given
metadata, @var{meta}.
As a side effect, sets the encoding on @var{port} to ISO-8859-1
(latin-1), so that reading one character reads one byte. See the
discussion of character sets in "HTTP Responses" in the manual, for more
information.
@end defun
@defun build-response [#:version] [#:code] [#:reason-phrase] [#:headers] [#:port]
Construct an HTTP response object. If @var{validate-headers?} is true,
the headers are each run through their respective validators.
@end defun
@defun extend-response r k v . additional
Extend an HTTP response by setting additional HTTP headers @var{k},
@var{v}. Returns a new HTTP response.
@end defun
@defun adapt-response-version response version
Adapt the given response to a different HTTP version. Returns a new HTTP
response.
The idea is that many applications might just build a response for the
default HTTP version, and this method could handle a number of
programmatic transformations to respond to older HTTP versions (0.9 and
1.0). But currently this function is a bit heavy-handed, just updating
the version field.
@end defun
@defun write-response r port
Write the given HTTP response to @var{port}.
Returns a new response, whose @code{response-port} will continue writing
on @var{port}, perhaps using some transfer encoding.
@end defun
@defun read-response-body/latin-1 r
Reads the response body from @var{r}, as a string.
Assumes that the response port has ISO-8859-1 encoding, so that the
number of characters to read is the same as the
@code{response-content-length}. Returns @code{#f} if there was no
response body.
@end defun
@defun write-response-body/latin-1 r body
Write @var{body}, a string encodable in ISO-8859-1, to the port
corresponding to the HTTP response @var{r}.
@end defun
@defun read-response-body/bytevector r
Reads the response body from @var{r}, as a bytevector. Returns @code{#f}
if there was no response body.
@end defun
@defun write-response-body/bytevector r bv
Write @var{body}, a bytevector, to the port corresponding to the HTTP
response @var{r}.
@end defun
As with requests, the various headers that are typically associated with
HTTP responses may be accessed with these dedicated accessors.
@xref{HTTP Headers}, for more information on the format of parsed
headers.
@defun response-accept-ranges response [default=#f]
@defunx response-age response [default='()]
@defunx response-allow response [default='()]
@defunx response-cache-control response [default='()]
@defunx response-connection response [default='()]
@defunx response-content-encoding response [default='()]
@defunx response-content-language response [default='()]
@defunx response-content-length response [default=#f]
@defunx response-content-location response [default=#f]
@defunx response-content-md5 response [default=#f]
@defunx response-content-range response [default=#f]
@defunx response-content-type response [default=#f]
@defunx response-date response [default=#f]
@defunx response-etag response [default=#f]
@defunx response-expires response [default=#f]
@defunx response-last-modified response [default=#f]
@defunx response-location response [default=#f]
@defunx response-pragma response [default='()]
@defunx response-proxy-authenticate response [default=#f]
@defunx response-retry-after response [default=#f]
@defunx response-server response [default=#f]
@defunx response-trailer response [default='()]
@defunx response-transfer-encoding response [default='()]
@defunx response-upgrade response [default='()]
@defunx response-vary response [default='()]
@defunx response-via response [default='()]
@defunx response-warning response [default='()]
@defunx response-www-authenticate response [default=#f]
Return the given request header, or @var{default} if none was present.
@end defun
@node Web Server
@subsection Web Server
@code{(web server)} is a generic web server interface, along with a main
loop implementation for web servers controlled by Guile.
@example
(use-modules (web server))
@end example
The lowest layer is the @code{<server-impl>} object, which defines a set
of hooks to open a server, read a request from a client, write a
response to a client, and close a server. These hooks -- @code{open},
@code{read}, @code{write}, and @code{close}, respectively -- are bound
together in a @code{<server-impl>} object. Procedures in this module take a
@code{<server-impl>} object, if needed.
A @code{<server-impl>} may also be looked up by name. If you pass the
@code{http} symbol to @code{run-server}, Guile looks for a variable
named @code{http} in the @code{(web server http)} module, which should
be bound to a @code{<server-impl>} object. Such a binding is made by
instantiation of the @code{define-server-impl} syntax. In this way the
run-server loop can automatically load other backends if available.
The life cycle of a server goes as follows:
@enumerate
@item
The @code{open} hook is called, to open the server. @code{open} takes 0 or
more arguments, depending on the backend, and returns an opaque
server socket object, or signals an error.
@item
The @code{read} hook is called, to read a request from a new client.
The @code{read} hook takes one argument, the server socket. It should
return three values: an opaque client socket, the request, and the
request body. The request should be a @code{<request>} object, from
@code{(web request)}. The body should be a string or a bytevector, or
@code{#f} if there is no body.
If the read failed, the @code{read} hook may return #f for the client
socket, request, and body.
@item
A user-provided handler procedure is called, with the request
and body as its arguments. The handler should return two
values: the response, as a @code{<response>} record from @code{(web
response)}, and the response body as a string, bytevector, or
@code{#f} if not present. We also allow the reponse to be simply an
alist of headers, in which case a default response object is
constructed with those headers.
@item
The @code{write} hook is called with three arguments: the client
socket, the response, and the body. The @code{write} hook returns no
values.
@item
At this point the request handling is complete. For a loop, we
loop back and try to read a new request.
@item
If the user interrupts the loop, the @code{close} hook is called on
the server socket.
@end enumerate
A user may define a server implementation with the following form:
@defun define-server-impl name open read write close
Make a @code{<server-impl>} object with the hooks @var{open},
@var{read}, @var{write}, and @var{close}, and bind it to the symbol
@var{name} in the current module.
@end defun
@defun lookup-server-impl impl
Look up a server implementation. If @var{impl} is a server
implementation already, it is returned directly. If it is a symbol, the
binding named @var{impl} in the @code{(web server @var{impl})} module is
looked up. Otherwise an error is signaled.
Currently a server implementation is a somewhat opaque type, useful only
for passing to other procedures in this module, like @code{read-client}.
@end defun
The @code{(web server)} module defines a number of routines that use
@code{<server-impl>} objects to implement parts of a web server. Given
that we don't expose the accessors for the various fields of a
@code{<server-impl>}, indeed these routines are the only procedures with
any access to the impl objects.
@defun open-server impl open-params
Open a server for the given implementation. Returns one value, the new
server object. The implementation's @code{open} procedure is applied to
@var{open-params}, which should be a list.
@end defun
@defun read-client impl server
Read a new client from @var{server}, by applying the implementation's
@code{read} procedure to the server. If successful, returns three
values: an object corresponding to the client, a request object, and the
request body. If any exception occurs, returns @code{#f} for all three
values.
@end defun
@defun handle-request handler request body state
Handle a given request, returning the response and body.
The response and response body are produced by calling the given
@var{handler} with @var{request} and @var{body} as arguments.
The elements of @var{state} are also passed to @var{handler} as
arguments, and may be returned as additional values. The new
@var{state}, collected from the @var{handler}'s return values, is then
returned as a list. The idea is that a server loop receives a handler
from the user, along with whatever state values the user is interested
in, allowing the user's handler to explicitly manage its state.
@end defun
@defun sanitize-response request response body
"Sanitize" the given response and body, making them appropriate for the
given request.
As a convenience to web handler authors, @var{response} may be given as
an alist of headers, in which case it is used to construct a default
response. Ensures that the response version corresponds to the request
version. If @var{body} is a string, encodes the string to a bytevector,
in an encoding appropriate for @var{response}. Adds a
@code{content-length} and @code{content-type} header, as necessary.
If @var{body} is a procedure, it is called with a port as an argument,
and the output collected as a bytevector. In the future we might try to
instead use a compressing, chunk-encoded port, and call this procedure
later, in the write-client procedure. Authors are advised not to rely on
the procedure being called at any particular time.
@end defun
@defun write-client impl server client response body
Write an HTTP response and body to @var{client}. If the server and
client support persistent connections, it is the implementation's
responsibility to keep track of the client thereafter, presumably by
attaching it to the @var{server} argument somehow.
@end defun
@defun close-server impl server
Release resources allocated by a previous invocation of
@code{open-server}.
@end defun
Given the procedures above, it is a small matter to make a web server:
@defun serve-one-client handler impl server state
Read one request from @var{server}, call @var{handler} on the request
and body, and write the response to the client. Returns the new state
produced by the handler procedure.
@end defun
@defun run-server handler [impl] [open-params] . state
Run Guile's built-in web server.
@var{handler} should be a procedure that takes two or more arguments,
the HTTP request and request body, and returns two or more values, the
response and response body.
For example, here is a simple "Hello, World!" server:
@example
(define (handler request body)
(values '((content-type . ("text/plain")))
"Hello, World!"))
(run-server handler)
@end example
The response and body will be run through @code{sanitize-response}
before sending back to the client.
Additional arguments to @var{handler} are taken from @var{state}.
Additional return values are accumulated into a new @var{state}, which
will be used for subsequent requests. In this way a handler can
explicitly manage its state.
The default server implementation is @code{http}, which accepts
@var{open-params} like @code{(#:port 8081)}, among others. See "Web
Server" in the manual, for more information.
@end defun
@node Web Examples
@subsection Web Examples
Well, enough about the tedious internals. Let's make a web application!
@subsubsection Hello, World!
The first program we have to write, of course, is ``Hello, World!''.
This means that we have to implement a web handler that does what we
want.
Now we define a handler, a function of two arguments and two return
values:
@example
(define (handler request request-body)
(values @var{response} @var{response-body}))
@end example
In this first example, we take advantage of a short-cut, returning an
alist of headers instead of a proper response object. The response body
is our payload:
@example
(define (hello-world-handler request request-body)
(values '((content-type . ("text/plain")))
"Hello World!"))
@end example
Now let's test it, by running a server with this handler. Load up the
web server module if you haven't yet done so, and run a server with this
handler:
@example
(use-modules (web server))
(run-server hello-world-handler)
@end example
By default, the web server listens for requests on
@code{localhost:8080}. Visit that address in your web browser to
test. If you see the string, @code{Hello World!}, sweet!
@subsubsection Inspecting the Request
The Hello World program above is a general greeter, responding to all
URIs. To make a more exclusive greeter, we need to inspect the request
object, and conditionally produce different results. So let's load up
the request, response, and URI modules, and do just that.
@example
(use-modules (web server)) ; you probably did this already
(use-modules (web request)
(web response)
(web uri))
(define (request-path-components request)
(split-and-decode-uri-path (uri-path (request-uri request))))
(define (hello-hacker-handler request body)
(if (equal? (request-path-components request)
'("hacker"))
(values '((content-type . ("text/plain")))
"Hello hacker!")
(not-found request)))
(run-server hello-hacker-handler)
@end example
Here we see that we have defined a helper to return the components of
the URI path as a list of strings, and used that to check for a request
to @code{/hacker/}. Then the success case is just as before -- visit
@code{http://localhost:8080/hacker/} in your browser to check.
You should always match against URI path components as decoded by
@code{split-and-decode-uri-path}. The above example will work for
@code{/hacker/}, @code{//hacker///}, and @code{/h%61ck%65r}.
But we forgot to define @code{not-found}! If you are pasting these
examples into a REPL, accessing any other URI in your web browser will
drop your Guile console into the debugger:
@example
<unnamed port>:38:7: In procedure module-lookup:
<unnamed port>:38:7: Unbound variable: not-found
Entering a new prompt. Type `,bt' for a backtrace or `,q' to continue.
scheme@@(guile-user) [1]>
@end example
So let's define the function, right there in the debugger. As you
probably know, we'll want to return a 404 response.
@example
;; Paste this in your REPL
(define (not-found request)
(values (build-response #:code 404)
(string-append "Resource not found: "
(unparse-uri (request-uri request)))))
;; Now paste this to let the web server keep going:
,continue
@end example
Now if you access @code{http://localhost/foo/}, you get this error
message. (Note that some popular web browsers won't show
server-generated 404 messages, showing their own instead, unless the 404
message body is long enough.)
@subsubsection Higher-Level Interfaces
The web handler interface is a common baseline that all kinds of Guile
web applications can use. You will usually want to build something on
top of it, however, especially when producing HTML. Here is a simple
example that builds up HTML output using SXML (@pxref{sxml simple}).
First, load up the modules:
@example
(use-modules (web server)
(web request)
(web response)
(sxml simple))
@end example
Now we define a simple templating function that takes a list of HTML
body elements, as SXML, and puts them in our super template:
@example
(define (templatize title body)
`(html (head (title ,title))
(body ,@@body)))
@end example
For example, the simplest Hello HTML can be produced like this:
@example
(sxml->xml (templatize "Hello!" '((b "Hi!"))))
@print{}
<html><head><title>Hello!</title></head><body><b>Hi!</b></body></html>
@end example
Much better to work with Scheme data types than to work with HTML as
strings. Now we define a little response helper:
@example
(define* (respond #:optional body #:key
(status 200)
(title "Hello hello!")
(doctype "<!DOCTYPE html>\n")
(content-type-params '(("charset" . "utf-8")))
(content-type "text/html")
(extra-headers '())
(sxml (and body (templatize title body))))
(values (build-response
#:code status
#:headers `((content-type
. (,content-type ,@@content-type-params))
,@@extra-headers))
(lambda (port)
(if sxml
(begin
(if doctype (display doctype port))
(sxml->xml sxml port))))))
@end example
Here we see the power of keyword arguments with default initializers. By
the time the arguments are fully parsed, the @code{sxml} local variable
will hold the templated SXML, ready for sending out to the client.
Instead of returning the body as a string, here we give a procedure,
which will be called by the web server to write out the response to the
client.
Now, a simple example using this responder, which lays out the incoming
headers in an HTML table.
@example
(define (debug-page request body)
(respond
`((h1 "hello world!")
(table
(tr (th "header") (th "value"))
,@@(map (lambda (pair)
`(tr (td (tt ,(with-output-to-string
(lambda () (display (car pair))))))
(td (tt ,(with-output-to-string
(lambda ()
(write (cdr pair))))))))
(request-headers request))))))
(run-server debug-page)
@end example
Now if you visit any local address in your web browser, we actually see
some HTML, finally.
@subsubsection Conclusion
Well, this is about as far as Guile's built-in web support goes, for
now. There are many ways to make a web application, but hopefully by
standardizing the most fundamental data types, users will be able to
choose the approach that suits them best, while also being able to
switch between implementations of the server. This is a relatively new
part of Guile, so if you have feedback, let us know, and we can take it
into account. Happy hacking on the web!
@c Local Variables:
@c TeX-master: "guile.texi"
@c End: