1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-04-29 19:30:36 +02:00

Add sandboxed evaluation facility

* module/ice-9/sandbox.scm: New file.
* module/Makefile.am (SOURCES): Add new file.
* doc/ref/api-evaluation.texi (Sandboxed Evaluation): New section.
* NEWS: Update.
* test-suite/tests/sandbox.test: New file.
* test-suite/Makefile.am: Add new file.
This commit is contained in:
Andy Wingo 2017-04-18 20:39:40 +02:00
parent 622abec1d2
commit 7c71be0c7e
6 changed files with 1768 additions and 0 deletions

7
NEWS
View file

@ -10,6 +10,13 @@ Changes in 2.2.1 (since 2.2.0):
* Notable changes
** New sandboxed evaluation facility
Guile now has a way to execute untrusted code in a safe way. See
"Sandboxed Evaluation" in the manual for full details, including some
important notes on limitations on the sandbox's ability to prevent
resource exhaustion.
** All literal constants are read-only
According to the Scheme language definition, it is an error to attempt

View file

@ -22,6 +22,7 @@ loading, evaluating, and compiling Scheme code at run time.
* Delayed Evaluation:: Postponing evaluation until it is needed.
* Local Evaluation:: Evaluation in a local lexical environment.
* Local Inclusion:: Compile-time inclusion of one file in another.
* Sandboxed Evaluation:: Evaluation with limited capabilities.
* REPL Servers:: Serving a REPL over a socket.
* Cooperative REPL Servers:: REPL server for single-threaded applications.
@end menu
@ -1227,6 +1228,270 @@ the source files for a package (as you should!). It makes it possible
to evaluate an installed file from source, instead of relying on the
@code{.go} file being up to date.
@node Sandboxed Evaluation
@subsection Sandboxed Evaluation
Sometimes you would like to evaluate code that comes from an untrusted
party. The safest way to do this is to buy a new computer, evaluate the
code on that computer, then throw the machine away. However if you are
unwilling to take this simple approach, Guile does include a limited
``sandbox'' facility that can allow untrusted code to be evaluated with
some confidence.
To use the sandboxed evaluator, load its module:
@example
(use-modules (ice-9 sandbox))
@end example
Guile's sandboxing facility starts with the ability to restrict the time
and space used by a piece of code.
@deffn {Scheme Procedure} call-with-time-limit limit thunk limit-reached
Call @var{thunk}, but cancel it if @var{limit} seconds of wall-clock
time have elapsed. If the computation is cancelled, call
@var{limit-reached} in tail position. @var{thunk} must not disable
interrupts or prevent an abort via a @code{dynamic-wind} unwind handler.
@end deffn
@deffn {Scheme Procedure} call-with-allocation-limit limit thunk limit-reached
Call @var{thunk}, but cancel it if @var{limit} bytes have been
allocated. If the computation is cancelled, call @var{limit-reached} in
tail position. @var{thunk} must not disable interrupts or prevent an
abort via a @code{dynamic-wind} unwind handler.
This limit applies to both stack and heap allocation. The computation
will not be aborted before @var{limit} bytes have been allocated, but
for the heap allocation limit, the check may be postponed until the next garbage collection.
Note that as a current shortcoming, the heap size limit applies to all
threads; concurrent allocation by other unrelated threads counts towards
the allocation limit.
@end deffn
@deffn {Scheme Procedure} call-with-time-and-allocation-limits time-limit allocation-limit thunk
Invoke @var{thunk} in a dynamic extent in which its execution is limited
to @var{time-limit} seconds of wall-clock time, and its allocation to
@var{allocation-limit} bytes. @var{thunk} must not disable interrupts
or prevent an abort via a @code{dynamic-wind} unwind handler.
If successful, return all values produced by invoking @var{thunk}. Any
uncaught exception thrown by the thunk will propagate out. If the time
or allocation limit is exceeded, an exception will be thrown to the
@code{limit-exceeded} key.
@end deffn
The time limit and stack limit are both very precise, but the heap limit
only gets checked asynchronously, after a garbage collection. In
particular, if the heap is already very large, the number of allocated
bytes between garbage collections will be large, and therefore the
precision of the check is reduced.
Additionally, due to the mechanism used by the allocation limit (the
@code{after-gc-hook}), large single allocations like @code{(make-vector
#e1e7)} are only detected after the allocation completes, even if the
allocation itself causes garbage collection. It's possible therefore
for user code to not only exceed the allocation limit set, but also to
exhaust all available memory, causing out-of-memory conditions at any
allocation site. Failure to allocate memory in Guile itself should be
safe and cause an exception to be thrown, but most systems are not
designed to handle @code{malloc} failures. An allocation failure may
therefore exercise unexpected code paths in your system, so it is a
weakness of the sandbox (and therefore an interesting point of attack).
The main sandbox interface is @code{eval-in-sandbox}.
@deffn {Scheme Procedure} eval-in-sandbox exp [#:time-limit 0.1] @
[#:allocation-limit #e10e6] @
[#:bindings all-pure-bindings] @
[#:module (make-sandbox-module bindings)] @
[#:sever-module? #t]
Evaluate the Scheme expression @var{exp} within an isolated
"sandbox". Limit its execution to @var{time-limit} seconds of
wall-clock time, and limit its allocation to @var{allocation-limit}
bytes.
The evaluation will occur in @var{module}, which defaults to the result
of calling @code{make-sandbox-module} on @var{bindings}, which itself
defaults to @code{all-pure-bindings}. This is the core of the
sandbox: creating a scope for the expression that is @dfn{safe}.
A safe sandbox module has two characteristics. Firstly, it will not
allow the expression being evaluated to avoid being cancelled due to
time or allocation limits. This ensures that the expression terminates
in a timely fashion.
Secondly, a safe sandbox module will prevent the evaluation from
receiving information from previous evaluations, or from affecting
future evaluations. All combinations of binding sets exported by
@code{(ice-9 sandbox)} form safe sandbox modules.
The @var{bindings} should be given as a list of import sets. One import
set is a list whose car names an interface, like @code{(ice-9 q)}, and
whose cdr is a list of imports. An import is either a bare symbol or a
pair of @code{(@var{out} . @var{in})}, where @var{out} and @var{in} are
both symbols and denote the name under which a binding is exported from
the module, and the name under which to make the binding available,
respectively. Note that @var{bindings} is only used as an input to the
default initializer for the @var{module} argument; if you pass
@code{#:module}, @var{bindings} is unused. If @var{sever-module?} is
true (the default), the module will be unlinked from the global module
tree after the evaluation returns, to allow @var{mod} to be
garbage-collected.
If successful, return all values produced by @var{exp}. Any uncaught
exception thrown by the expression will propagate out. If the time or
allocation limit is exceeded, an exception will be thrown to the
@code{limit-exceeded} key.
@end deffn
Constructing a safe sandbox module is tricky in general. Guile defines
an easy way to construct safe modules from predefined sets of bindings.
Before getting to that interface, here are some general notes on safety.
@enumerate
@item The time and allocation limits rely on the ability to interrupt
and cancel a computation. For this reason, no binding included in a
sandbox module should be able to indefinitely postpone interrupt
handling, nor should a binding be able to prevent an abort. In practice
this second consideration means that @code{dynamic-wind} should not be
included in any binding set.
@item The time and allocation limits apply only to the
@code{eval-in-sandbox} call. If the call returns a procedure which is
later called, no limit is ``automatically'' in place. Users of
@code{eval-in-sandbox} have to be very careful to reimpose limits when
calling procedures that escape from sandboxes.
@item Similarly, the dynamic environment of the @code{eval-in-sandbox}
call is not necessarily in place when any procedure that escapes from
the sandbox is later called.
This detail prevents us from exposing @code{primitive-eval} to the
sandbox, for two reasons. The first is that it's possible for legacy
code to forge references to any binding, if the
@code{allow-legacy-syntax-objects?} parameter is true. The default for
this parameter is true; @pxref{Syntax Transformer Helpers} for the
details. The parameter is bound to @code{#f} for the duration of the
@code{eval-in-sandbox} call itself, but that will not be in place during
calls to escaped procedures.
The second reason we don't expose @code{primitive-eval} is that
@code{primitive-eval} implicitly works in the current module, which for
an escaped procedure will probably be different than the module that is
current for the @code{eval-in-sandbox} call itself.
The common denominator here is that if an interface exposed to the
sandbox relies on dynamic environments, it is easy to mistakenly grant
the sandboxed procedure additional capabilities in the form of bindings
that it should not have access to. For this reason, the default sets of
predefined bindings do not depend on any dynamically scoped value.
@item Mutation may allow a sandboxed evaluation to break some invariant
in users of data supplied to it. A lot of code culturally doesn't
expect mutation, but if you hand mutable data to a sandboxed evaluation
and you also grant mutating capabilities to that evaluation, then the
sandboxed code may indeed mutate that data. The default set of bindings
to the sandbox do not include any mutating primitives.
Relatedly, @code{set!} may allow a sandbox to mutate a primitive,
invalidating many system-wide invariants. Guile is currently quite
permissive when it comes to imported bindings and mutability. Although
@code{set!} to a module-local or lexically bound variable would be fine,
we don't currently have an easy way to disallow @code{set!} to an
imported binding, so currently no binding set includes @code{set!}.
@item Mutation may allow a sandboxed evaluation to keep state, or
make a communication mechanism with other code. On the one hand this
sounds cool, but on the other hand maybe this is part of your threat
model. Again, the default set of bindings doesn't include mutating
primitives, preventing sandboxed evaluations from keeping state.
@item The sandbox should probably not be able to open a network
connection, or write to a file, or open a file from disk. The default
binding set includes no interaction with the operating system.
@end enumerate
If you, dear reader, find the above discussion interesting, you will
enjoy Jonathan Rees' dissertation, ``A Security Kernel Based on the
Lambda Calculus''.
@defvr {Scheme Variable} all-pure-bindings
All ``pure'' bindings that together form a safe subset of those bindings
available by default to Guile user code.
@end defvr
@defvr {Scheme Variable} all-pure-and-impure-bindings
Like @code{all-pure-bindings}, but additionally including mutating
primitives like @code{vector-set!}. This set is still safe in the sense
mentioned above, with the caveats about mutation.
@end defvr
The components of these composite sets are as follows:
@defvr {Scheme Variable} alist-bindings
@defvrx {Scheme Variable} array-bindings
@defvrx {Scheme Variable} bit-bindings
@defvrx {Scheme Variable} bitvector-bindings
@defvrx {Scheme Variable} char-bindings
@defvrx {Scheme Variable} char-set-bindings
@defvrx {Scheme Variable} clock-bindings
@defvrx {Scheme Variable} core-bindings
@defvrx {Scheme Variable} error-bindings
@defvrx {Scheme Variable} fluid-bindings
@defvrx {Scheme Variable} hash-bindings
@defvrx {Scheme Variable} iteration-bindings
@defvrx {Scheme Variable} keyword-bindings
@defvrx {Scheme Variable} list-bindings
@defvrx {Scheme Variable} macro-bindings
@defvrx {Scheme Variable} nil-bindings
@defvrx {Scheme Variable} number-bindings
@defvrx {Scheme Variable} pair-bindings
@defvrx {Scheme Variable} predicate-bindings
@defvrx {Scheme Variable} procedure-bindings
@defvrx {Scheme Variable} promise-bindings
@defvrx {Scheme Variable} prompt-bindings
@defvrx {Scheme Variable} regexp-bindings
@defvrx {Scheme Variable} sort-bindings
@defvrx {Scheme Variable} srfi-4-bindings
@defvrx {Scheme Variable} string-bindings
@defvrx {Scheme Variable} symbol-bindings
@defvrx {Scheme Variable} unspecified-bindings
@defvrx {Scheme Variable} variable-bindings
@defvrx {Scheme Variable} vector-bindings
@defvrx {Scheme Variable} version-bindings
The components of @code{all-pure-bindings}.
@end defvr
@defvr {Scheme Variable} mutating-alist-bindings
@defvrx {Scheme Variable} mutating-array-bindings
@defvrx {Scheme Variable} mutating-bitvector-bindings
@defvrx {Scheme Variable} mutating-fluid-bindings
@defvrx {Scheme Variable} mutating-hash-bindings
@defvrx {Scheme Variable} mutating-list-bindings
@defvrx {Scheme Variable} mutating-pair-bindings
@defvrx {Scheme Variable} mutating-sort-bindings
@defvrx {Scheme Variable} mutating-srfi-4-bindings
@defvrx {Scheme Variable} mutating-string-bindings
@defvrx {Scheme Variable} mutating-variable-bindings
@defvrx {Scheme Variable} mutating-vector-bindings
The additional components of @code{all-pure-and-impure-bindings}.
@end defvr
Finally, what do you do with a binding set? What is a binding set
anyway? @code{make-sandbox-module} is here for you.
@deffn {Scheme Procedure} make-sandbox-module bindings
Return a fresh module that only contains @var{bindings}.
The @var{bindings} should be given as a list of import sets. One import
set is a list whose car names an interface, like @code{(ice-9 q)}, and
whose cdr is a list of imports. An import is either a bare symbol or a
pair of @code{(@var{out} . @var{in})}, where @var{out} and @var{in} are
both symbols and denote the name under which a binding is exported from
the module, and the name under which to make the binding available,
respectively.
@end deffn
So you see that binding sets are just lists, and
@code{all-pure-and-impure-bindings} is really just the result of
appending all of the component binding sets.
@node REPL Servers
@subsection REPL Servers

View file

@ -103,6 +103,7 @@ SOURCES = \
ice-9/rw.scm \
ice-9/safe-r5rs.scm \
ice-9/safe.scm \
ice-9/sandbox.scm \
ice-9/save-stack.scm \
ice-9/scm-style-repl.scm \
ice-9/serialize.scm \

1399
module/ice-9/sandbox.scm Normal file

File diff suppressed because it is too large Load diff

View file

@ -125,6 +125,7 @@ SCM_TESTS = tests/00-initial-env.test \
tests/regexp.test \
tests/rtl.test \
tests/rtl-compilation.test \
tests/sandbox.test \
tests/session.test \
tests/signals.test \
tests/sort.test \

View file

@ -0,0 +1,95 @@
;;;; sandbox.test --- tests guile's evaluator -*- scheme -*-
;;;; Copyright (C) 2017 Free Software Foundation, Inc.
;;;;
;;;; This library is free software; you can redistribute it and/or
;;;; modify it under the terms of the GNU Lesser General Public
;;;; License as published by the Free Software Foundation; either
;;;; version 3 of the License, or (at your option) any later version.
;;;;
;;;; This library is distributed in the hope that it will be useful,
;;;; but WITHOUT ANY WARRANTY; without even the implied warranty of
;;;; MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU
;;;; Lesser General Public License for more details.
;;;;
;;;; You should have received a copy of the GNU Lesser General Public
;;;; License along with this library; if not, write to the Free Software
;;;; Foundation, Inc., 51 Franklin Street, Fifth Floor, Boston, MA 02110-1301 USA
(define-module (test-suite sandbox)
#:use-module (test-suite lib)
#:use-module (ice-9 sandbox))
(define exception:bad-expression
(cons 'syntax-error "Bad expression"))
(define exception:failed-match
(cons 'syntax-error "failed to match any pattern"))
(define exception:not-a-list
(cons 'wrong-type-arg "Not a list"))
(define exception:wrong-length
(cons 'wrong-type-arg "wrong length"))
(define (usleep-loop usecs)
(unless (zero? usecs)
(usleep-loop (usleep usecs))))
(define (busy-loop)
(busy-loop))
(with-test-prefix "time limit"
(pass-if "0 busy loop"
(call-with-time-limit 0 busy-loop (lambda () #t)))
(pass-if "0.001 busy loop"
(call-with-time-limit 0.001 busy-loop (lambda () #t)))
(pass-if "0 sleep"
(call-with-time-limit 0 (lambda () (usleep-loop #e1e6) #f)
(lambda () #t)))
(pass-if "0.001 sleep"
(call-with-time-limit 0.001 (lambda () (usleep-loop #e1e6) #f)
(lambda () #t))))
(define (alloc-loop)
(let lp ((ret #t))
(and ret
(lp (cons #t #t)))))
(define (recur-loop)
(1+ (recur-loop)))
(with-test-prefix "allocation limit"
(pass-if "0 alloc loop"
(call-with-allocation-limit 0 alloc-loop (lambda () #t)))
(pass-if "1e6 alloc loop"
(call-with-allocation-limit #e1e6 alloc-loop (lambda () #t)))
(pass-if "0 recurse"
(call-with-allocation-limit 0 recur-loop (lambda () #t)))
(pass-if "1e6 recurse"
(call-with-allocation-limit #e1e6 recur-loop (lambda () #t))))
(define-syntax-rule (pass-if-unbound foo)
(pass-if-exception (format #f "~a unavailable" 'foo)
exception:unbound-var (eval-in-sandbox 'foo))
)
(with-test-prefix "eval-in-sandbox"
(pass-if-equal 42
(eval-in-sandbox 42))
(pass-if-equal 'foo
(eval-in-sandbox ''foo))
(pass-if-equal '(1 . 2)
(eval-in-sandbox '(cons 1 2)))
(pass-if-unbound @@)
(pass-if-unbound foo)
(pass-if-unbound set!)
(pass-if-unbound open-file)
(pass-if-unbound current-input-port)
(pass-if-unbound call-with-output-file)
(pass-if-unbound vector-set!)
(pass-if-equal vector-set!
(eval-in-sandbox 'vector-set!
#:bindings all-pure-and-impure-bindings))
(pass-if-exception "limit exceeded"
'(limit-exceeded . "")
(eval-in-sandbox '(let lp () (lp)))))