guile/doc/ref/misc-modules.texi

@page
@node Pretty Printing
@chapter Pretty Printing

@c FIXME::martin: Review me!

@cindex pretty printing
The module @code{(ice-9 pretty-print)} provides the procedure
@code{pretty-print}, which provides nicely formatted output of Scheme
objects.  This is especially useful for deeply nested or complex data
structures, such as lists and vectors.

The module is loaded by simply saying.

@lisp
(use-modules (ice-9 pretty-print))
@end lisp

This makes the procedure @code{pretty-print} available.  As an example
how @code{pretty-print} will format the output, see the following:

@lisp
(pretty-print '(define (foo) (lambda (x)
(cond ((zero? x) #t) ((negative? x) -x) (else
(if (= x 1) 2 (* x x x)))))))
@print{}
(define (foo)
  (lambda (x)
    (cond ((zero? x) #t)
          ((negative? x) -x)
          (else (if (= x 1) 2 (* x x x))))))
@end lisp

@deffn {Scheme Procedure} pretty-print obj [port] [keyword-options]
Print the textual representation of the Scheme object @var{obj} to
@var{port}.  @var{port} defaults to the current output port, if not
given.

The further @var{keyword-options} are keywords and parameters as
follows,

@table @asis
@item @nicode{#:display?} @var{flag}
If @var{flag} is true then print using @code{display}.  The default is
@code{#f} which means use @code{write} style.  (@pxref{Writing})

@item @nicode{#:per-line-prefix} @var{string}
Print the given @var{string} as a prefix on each line.  The default is
no prefix.

@item @nicode{#:width} @var{columns}
Print within the given @var{columns}.  The default is 79.
@end table
@end deffn

Beware: Since @code{pretty-print} uses it's own write procedure, it's
output will not be the same as for example the output of @code{write}.
Consider the following example.

@lisp
(write (lambda (x) x))
@print{}
#<procedure #f (x)>

(pretty-print (lambda (x) x))
@print{}
#[procedure]
@end lisp

The reason is that @code{pretty-print} does not know as much about
Guile's object types as the builtin procedures.  This is particularly
important for smobs, for which a write procedure can be defined and be
used by @code{write}, but not by @code{pretty-print}.


@page
@node Formatted Output
@chapter Formatted Output

@c FIXME::martin: Review me!

@cindex format
@cindex formatted output
Outputting messages or other texts which are composed of literal
strings, variable contents, newlines and other formatting can be
cumbersome, when only the standard procedures like @code{display},
@code{write} and @code{newline} are available.  Additionally, one
often wants to collect the output in strings.  With the standard
routines, the user is required to set up a string port, add this port
as a parameter to the output procedure calls and then retrieve the
resulting string from the string port.

The @code{format} procedure, to be found in module @code{(ice-9
format)}, can do all this, and even more.  If you are a C programmer,
you can think of this procedure as Guile's @code{fprintf}.

@deffn {Scheme Procedure} format destination format-string args @dots{}
The first parameter is the @var{destination}, it determines where the
output of @code{format} will go.

@table @asis
@item @code{#t}
Send the formatted output to the current output port and return
@code{#t}.

@item @code{#f}
Return the formatted output as a string.

@item Any number value
Send the formatted output to the current error port and return
@code{#t}.

@item A valid output port
Send the formatted output to the port @var{destination} and return
@code{#t}.
@end table

The second parameter is the format string.  It has a similar function
to the format string in calls to @code{printf} or @code{fprintf} in C.
It is output to the specified destination, but all escape sequences
are replaced by the results of formatting the corresponding sequence.

Note that escape sequences are marked with the character @code{~}
(tilde), and not with a @code{%} (percent sign), as in C.

The escape sequences in the following table are supported.  When there
appears ``corresponding @var{arg}', that means any of the additional
arguments, after dropping all arguments which have been used up by
escape sequences which have been processed earlier.  Some of the
format characters (the characters following the tilde) can be prefixed
by @code{:}, @code{@@}, or @code{:@@}, to modify the behaviour of the
format character.  How the modified behaviour differs from the default
behaviour is described for every character in the table where
appropriate.

@table @code
@item ~~
Output a single @code{~} (tilde) character.

@item ~%
Output a newline character, thus advancing to the next output line.

@item ~&
Start a new line, that is, output a newline character if not already
at the start of a line.

@item ~_
Output a single space character.

@item ~/
Output a single tabulator character.

@item ~|
Output a page separator (formfeed) character.

@item ~t
Advance to the next tabulator position.

@item ~y
Pretty-print the corresponding @var{arg}.

@item ~a
Output the corresponding @var{arg} like @code{display}.

@item ~s
Output the corresponding @var{arg} like @code{write}.

@item ~d
Output the corresponding @var{arg} as a decimal number.

@item ~x
Output the corresponding @var{arg} as a hexadecimal number.

@item ~o
Output the corresponding @var{arg} as an octal number.

@item ~b
Output the corresponding @var{arg} as a binary number.

@item ~r
Output the corresponding @var{arg} as a number word, e.g. 10 prints as
@code{ten}.  If prefixed with @code{:}, @code{tenth} is printed, if
prefixed with @code{:@@}, Roman numbers are printed.

@item ~f
Output the corresponding @var{arg} as a fixed format floating point
number, such as @code{1.34}.

@item ~e
Output the corresponding @var{arg} in exponential notation, such as
@code{1.34E+0}.

@item ~g
This works either like @code{~f} or like @code{~e}, whichever produces
less characters to be written.

@item ~$
Like @code{~f}, but only with two digits after the decimal point.

@item ~i
Output the corresponding @var{arg} as a complex number.

@item ~c
Output the corresponding @var{arg} as a character.  If prefixed with
@code{@@}, it is printed like with @code{write}.  If prefixed with
@code{:}, control characters are treated specially, for example
@code{#\newline} will be printed as @code{^J}.

@item ~p
``Plural''.  If the corresponding @var{arg} is 1, nothing is printed
(or @code{y} if prefixed with @code{@@} or @code{:@@}), otherwise
@code{s} is printed (or @code{ies} if prefixed with @code{@@} or
@code{:@@}).

@item ~?, ~k
Take the corresponding argument as a format string, and the following
argument as a list of values.  Then format the values with respect to
the format string.

@item ~!
Flush the output to the output port.

@item ~#\newline (tilde-newline)
@c FIXME::martin: I don't understand this from the source.
Continuation lines.

@item ~*
Argument jumping. Navigate in the argument list as specified by the
corresponding argument.  If prefixed with @code{:}, jump backwards in
the argument list, if prefixed by @code{:@@}, jump to the parameter
with the absolute index, otherwise jump forward in the argument list.

@item ~(
Case conversion begin.  If prefixed by @code{:}, the following output
string will be capitalized, if prefixed by @code{@@}, the first
character will be capitalized, if prefixed by @code{:@@} it will be
upcased and otherwise it will be downcased.  Conversion stops when the
``Case conversion end'' @code{~)}sequence is encountered.

@item ~)
Case conversion end.  Stop any case conversion currently in effect.

@item ~[
@c FIXME::martin: I don't understand this from the source.
Conditional begin.

@item ~;
@c FIXME::martin: I don't understand this from the source.
Conditional separator.

@item ~]
@c FIXME::martin: I don't understand this from the source.
Conditional end.

@item ~@{
@c FIXME::martin: I don't understand this from the source.
Iteration begin.

@item ~@}
@c FIXME::martin: I don't understand this from the source.
Iteration end.

@item ~^
@c FIXME::martin: I don't understand this from the source.
Up and out.

@item ~'
@c FIXME::martin: I don't understand this from the source.
Character parameter.

@item ~0 @dots{} ~9, ~-, ~+
@c FIXME::martin: I don't understand this from the source.
Numeric parameter.

@item ~v
@c FIXME::martin: I don't understand this from the source.
Variable parameter from next argument.

@item ~#
Parameter is number of remaining args.  The number of the remaining
arguments is prepended to the list of unprocessed arguments.

@item ~,
@c FIXME::martin: I don't understand this from the source.
Parameter separators.

@item ~q
Inquiry message.  Insert a copyright message into the output.
@end table

If any type conversions should fail (for example when using an escape
sequence for number output, but the argument is a string), an error
will be signalled.
@end deffn

You may have noticed that Guile contains a @code{format} procedure
even when the module @code{(ice-9 format)} is not loaded.  The default
@code{format} procedure does not support all escape sequences
documented in this chapter, and will signal an error if you try to use
one of them.  The reason for providing two versions of @code{format}
is that the full-featured module is fairly large and requires some
time to get loaded.  So the Guile maintainers decided not to load the
large version of @code{format} by default, so that the start-up time
of the interpreter is not unnecessarily increased.


@page
@node Rx Regexps
@chapter The Rx Regular Expression Library

[FIXME: this is taken from Gary and Mark's quick summaries and should be
reviewed and expanded.  Rx is pretty stable, so could already be done!]

@cindex rx
@cindex finite automaton

The @file{guile-lang-allover} package provides an interface to Tom
Lord's Rx library (currently only to POSIX regular expressions).  Use of
the library requires a two step process: compile a regular expression
into an efficient structure, then use the structure in any number of
string comparisons.

For example, given the regular expression @samp{abc.} (which matches any
string containing @samp{abc} followed by any single character):

@smalllisp
guile> @kbd{(define r (regcomp "abc."))}
guile> @kbd{r}
#<rgx abc.>
guile> @kbd{(regexec r "abc")}
#f
guile> @kbd{(regexec r "abcd")}
#((0 . 4))
guile>
@end smalllisp

The definitions of @code{regcomp} and @code{regexec} are as follows:

@deffn {Scheme Procedure} regcomp pattern [flags]
Compile the regular expression pattern using POSIX rules.  Flags is
optional and should be specified using symbolic names:
@defvar REG_EXTENDED
use extended POSIX syntax
@end defvar
@defvar REG_ICASE
use case-insensitive matching
@end defvar
@defvar REG_NEWLINE
allow anchors to match after newline characters in the
string and prevents @code{.} or @code{[^...]} from matching newlines.
@end defvar

The @code{logior} procedure can be used to combine multiple flags.
The default is to use
POSIX basic syntax, which makes @code{+} and @code{?}  literals and @code{\+}
and @code{\?}
operators.  Backslashes in @var{pattern} must be escaped if specified in a
literal string e.g., @code{"\\(a\\)\\?"}.
@end deffn

@deffn {Scheme Procedure} regexec regex string [match-pick] [flags]
Match @var{string} against the compiled POSIX regular expression
@var{regex}.
@var{match-pick} and @var{flags} are optional.  Possible flags (which can be
combined using the logior procedure) are:

@defvar REG_NOTBOL
The beginning of line operator won't match the beginning of
@var{string} (presumably because it's not the beginning of a line)
@end defvar

@defvar REG_NOTEOL
Similar to REG_NOTBOL, but prevents the end of line operator
from matching the end of @var{string}.
@end defvar

If no match is possible, regexec returns #f.  Otherwise @var{match-pick}
determines the return value:

@code{#t} or unspecified: a newly-allocated vector is returned,
containing pairs with the indices of the matched part of @var{string} and any
substrings.

@code{""}: a list is returned: the first element contains a nested list
with the matched part of @var{string} surrounded by the the unmatched parts.
Remaining elements are matched substrings (if any).  All returned
substrings share memory with @var{string}.

@code{#f}: regexec returns #t if a match is made, otherwise #f.

vector: the supplied vector is returned, with the first element replaced
by a pair containing the indices of the matched portion of @var{string} and
further elements replaced by pairs containing the indices of matched
substrings (if any).

list: a list will be returned, with each member of the list
specified by a code in the corresponding position of the supplied list:

a number: the numbered matching substring (0 for the entire match).

@code{#\<}: the beginning of @var{string} to the beginning of the part matched
by regex.

@code{#\>}: the end of the matched part of @var{string} to the end of
@var{string}.

@code{#\c}: the "final tag", which seems to be associated with the "cut
operator", which doesn't seem to be available through the posix
interface.

e.g., @code{(list #\< 0 1 #\>)}.  The returned substrings share memory with
@var{string}.
@end deffn

Here are some other procedures that might be used when using regular
expressions:

@deffn {Scheme Procedure} compiled-regexp? obj
Test whether obj is a compiled regular expression.
@end deffn

@deffn {Scheme Procedure} regexp->dfa regex [flags]
@end deffn

@deffn {Scheme Procedure} dfa-fork dfa
@end deffn

@deffn {Scheme Procedure} reset-dfa! dfa
@end deffn

@deffn {Scheme Procedure} dfa-final-tag dfa
@end deffn

@deffn {Scheme Procedure} dfa-continuable? dfa
@end deffn

@deffn {Scheme Procedure} advance-dfa! dfa string
@end deffn


@node File Tree Walk
@chapter File Tree Walk
@cindex file tree walk

The functions in this section traverse a tree of files and
directories, in a fashion similar to the C @code{ftw} and @code{nftw}
routines (@pxref{Working with Directory Trees,,, libc, GNU C Library
Reference Manual}).

@example
(use-modules (ice-9 ftw))
@end example
@sp 1

@defun ftw startname proc ['hash-size n]
Walk the filesystem tree descending from @var{startname}, calling
@var{proc} for each file and directory.

Hard links and symbolic links are followed.  A file or directory is
reported to @var{proc} only once, and skipped if seen again in another
place.  One consequence of this is that @code{ftw} is safe against
circularly linked directory structures.

Each @var{proc} call is @code{(@var{proc} filename statinfo flag)} and
it should return @code{#t} to continue, or any other value to stop.

@var{filename} is the item visited, being @var{startname} plus a
further path and the name of the item.  @var{statinfo} is the return
from @code{stat} (@pxref{File System}) on @var{filename}.  @var{flag}
is one of the following symbols,

@table @code
@item regular
@var{filename} is a file, this includes special files like devices,
named pipes, etc.

@item directory
@var{filename} is a directory.

@item invalid-stat
An error occurred when calling @code{stat}, so nothing is known.
@var{statinfo} is @code{#f} in this case.

@item directory-not-readable
@var{filename} is a directory, but one which cannot be read and hence
won't be recursed into.

@item symlink
@var{filename} is a dangling symbolic link.  Symbolic links are
normally followed and their target reported, the link itself is
reported if the target does not exist.
@end table

The return value from @code{ftw} is @code{#t} if it ran to completion,
or otherwise the non-@code{#t} value from @var{proc} which caused the
stop.

Optional argument symbol @code{hash-size} and an integer can be given
to set the size of the hash table used to track items already visited.
(@pxref{Hash Table Reference})

@c  Actually, it's probably safe to escape from ftw, just need to
@c  check it.
@c
In the current implementation, returning non-@code{#t} from @var{proc}
is the only valid way to terminate @code{ftw}.  @var{proc} must not
use @code{throw} or similar to escape.
@end defun


@defun nftw startname proc ['chdir] ['depth] ['hash-size n] ['mount] ['physical]
Walk the filesystem tree starting at @var{startname}, calling
@var{proc} for each file and directory.  @code{nftw} has extra
features over the basic @code{ftw} described above.

Hard links and symbolic links are followed, but a file or directory is
reported to @var{proc} only once, and skipped if seen again in another
place.  One consequence of this is that @code{nftw} is safe against
circular linked directory structures.

Each @var{proc} call is @code{(@var{proc} filename statinfo flag
basename level)} and it should return @code{#t} to continue, or any
other value to stop.

@var{filename} is the item visited, being @var{startname} plus a
further path and the name of the item.  @var{statinfo} is the return
from @code{stat} on @var{filename} (@pxref{File System}).
@var{basename} it the item name without any path.  @var{level} is an
integer giving the directory nesting level, starting from 0 for the
contents of @var{startname} (or that item itself if it's a file).
@var{flag} is one of the following symbols,

@table @code
@item regular
@var{filename} is a file, this includes special files like devices,
named pipes, etc.

@item directory
@var{filename} is a directory.

@item directory-processed
@var{filename} is a directory, and its contents have all been visited.
This flag is given instead of @code{directory} when the @code{depth}
option below is used.

@item invalid-stat
An error occurred when applying @code{stat} to @var{filename}, so
nothing is known about it.  @var{statinfo} is @code{#f} in this case.

@item directory-not-readable
@var{filename} is a directory, but one which cannot be read and hence
won't be recursed into.

@item symlink
@var{filename} is a dangling symbolic link.  Symbolic links are
normally followed and their target reported, the link itself is
reported if the target does not exist.

Under the @code{physical} option described below, @code{symlink} is
instead given for symbolic links whose target does exist.

@item stale-symlink
Under the @code{physical} option described below, this indicates
@var{filename} is a dangling symbolic link, meaning its target does
not exist.  Without the @code{physical} option plain @code{symlink}
indicates this.
@end table

The following optional arguments can be given to modify the way
@code{nftw} works.  Each is passed as a symbol (and @code{hash-size}
takes a following integer value).

@table @asis
@item @code{chdir}
Change to the directory containing the item before calling @var{proc}.
When @code{nftw} returns the original current directory is restored.

Under this option, generally the @var{basename} parameter should be
used to access the item in each @var{proc} call.  The @var{filename}
parameter still has a path as normal and this will only be valid if
the @var{startname} directory was absolute.

@item @code{depth}
Visit files ``depth first'', meaning @var{proc} is called for the
contents of each directory before it's called for the directory
itself.  Normally a directory is reported first, then its contents.

Under this option, the @var{flag} to @var{proc} for a directory is
@code{directory-processed} instead of @code{directory}.

@item @code{hash-size @var{n}}
Set the size of the hash table used to track items already visited.
(@pxref{Hash Table Reference})

@item @code{mount}
Don't cross a mount point, meaning only visit items on the same
filesystem as @var{startname}.  (Ie.@: the same @code{stat:dev}.)

@item @code{physical}
Don't follow symbolic links, instead report them to @var{proc} as
@code{symlink}, and report dangling links as @code{stale-symlink}.
@end table

The return value from @code{nftw} is @code{#t} if it ran to
completion, or otherwise the non-@code{#t} value from @var{proc} which
caused the stop.

@c  For reference, one reason not to esacpe is that the current
@c  directory is not saved and restored with dynamic-wind.  Maybe
@c  changing that would be enough to allow escaping.
@c
In the current implementation, returning non-@code{#t} from @var{proc}
is the only valid way to terminate @code{ftw}.  @var{proc} must not
use @code{throw} or similar to escape.
@end defun


@c Local Variables:
@c TeX-master: "guile.texi"
@c End: