1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-04-30 11:50:28 +02:00

Doc updates for character encoding of source code files

* NEWS

* doc/ref/scheme-scripts.texi: doc updates for character encoding of
  source code

* doc/ref/api-evaluation.texi: doc updates for character encoding of
  source code
This commit is contained in:
Michael Gran 2009-09-05 10:42:15 -07:00
parent 28cc8dac2f
commit 8748ffeaa7
3 changed files with 88 additions and 0 deletions

12
NEWS
View file

@ -10,6 +10,18 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
Changes in 1.9.3 (since the 1.9.2 prerelease): Changes in 1.9.3 (since the 1.9.2 prerelease):
** Non-ASCII source code files can be read, but require coding
declarations
The default reader now handles source code files for some of the
non-ASCII character encodings, such as UTF-8. A non-ASCII source file
should have an encoding declaration near the top of the file. Also,
there is a new function file-encoding that scans a port for a coding
declaration.
The pre-1.9.3 reader handled 8-bit clean but otherwise unspecified source
code. This use is now discouraged.
** Ports do transcoding ** Ports do transcoding
Ports now have an associated character encoding, and port read/write Ports now have an associated character encoding, and port read/write

View file

@ -17,6 +17,7 @@ loading, evaluating, and compiling Scheme code at run time.
* Fly Evaluation:: Procedures for on the fly evaluation. * Fly Evaluation:: Procedures for on the fly evaluation.
* Compilation:: How to compile Scheme files and procedures. * Compilation:: How to compile Scheme files and procedures.
* Loading:: Loading Scheme code from file. * Loading:: Loading Scheme code from file.
* Character Encoding of Source Files:: Loading non-ASCII Scheme code from file.
* Delayed Evaluation:: Postponing evaluation until it is needed. * Delayed Evaluation:: Postponing evaluation until it is needed.
* Local Evaluation:: Evaluation in a local environment. * Local Evaluation:: Evaluation in a local environment.
* Evaluator Behaviour:: Modifying Guile's evaluator. * Evaluator Behaviour:: Modifying Guile's evaluator.
@ -229,6 +230,12 @@ Thus a Guile script often starts like this.
More details on Guile scripting can be found in the scripting section More details on Guile scripting can be found in the scripting section
(@pxref{Guile Scripting}). (@pxref{Guile Scripting}).
There is one special case where the contents of a comment can actually
affect the interpretation of code. When a character encoding
declaration, such as @code{coding: utf-8} appears in one of the first
few lines of a source file, it indicates to Guile's default reader
that this source code file is not ASCII. For details see @ref{Character
Encoding of Source Files}.
@node Case Sensitivity @node Case Sensitivity
@subsubsection Case Sensitivity @subsubsection Case Sensitivity
@ -590,6 +597,69 @@ a file to load. By default, @code{%load-extensions} is bound to the
list @code{("" ".scm")}. list @code{("" ".scm")}.
@end defvar @end defvar
@node Character Encoding of Source Files
@subsection Character Encoding of Source Files
@cindex primitive-load
@cindex load
Scheme source code files are usually encoded in ASCII, but, the
built-in reader can interpret other character encodings. The
procedure @code{primitive-load}, and by extension the functions that
call it, such as @code{load}, first scan the top 500 characters of the
file for a coding declaration.
A coding declaration has the form @code{coding: XXXXXX}, where
@code{XXXXXX} is the name of a character encoding in which the source
code file has been encoded. The coding declaration must appear in a
scheme comment. It can either be a semicolon-initiated comment or a block
@code{#!} comment.
The name of the character encoding in the coding declaration is
typically lower case and containing only letters, numbers, and
hyphens. The most common examples of character encodings are
@code{utf-8} and @code{iso-8859-1}. This allows the coding
declaration to be compatible with EMACS.
For source code, only a subset of all possible character encodings can
be interpreted by the built-in source code reader. Only those
character encodings in which ASCII text appears unmodified can be
used. This includes @code{UTF-8} and @code{ISO-8859-1} through
@code{ISO-8859-15}. The multi-byte character encodings @code{UTF-16}
and @code{UTF-32} may not be used because they are not compatible with
ASCII.
@cindex read
@cindex set-port-encoding!
There might be a scenario in which one would want to read non-ASCII
code from a port, such as with the function @code{read}, instead of
with @code{load}. If the port's character encoding is the same as the
encoding of the code to be read by the port, not other special
handling is necessary. The port will automatically do the character
encoding conversion. The functions @code{setlocale} or by
@code{set-port-encoding!} are used to set port encodings.
If a port is used to read code of unknown character encoding, it can
accomplish this in three steps. First, the character encoding of the
port should be set to ISO-8859-1 using @code{set-port-encoding!}.
Then, the procedure @code{file-encoding}, described below, is used to
scan for a coding declaration when reading from the port. As a side
effect, it rewinds the port after its scan is complete. After that,
the port's character encoding should be set to the encoding returned
by @code{file-encoding}, if any, again by using
@code{set-port-encoding!}. Then the code can be read as normal.
@deffn {Scheme Procedure} file-encoding port
@deffnx {C Function} scm_file_encoding port
Scans the port for an EMACS-like character coding declaration near the
top of the contents of a port with random-acessible contents. The
coding declaration is of the form @code{coding: XXXXX} and must appear
in a scheme comment.
Returns a string containing the character encoding of the file
if a declaration was found, or @code{#f} otherwise. The port is
rewound.
@end deffn
@node Delayed Evaluation @node Delayed Evaluation
@subsection Delayed Evaluation @subsection Delayed Evaluation

View file

@ -63,6 +63,12 @@ The second line of the script should contain only the characters
operating system never reads this far, but Guile treats this as the end operating system never reads this far, but Guile treats this as the end
of the comment begun on the first line by the @samp{#!} characters. of the comment begun on the first line by the @samp{#!} characters.
@item
If this source code file is not ASCII or ISO-8859-1 encoded, a coding
declaration such as @code{coding: utf-8} should appear in a comment
somewhere in the first five lines of the file: see @ref{Character
Encoding of Source Files}.
@item @item
The rest of the file should be a Scheme program. The rest of the file should be a Scheme program.