Doc updates for character encoding of source code files

* NEWS * doc/ref/scheme-scripts.texi: doc updates for character encoding of source code * doc/ref/api-evaluation.texi: doc updates for character encoding of source code
2025-07-04 08:40:21 +02:00 · 2009-09-05 10:42:15 -07:00 · 2009-09-05 10:42:15 -07:00 · 8748ffeaa7
commit 8748ffeaa7
parent 28cc8dac2f
3 changed files with 88 additions and 0 deletions
--- a/12
+++ b/12
@ -10,6 +10,18 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
 Changes in 1.9.3 (since the 1.9.2 prerelease):
 ** Non-ASCII source code files can be read, but require coding
   declarations
 The default reader now handles source code files for some of the
 non-ASCII character encodings, such as UTF-8.  A non-ASCII source file
 should have an encoding declaration near the top of the file.  Also,
 there is a new function file-encoding that scans a port for a coding
 declaration.
 The pre-1.9.3 reader handled 8-bit clean but otherwise unspecified source
 code.  This use is now discouraged.
 ** Ports do transcoding
 Ports now have an associated character encoding, and port read/write
--- a/doc/ref/api-evaluation.texi
+++ b/doc/ref/api-evaluation.texi
@ -17,6 +17,7 @@ loading, evaluating, and compiling Scheme code at run time.
 * Fly Evaluation::              Procedures for on the fly evaluation.
 * Compilation::                 How to compile Scheme files and procedures.
 * Loading::                     Loading Scheme code from file.
 * Character Encoding of Source Files:: Loading non-ASCII Scheme code from file.
 * Delayed Evaluation::          Postponing evaluation until it is needed.
 * Local Evaluation::            Evaluation in a local environment.
 * Evaluator Behaviour::         Modifying Guile's evaluator.
@ -229,6 +230,12 @@ Thus a Guile script often starts like this.
 More details on Guile scripting can be found in the scripting section
 (@pxref{Guile Scripting}).
 There is one special case where the contents of a comment can actually
 affect the interpretation of code.  When a character encoding
 declaration, such as @code{coding: utf-8} appears in one of the first
 few lines of a source file, it indicates to Guile's default reader
 that this source code file is not ASCII.  For details see @ref{Character
 Encoding of Source Files}.
@node Case Sensitivity
@subsubsection Case Sensitivity
@ -590,6 +597,69 @@ a file to load.  By default, @code{%load-extensions} is bound to the
 list @code{("" ".scm")}.
@end defvar
@node Character Encoding of Source Files
@subsection Character Encoding of Source Files
@cindex primitive-load
@cindex load
 Scheme source code files are usually encoded in ASCII, but, the
 built-in reader can interpret other character encodings.  The
 procedure @code{primitive-load}, and by extension the functions that
 call it, such as @code{load}, first scan the top 500 characters of the
 file for a coding declaration.
 A coding declaration has the form @code{coding: XXXXXX}, where
@code{XXXXXX} is the name of a character encoding in which the source
 code file has been encoded.  The coding declaration must appear in a
 scheme comment.  It can either be a semicolon-initiated comment or a block
@code{#!} comment.
 The name of the character encoding in the coding declaration is
 typically lower case and containing only letters, numbers, and
 hyphens.  The most common examples of character encodings are
@code{utf-8} and @code{iso-8859-1}.  This allows the coding
 declaration to be compatible with EMACS.
 For source code, only a subset of all possible character encodings can
 be interpreted by the built-in source code reader.  Only those
 character encodings in which ASCII text appears unmodified can be
 used.  This includes @code{UTF-8} and @code{ISO-8859-1} through
@code{ISO-8859-15}.  The multi-byte character encodings @code{UTF-16}
 and @code{UTF-32} may not be used because they are not compatible with
 ASCII.
@cindex read
@cindex set-port-encoding!
 There might be a scenario in which one would want to read non-ASCII
 code from a port, such as with the function @code{read}, instead of
 with @code{load}.  If the port's character encoding is the same as the
 encoding of the code to be read by the port, not other special
 handling is necessary.  The port will automatically do the character
 encoding conversion.  The functions @code{setlocale} or by
@code{set-port-encoding!} are used to set port encodings.
 If a port is used to read code of unknown character encoding, it can
 accomplish this in three steps.  First, the character encoding of the
 port should be set to ISO-8859-1 using @code{set-port-encoding!}.
 Then, the procedure @code{file-encoding}, described below, is used to
 scan for a coding declaration when reading from the port.  As a side
 effect, it rewinds the port after its scan is complete. After that,
 the port's character encoding should be set to the encoding returned
 by @code{file-encoding}, if any, again by using
@code{set-port-encoding!}.  Then the code can be read as normal.
@deffn {Scheme Procedure} file-encoding port
@deffnx {C Function} scm_file_encoding port
 Scans the port for an EMACS-like character coding declaration near the
 top of the contents of a port with random-acessible contents.  The
 coding declaration is of the form @code{coding: XXXXX} and must appear
 in a scheme comment.
 Returns a string containing the character encoding of the file
 if a declaration was found, or @code{#f} otherwise.  The port is
 rewound.
@end deffn
@node Delayed Evaluation
@subsection Delayed Evaluation
--- a/doc/ref/scheme-scripts.texi
+++ b/doc/ref/scheme-scripts.texi
@ -63,6 +63,12 @@ The second line of the script should contain only the characters
 operating system never reads this far, but Guile treats this as the end
 of the comment begun on the first line by the @samp{#!} characters.
@item
 If this source code file is not ASCII or ISO-8859-1 encoded, a coding
 declaration such as @code{coding: utf-8} should appear in a comment
 somewhere in the first five lines of the file: see @ref{Character
 Encoding of Source Files}.
@item
 The rest of the file should be a Scheme program.