mirror of
https://git.savannah.gnu.org/git/guile.git
synced 2025-04-30 11:50:28 +02:00
Doc updates for character encoding of source code files
* NEWS * doc/ref/scheme-scripts.texi: doc updates for character encoding of source code * doc/ref/api-evaluation.texi: doc updates for character encoding of source code
This commit is contained in:
parent
28cc8dac2f
commit
8748ffeaa7
3 changed files with 88 additions and 0 deletions
12
NEWS
12
NEWS
|
@ -10,6 +10,18 @@ prerelease, and a full NEWS corresponding to 1.8 -> 2.0.)
|
||||||
|
|
||||||
Changes in 1.9.3 (since the 1.9.2 prerelease):
|
Changes in 1.9.3 (since the 1.9.2 prerelease):
|
||||||
|
|
||||||
|
** Non-ASCII source code files can be read, but require coding
|
||||||
|
declarations
|
||||||
|
|
||||||
|
The default reader now handles source code files for some of the
|
||||||
|
non-ASCII character encodings, such as UTF-8. A non-ASCII source file
|
||||||
|
should have an encoding declaration near the top of the file. Also,
|
||||||
|
there is a new function file-encoding that scans a port for a coding
|
||||||
|
declaration.
|
||||||
|
|
||||||
|
The pre-1.9.3 reader handled 8-bit clean but otherwise unspecified source
|
||||||
|
code. This use is now discouraged.
|
||||||
|
|
||||||
** Ports do transcoding
|
** Ports do transcoding
|
||||||
|
|
||||||
Ports now have an associated character encoding, and port read/write
|
Ports now have an associated character encoding, and port read/write
|
||||||
|
|
|
@ -17,6 +17,7 @@ loading, evaluating, and compiling Scheme code at run time.
|
||||||
* Fly Evaluation:: Procedures for on the fly evaluation.
|
* Fly Evaluation:: Procedures for on the fly evaluation.
|
||||||
* Compilation:: How to compile Scheme files and procedures.
|
* Compilation:: How to compile Scheme files and procedures.
|
||||||
* Loading:: Loading Scheme code from file.
|
* Loading:: Loading Scheme code from file.
|
||||||
|
* Character Encoding of Source Files:: Loading non-ASCII Scheme code from file.
|
||||||
* Delayed Evaluation:: Postponing evaluation until it is needed.
|
* Delayed Evaluation:: Postponing evaluation until it is needed.
|
||||||
* Local Evaluation:: Evaluation in a local environment.
|
* Local Evaluation:: Evaluation in a local environment.
|
||||||
* Evaluator Behaviour:: Modifying Guile's evaluator.
|
* Evaluator Behaviour:: Modifying Guile's evaluator.
|
||||||
|
@ -229,6 +230,12 @@ Thus a Guile script often starts like this.
|
||||||
More details on Guile scripting can be found in the scripting section
|
More details on Guile scripting can be found in the scripting section
|
||||||
(@pxref{Guile Scripting}).
|
(@pxref{Guile Scripting}).
|
||||||
|
|
||||||
|
There is one special case where the contents of a comment can actually
|
||||||
|
affect the interpretation of code. When a character encoding
|
||||||
|
declaration, such as @code{coding: utf-8} appears in one of the first
|
||||||
|
few lines of a source file, it indicates to Guile's default reader
|
||||||
|
that this source code file is not ASCII. For details see @ref{Character
|
||||||
|
Encoding of Source Files}.
|
||||||
|
|
||||||
@node Case Sensitivity
|
@node Case Sensitivity
|
||||||
@subsubsection Case Sensitivity
|
@subsubsection Case Sensitivity
|
||||||
|
@ -590,6 +597,69 @@ a file to load. By default, @code{%load-extensions} is bound to the
|
||||||
list @code{("" ".scm")}.
|
list @code{("" ".scm")}.
|
||||||
@end defvar
|
@end defvar
|
||||||
|
|
||||||
|
@node Character Encoding of Source Files
|
||||||
|
@subsection Character Encoding of Source Files
|
||||||
|
|
||||||
|
@cindex primitive-load
|
||||||
|
@cindex load
|
||||||
|
Scheme source code files are usually encoded in ASCII, but, the
|
||||||
|
built-in reader can interpret other character encodings. The
|
||||||
|
procedure @code{primitive-load}, and by extension the functions that
|
||||||
|
call it, such as @code{load}, first scan the top 500 characters of the
|
||||||
|
file for a coding declaration.
|
||||||
|
|
||||||
|
A coding declaration has the form @code{coding: XXXXXX}, where
|
||||||
|
@code{XXXXXX} is the name of a character encoding in which the source
|
||||||
|
code file has been encoded. The coding declaration must appear in a
|
||||||
|
scheme comment. It can either be a semicolon-initiated comment or a block
|
||||||
|
@code{#!} comment.
|
||||||
|
|
||||||
|
The name of the character encoding in the coding declaration is
|
||||||
|
typically lower case and containing only letters, numbers, and
|
||||||
|
hyphens. The most common examples of character encodings are
|
||||||
|
@code{utf-8} and @code{iso-8859-1}. This allows the coding
|
||||||
|
declaration to be compatible with EMACS.
|
||||||
|
|
||||||
|
For source code, only a subset of all possible character encodings can
|
||||||
|
be interpreted by the built-in source code reader. Only those
|
||||||
|
character encodings in which ASCII text appears unmodified can be
|
||||||
|
used. This includes @code{UTF-8} and @code{ISO-8859-1} through
|
||||||
|
@code{ISO-8859-15}. The multi-byte character encodings @code{UTF-16}
|
||||||
|
and @code{UTF-32} may not be used because they are not compatible with
|
||||||
|
ASCII.
|
||||||
|
|
||||||
|
@cindex read
|
||||||
|
@cindex set-port-encoding!
|
||||||
|
There might be a scenario in which one would want to read non-ASCII
|
||||||
|
code from a port, such as with the function @code{read}, instead of
|
||||||
|
with @code{load}. If the port's character encoding is the same as the
|
||||||
|
encoding of the code to be read by the port, not other special
|
||||||
|
handling is necessary. The port will automatically do the character
|
||||||
|
encoding conversion. The functions @code{setlocale} or by
|
||||||
|
@code{set-port-encoding!} are used to set port encodings.
|
||||||
|
|
||||||
|
If a port is used to read code of unknown character encoding, it can
|
||||||
|
accomplish this in three steps. First, the character encoding of the
|
||||||
|
port should be set to ISO-8859-1 using @code{set-port-encoding!}.
|
||||||
|
Then, the procedure @code{file-encoding}, described below, is used to
|
||||||
|
scan for a coding declaration when reading from the port. As a side
|
||||||
|
effect, it rewinds the port after its scan is complete. After that,
|
||||||
|
the port's character encoding should be set to the encoding returned
|
||||||
|
by @code{file-encoding}, if any, again by using
|
||||||
|
@code{set-port-encoding!}. Then the code can be read as normal.
|
||||||
|
|
||||||
|
@deffn {Scheme Procedure} file-encoding port
|
||||||
|
@deffnx {C Function} scm_file_encoding port
|
||||||
|
Scans the port for an EMACS-like character coding declaration near the
|
||||||
|
top of the contents of a port with random-acessible contents. The
|
||||||
|
coding declaration is of the form @code{coding: XXXXX} and must appear
|
||||||
|
in a scheme comment.
|
||||||
|
|
||||||
|
Returns a string containing the character encoding of the file
|
||||||
|
if a declaration was found, or @code{#f} otherwise. The port is
|
||||||
|
rewound.
|
||||||
|
@end deffn
|
||||||
|
|
||||||
|
|
||||||
@node Delayed Evaluation
|
@node Delayed Evaluation
|
||||||
@subsection Delayed Evaluation
|
@subsection Delayed Evaluation
|
||||||
|
|
|
@ -63,6 +63,12 @@ The second line of the script should contain only the characters
|
||||||
operating system never reads this far, but Guile treats this as the end
|
operating system never reads this far, but Guile treats this as the end
|
||||||
of the comment begun on the first line by the @samp{#!} characters.
|
of the comment begun on the first line by the @samp{#!} characters.
|
||||||
|
|
||||||
|
@item
|
||||||
|
If this source code file is not ASCII or ISO-8859-1 encoded, a coding
|
||||||
|
declaration such as @code{coding: utf-8} should appear in a comment
|
||||||
|
somewhere in the first five lines of the file: see @ref{Character
|
||||||
|
Encoding of Source Files}.
|
||||||
|
|
||||||
@item
|
@item
|
||||||
The rest of the file should be a Scheme program.
|
The rest of the file should be a Scheme program.
|
||||||
|
|
||||||
|
|
Loading…
Add table
Add a link
Reference in a new issue