1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-05-02 04:40:29 +02:00
Commit graph

32 commits

Author SHA1 Message Date
Michael Gran
889975e51a Add full Unicode capability to ports and the default reader
Ports are given two additional properties: a character encoding and
a conversion failure strategy.  These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.

If unspecified, ports use a default value. The default value of these
properties is held in a fluid.  The default character encoding can be
modified by calling setlocale.

ISO-8859-1 is treated specially.  Since it is a native encoding of
strings, it can be processed more quickly.  Source code is assumed to be
ISO-8859-1 unless otherwise specified.  The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.

The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.

* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding

* module/system/base/compile.scm (compile-file): use source-code
  file's self-declared encoding when compiling files

* libguile/strports.c: store string ports in locale encoding
  (scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
  (scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
  new functions

* libguile/strings.h: new declaration for scm_i_string_contains_char

* libguile/strings.c (scm_i_string_contains_char): new function
  (scm_from_stringn, scm_to_stringn):  use NULL for Latin-1
  (scm_from_locale_stringn, scm_to_locale_stringn): respect character
  encoding of input and output ports

* libguile/read.h: declaration for scm_scan_for_encoding

* libguile/read.c:
  (read_token): now takes scheme string instead of C string/length
  (read_complete_token): new function
  (scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
  (scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
  (scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
  (scm_read_scsh_block_comment, scm_read_commented_expression)
  (scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
  (scm_read_expression): use scm_t_wchar for char type, use read_complete_token
  (scm_scan_for_encoding): new function to find a file's character encoding
  (scm_file_encoding): new function to find a port's character encoding

* libguile/rdelim.c: don't unpack strings

* libguile/print.h: declaration for modified function
  scm_i_charprint

* libguile/print.c: use locale when printing characters and
  strings
  (scm_i_charprint): input parameter is now scm_t_wchar
  (scm_simple_format): don't unpack strings

* libguile/posix.h: new declaration for scm_setbinary.

* libguile/posix.c (scm_setlocale): set default and stdio port
  encodings based on the locale's character encoding
  (scm_setbinary): new function

* libguile/ports.h (scm_t_port): add encoding and failed
  conversion handler to port type.  Declarations for new or modified
  functions scm_getc, scm_unget_byte, scm_ungetc,
  scm_i_get_port_encoding, scm_i_set_port_encoding_x,
  scm_port_encoding, scm_set_port_encoding_x,
  scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
  scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.

* libguile/ports.c: assign the current ports to zero on startup so
  we can see if they've been set.
  (scm_current_input_port, scm_current_output_port,
  scm_current_error_port): return #f if the port is not yet
  initialized
  (scm_new_port_table_entry): set up a new port's encoding and
  illegal sequence handler based on the thread's current defaults
  (scm_i_remove_port): free port encoding name when port is removed
  (scm_i_mode_bits_n): now takes a scheme string instead of a c
  string and length.  All callers changed.
  (SCM_MBCHAR_BUF_SIZE): new const
  (scm_getc): new function, since the scm_getc in inline.h is now
  scm_get_byte_or_eof.  This pulls one codepoint from a port.
  (scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
  (scm_unget_byte): new function, incorportaing the low-level functionality
  of scm_ungetc
  (scm_ungetc): uses scm_unget_byte

* libguile/numbers.h (scm_t_wchar): compilation order problem with
  scm_t_wchar being use in functions in multiple headers.  Forward
  declare scm_t_wchar.

* libguile/load.c (scm_primitive_load): scan for file encoding at
  top of file and use it to set the load port's encoding

* libguile/inline.h (scm_get_byte_or_eof): new function
  incorporating most of the functionality of scm_getc.

* libguile/fports.c (fport_fill_input): now returns scm_t_wchar

* libguile/chars.h (scm_t_wchar): avoid compilation order problem
  with declaration of scm_t_wchar
2009-08-25 07:54:37 -07:00
Michael Gran
8ef6962953 Avoid compilation warnings in SCM_MAKE_CHAR
* libguile/chars.h (SCM_MAKE_CHAR): change inequality
2009-08-18 21:13:38 -07:00
Michael Gran
744c8724a7 Quiet signed/unsigned comparison warnings in chars.[ch]
* libguile/chars.h (SCM_MAKE_CHAR): quiet signed/unsigned
  comparison warnings

* libguile/chars.c (scm_i_charname):
  (scm_i_charname_to_char): quiet signed/unsigned comparison
  warnings
2009-08-11 22:56:18 -07:00
Michael Gran
4847dc172f Missing parentheses in SCM_MAKE_CHAR macro
* libguile/chars.h (SCM_MAKE_CHAR): missing parentheses
2009-08-09 16:58:14 -07:00
Michael Gran
a876e7dcea Don't doubly define scm_t_wchar
* libguile/chars.h: don't define scm_t_wchar
        * libguile/numbers.h: define scm_t_wchar here
2009-08-01 11:21:46 -07:00
Michael Gran
f7118e3552 Fix coding style compliance for recent 32-bit char changes
* libguile/print.c (iprin1): extra braces

        * libguile/chars.h (SCM_IS_UNICODE_CHAR): coding style
2009-08-01 11:04:43 -07:00
Michael Gran
4c402b889e Don't use GNU extensions for SCM_MAKE_CHAR macro
Since the contents of SCM_MAKE_CHAR are evaluated more than once,
don't use it in situations where this could cause side-effects.

        * libguile/vm-i-system.c (make-char8): avoid side-effects with
        SCM_MAKE_CHAR call

        * libguile/chars.h (SCM_MAKE_CHAR): modified
2009-08-01 10:15:20 -07:00
Michael Gran
904a78f11d Add 32-bit characters
This adds the 32-bit standalone characters.  Strings are still
8-bit.  Characters larger than 8-bit can only be entered or
displayed in octal format at this point.  At this point, the
terminal's display encoding is expected to be Latin-1.

        * module/language/assembly/compile-bytecode.scm (write-bytecode):
        add 32-bit char

        * module/language/assembly.scm (object->assembly): add 32-bit char
        (assembly->object): add 32-bit char

        * libguile/vm-i-system.c (make-char32): new op

        * libguile/print.c (iprin1): print 32-bit char

        * libguile/numbers.h: add type scm_t_wchar

        * libguile/numbers.c: add type scm_t_wchar

        * libguile/chars.h: new type scm_t_wchar
        (SCM_CODEPOINT_MAX): new
        (SCM_IS_UNICODE_CHAR): new
        (SCM_MAKE_CHAR): operate on 32-bit char

        * libguile/chars.c: comparison operators now use Unicode
        codepoints
        (scm_c_upcase): now receives and returns scm_t_wchar
        (scm_c_downcase): now receives and returns scm_t_wchar
2009-07-29 06:38:32 -07:00
Michael Gran
77332b21a0 Replace global charnames variables with accessors
The global variables scm_charnames and scm_charnums are replaced with
the accessor functions scm_i_charname and scm_i_charname_to_num.
Also, the incomplete and broken EBCDIC support is removed.

       * libguile/print.c (iprin1): use new func scm_i_charname

        * libguile/read.c (scm_read_character): use new func
        scm_i_charname_to_num

        * libguile/chars.c (scm_i_charname): new function
        (scm_i_charname_to_char): new function
        (scm_charnames, scm_charnums): removed

        * libguile/chars.h: new declarations
2009-07-27 21:02:23 -07:00
Neil Jerram
53befeb700 Change Guile license to LGPLv3+
(Not quite finished, the following will be done tomorrow.
   module/srfi/*.scm
   module/rnrs/*.scm
   module/scripts/*.scm
   testsuite/*.scm
   guile-readline/*
)
2009-06-17 00:22:09 +01:00
Ludovic Courtès
102dbb6f6c Add `SCM_INTERNAL' macro, use it. 2008-05-31 23:21:02 +02:00
Kevin Ryde
2b829bbb3d merge from 1.8 branch 2006-04-17 00:05:42 +00:00
Marius Vollmer
92205699d0 The FSF has a new address. 2005-05-23 19:57:22 +00:00
Kevin Ryde
d044f345c9 (scm_tables_prehistory): Remove. 2004-04-24 22:04:51 +00:00
Han-Wen Nienhuys
84fad13058 * srfi-13.c (s_scm_string_map): convert character to unsigned char
before converting to unsigned int. This prevents hi-bit ascii as
being converted large integers.
(string_upcase_x): change caller for scm_{up,down}case to
scm_c_{up,down}case

* chars.h (scm_init_chars): change scm_{upcase,downcase} to
scm_c_{up,down}case.
(SCM_MAKE_CHAR): add (unsigned char) cast. This prevents havoc
when hi-bit ASCII is subjected to SCM_MAKE_CHAR().
2004-04-06 21:48:02 +00:00
Marius Vollmer
73be1d9e8e Changed license terms to the plain LGPL thru-out. 2003-04-05 19:15:35 +00:00
Marius Vollmer
33b001fd89 Prefixed each each exported symbol with SCM_API. 2001-11-02 00:19:12 +00:00
Thien-Thi Nguyen
f5dd4f5419 (SCM_MAKE_CHAR): Use scm_t_bits' instead of intptr_t'.
Thanks to Golubev I. N.
2001-09-26 03:14:15 +00:00
Rob Browning
f6b115d97e * chars.h (SCM_MAKE_CHAR): coerce value to intptr_t. 2001-09-21 17:56:39 +00:00
Dirk Herrmann
0527e68763 * Renamed header macros to the SCM_<filename>_H format. 2001-08-31 10:42:19 +00:00
Martin Grabmüller
58ade1022c * alist.c, arbiters.c, async.h, backtrace.h, boolean.c, chars.c,
chars.h, continuations.h, debug-malloc.h, dynl.c, feature.c,
	feature.h, filesys.h, fluids.h, fports.h, gc_os_dep.c,
	gdb_interface.h, gh_eval.c, gh_funcs.c, gh_io.c, gh_list.c,
	gh_predicates.c, gsubr.c, gsubr.h, guardians.h,
	guile-func-name-check.in, guile-snarf-docs-texi.in,
	guile-snarf-docs.in, guile-snarf.awk.in, guile-snarf.in,
	hashtab.h, iselect.h, keywords.h, lang.c, list.h, load.h,
	objprop.c, objprop.h, options.c, options.h, random.h,
	regex-posix.h, root.c, root.h, script.c, snarf.h, stackchk.c,
	strerror.c, strop.h, strports.h, threads.h, values.c, values.h,
	version.c, version.h: Updated copyright notice.
2001-07-19 21:08:49 +00:00
Rob Browning
8a7fb63c90 * error.h (scm_sysmissing): deprecation expired - removed. 2001-04-27 21:08:44 +00:00
Mikael Djurfeldt
f2c9fcb07e Updated copyrights 2000-06-12 12:28:24 +00:00
Dirk Herrmann
f5f2dcffbe * Wrapped deprecated code between #if (SCM_DEBUG_DEPRECATED == 0) #endif.
* Replace use of deprecated macros SCM_INPORTP, SCM_OUTPORTP, SCM_ICHRP.
2000-05-15 11:47:48 +00:00
Michael Livshin
89e00824a0 * *.[hc]: add Emacs magic at the end of file, to ensure GNU
indentation style.
2000-03-19 19:01:16 +00:00
Greg J. Badros
7866a09b5b * list.c: Moved append docs to append! Thanks Dirk Hermann. Also,
added append docs from R4RS.

* strings.c: Docstring typo fix, + eliminate unneeded IMP tests.
Thanks Dirk Hermann!

* chars.h: Provide SCM_CHARP, SCM_CHAR, SCM_MAKE_CHAR and
deprecate SCM_ICHRP, SCM_ICHR, SCM_MAKICHR.  Thanks Dirk Hermann!

* *.h, *.c: Use SCM_CHARP, SCM_CHAR, SCM_MAKE_CHAR throughout.
Drop use of SCM_P for function prototypes... assume an ANSI C
compiler.  Thanks Dirk Hermann!
2000-03-02 20:54:43 +00:00
Jim Blandy
3ec93c46a1 * chars.h, error.h, feature.h, filesys.h, gc.h, gsubr.h, macros.h,
numbers.h, options.h, procs.h, ramap.h, read.h, smob.h,
strports.h, symbols.h, unif.h: Update variable declarations and
function prototypes for above changes.
1999-02-06 12:29:08 +00:00
Jim Blandy
82892beda5 * Lots of files: New address for FSF. 1997-05-26 22:34:48 +00:00
Jim Blandy
1cc91f1b29 * __scm.h, alist.c, alist.h, append.c, append.h, appinit.c,
arbiters.c, arbiters.h, async.c, async.h, boolean.c, boolean.h,
chars.c, chars.h, continuations.c, continuations.h, debug.c,
debug.h, dynwind.c, dynwind.h, eq.c, eq.h, error.c, eval.c,
eval.h, extchrs.c, extchrs.h, fdsocket.c, fdsocket.h, filesys.c,
filesys.h, fports.c, fports.h, gc.c, gdb_interface.h, gdbint.c,
gdbint.h, genio.c, genio.h, gscm.c, gscm.h, gsubr.c, gsubr.h,
hash.c, hash.h, hashtab.c, hashtab.h, init.c, ioext.c, ioext.h,
kw.c, kw.h, libguile.h, mallocs.c, mallocs.h, markers.c,
markers.h, mbstrings.c, mbstrings.h, numbers.c, numbers.h,
objprop.c, objprop.h, options.c, options.h, pairs.c, pairs.h,
ports.c, ports.h, posix.c, posix.h, print.c, print.h, procprop.c,
procprop.h, procs.c, procs.h, ramap.c, ramap.h, read.c, read.h,
root.c, scmsigs.c, scmsigs.h, sequences.c, sequences.h, simpos.c,
simpos.h, smob.c, socket.c, socket.h, srcprop.c, srcprop.h,
stackchk.c, stackchk.h, stime.c, stime.h, strings.c, strings.h,
strop.c, strop.h, strorder.c, strorder.h, strports.c, strports.h,
struct.c, struct.h, symbols.c, symbols.h, tag.c, tag.h, unif.c,
unif.h, variable.c, variable.h, vectors.c, vectors.h, version.c,
version.h, vports.c, vports.h, weaks.c, weaks.h: Use SCM_P to
declare functions with prototypes.  (Patch thanks to Marius
Vollmer.)
1996-10-14 01:33:50 +00:00
Jim Blandy
b4309c3c5a * alist.h, append.h, arbiters.h, async.h, boolean.h, chars.h,
continuations.h, debug.h, dynwind.h, error.h, eval.h, fdsocket.h,
feature.h, filesys.h, fports.h, gc.h, gdbint.h, genio.h, gsubr.h,
hash.h, init.h, ioext.h, kw.h, list.h, markers.h, marksweep.h,
mbstrings.h, numbers.h, objprop.h, options.h, pairs.h, ports.h,
posix.h, print.h, procprop.h, procs.h, ramap.h, read.h, root.h,
sequences.h, smob.h, socket.h, srcprop.h, stackchk.h, stime.h,
strings.h, strop.h, strorder.h, strports.h, struct.h, symbols.h,
tag.h, throw.h, unif.h, variable.h, vectors.h, version.h,
vports.h, weaks.h: #include "libguile/__scm.h", not
<libguile/__scm.h>.  This allows 'gcc -MM' to determine which
dependencies are within libguile properly.
1996-09-05 21:19:08 +00:00
Jim Blandy
cd4f61de2f Don't install the unwashed masses of Guile header files in the
main #include path; put most of them in a subdirectory called
'libguile'.  This avoids naming conflicts between Guile header
files and system header files (of which there were a few).
* Makefile.in (pkgincludedir): Deleted.
(innerincludedir): New variable; this and $(includedir) are enough.
(INCLUDE_CFLAGS): Search for headers in "-I$(srcdir)/..".
(installed_h_files): Divide this up.  Now this variable lists
those header files which should go into $(includedir) (i.e. appear
directly in the #include path), and ...
(inner_h_files): ... this new variable says which files appear in
a subdirectory, and are referred to as <libguile/mumble.h>.
(h_files): List them both.
(install): Create innerincludedir, not pkgincludedir.  Put
the installed_h_files and inner_h_files in their proper places.
(uninstall): Corresponding changes.
* alist.h, append.h, arbiters.h, async.h, boolean.h, chars.h,
continuations.h, debug.h, dynwind.h, error.h, eval.h, fdsocket.h,
feature.h, fports.h, gc.h, genio.h, gsubr.h, hash.h, init.h,
ioext.h, kw.h, libguile.h, list.h, markers.h, marksweep.h,
mbstrings.h, numbers.h, options.h, pairs.h, ports.h, posix.h,
print.h, procprop.h, procs.h, ramap.h, read.h, root.h,
sequences.h, smob.h, socket.h, srcprop.h, stackchk.h, stime.h,
strings.h, strop.h, strorder.h, strports.h, struct.h, symbols.h,
tag.h, throw.h, unif.h, variable.h, vectors.h, version.h,
vports.h, weaks.h: Find __scm.h in its new location.
* __scm.h: Find scmconfig.h and tags.h in their new locations
(they're both "inner" files).
1996-09-04 06:21:08 +00:00
Jim Blandy
0f2d19dd46 maintainer changed: was lord, now jimb; first import 1996-07-25 22:56:11 +00:00