1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-04-30 20:00:19 +02:00
Commit graph

250 commits

Author SHA1 Message Date
Michael Gran
026ed23911 Always cast input to toupper as int
* libguile/read.c (scm_scan_for_encoding): add cast to int
2009-08-27 07:44:18 -07:00
Andy Wingo
4769c9db2c fix uninitialized variable in scm_read_character
* libguile/read.c (scm_read_character): Fix uninitialized variable.
2009-08-26 13:15:07 +02:00
Andy Wingo
c6a1380bde Merge commit 'origin/master'
Conflicts:
	libguile/unif.c
2009-08-25 21:43:00 +02:00
Andy Wingo
108e18b18a Merge wip-array refactor, up to cd43fdc5b7
Conflicts:
	NEWS
	libguile/print.c
2009-08-25 18:04:02 +02:00
Michael Gran
889975e51a Add full Unicode capability to ports and the default reader
Ports are given two additional properties: a character encoding and
a conversion failure strategy.  These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.

If unspecified, ports use a default value. The default value of these
properties is held in a fluid.  The default character encoding can be
modified by calling setlocale.

ISO-8859-1 is treated specially.  Since it is a native encoding of
strings, it can be processed more quickly.  Source code is assumed to be
ISO-8859-1 unless otherwise specified.  The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.

The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.

* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding

* module/system/base/compile.scm (compile-file): use source-code
  file's self-declared encoding when compiling files

* libguile/strports.c: store string ports in locale encoding
  (scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
  (scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
  new functions

* libguile/strings.h: new declaration for scm_i_string_contains_char

* libguile/strings.c (scm_i_string_contains_char): new function
  (scm_from_stringn, scm_to_stringn):  use NULL for Latin-1
  (scm_from_locale_stringn, scm_to_locale_stringn): respect character
  encoding of input and output ports

* libguile/read.h: declaration for scm_scan_for_encoding

* libguile/read.c:
  (read_token): now takes scheme string instead of C string/length
  (read_complete_token): new function
  (scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
  (scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
  (scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
  (scm_read_scsh_block_comment, scm_read_commented_expression)
  (scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
  (scm_read_expression): use scm_t_wchar for char type, use read_complete_token
  (scm_scan_for_encoding): new function to find a file's character encoding
  (scm_file_encoding): new function to find a port's character encoding

* libguile/rdelim.c: don't unpack strings

* libguile/print.h: declaration for modified function
  scm_i_charprint

* libguile/print.c: use locale when printing characters and
  strings
  (scm_i_charprint): input parameter is now scm_t_wchar
  (scm_simple_format): don't unpack strings

* libguile/posix.h: new declaration for scm_setbinary.

* libguile/posix.c (scm_setlocale): set default and stdio port
  encodings based on the locale's character encoding
  (scm_setbinary): new function

* libguile/ports.h (scm_t_port): add encoding and failed
  conversion handler to port type.  Declarations for new or modified
  functions scm_getc, scm_unget_byte, scm_ungetc,
  scm_i_get_port_encoding, scm_i_set_port_encoding_x,
  scm_port_encoding, scm_set_port_encoding_x,
  scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
  scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.

* libguile/ports.c: assign the current ports to zero on startup so
  we can see if they've been set.
  (scm_current_input_port, scm_current_output_port,
  scm_current_error_port): return #f if the port is not yet
  initialized
  (scm_new_port_table_entry): set up a new port's encoding and
  illegal sequence handler based on the thread's current defaults
  (scm_i_remove_port): free port encoding name when port is removed
  (scm_i_mode_bits_n): now takes a scheme string instead of a c
  string and length.  All callers changed.
  (SCM_MBCHAR_BUF_SIZE): new const
  (scm_getc): new function, since the scm_getc in inline.h is now
  scm_get_byte_or_eof.  This pulls one codepoint from a port.
  (scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
  (scm_unget_byte): new function, incorportaing the low-level functionality
  of scm_ungetc
  (scm_ungetc): uses scm_unget_byte

* libguile/numbers.h (scm_t_wchar): compilation order problem with
  scm_t_wchar being use in functions in multiple headers.  Forward
  declare scm_t_wchar.

* libguile/load.c (scm_primitive_load): scan for file encoding at
  top of file and use it to set the load port's encoding

* libguile/inline.h (scm_get_byte_or_eof): new function
  incorporating most of the functionality of scm_getc.

* libguile/fports.c (fport_fill_input): now returns scm_t_wchar

* libguile/chars.h (scm_t_wchar): avoid compilation order problem
  with declaration of scm_t_wchar
2009-08-25 07:54:37 -07:00
Michael Gran
30a6b9caa9 Only pass ints to tolower and toupper
* libguile/strings.c (unistring_escapes_to_guile_escapes): cast
  tolower's parameter to int

* libguile/read.c (CHAR_DOWNCASE): cast tolower's parameter to int
2009-08-11 21:12:52 -07:00
Michael Gran
9c44cd4559 Add Unicode strings and symbols
This adds full Unicode strings as a datatype, and it adds some
minimal functionality.  The terminal and port encoding is assumed
to be ISO-8859-1.  Non-ISO-8859-1 characters are written or
input as string character escapes.

The string character escapes now have 3 forms: \xXX \uXXXX and
\UXXXXXX, for unprintable characters that have 2, 4 or 6 hex digits.

The process for writing to strings has been modified.  There is now a
function scm_i_string_start_writing that does the copy-on-write
conversion if necessary.

To compile strings that may be wide, the VM storage of strings and
string-likes has changed.

Most string-using functions have not yet been updated and may break
when used with wide strings.


        * module/language/assembly/compile-bytecode.scm (write-bytecode):
        use variable width string bytecode format

        * module/language/assembly.scm (byte-length): use variable width
        bytecode format

        * libguile/vm-i-loader.c (load-string, load-symbol):
        (load-keyword, define): use variable-width bytecode format

        * libguile/vm-engine.h (FETCH_WIDTH): new macro

        * libguile/strings.h: new declarations

        * libguile/strings.c (make_wide_stringbuf): new function
        (widen_stringbuf): new function
        (scm_i_make_wide_string): new function
        (scm_i_is_narrow_string): new function
        (scm_i_string_wide_chars): new function
        (scm_i_string_start_writing): new function
        (scm_i_string_ref): new function
        (scm_i_string_set_x): new function
        (scm_i_is_narrow_symbol): new function
        (scm_i_symbol_wide_chars, scm_i_symbol_ref): new function
        (scm_string_width): new function
        (unistring_escapes_to_guile_escapes): new function
        (scm_to_stringn): new function
        (scm_i_stringbuf_free): modify for wide strings
        (scm_i_substring_copy): modify for wide strings
        (scm_i_string_chars, scm_string_append): modify for wide strings
        (scm_i_make_symbol, scm_to_locale_stringn): modify for wide strings
        (scm_string_dump, scm_symbol_dump, scm_to_locale_stringbuf):
        (scm_string, scm_i_deprecated_string_chars): modify for wide strings
        (scm_from_locale_string, scm_from_locale_stringn): add null test

        * libguile/srfi-13.c: add calls for scm_i_string_start_writing for
        each call of scm_i_string_stop_writing
        (scm_string_for_each): modify for wide strings

        * libguile/socket.c: add calls for scm_i_string_start_writing for each
        call of scm_i_string_stop_writing

        * libguile/rw.c: add calls for scm_i_string_start_writing for each
        call of scm_i_string_stop_writing

        * libguile/read.c (scm_read_string): allow reading of wide strings

        * libguile/print.h: add declaration for scm_charprint

        * libguile/print.c (iprin1): print wide strings and add new string
        escapes
        (scm_charprint): new function

        * libguile/ports.h: new declarations for scm_lfwrite_substr and
        scm_lfwrite_str

        * libguile/ports.c (update_port_lf): new function
        (scm_lfwrite): use update_port_lf
        (scm_lfwrite_substr): new function
        (scm_lfwrite_str): new function

        * test-suite/tests/asm-to-bytecode.test ("compiler"): add string
        width byte to sting-like asm tests
2009-08-08 02:35:00 -07:00
Michael Gran
77332b21a0 Replace global charnames variables with accessors
The global variables scm_charnames and scm_charnums are replaced with
the accessor functions scm_i_charname and scm_i_charname_to_num.
Also, the incomplete and broken EBCDIC support is removed.

       * libguile/print.c (iprin1): use new func scm_i_charname

        * libguile/read.c (scm_read_character): use new func
        scm_i_charname_to_num

        * libguile/chars.c (scm_i_charname): new function
        (scm_i_charname_to_char): new function
        (scm_charnames, scm_charnums): removed

        * libguile/chars.h: new declarations
2009-07-27 21:02:23 -07:00
Andy Wingo
2fa901a51f rename unif.[ch] to arrays.[ch]
* libguile/Makefile.am:
* libguile/unif.c:
* libguile/unif.h:
* libguile/arrays.c:
* libguile/arrays.h: Rename unif.[ch] to arrays.[ch].

* libguile.h:
* libguile/array-handle.c:
* libguile/array-map.c:
* libguile/bitvectors.c:
* libguile/bytevectors.c:
* libguile/eq.c:
* libguile/gc-card.c:
* libguile/gc-malloc.c:
* libguile/gc-mark.c:
* libguile/gc.c:
* libguile/init.c:
* libguile/inline.h:
* libguile/print.c:
* libguile/random.c:
* libguile/read.c:
* libguile/socket.c:
* libguile/sort.c:
* libguile/srfi-4.c:
* libguile/srfi-4.h:
* libguile/strports.c:
* libguile/vectors.c:
* libguile/vectors.h: Update includers.
2009-07-19 14:53:03 +02:00
Andy Wingo
cf39614240 bitvector exodus from unif.[ch]
* libguile/Makefile.am:
* libguile/unif.c:
* libguile/unif.h:
* libguile/bitvectors.c:
* libguile/bitvectors.h: Move bitvector functionality out of unif.[ch].

* libguile/array-handle.c:
* libguile/array-map.c:
* libguile/init.c:
* libguile/read.c:
* libguile/srfi-4.c:
* libguile/vectors.c: Oh, what a tangled web we weave...
2009-07-19 14:53:03 +02:00
Ludovic Courtès
0ba0b38489 Implement R6RS bytevector read syntax.
* libguile/read.c (scm_read_bytevector): New function.
  (scm_read_sharp): Add `v' case for bytevectors.

* test-suite/lib.scm (exception:read-error): New variable.

* test-suite/tests/bytevectors.test ("Datum Syntax"): New test set.
2009-06-19 00:47:11 +02:00
Neil Jerram
53befeb700 Change Guile license to LGPLv3+
(Not quite finished, the following will be done tomorrow.
   module/srfi/*.scm
   module/rnrs/*.scm
   module/scripts/*.scm
   testsuite/*.scm
   guile-readline/*
)
2009-06-17 00:22:09 +01:00
Andy Wingo
938d46a35d Merge branch 'syncase-in-boot-9'
Conflicts:
	module/Makefile.am
2009-05-29 16:01:43 +02:00
Andy Wingo
34f3d47df9 add reader support for #; #` #' #, and #,@. fix bug in compile-and-load.
* libguile/read.c (flush_ws, scm_read_commented_expression)
  (scm_read_sharp): Add support for commenting out expressions with #;.
  (scm_read_syntax, scm_read_sharp): Add support for #', #`, #, and #,@.

* module/ice-9/boot-9.scm: Remove #' read-hash extension, which actually
  didn't do anything at all. It's been there since 1997, but no Guile
  code I've ever seen uses it, and it conflicts with #'x => (syntax x)
  from modern Scheme.

* module/system/base/compile.scm (compile-and-load): Whoops, fix a number
  of bugs here.
2009-05-28 14:49:33 +02:00
Michael Gran
5d66005209 Symbols longer than 128 chars can cause an exception. Also, the terminating colon of long postfix keywords are not handled correctly.
* test-suite/tests/reader.test ("read-options"): Add test
	for long postfix keywords.

	* libguile/read.c (scm_read_mixed_case_symbol): Fix
	exception on symbols are greater than 128 chars.  Also,
	colons are not stripped from long postfix keywords.
2009-05-21 00:17:02 +02:00
Ludovic Courtès
45a9f43049 Revert "Make literal strings (i.e., returned by `read') read-only."
This reverts commit fb2f8886c4.

The rationale is that `read' must return mutable strings, as reported
by szgyg <szgyg@ludens.elte.hu>.
2008-10-09 22:21:33 +02:00
Ludovic Courtès
fb2f8886c4 Make literal strings (i.e., returned by `read') read-only.
* libguile/read.c (scm_read_string): Use `scm_i_make_read_only_string ()' to
  return a read-only string, as mandated by R5RS.  Reported by Bill
  Schottstaedt <bil@ccrma.Stanford.EDU>.

* libguile/strings.c (scm_i_make_read_only_string): New function.
  (scm_i_shared_substring_read_only): Special-case the empty string
  so that the read-only and read-write empty strings are `eq?'.  This
  optimization is relied on by the `substring/shared' `empty string'
  test case in `srfi-13.test'.

* libguile/strings.h (scm_i_make_read_only_string): New declaration.

* test-suite/tests/strings.test ("string-set!")["literal string"]: New test.

* NEWS: Update.
2008-09-23 18:45:27 +02:00
Ludovic Courtès
bd22f1c768 Remove extraneous semi-colon in `read.c'. 2008-04-26 21:56:00 +02:00
Ludovic Courtès
904fabb602 Revert "Fix typo in `read.c'."
This reverts commit 6ddb3ca825.
2008-04-15 20:14:44 +02:00
Ludovic Courtès
6ddb3ca825 Fix typo in `read.c'. 2008-04-15 20:01:40 +02:00
Ludovic Courtès
ef4cbc08c8 Add support for SRFI-88-like postfix keyword read syntax. 2008-04-15 19:52:43 +02:00
Ludovic Courtès
7f74cf9a67 More compilation fixes with Sun CC (bug #21378). 2008-02-07 09:54:47 +00:00
Ludovic Courtès
d41668faec Changes from arch/CVS synchronization 2007-10-17 21:56:10 +00:00
Ludovic Courtès
454866e052 Changes from arch/CVS synchronization 2007-09-03 16:58:20 +00:00
Ludovic Courtès
492faee1e5 Changes from arch/CVS synchronization 2007-08-23 21:17:24 +00:00
Ludovic Courtès
f743909974 Changes from arch/CVS synchronization 2007-07-29 15:16:46 +00:00
Ludovic Courtès
7337d56d57 Changes from arch/CVS synchronization 2007-07-22 16:30:13 +00:00
Kevin Ryde
b3aa4626cd merge from 1.8 branch 2007-03-07 23:35:55 +00:00
Han-Wen Nienhuys
22fc179acd * backtrace.c, debug.c, debug.h, deprecation.c, eq.c, eval.c
eval.h, gsubr.c, init.c, macros.c, print.c, print.h, read.c,
read.h, stacks.c, symbols.c, throw.c: use private-options.h

* private-options.h: new file: contain hardcoded option
definitions.
2007-01-22 15:14:40 +00:00
Han-Wen Nienhuys
6256065013 * readline.c: terminate option list with NULL.
* read.c: idem.

* print.c: idem.

* eval.c: terminate option lists with 0.

* options.c: remove n (for length) from scm_option_X
functions. Detect option list length by looking for NULL name.
2007-01-19 19:26:36 +00:00
Han-Wen Nienhuys
391f57e6ad (s_scm_read_hash_extend): document #f argument to
read-hash-extend.
2007-01-06 18:20:35 +00:00
Kevin Ryde
23f2b9a3de merge from 1.8 branch 2006-06-17 23:15:59 +00:00
Kevin Ryde
2b829bbb3d merge from 1.8 branch 2006-04-17 00:05:42 +00:00
Marius Vollmer
92205699d0 The FSF has a new address. 2005-05-23 19:57:22 +00:00
Marius Vollmer
9de87eea47 See ChangeLog from 2005-03-02. 2005-03-02 20:42:01 +00:00
Marius Vollmer
4057a3e05a Use new vector elements API or simple vector API, as appropriate.
Removed SCM_HAVE_ARRAYS ifdefery.  Replaced all uses of
SCM_HASHTABLE_BUCKETS with SCM_HASHTABLE_BUCKET.
2005-01-02 20:49:04 +00:00
Marius Vollmer
c35092e670 (scm_lreadr): Bugfix: include the last bit in the bit vector. 2004-11-02 15:53:53 +00:00
Marius Vollmer
eb42ff2564 (scm_lreadr): Call scm_i_read_array for all characters followinf '#'
that can start an array.  Explicitely disambiguate 'i' and 'e' between
introducing numbers and uniform vectors.  Do not call
scm_i_read_homogenous_vector, since that is also handled by
scm_i_read_array now.
2004-10-29 14:45:19 +00:00
Marius Vollmer
a4022e691e * read.c (scm_lreadr): Call scm_i_read_homogenous_vector for '#f',
'#u', and '#s'.

* read.h, read.c (scm_i_input_error): Renamed from scm_input_error
and made non-static.  Changed all uses.
2004-10-26 17:00:13 +00:00
Marius Vollmer
f13b4400d3 (scm_lreadr): Simply do (symbol->keyword (read)) after
reading '#:' or ':'.  See NEWS for consequences.
2004-10-04 18:03:18 +00:00
Marius Vollmer
8631637894 (scm_lreadr): Revert change from 2004-09-22: string literals are now
read-write again (until SCM_STRING_CHARS is removed).
2004-09-29 18:01:36 +00:00
Marius Vollmer
d2e53ed6f8 *** empty log message *** 2004-09-22 17:41:37 +00:00
Marius Vollmer
ec82b7c251 (scm_lreadr): use scm_c_substring_read_only for string
literals, thus making them read-only as specified by R5RS.
2004-09-22 13:55:15 +00:00
Marius Vollmer
ef80ed5ebc (skip_scsh_block_comment): Recognize "!#" everywhere, not just on a
line of its own.
2004-09-20 23:55:38 +00:00
Marius Vollmer
0520c32088 (scm_flush_ws): Detect "#!"-style comments here.
(scm_lreadr): Abort on seeing "#!", which should no longer happen.
(skip_scsh_block_comment): Use scm_input_error instead of
scm_misc_error in case of EOF.
2004-09-07 09:18:59 +00:00
Marius Vollmer
272632a67c (scm_i_casei_streq): New, for counted strings.
* strings.h, strings.c: (scm_i_string_chars, scm_i_string_length,
scm_i_string_writable_chars, scm_i_string_stop_writing): New, to
replace SCM_I_STRING_CHARS and SCM_I_STRING_LENGTH.  Updated all
uses.
(scm_i_make_string, scm_c_make_string): New, to replace
scm_allocate_string.  Updated all uses.
(SCM_STRINGP, SCM_STRING_CHARS, SCM_STRING_UCHARS,
SCM_STRING_LENGTH): Deprecated.
(scm_allocate_string, scm_take_str, scm_take0str, scm_mem2string,
scm_str2string, scm_makfrom0str, scm_makfrom0str_opt):
Discouraged.  Replaced all uses with scm_from_locale_string or
similar, as appropriate.
(scm_c_string_length, scm_c_string_ref, scm_c_string_set_x,
scm_c_substring, scm_c_substring_shared, scm_c_substring_copy,
scm_substring_shared, scm_substring_copy): New.

* symbols.c, symbols.h (SCM_SYMBOLP, SCM_SYMBOL_FUNC,
SCM_SET_SYMBOL_FUNC, SCM_SYMBOL_PROPS, SCM_SET_SYMBOL_PROPS,
SCM_SYMBOL_HASH, SCM_SYMBOL_INTERNED_P, scm_mem2symbol,
scm_str2symbol, scm_mem2uninterned_symbol): Discouraged.
(SCM_SYMBOL_LENGTH, SCM_SYMBOL_CHARS, scm_c_symbol2str):
Deprecated.
(SCM_MAKE_SYMBOL_TAG, SCM_SET_SYMBOL_LENGTH, SCM_SET_SYMBOL_CHARS,
SCM_PROP_SLOTS, SCM_SET_PROP_SLOTS): Removed.
(scm_is_symbol, scm_from_locale_symbol, scm_from_locale_symboln):
New, to replace scm_str2symbol and scm_mem2symbol, respectively.
Updated all uses.
(scm_gensym): Generate only the number suffix in the buffer, just
string-append the prefix.
2004-08-19 17:17:43 +00:00
Marius Vollmer
8824ac88f0 * socket.c, rw.c, deprecated.h, validate.h
(SCM_VALIDATE_STRING_COPY): Deprecated.  Replaced all uses with
SCM_VALIDATE_STRING plus SCM_I_STRING_CHARS or
scm_to_locale_string, etc.
(SCM_VALIDATE_SUBSTRING_SPEC_COPY): Deprecated.  Replaced as
above, plus scm_i_get_substring_spec.

* regex-posix.c, read.c, random.c, ramap.c, print.c, numbers.c,
hash.c, gc.c, gc-card.c, convert.i.c, backtrace.c, strop.c,
strorder.c, strports.c, struct.c, symbols.c, unif.c, ports.c: Use
SCM_I_STRING_CHARS, SCM_I_STRING_UCHARS, and SCM_I_STRING_LENGTH
instead of SCM_STRING_CHARS, SCM_STRING_UCHARS, and
SCM_STRING_LENGTH, respectively.  Also, replaced scm_return_first
with more explicit scm_remember_upto_here_1, etc, or introduced
them in the first place.
2004-08-12 17:45:03 +00:00
Marius Vollmer
29a837fd27 (scm_input_error): Use a SCM value for 'fn', not a C string. This
avoids a conversion round-trip.
2004-08-10 13:54:01 +00:00
Marius Vollmer
b9bd8526f0 * numbers.h, numbers.c, discouraged.h, discouraged.c (scm_short2num,
scm_ushort2num, scm_int2num, scm_uint2num, scm_long2num,
scm_ulong2num, scm_size2num, scm_ptrdiff2num, scm_num2short,
scm_num2ushort, scm_num2int, scm_num2uint, scm_num2long,
scm_num2ulong, scm_num2size, scm_num2ptrdiff, scm_long_long2num,
scm_ulong_long2num, scm_num2long_long, scm_num2ulong_long):
Discouraged by moving to discouraged.h and discouraged.c and
reimplementing in terms of scm_from_* and scm_to_*.  Changed all uses
to the new scm_from_* and scm_to_* functions.
2004-08-02 16:14:04 +00:00
Marius Vollmer
bc36d0502b * tags.h, deprecated.h (SCM_EQ_P): Deprecated by moving it into
deprecated.h.  Replaced all uses with scm_is_eq.
2004-07-27 15:41:49 +00:00