1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-05-04 22:40:25 +02:00
Commit graph

7681 commits

Author SHA1 Message Date
Michael Gran
7519234547 Fix broken interaction between readline and Unicode
This requires separate small fixes.

Readline has internal logic to deal with multi-byte characters, so
it wants bytes, not characters.

scm_c_read gets called by the vm when readline is activated, and it was
truncating multi-byte characters because soft ports didn't have the
UCS-4 capability.

Soft ports need the capability to read UCS-4 characters.  Since soft ports
may have a single byte buffer, full characters need to be stored into the
pushback buffer.

This broke the optimizations in scm_c_read for using an alternate buffer
for single-byte-buffered ports, because the opimization wasn't expecting
anything in the pushback buffer.

* libguile/vports.c (sf_fill_input): store complete chars, not single bytes

* libguile/ports.c (scm_c_read): don't use optimized path for non Latin-1.
  Add debug prints.

* libguile/string.h: make scm_i_from_stringn and scm_i_string_ref public
  so that readline can use them

* guile-readline/readline.c: read bytes, not complete chars, from the
  input port.  Convert output to the output port's locale
2009-09-07 19:12:34 -07:00
Michael Gran
060e305adc Avoid string buffer overrun in scm_scan_for_encoding
* libguile/read.c (scm_scan_for_encoding): possible overrun if
  coding declaration is at end of file
2009-09-05 11:10:07 -07:00
Michael Gran
18d8fcd43c Remove locale u8vector functions
Locale u8vector functions deemed harmful.

* libguile/strports.c (scm_strport_to_locale_u8vector)
  (scm_call_with_output_locale_u8vector, scm_open_input_locale_u8vector)
  (scm_get_output_locale_u8vector): removed

* libguile/strports.h: removed declarations for
  scm_strport_to_locale_u8vector,
  scm_call_with_output_u8vector,
  scm_input_locale_u8vector,
  scm_get_output_locale_u8vector

* test-suite/tests/encoding-iso88591.test: display tests removed

* test-suite/tests/encoding-iso88597.test: display tests removed
2009-09-04 07:34:35 -07:00
Michael Gran
25ebc0340d Initialize string ports with UTF-8 encoding
String ports should be able to accept any string characters, regardless
of the current locale.  Setting it to UTF-8 achieves that.

* libguile/strports.c (scm_i_mkstrport): set port's locale to UTF-8
  (scm_mkstrport): convert input string to UTF-8
2009-09-04 07:30:13 -07:00
Michael Gran
3d03f9395e write-char should handle UCS-4 characters
* libguile/print.c (scm_write_char): call UCS-4 printing routine, instead
  of 8-bit primitive
2009-09-04 07:27:14 -07:00
Ken Raeburn
5f5e7a2cd6 Make test-case compilation with -DSCM_DEBUG=1 work.
* gc.h (scm_i_expensive_validation_check): Declare SCM_API.
2009-09-03 16:59:11 -04:00
Michael Gran
bb15a36c25 Update docs and docstrings for Unicode characters
* doc/ref/api-data.texi: more info about characters and codepoints

* libguile/chars.c: replace 'code point' with 'Unicode code point' in
  docstrings
2009-09-03 08:48:23 -07:00
Michael Gran
ba8477ecce Add char-set debugging function
* libguile/srfi-14.c (scm_sys_char_set_dump): new function

* libguile/srfi-14.h: declaration of scm_sys_char_set_dump
2009-09-03 08:29:45 -07:00
Michael Gran
719bb8cd5d Distinguish between all codepoints and designated codepoints in char-sets
* libguile/unidata_to_charset.pl (designated): renamed from full

* libguile/srfi-14.c (scm_char_set_designated): new char-set

* libguile/srfi-14.i.c (cs_designated): renamed from cs_full
2009-09-03 08:23:24 -07:00
Michael Gran
0dcd7e6153 Modify read and print of combining characters
Since combining characters, such as accents, modify the appearance of the
previous letter, it looks awkward in its character literal form (#\name)
since it modified the backslash.  This instead prints the combining
character on a small circle.

* libguile/chars.h (SCM_CODEPOINT_DOTTED_CIRCLE): new #define

* libguile/print.c (iprint1): print combining characters on dotted circles

* libguile/read.c (scm_read_character): parse the combination of combining
  characters and dotted circles
2009-09-03 07:47:26 -07:00
Michael Gran
aa2cba9c88 Remove always-true range checks in scm_i_ucs_range_to_char_set
* libguile/srfi-14.c (scm_i_ucs_range_to_char_set): limits are always
  non-negative due to the type of the variable
2009-09-02 06:45:05 -07:00
Michael Gran
08ed805879 Unreachable code in charset set operator
* libguile/srfi-14.c (scm_i_charset_set): remove unreachable code
  in scm_i_charset_set
2009-09-02 06:28:55 -07:00
Michael Gran
aff31b0f99 Optimize charset union operator
* libguile/srfi-14.c (charsets_union): call scm_i_charset_set_range
  instead of setting characters one-by-one.
2009-09-02 06:28:47 -07:00
Michael Gran
f4cdfe6140 The charset complement operator should not include surrogates
* libguile/srfi-14.c (charsets_complement): skip over surrogates
  when making a charset complement
2009-09-02 06:28:42 -07:00
Michael Gran
bde543e88b char-set-filter! does not properly iterate over the charset
* libguile/srfi-14.c (scm_char_set_filter_x): iterate over
  codepoints
2009-09-02 06:28:35 -07:00
Michael Gran
91772d8f8a ucs-range->char-set should not store surrogates and has off-by-one error
* libguile/srfi-14.c (scm_i_ucs_range_to_char_set): new function that
  contains the functionality of ucs_range_to_char_set, fixes
  off-by-one, and doesn't store surroges
  (scm_ucs_range_to_char_set, scm_ucs_range_to_char_set_x): call
  scm_i_ucs_range_to_char_set
  (scm_i_charset_set_range): new helper function
2009-09-02 06:28:29 -07:00
Michael Gran
693e72891f char-set-any improperly unpacks charset data
* libguile/srfi-14.c (scm_char_set_any): unpack the charset correctly
2009-09-02 06:28:20 -07:00
Michael Gran
7165abeba8 char-set-xor! should modify the input parameter
char-set-xor! was not modifying its input parameter.  It isn't
technically required to do so by the spec, but, the other similar
functions do it.

* libguile/srfi-14.c (scm_char_set_xor_x): modify the input parameter
2009-09-02 06:28:11 -07:00
Ludovic Courtès
5f236208d0 Merge branch 'boehm-demers-weiser-gc' into bdw-gc-static-alloc
Conflicts:
	acinclude.m4
	libguile/strings.c
2009-09-02 01:37:37 +02:00
Ludovic Courtès
d7e7a02a62 Fix leaky behavior of `scm_take_TAGvector ()'.
* libguile/srfi-4.c (free_user_data): New function.

* libguile/srfi-4.i.c (scm_take_TAGvector): Register `free_user_data ()'
  as a finalizer for DATA.

* libguile/objcodes.c (scm_objcode_to_bytecode): Allocate with
  `scm_malloc ()' since the memory taken by `scm_take_u8vector ()' will
  eventually be free(3)d.

* libguile/vm.c (really_make_boot_program): Likewise.
2009-09-01 23:53:58 +02:00
Ludovic Courtès
ba54a2026b Remove the distinction between inline/outline storage for stringbufs.
* libguile/strings.c (STRINGBUF_HEADER_SIZE, STRINGBUF_HEADER_BYTES):
  New macros.
  (STRINGBUF_F_INLINE, STRINGBUF_INLINE, STRINGBUF_OUTLINE_CHARS,
  STRINGBUF_OUTLINE_LENGTH, STRINGBUF_INLINE_CHARS,
  STRINGBUF_INLINE_LENGTH, STRINGBUF_MAX_INLINE_LEN): Remove.
  (STRINGBUF_CHARS, STRINGBUF_WIDE_CHARS): Adjust to return a fixed
  location.
  (STRINGBUF_LENGTH): Get the length from word 1.
  (make_stringbuf, make_wide_stringbuf): Adjust to use a contiguous
  memory region.
  (wide_stringbuf): Renamed from `widen_stringbuf'.  Adjust similarly.
  Return the new stringbuf.  Callers updated.
  (narrow_stringbuf): Likewise.
  (scm_sys_string_dump, scm_sys_symbol_dump): Remove `stringbuf-inline'
  pair.

* test-suite/tests/strings.test ("string internals")["null strings are
  inlined", "short Latin-1 encoded strings are inlined", "long Latin-1
  encoded strings are not inlined", "short UCS-4 encoded strings are not
  inlined", "long UCS-4 encoded strings are not inlined"]: Remove.

* test-suite/tests/symbols.test ("symbol internals")["null symbols are
  inlined", "short Latin-1 encoded symbols are inlined", "long Latin-1
  encoded symbols are not inlined", "short UCS-4 encoded symbols are not
  inlined", "long UCS-4 encoded symbols are not inlined"]: Remove.
2009-09-01 02:02:43 +02:00
Ludovic Courtès
13a9455669 Fix leaky handling of `scm_take_locale_{symbol,string} ()'.
* libguile/strings.c (scm_i_take_stringbufn, scm_i_c_take_symbol):
  Remove.
  (scm_take_locale_stringn): Rewrite in terms of `scm_from_locale_stringn ()'.

* libguile/strings.h (scm_i_c_take_symbol, scm_i_take_stringbufn):
  Remove declarations.
2009-09-01 00:38:40 +02:00
Michael Gran
3f12aedb50 Update docs for Unicode characters
* NEWS: add note about Unicode characters

* doc/ref/api-data.texi: update Characters subsection

* libguile/chars.c: update docstrings to match manual
2009-08-30 16:55:52 -07:00
Michael Gran
5f5920e012 Fix escape sequence normalization for wide strings
* libguile/strings.c (scm_to_stringn): convert unistring escapes to
  guile escapes for both wide and narrow strings
2009-08-30 16:55:17 -07:00
Michael Gran
fac32b518e Fix encoding errors with strings returned by string ports
String ports, being 8-bit, store strings using the character encoding
of the port.  This fixes a bug where the default character encoding, and
not the port's encoding, was being used to convert the string port data
back to a string.

* libguile/strports.c: extra comments
  (scm_strport_to_string):  use port's encoding when converting port data
  to a string

* libguile/strings.c (scm_i_from_stringn): renamed from scm_from_stringn
  and made internal.  All callers changed.
  (scm_from_stringn): renamed to scm_i_from_stringn.

* libguile/strings.h: declaration for scm_i_from_stringn
2009-08-30 16:54:49 -07:00
Ludovic Courtès
0665b3ffcb Remove the distinction between inline/outline storage for bytevectors.
* libguile/bytevectors.c (SCM_BYTEVECTOR_INLINE_THRESHOLD,
  SCM_BYTEVECTOR_INLINEABLE_SIZE_P, SCM_BYTEVECTOR_SET_CONTENTS,
  SCM_BYTEVECTOR_SET_INLINE): Remove.
  (SCM_BYTEVECTOR_HEADER_BYTES): New macro.
  (SCM_BYTEVECTOR_SET_ELEMENT_TYPE): Adjust to new flag layout.
  (make_bytevector): Remove content inlining machinery; use
  `scm_gc_malloc_pointerless ()' in all cases; special-case zero-sized
  vu8 buffers.
  (make_bytevector_from_buffer): Simplified.
  (scm_c_shrink_bytevector): New, formerly `scm_i_shrink_bytevector ()'.
  Remove buffer inlining machinery.
  (scm_bootstrap_bytevectors): Use `make_bytevector ()' for
  SCM_NULL_BYTEVECTOR.

* libguile/bytevectors.h (SCM_BYTEVECTOR_HEADER_SIZE): New macro.
  (SCM_BYTEVECTOR_CONTENTS): Adjust to new layout.
  (SCM_SET_BYTEVECTOR_FLAGS): Properly cast F.
  (SCM_F_BYTEVECTOR_INLINE, SCM_BYTEVECTOR_INLINE_P): Remove.
  (SCM_BYTEVECTOR_ELEMENT_TYPE): Adjust.
  (scm_c_shrink_bytevector): Remove macro, make a C function
  declaration.
2009-08-31 01:07:30 +02:00
Ludovic Courtès
807e5a6641 Use a TC7 tag instead of a SMOB for bytevectors.
* libguile/bytevectors.c (scm_tc16_bytevector): Remove.
  (SCM_BYTEVECTOR_SET_LENGTH, SCM_BYTEVECTOR_SET_CONTENTS,
  SCM_BYTEVECTOR_SET_INLINE, SCM_BYTEVECTOR_SET_ELEMENT_TYPE,
  make_bytevector_from_buffer, scm_is_bytevector,
  scm_bootstrap_bytevectors): Adjust to the SMOB->tc7 change.
  (scm_i_print_bytevector): New, formerly `print_bytevector ()'.
  (bytevector_equal_p): Remove.

* libguile/bytevectors.h (SCM_BYTEVECTOR_LENGTH,
  SCM_BYTEVECTOR_CONTENTS, SCM_BYTEVECTOR_P): Adjust to SMOB->tc7
  change.
  (SCM_BYTEVECTOR_FLAGS, SCM_SET_BYTEVECTOR_FLAGS): New macros.
  (scm_tc16_bytevector): Remove declaration.
  (scm_i_print_bytevector): New declaration.

* libguile/eq.c (scm_equal_p): Handle `scm_tc7_bytevector'.

* libguile/evalext.c (scm_self_evaluating_p): Likewise.

* libguile/print.c (iprin1): Likewise.

* libguile/tags.h (scm_tc7_bytevector): New.
  (scm_tc7_unused_8): Remove.

* libguile/validate.h (SCM_VALIDATE_BYTEVECTOR): Adjust.

* test-suite/tests/bytevectors.test ("Datum
  Syntax")["self-evaluating?"]: New test.
2009-08-30 20:12:09 +02:00
Michael Gran
0ffc78e384 Range check octal-escaped characters
* libguile/read.c (scm_read_character): range check octal escapes
2009-08-29 07:14:49 -07:00
Michael Gran
24d23822ee Surrogate characters shouldn't be in charsets
* libguile/srfi-14.c (charsets_complement): use surrogate #defines instead
  of hardcoded numbers

* libguile/srfi-14.i.c (cs_full_ranges): remove surrogates from full
  charset

* libguile/unidata_to_charset.pl (full): test for surrogates
2009-08-29 00:01:06 -07:00
Michael Gran
526ee76ac3 Better range check for codepoints
* libguile/chars.h (SCM_IS_UNICODE_CHAR): check for negative codepoints
2009-08-29 00:00:58 -07:00
Michael Gran
6d736fdba2 Cast the input to isalpha et al to integer
* libguile/gc_os_dep.c (GC_linux_stack_base) [LINUX_STACKBOTTOM]: cast
  input of ctype functions to int

* libguile/inet_aton.c (inet_aton): cast input of ctype functions to int

* libguile/read.c (scm_scan_for_encoding): cast input of isalnum to int

* libguile/win32-socket.c (scm_i_socket_uncomment): cast input of isspace
  to int
2009-08-28 21:19:05 -07:00
Ludovic Courtès
760fb97d1f Remove deprecated variables/macros from the GC headers.
* libguile/deprecated.c (scm_mtrigger, scm_mallocated,
  scm_max_segment_size): New global variables, from gc.c.
  (scm_map_free_list,
  scm_gc_set_debug_check_freelist_x)[GUILE_DEBUG_FREELIST]: New stubs.

* libguile/deprecated.h (scm_mallocated, scm_mtrigger,
  scm_max_segment_size): New declarations.
  (scm_map_free_list,
  scm_gc_set_debug_check_freelist_x)[GUILE_DEBUG_FREELIST]: New
  declarations.

* libguile/gc-malloc.c (scm_i_minyield_malloc): Remove.
  (scm_gc_init_malloc): Remove references to `scm_i_minyield_malloc' and
  `scm_mtrigger'.

* libguile/gc.c (scm_mtrigger, scm_mallocated): Remove.
  (scm_init_storage): Remove reference to `SCM_HEAP_SEG_SIZE'.

* libguile/gc.h (scm_max_segment_size, SCM_SET_FREELIST_LOC,
  SCM_FREELIST_LOC, scm_i_master_freelist, scm_i_master_freelist2,
  scm_mallocated, scm_mtrigger): Remove.
  (scm_map_free_list,
  scm_gc_set_debug_check_freelist_x)[SCM_ENABLE_DEPRECATED &&
  GUILE_DEBUG_FREELIST]: Remove.

* libguile/private-gc.h (SCM_DEFAULT_INIT_HEAP_SIZE_1,
  SCM_DEFAULT_MIN_YIELD_1, SCM_DEFAULT_MIN_YIELD_2,
  DEFAULT_SWEEP_AMOUNT, SCM_DEFAULT_MAX_SEGMENT_SIZE,
  SCM_MIN_HEAP_SEG_SIZE, SCM_HEAP_SEG_SIZE,
  SCM_GC_CARD_BVEC_SIZE_IN_LONGS, SCM_GC_IN_CARD_HEADERP): Remove.
  (scm_getenv_int): Made internal.
  (scm_i_marking, scm_mark_all, scm_i_deprecated_memory_return,
  scm_i_find_heap_calls, scm_gc_init_malloc, scm_gc_init_freelist,
  scm_gc_init_segments, scm_gc_init_mark): Remove declarations.

* libguile/gc-segment-table.c: Remove, finally.
2009-08-28 21:02:42 +02:00
Ludovic Courtès
7af531508c Merge branch 'master' into boehm-demers-weiser-gc
Conflicts:
	libguile/Makefile.am
	libguile/bytevectors.c
	libguile/gc-card.c
	libguile/gc-mark.c
	libguile/programs.c
	libguile/srcprop.c
	libguile/srfi-14.c
	libguile/symbols.c
	libguile/threads.c
	libguile/unif.c
	libguile/vm.c
2009-08-28 19:16:46 +02:00
Andy Wingo
5950cc3fcc fix case in which compiled path had stale .go, but fallback had fresh .go
* libguile/load.c (scm_primitive_load_path): If the compiled path was
  out of date, but the fallback path was current, we correctly detected
  that case, but loaded the wrong file. So here fix the typo.
2009-08-28 17:13:09 +02:00
Michael Gran
8736ef70ac scm_getc improperly handles Latin-1 characters
Upper-plane Latin-1 characters should be converted to codepoints.

* libguile/ports.c (scm_getc): improper conversion of char to scm_t_wchar
2009-08-27 20:42:36 -07:00
Michael Gran
d5ecf5797d Fix FUNC_NAME definitions and #endif in srfi-14.[ch]
* libguile/srfi-14.c: whitespace and FUNC_NAME fixes

* libguile/srfi-14.h: #endif comment
2009-08-27 18:52:53 -07:00
Michael Gran
d0434ddf25 Script to generate srfi-14 charsets from UnicodeData.txt
This script was used to generate srfi-14.i.c from the UnicodeData.txt
file supplied by ftp://www.unicode.org/Public/UNIDATA/

* libguile/unidata_to_charset.pl
2009-08-27 18:23:46 -07:00
Ludovic Courtès
1505848425 Add missing `FUNC_NAME' definition.
* libguile/load.c (scm_sys_warn_autocompilation_enabled): Define
  `FUNC_NAME'.
2009-08-28 01:16:49 +02:00
Daniel Kraft
ff81007918 Merge branch 'master' of git://git.savannah.gnu.org/guile into elisp 2009-08-27 19:26:04 +02:00
Michael Gran
fa316af70f Default srfi-14 character set information
* libguile/srfi-14.i.c: structures containing the default srfi-14
  sets
2009-08-27 09:13:22 -07:00
Michael Gran
026ed23911 Always cast input to toupper as int
* libguile/read.c (scm_scan_for_encoding): add cast to int
2009-08-27 07:44:18 -07:00
Michael Gran
930ddd34c3 Segfault when writing non-Latin-1 characters under Latin-1 locale
* libguile/print.c (iprin1): handle write of non-Latin-1 characters
  under the Latin-1 locale
2009-08-27 07:44:01 -07:00
Michael Gran
f49dbcadf3 Unicode-capable srfi-14 charsets
* libguile/Makefile.am: distribute new files srfi-14.i.c and
  unidata_to_charset.pl

* chars.c (scm_c_upcase, scm_c_downcase): use unicode-enable toupper
  and tolower

* libguile/srfi-14.h (scm_t_char_range, scm_t_char_set): new structures
  to describe char-sets
  (scm_t_char_set_cursor): new structure to describe char-set-cursors
  (SCM_BITS_PER_LONG): removed
  (SCM_CHARSET_GET): calls function
  New declarations for scm_i_charset_get, scm_i_charset_set,
  scm_i_charset_unset, and scm_debug_char_set.

* test-suite/tests/srfi-14.test: new tests

* libguile/srfi-14.c (SCM_CHARSET_DATA): new macro
  (SCM_CHARSET_SET, SCM_CHARSET_UNSET): call function
  (BYTES_PER_CHARSET, LONGS_PER_CHARSET): removed
  (scm_i_charset_get, scm_i_charset_set, scm_i_charset_unset)
  (charsets_equal, charsets_leq, charsets_union)
  (charsets_intersection, charsets_complement, charsets_xor): new
  functions that are low-level charset operators
  (charset_print, charset_free): modified for new charset struct
  (charset_cursor_print, charset_cursor_free): new function
  (make_char_set, scm_char_set_p, scm_char_set_eq, scm_car_set_leq)
  (scm_char_set_hash, scm_char_set_cursor, scm_char_set_ref)
  (scm_char_set_cursor_next, scm_end_of_char_set_p, scm_char_set_fold)
  (scm_char_set_unfold, scm_char_set_unfold_x, scm_char_set_for_each)
  (scm_char_set_map, scm_char_set_copy, scm_char_set, scm_list_to_char_set)
  (scm_list_to_char_set_x, scm_string_to_char_set, scm_string_to_char_set_x)
  (scm_char_set_filter, scm_char_set_filter_x, scm_ucs_range_to_char_set)
  (scm_ucs_range_to_char_set_x, scm_to_char_set, scm_char_set_size)
  (scm_char_set_count, scm_char_set_to_list, scm_char_set_to_string)
  (scm_char_set_contains_p, scm_char_set_every, scm_char_set_any)
  (scm_char_set_adjoin, scm_char_set_delete, scm_char_set_adjoin_x)
  (scm_char_set_delete_x, scm_char_set_complement, scm_char_set_union)
  (scm_char_set_intersection, scm_char_set_difference, scm_char_set_xor)
  (scm_char_set_diff_plus_intersection, scm_char_set_complement_x)
  (scm_char_set_union_x, scm_char_set_intersection_x, scm_char_set_difference_x)
  (scm_char_set_xor_x, scm_char_set_diff_plus_intersection_x): modified
  to use new charset and charset-cursor data structures
  (CSET_BLANK_PRED, CSET_SYMBOL_PRED, CSET_PUNCT_PRED, CSET_LOWER_PRED)
  (CSET_UPPER_PRED, CSET_LETTER_PRED, CSET_DIGIT_PRED, CSET_WHITESPACE_PRED)
  (CSET_CONTROL_PRED, CSET_HEX_DIGIT_PRED, CSET_ASCII_PRED, CSET_LETTER_PRED)
  (CSET_LETTER_AND_DIGIT_PRED, CSET_PRINTING_PRED, CSET_TRUE_PRED)
  (CSET_FALSE_PRED): removed
  (scm_srfi_14_compute_char_sets): removed - too slow to iterate
  over all of unicode at startup
  (scm_debug_char_set) [SCM_CHARSET_DEBUG]: new function
2009-08-27 07:43:33 -07:00
Ken Raeburn
71a5964c11 Don't leave and reenter guile mode if mutex is available
On Aug 5, 2009, at 10:06, Ken Raeburn wrote:
> (1) In scm_pthread_mutex_lock, we leave and re-enter guile mode so
> that we don't block the thread while in guile mode.  But we could
> use pthread_mutex_trylock first, and avoid the costs scm_leave_guile
> seems to incur on the Mac.  If we can't acquire the lock, it should
> return immediately, and then we can do the expensive, blocking
> version.  A quick, hack version of this changed my run time for
> A(3,8) from 17.5s to 14.5s, saving about 17%; sigaltstack and
> sigprocmask are still in the picture, because they're called from
> scm_catch_with_pre_unwind_handler.  I'll work up a nicer patch
> later.

Ah, we already had scm_i_pthread_mutex_trylock lying around; that made
things easy.
A second timing test with A(3,9) and this version of the patch (based
on 1.9.1) shows the same improvement.

* libguile/threads.c (scm_pthread_mutex_lock): Try the mutex before
  leaving and reentering guile mode.
2009-08-26 23:36:19 +01:00
Andy Wingo
4769c9db2c fix uninitialized variable in scm_read_character
* libguile/read.c (scm_read_character): Fix uninitialized variable.
2009-08-26 13:15:07 +02:00
Ludovic Courtès
f86f3b5b11 Remove the `scm_tc_free_cell' SMOB type.
* libguile/deprecated.h (SCM_FREEP, SCM_NFREEP): Changed to constants.

* libguile/gc.c (scm_i_tag_name): Remove reference to
  `scm_tc_free_cell'.

* libguile/gc.h (SCM_FREE_CELL_CDR, SCM_SET_FREE_CELL_CDR): Remove.

* libguile/smob.c (free_print): Remove.
  (scm_smob_prehistory): Don't create the "free" SMOB type.

* libguile/struct.c (struct_finalizer_trampoline): Use a bare
  `scm_tc3_struct' tag for finalized structs instead of
  `scm_tc_free_cell'.

* libguile/tags.h (scm_tc_free_cell): Remove.
2009-08-25 23:57:49 +02:00
Andy Wingo
c6a1380bde Merge commit 'origin/master'
Conflicts:
	libguile/unif.c
2009-08-25 21:43:00 +02:00
Andy Wingo
108e18b18a Merge wip-array refactor, up to cd43fdc5b7
Conflicts:
	NEWS
	libguile/print.c
2009-08-25 18:04:02 +02:00
Michael Gran
889975e51a Add full Unicode capability to ports and the default reader
Ports are given two additional properties: a character encoding and
a conversion failure strategy.  These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.

If unspecified, ports use a default value. The default value of these
properties is held in a fluid.  The default character encoding can be
modified by calling setlocale.

ISO-8859-1 is treated specially.  Since it is a native encoding of
strings, it can be processed more quickly.  Source code is assumed to be
ISO-8859-1 unless otherwise specified.  The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.

The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.

* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding

* module/system/base/compile.scm (compile-file): use source-code
  file's self-declared encoding when compiling files

* libguile/strports.c: store string ports in locale encoding
  (scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
  (scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
  new functions

* libguile/strings.h: new declaration for scm_i_string_contains_char

* libguile/strings.c (scm_i_string_contains_char): new function
  (scm_from_stringn, scm_to_stringn):  use NULL for Latin-1
  (scm_from_locale_stringn, scm_to_locale_stringn): respect character
  encoding of input and output ports

* libguile/read.h: declaration for scm_scan_for_encoding

* libguile/read.c:
  (read_token): now takes scheme string instead of C string/length
  (read_complete_token): new function
  (scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
  (scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
  (scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
  (scm_read_scsh_block_comment, scm_read_commented_expression)
  (scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
  (scm_read_expression): use scm_t_wchar for char type, use read_complete_token
  (scm_scan_for_encoding): new function to find a file's character encoding
  (scm_file_encoding): new function to find a port's character encoding

* libguile/rdelim.c: don't unpack strings

* libguile/print.h: declaration for modified function
  scm_i_charprint

* libguile/print.c: use locale when printing characters and
  strings
  (scm_i_charprint): input parameter is now scm_t_wchar
  (scm_simple_format): don't unpack strings

* libguile/posix.h: new declaration for scm_setbinary.

* libguile/posix.c (scm_setlocale): set default and stdio port
  encodings based on the locale's character encoding
  (scm_setbinary): new function

* libguile/ports.h (scm_t_port): add encoding and failed
  conversion handler to port type.  Declarations for new or modified
  functions scm_getc, scm_unget_byte, scm_ungetc,
  scm_i_get_port_encoding, scm_i_set_port_encoding_x,
  scm_port_encoding, scm_set_port_encoding_x,
  scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
  scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.

* libguile/ports.c: assign the current ports to zero on startup so
  we can see if they've been set.
  (scm_current_input_port, scm_current_output_port,
  scm_current_error_port): return #f if the port is not yet
  initialized
  (scm_new_port_table_entry): set up a new port's encoding and
  illegal sequence handler based on the thread's current defaults
  (scm_i_remove_port): free port encoding name when port is removed
  (scm_i_mode_bits_n): now takes a scheme string instead of a c
  string and length.  All callers changed.
  (SCM_MBCHAR_BUF_SIZE): new const
  (scm_getc): new function, since the scm_getc in inline.h is now
  scm_get_byte_or_eof.  This pulls one codepoint from a port.
  (scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
  (scm_unget_byte): new function, incorportaing the low-level functionality
  of scm_ungetc
  (scm_ungetc): uses scm_unget_byte

* libguile/numbers.h (scm_t_wchar): compilation order problem with
  scm_t_wchar being use in functions in multiple headers.  Forward
  declare scm_t_wchar.

* libguile/load.c (scm_primitive_load): scan for file encoding at
  top of file and use it to set the load port's encoding

* libguile/inline.h (scm_get_byte_or_eof): new function
  incorporating most of the functionality of scm_getc.

* libguile/fports.c (fport_fill_input): now returns scm_t_wchar

* libguile/chars.h (scm_t_wchar): avoid compilation order problem
  with declaration of scm_t_wchar
2009-08-25 07:54:37 -07:00
Michael Gran
9db8cf1634 Avoid unpacking symbols in GOOPS
* libguile/goops.c (scm_make_extended_class_from_symbol): new function
  (scm_class_of): don't unpack symbol chars
  (wrap_init): don't unpack symbol chars
  (make_class_from_symbol): new function
  (make_struct_class): don't unpack symbol chars
2009-08-23 10:40:44 -07:00