guile

mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-07-13 20:50:25 +02:00

Author	SHA1	Message	Date
Michael Gran	7519234547	Fix broken interaction between readline and Unicode This requires separate small fixes. Readline has internal logic to deal with multi-byte characters, so it wants bytes, not characters. scm_c_read gets called by the vm when readline is activated, and it was truncating multi-byte characters because soft ports didn't have the UCS-4 capability. Soft ports need the capability to read UCS-4 characters. Since soft ports may have a single byte buffer, full characters need to be stored into the pushback buffer. This broke the optimizations in scm_c_read for using an alternate buffer for single-byte-buffered ports, because the opimization wasn't expecting anything in the pushback buffer. * libguile/vports.c (sf_fill_input): store complete chars, not single bytes * libguile/ports.c (scm_c_read): don't use optimized path for non Latin-1. Add debug prints. * libguile/string.h: make scm_i_from_stringn and scm_i_string_ref public so that readline can use them * guile-readline/readline.c: read bytes, not complete chars, from the input port. Convert output to the output port's locale	2009-09-07 19:12:34 -07:00
Michael Gran	060e305adc	Avoid string buffer overrun in scm_scan_for_encoding * libguile/read.c (scm_scan_for_encoding): possible overrun if coding declaration is at end of file	2009-09-05 11:10:07 -07:00
Michael Gran	18d8fcd43c	Remove locale u8vector functions Locale u8vector functions deemed harmful. * libguile/strports.c (scm_strport_to_locale_u8vector) (scm_call_with_output_locale_u8vector, scm_open_input_locale_u8vector) (scm_get_output_locale_u8vector): removed * libguile/strports.h: removed declarations for scm_strport_to_locale_u8vector, scm_call_with_output_u8vector, scm_input_locale_u8vector, scm_get_output_locale_u8vector * test-suite/tests/encoding-iso88591.test: display tests removed * test-suite/tests/encoding-iso88597.test: display tests removed	2009-09-04 07:34:35 -07:00
Michael Gran	25ebc0340d	Initialize string ports with UTF-8 encoding String ports should be able to accept any string characters, regardless of the current locale. Setting it to UTF-8 achieves that. * libguile/strports.c (scm_i_mkstrport): set port's locale to UTF-8 (scm_mkstrport): convert input string to UTF-8	2009-09-04 07:30:13 -07:00
Michael Gran	3d03f9395e	write-char should handle UCS-4 characters * libguile/print.c (scm_write_char): call UCS-4 printing routine, instead of 8-bit primitive	2009-09-04 07:27:14 -07:00
Ken Raeburn	5f5e7a2cd6	Make test-case compilation with -DSCM_DEBUG=1 work. * gc.h (scm_i_expensive_validation_check): Declare SCM_API.	2009-09-03 16:59:11 -04:00
Michael Gran	bb15a36c25	Update docs and docstrings for Unicode characters * doc/ref/api-data.texi: more info about characters and codepoints * libguile/chars.c: replace 'code point' with 'Unicode code point' in docstrings	2009-09-03 08:48:23 -07:00
Michael Gran	ba8477ecce	Add char-set debugging function * libguile/srfi-14.c (scm_sys_char_set_dump): new function * libguile/srfi-14.h: declaration of scm_sys_char_set_dump	2009-09-03 08:29:45 -07:00
Michael Gran	719bb8cd5d	Distinguish between all codepoints and designated codepoints in char-sets * libguile/unidata_to_charset.pl (designated): renamed from full * libguile/srfi-14.c (scm_char_set_designated): new char-set * libguile/srfi-14.i.c (cs_designated): renamed from cs_full	2009-09-03 08:23:24 -07:00
Michael Gran	0dcd7e6153	Modify read and print of combining characters Since combining characters, such as accents, modify the appearance of the previous letter, it looks awkward in its character literal form (#\name) since it modified the backslash. This instead prints the combining character on a small circle. * libguile/chars.h (SCM_CODEPOINT_DOTTED_CIRCLE): new #define * libguile/print.c (iprint1): print combining characters on dotted circles * libguile/read.c (scm_read_character): parse the combination of combining characters and dotted circles	2009-09-03 07:47:26 -07:00
Michael Gran	aa2cba9c88	Remove always-true range checks in scm_i_ucs_range_to_char_set * libguile/srfi-14.c (scm_i_ucs_range_to_char_set): limits are always non-negative due to the type of the variable	2009-09-02 06:45:05 -07:00
Michael Gran	08ed805879	Unreachable code in charset set operator * libguile/srfi-14.c (scm_i_charset_set): remove unreachable code in scm_i_charset_set	2009-09-02 06:28:55 -07:00
Michael Gran	aff31b0f99	Optimize charset union operator * libguile/srfi-14.c (charsets_union): call scm_i_charset_set_range instead of setting characters one-by-one.	2009-09-02 06:28:47 -07:00
Michael Gran	f4cdfe6140	The charset complement operator should not include surrogates * libguile/srfi-14.c (charsets_complement): skip over surrogates when making a charset complement	2009-09-02 06:28:42 -07:00
Michael Gran	bde543e88b	char-set-filter! does not properly iterate over the charset * libguile/srfi-14.c (scm_char_set_filter_x): iterate over codepoints	2009-09-02 06:28:35 -07:00
Michael Gran	91772d8f8a	ucs-range->char-set should not store surrogates and has off-by-one error * libguile/srfi-14.c (scm_i_ucs_range_to_char_set): new function that contains the functionality of ucs_range_to_char_set, fixes off-by-one, and doesn't store surroges (scm_ucs_range_to_char_set, scm_ucs_range_to_char_set_x): call scm_i_ucs_range_to_char_set (scm_i_charset_set_range): new helper function	2009-09-02 06:28:29 -07:00
Michael Gran	693e72891f	char-set-any improperly unpacks charset data * libguile/srfi-14.c (scm_char_set_any): unpack the charset correctly	2009-09-02 06:28:20 -07:00
Michael Gran	7165abeba8	char-set-xor! should modify the input parameter char-set-xor! was not modifying its input parameter. It isn't technically required to do so by the spec, but, the other similar functions do it. * libguile/srfi-14.c (scm_char_set_xor_x): modify the input parameter	2009-09-02 06:28:11 -07:00
Ludovic Courtès	5f236208d0	Merge branch 'boehm-demers-weiser-gc' into bdw-gc-static-alloc Conflicts: acinclude.m4 libguile/strings.c	2009-09-02 01:37:37 +02:00
Ludovic Courtès	d7e7a02a62	Fix leaky behavior of `scm_take_TAGvector ()'. * libguile/srfi-4.c (free_user_data): New function. * libguile/srfi-4.i.c (scm_take_TAGvector): Register `free_user_data ()' as a finalizer for DATA. * libguile/objcodes.c (scm_objcode_to_bytecode): Allocate with `scm_malloc ()' since the memory taken by `scm_take_u8vector ()' will eventually be free(3)d. * libguile/vm.c (really_make_boot_program): Likewise.	2009-09-01 23:53:58 +02:00
Ludovic Courtès	ba54a2026b	Remove the distinction between inline/outline storage for stringbufs. * libguile/strings.c (STRINGBUF_HEADER_SIZE, STRINGBUF_HEADER_BYTES): New macros. (STRINGBUF_F_INLINE, STRINGBUF_INLINE, STRINGBUF_OUTLINE_CHARS, STRINGBUF_OUTLINE_LENGTH, STRINGBUF_INLINE_CHARS, STRINGBUF_INLINE_LENGTH, STRINGBUF_MAX_INLINE_LEN): Remove. (STRINGBUF_CHARS, STRINGBUF_WIDE_CHARS): Adjust to return a fixed location. (STRINGBUF_LENGTH): Get the length from word 1. (make_stringbuf, make_wide_stringbuf): Adjust to use a contiguous memory region. (wide_stringbuf): Renamed from `widen_stringbuf'. Adjust similarly. Return the new stringbuf. Callers updated. (narrow_stringbuf): Likewise. (scm_sys_string_dump, scm_sys_symbol_dump): Remove `stringbuf-inline' pair. * test-suite/tests/strings.test ("string internals")["null strings are inlined", "short Latin-1 encoded strings are inlined", "long Latin-1 encoded strings are not inlined", "short UCS-4 encoded strings are not inlined", "long UCS-4 encoded strings are not inlined"]: Remove. * test-suite/tests/symbols.test ("symbol internals")["null symbols are inlined", "short Latin-1 encoded symbols are inlined", "long Latin-1 encoded symbols are not inlined", "short UCS-4 encoded symbols are not inlined", "long UCS-4 encoded symbols are not inlined"]: Remove.	2009-09-01 02:02:43 +02:00
Ludovic Courtès	13a9455669	Fix leaky handling of `scm_take_locale_{symbol,string} ()'. * libguile/strings.c (scm_i_take_stringbufn, scm_i_c_take_symbol): Remove. (scm_take_locale_stringn): Rewrite in terms of `scm_from_locale_stringn ()'. * libguile/strings.h (scm_i_c_take_symbol, scm_i_take_stringbufn): Remove declarations.	2009-09-01 00:38:40 +02:00
Michael Gran	3f12aedb50	Update docs for Unicode characters * NEWS: add note about Unicode characters * doc/ref/api-data.texi: update Characters subsection * libguile/chars.c: update docstrings to match manual	2009-08-30 16:55:52 -07:00
Michael Gran	5f5920e012	Fix escape sequence normalization for wide strings * libguile/strings.c (scm_to_stringn): convert unistring escapes to guile escapes for both wide and narrow strings	2009-08-30 16:55:17 -07:00
Michael Gran	fac32b518e	Fix encoding errors with strings returned by string ports String ports, being 8-bit, store strings using the character encoding of the port. This fixes a bug where the default character encoding, and not the port's encoding, was being used to convert the string port data back to a string. * libguile/strports.c: extra comments (scm_strport_to_string): use port's encoding when converting port data to a string * libguile/strings.c (scm_i_from_stringn): renamed from scm_from_stringn and made internal. All callers changed. (scm_from_stringn): renamed to scm_i_from_stringn. * libguile/strings.h: declaration for scm_i_from_stringn	2009-08-30 16:54:49 -07:00
Ludovic Courtès	0665b3ffcb	Remove the distinction between inline/outline storage for bytevectors. * libguile/bytevectors.c (SCM_BYTEVECTOR_INLINE_THRESHOLD, SCM_BYTEVECTOR_INLINEABLE_SIZE_P, SCM_BYTEVECTOR_SET_CONTENTS, SCM_BYTEVECTOR_SET_INLINE): Remove. (SCM_BYTEVECTOR_HEADER_BYTES): New macro. (SCM_BYTEVECTOR_SET_ELEMENT_TYPE): Adjust to new flag layout. (make_bytevector): Remove content inlining machinery; use `scm_gc_malloc_pointerless ()' in all cases; special-case zero-sized vu8 buffers. (make_bytevector_from_buffer): Simplified. (scm_c_shrink_bytevector): New, formerly `scm_i_shrink_bytevector ()'. Remove buffer inlining machinery. (scm_bootstrap_bytevectors): Use `make_bytevector ()' for SCM_NULL_BYTEVECTOR. * libguile/bytevectors.h (SCM_BYTEVECTOR_HEADER_SIZE): New macro. (SCM_BYTEVECTOR_CONTENTS): Adjust to new layout. (SCM_SET_BYTEVECTOR_FLAGS): Properly cast F. (SCM_F_BYTEVECTOR_INLINE, SCM_BYTEVECTOR_INLINE_P): Remove. (SCM_BYTEVECTOR_ELEMENT_TYPE): Adjust. (scm_c_shrink_bytevector): Remove macro, make a C function declaration.	2009-08-31 01:07:30 +02:00
Ludovic Courtès	807e5a6641	Use a TC7 tag instead of a SMOB for bytevectors. * libguile/bytevectors.c (scm_tc16_bytevector): Remove. (SCM_BYTEVECTOR_SET_LENGTH, SCM_BYTEVECTOR_SET_CONTENTS, SCM_BYTEVECTOR_SET_INLINE, SCM_BYTEVECTOR_SET_ELEMENT_TYPE, make_bytevector_from_buffer, scm_is_bytevector, scm_bootstrap_bytevectors): Adjust to the SMOB->tc7 change. (scm_i_print_bytevector): New, formerly `print_bytevector ()'. (bytevector_equal_p): Remove. * libguile/bytevectors.h (SCM_BYTEVECTOR_LENGTH, SCM_BYTEVECTOR_CONTENTS, SCM_BYTEVECTOR_P): Adjust to SMOB->tc7 change. (SCM_BYTEVECTOR_FLAGS, SCM_SET_BYTEVECTOR_FLAGS): New macros. (scm_tc16_bytevector): Remove declaration. (scm_i_print_bytevector): New declaration. * libguile/eq.c (scm_equal_p): Handle `scm_tc7_bytevector'. * libguile/evalext.c (scm_self_evaluating_p): Likewise. * libguile/print.c (iprin1): Likewise. * libguile/tags.h (scm_tc7_bytevector): New. (scm_tc7_unused_8): Remove. * libguile/validate.h (SCM_VALIDATE_BYTEVECTOR): Adjust. * test-suite/tests/bytevectors.test ("Datum Syntax")["self-evaluating?"]: New test.	2009-08-30 20:12:09 +02:00
Michael Gran	0ffc78e384	Range check octal-escaped characters * libguile/read.c (scm_read_character): range check octal escapes	2009-08-29 07:14:49 -07:00
Michael Gran	24d23822ee	Surrogate characters shouldn't be in charsets * libguile/srfi-14.c (charsets_complement): use surrogate #defines instead of hardcoded numbers * libguile/srfi-14.i.c (cs_full_ranges): remove surrogates from full charset * libguile/unidata_to_charset.pl (full): test for surrogates	2009-08-29 00:01:06 -07:00
Michael Gran	526ee76ac3	Better range check for codepoints * libguile/chars.h (SCM_IS_UNICODE_CHAR): check for negative codepoints	2009-08-29 00:00:58 -07:00
Michael Gran	6d736fdba2	Cast the input to isalpha et al to integer * libguile/gc_os_dep.c (GC_linux_stack_base) [LINUX_STACKBOTTOM]: cast input of ctype functions to int * libguile/inet_aton.c (inet_aton): cast input of ctype functions to int * libguile/read.c (scm_scan_for_encoding): cast input of isalnum to int * libguile/win32-socket.c (scm_i_socket_uncomment): cast input of isspace to int	2009-08-28 21:19:05 -07:00
Ludovic Courtès	760fb97d1f	Remove deprecated variables/macros from the GC headers. * libguile/deprecated.c (scm_mtrigger, scm_mallocated, scm_max_segment_size): New global variables, from gc.c. (scm_map_free_list, scm_gc_set_debug_check_freelist_x)[GUILE_DEBUG_FREELIST]: New stubs. * libguile/deprecated.h (scm_mallocated, scm_mtrigger, scm_max_segment_size): New declarations. (scm_map_free_list, scm_gc_set_debug_check_freelist_x)[GUILE_DEBUG_FREELIST]: New declarations. * libguile/gc-malloc.c (scm_i_minyield_malloc): Remove. (scm_gc_init_malloc): Remove references to `scm_i_minyield_malloc' and `scm_mtrigger'. * libguile/gc.c (scm_mtrigger, scm_mallocated): Remove. (scm_init_storage): Remove reference to `SCM_HEAP_SEG_SIZE'. * libguile/gc.h (scm_max_segment_size, SCM_SET_FREELIST_LOC, SCM_FREELIST_LOC, scm_i_master_freelist, scm_i_master_freelist2, scm_mallocated, scm_mtrigger): Remove. (scm_map_free_list, scm_gc_set_debug_check_freelist_x)[SCM_ENABLE_DEPRECATED && GUILE_DEBUG_FREELIST]: Remove. * libguile/private-gc.h (SCM_DEFAULT_INIT_HEAP_SIZE_1, SCM_DEFAULT_MIN_YIELD_1, SCM_DEFAULT_MIN_YIELD_2, DEFAULT_SWEEP_AMOUNT, SCM_DEFAULT_MAX_SEGMENT_SIZE, SCM_MIN_HEAP_SEG_SIZE, SCM_HEAP_SEG_SIZE, SCM_GC_CARD_BVEC_SIZE_IN_LONGS, SCM_GC_IN_CARD_HEADERP): Remove. (scm_getenv_int): Made internal. (scm_i_marking, scm_mark_all, scm_i_deprecated_memory_return, scm_i_find_heap_calls, scm_gc_init_malloc, scm_gc_init_freelist, scm_gc_init_segments, scm_gc_init_mark): Remove declarations. * libguile/gc-segment-table.c: Remove, finally.	2009-08-28 21:02:42 +02:00
Ludovic Courtès	7af531508c	Merge branch 'master' into boehm-demers-weiser-gc Conflicts: libguile/Makefile.am libguile/bytevectors.c libguile/gc-card.c libguile/gc-mark.c libguile/programs.c libguile/srcprop.c libguile/srfi-14.c libguile/symbols.c libguile/threads.c libguile/unif.c libguile/vm.c	2009-08-28 19:16:46 +02:00
Andy Wingo	5950cc3fcc	fix case in which compiled path had stale .go, but fallback had fresh .go * libguile/load.c (scm_primitive_load_path): If the compiled path was out of date, but the fallback path was current, we correctly detected that case, but loaded the wrong file. So here fix the typo.	2009-08-28 17:13:09 +02:00
Michael Gran	8736ef70ac	scm_getc improperly handles Latin-1 characters Upper-plane Latin-1 characters should be converted to codepoints. * libguile/ports.c (scm_getc): improper conversion of char to scm_t_wchar	2009-08-27 20:42:36 -07:00
Michael Gran	d5ecf5797d	Fix FUNC_NAME definitions and #endif in srfi-14.[ch] * libguile/srfi-14.c: whitespace and FUNC_NAME fixes * libguile/srfi-14.h: #endif comment	2009-08-27 18:52:53 -07:00
Michael Gran	d0434ddf25	Script to generate srfi-14 charsets from UnicodeData.txt This script was used to generate srfi-14.i.c from the UnicodeData.txt file supplied by ftp://www.unicode.org/Public/UNIDATA/ * libguile/unidata_to_charset.pl	2009-08-27 18:23:46 -07:00
Ludovic Courtès	1505848425	Add missing `FUNC_NAME' definition. * libguile/load.c (scm_sys_warn_autocompilation_enabled): Define `FUNC_NAME'.	2009-08-28 01:16:49 +02:00
Daniel Kraft	ff81007918	Merge branch 'master' of git://git.savannah.gnu.org/guile into elisp	2009-08-27 19:26:04 +02:00
Michael Gran	fa316af70f	Default srfi-14 character set information * libguile/srfi-14.i.c: structures containing the default srfi-14 sets	2009-08-27 09:13:22 -07:00
Michael Gran	026ed23911	Always cast input to toupper as int * libguile/read.c (scm_scan_for_encoding): add cast to int	2009-08-27 07:44:18 -07:00
Michael Gran	930ddd34c3	Segfault when writing non-Latin-1 characters under Latin-1 locale * libguile/print.c (iprin1): handle write of non-Latin-1 characters under the Latin-1 locale	2009-08-27 07:44:01 -07:00
Michael Gran	f49dbcadf3	Unicode-capable srfi-14 charsets * libguile/Makefile.am: distribute new files srfi-14.i.c and unidata_to_charset.pl * chars.c (scm_c_upcase, scm_c_downcase): use unicode-enable toupper and tolower * libguile/srfi-14.h (scm_t_char_range, scm_t_char_set): new structures to describe char-sets (scm_t_char_set_cursor): new structure to describe char-set-cursors (SCM_BITS_PER_LONG): removed (SCM_CHARSET_GET): calls function New declarations for scm_i_charset_get, scm_i_charset_set, scm_i_charset_unset, and scm_debug_char_set. * test-suite/tests/srfi-14.test: new tests * libguile/srfi-14.c (SCM_CHARSET_DATA): new macro (SCM_CHARSET_SET, SCM_CHARSET_UNSET): call function (BYTES_PER_CHARSET, LONGS_PER_CHARSET): removed (scm_i_charset_get, scm_i_charset_set, scm_i_charset_unset) (charsets_equal, charsets_leq, charsets_union) (charsets_intersection, charsets_complement, charsets_xor): new functions that are low-level charset operators (charset_print, charset_free): modified for new charset struct (charset_cursor_print, charset_cursor_free): new function (make_char_set, scm_char_set_p, scm_char_set_eq, scm_car_set_leq) (scm_char_set_hash, scm_char_set_cursor, scm_char_set_ref) (scm_char_set_cursor_next, scm_end_of_char_set_p, scm_char_set_fold) (scm_char_set_unfold, scm_char_set_unfold_x, scm_char_set_for_each) (scm_char_set_map, scm_char_set_copy, scm_char_set, scm_list_to_char_set) (scm_list_to_char_set_x, scm_string_to_char_set, scm_string_to_char_set_x) (scm_char_set_filter, scm_char_set_filter_x, scm_ucs_range_to_char_set) (scm_ucs_range_to_char_set_x, scm_to_char_set, scm_char_set_size) (scm_char_set_count, scm_char_set_to_list, scm_char_set_to_string) (scm_char_set_contains_p, scm_char_set_every, scm_char_set_any) (scm_char_set_adjoin, scm_char_set_delete, scm_char_set_adjoin_x) (scm_char_set_delete_x, scm_char_set_complement, scm_char_set_union) (scm_char_set_intersection, scm_char_set_difference, scm_char_set_xor) (scm_char_set_diff_plus_intersection, scm_char_set_complement_x) (scm_char_set_union_x, scm_char_set_intersection_x, scm_char_set_difference_x) (scm_char_set_xor_x, scm_char_set_diff_plus_intersection_x): modified to use new charset and charset-cursor data structures (CSET_BLANK_PRED, CSET_SYMBOL_PRED, CSET_PUNCT_PRED, CSET_LOWER_PRED) (CSET_UPPER_PRED, CSET_LETTER_PRED, CSET_DIGIT_PRED, CSET_WHITESPACE_PRED) (CSET_CONTROL_PRED, CSET_HEX_DIGIT_PRED, CSET_ASCII_PRED, CSET_LETTER_PRED) (CSET_LETTER_AND_DIGIT_PRED, CSET_PRINTING_PRED, CSET_TRUE_PRED) (CSET_FALSE_PRED): removed (scm_srfi_14_compute_char_sets): removed - too slow to iterate over all of unicode at startup (scm_debug_char_set) [SCM_CHARSET_DEBUG]: new function	2009-08-27 07:43:33 -07:00
Ken Raeburn	71a5964c11	Don't leave and reenter guile mode if mutex is available On Aug 5, 2009, at 10:06, Ken Raeburn wrote: > (1) In scm_pthread_mutex_lock, we leave and re-enter guile mode so > that we don't block the thread while in guile mode. But we could > use pthread_mutex_trylock first, and avoid the costs scm_leave_guile > seems to incur on the Mac. If we can't acquire the lock, it should > return immediately, and then we can do the expensive, blocking > version. A quick, hack version of this changed my run time for > A(3,8) from 17.5s to 14.5s, saving about 17%; sigaltstack and > sigprocmask are still in the picture, because they're called from > scm_catch_with_pre_unwind_handler. I'll work up a nicer patch > later. Ah, we already had scm_i_pthread_mutex_trylock lying around; that made things easy. A second timing test with A(3,9) and this version of the patch (based on 1.9.1) shows the same improvement. * libguile/threads.c (scm_pthread_mutex_lock): Try the mutex before leaving and reentering guile mode.	2009-08-26 23:36:19 +01:00
Andy Wingo	4769c9db2c	fix uninitialized variable in scm_read_character * libguile/read.c (scm_read_character): Fix uninitialized variable.	2009-08-26 13:15:07 +02:00
Ludovic Courtès	f86f3b5b11	Remove the `scm_tc_free_cell' SMOB type. * libguile/deprecated.h (SCM_FREEP, SCM_NFREEP): Changed to constants. * libguile/gc.c (scm_i_tag_name): Remove reference to `scm_tc_free_cell'. * libguile/gc.h (SCM_FREE_CELL_CDR, SCM_SET_FREE_CELL_CDR): Remove. * libguile/smob.c (free_print): Remove. (scm_smob_prehistory): Don't create the "free" SMOB type. * libguile/struct.c (struct_finalizer_trampoline): Use a bare `scm_tc3_struct' tag for finalized structs instead of `scm_tc_free_cell'. * libguile/tags.h (scm_tc_free_cell): Remove.	2009-08-25 23:57:49 +02:00
Andy Wingo	c6a1380bde	Merge commit 'origin/master' Conflicts: libguile/unif.c	2009-08-25 21:43:00 +02:00
Andy Wingo	108e18b18a	Merge wip-array refactor, up to `cd43fdc5b7` Conflicts: NEWS libguile/print.c	2009-08-25 18:04:02 +02:00
Michael Gran	889975e51a	Add full Unicode capability to ports and the default reader Ports are given two additional properties: a character encoding and a conversion failure strategy. These properties have getters and setters. The new properties are used to convert any locale text to/from the internal representation of strings. If unspecified, ports use a default value. The default value of these properties is held in a fluid. The default character encoding can be modified by calling setlocale. ISO-8859-1 is treated specially. Since it is a native encoding of strings, it can be processed more quickly. Source code is assumed to be ISO-8859-1 unless otherwise specified. The encoding of a source code file can be given as 'coding: XXXXX' in a magic comment at the top of a file. The C functions that deal with encoding often use a null pointer as shorthand for the native Latin-1 encoding, for efficiency's sake. * test-suite/tests/encoding-iso88591.test: new tests * test-suite/tests/encoding-iso88597.test: new tests * test-suite/tests/encoding-utf8.test: new tests * test-suite/tests/encoding-escapes.test: new tests * test-suite/tests/numbers.test: declare 'binary' encoding * test-suite/tests/ports.test: declare 'binary' encoding * test-suite/tests/r6rs-ports.test: declare 'binary' encoding * module/system/base/compile.scm (compile-file): use source-code file's self-declared encoding when compiling files * libguile/strports.c: store string ports in locale encoding (scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector) (scm_open_input_locale_u8vector, scm_get_output_locale_u8vector): new functions * libguile/strings.h: new declaration for scm_i_string_contains_char * libguile/strings.c (scm_i_string_contains_char): new function (scm_from_stringn, scm_to_stringn): use NULL for Latin-1 (scm_from_locale_stringn, scm_to_locale_stringn): respect character encoding of input and output ports * libguile/read.h: declaration for scm_scan_for_encoding * libguile/read.c: (read_token): now takes scheme string instead of C string/length (read_complete_token): new function (scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol) (scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment) (scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector) (scm_read_scsh_block_comment, scm_read_commented_expression) (scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart) (scm_read_expression): use scm_t_wchar for char type, use read_complete_token (scm_scan_for_encoding): new function to find a file's character encoding (scm_file_encoding): new function to find a port's character encoding * libguile/rdelim.c: don't unpack strings * libguile/print.h: declaration for modified function scm_i_charprint * libguile/print.c: use locale when printing characters and strings (scm_i_charprint): input parameter is now scm_t_wchar (scm_simple_format): don't unpack strings * libguile/posix.h: new declaration for scm_setbinary. * libguile/posix.c (scm_setlocale): set default and stdio port encodings based on the locale's character encoding (scm_setbinary): new function * libguile/ports.h (scm_t_port): add encoding and failed conversion handler to port type. Declarations for new or modified functions scm_getc, scm_unget_byte, scm_ungetc, scm_i_get_port_encoding, scm_i_set_port_encoding_x, scm_port_encoding, scm_set_port_encoding_x, scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x, scm_port_conversion_strategy, scm_set_port_conversion_strategy_x. * libguile/ports.c: assign the current ports to zero on startup so we can see if they've been set. (scm_current_input_port, scm_current_output_port, scm_current_error_port): return #f if the port is not yet initialized (scm_new_port_table_entry): set up a new port's encoding and illegal sequence handler based on the thread's current defaults (scm_i_remove_port): free port encoding name when port is removed (scm_i_mode_bits_n): now takes a scheme string instead of a c string and length. All callers changed. (SCM_MBCHAR_BUF_SIZE): new const (scm_getc): new function, since the scm_getc in inline.h is now scm_get_byte_or_eof. This pulls one codepoint from a port. (scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding (scm_unget_byte): new function, incorportaing the low-level functionality of scm_ungetc (scm_ungetc): uses scm_unget_byte * libguile/numbers.h (scm_t_wchar): compilation order problem with scm_t_wchar being use in functions in multiple headers. Forward declare scm_t_wchar. * libguile/load.c (scm_primitive_load): scan for file encoding at top of file and use it to set the load port's encoding * libguile/inline.h (scm_get_byte_or_eof): new function incorporating most of the functionality of scm_getc. * libguile/fports.c (fport_fill_input): now returns scm_t_wchar * libguile/chars.h (scm_t_wchar): avoid compilation order problem with declaration of scm_t_wchar	2009-08-25 07:54:37 -07:00
Michael Gran	9db8cf1634	Avoid unpacking symbols in GOOPS * libguile/goops.c (scm_make_extended_class_from_symbol): new function (scm_class_of): don't unpack symbol chars (wrap_init): don't unpack symbol chars (make_class_from_symbol): new function (make_struct_class): don't unpack symbol chars	2009-08-23 10:40:44 -07:00

... 33 34 35 36 37 ...

7681 commits