* libguile/unidata_to_charset.pl (designated): renamed from full
* libguile/srfi-14.c (scm_char_set_designated): new char-set
* libguile/srfi-14.i.c (cs_designated): renamed from cs_full
Since combining characters, such as accents, modify the appearance of the
previous letter, it looks awkward in its character literal form (#\name)
since it modified the backslash. This instead prints the combining
character on a small circle.
* libguile/chars.h (SCM_CODEPOINT_DOTTED_CIRCLE): new #define
* libguile/print.c (iprint1): print combining characters on dotted circles
* libguile/read.c (scm_read_character): parse the combination of combining
characters and dotted circles
* libguile/srfi-14.c (scm_i_ucs_range_to_char_set): new function that
contains the functionality of ucs_range_to_char_set, fixes
off-by-one, and doesn't store surroges
(scm_ucs_range_to_char_set, scm_ucs_range_to_char_set_x): call
scm_i_ucs_range_to_char_set
(scm_i_charset_set_range): new helper function
char-set-xor! was not modifying its input parameter. It isn't
technically required to do so by the spec, but, the other similar
functions do it.
* libguile/srfi-14.c (scm_char_set_xor_x): modify the input parameter
String ports, being 8-bit, store strings using the character encoding
of the port. This fixes a bug where the default character encoding, and
not the port's encoding, was being used to convert the string port data
back to a string.
* libguile/strports.c: extra comments
(scm_strport_to_string): use port's encoding when converting port data
to a string
* libguile/strings.c (scm_i_from_stringn): renamed from scm_from_stringn
and made internal. All callers changed.
(scm_from_stringn): renamed to scm_i_from_stringn.
* libguile/strings.h: declaration for scm_i_from_stringn
* libguile/srfi-14.c (charsets_complement): use surrogate #defines instead
of hardcoded numbers
* libguile/srfi-14.i.c (cs_full_ranges): remove surrogates from full
charset
* libguile/unidata_to_charset.pl (full): test for surrogates
* libguile/gc_os_dep.c (GC_linux_stack_base) [LINUX_STACKBOTTOM]: cast
input of ctype functions to int
* libguile/inet_aton.c (inet_aton): cast input of ctype functions to int
* libguile/read.c (scm_scan_for_encoding): cast input of isalnum to int
* libguile/win32-socket.c (scm_i_socket_uncomment): cast input of isspace
to int
* libguile/load.c (scm_primitive_load_path): If the compiled path was
out of date, but the fallback path was current, we correctly detected
that case, but loaded the wrong file. So here fix the typo.
This script was used to generate srfi-14.i.c from the UnicodeData.txt
file supplied by ftp://www.unicode.org/Public/UNIDATA/
* libguile/unidata_to_charset.pl
On Aug 5, 2009, at 10:06, Ken Raeburn wrote:
> (1) In scm_pthread_mutex_lock, we leave and re-enter guile mode so
> that we don't block the thread while in guile mode. But we could
> use pthread_mutex_trylock first, and avoid the costs scm_leave_guile
> seems to incur on the Mac. If we can't acquire the lock, it should
> return immediately, and then we can do the expensive, blocking
> version. A quick, hack version of this changed my run time for
> A(3,8) from 17.5s to 14.5s, saving about 17%; sigaltstack and
> sigprocmask are still in the picture, because they're called from
> scm_catch_with_pre_unwind_handler. I'll work up a nicer patch
> later.
Ah, we already had scm_i_pthread_mutex_trylock lying around; that made
things easy.
A second timing test with A(3,9) and this version of the patch (based
on 1.9.1) shows the same improvement.
* libguile/threads.c (scm_pthread_mutex_lock): Try the mutex before
leaving and reentering guile mode.
Ports are given two additional properties: a character encoding and
a conversion failure strategy. These properties have getters and setters.
The new properties are used to convert any locale text to/from the
internal representation of strings.
If unspecified, ports use a default value. The default value of these
properties is held in a fluid. The default character encoding can be
modified by calling setlocale.
ISO-8859-1 is treated specially. Since it is a native encoding of
strings, it can be processed more quickly. Source code is assumed to be
ISO-8859-1 unless otherwise specified. The encoding of a source code
file can be given as 'coding: XXXXX' in a magic comment at the top of a
file.
The C functions that deal with encoding often use a null pointer
as shorthand for the native Latin-1 encoding, for efficiency's sake.
* test-suite/tests/encoding-iso88591.test: new tests
* test-suite/tests/encoding-iso88597.test: new tests
* test-suite/tests/encoding-utf8.test: new tests
* test-suite/tests/encoding-escapes.test: new tests
* test-suite/tests/numbers.test: declare 'binary' encoding
* test-suite/tests/ports.test: declare 'binary' encoding
* test-suite/tests/r6rs-ports.test: declare 'binary' encoding
* module/system/base/compile.scm (compile-file): use source-code
file's self-declared encoding when compiling files
* libguile/strports.c: store string ports in locale encoding
(scm_strport_to_locale_u8vector, scm_call_with_output_locale_u8vector)
(scm_open_input_locale_u8vector, scm_get_output_locale_u8vector):
new functions
* libguile/strings.h: new declaration for scm_i_string_contains_char
* libguile/strings.c (scm_i_string_contains_char): new function
(scm_from_stringn, scm_to_stringn): use NULL for Latin-1
(scm_from_locale_stringn, scm_to_locale_stringn): respect character
encoding of input and output ports
* libguile/read.h: declaration for scm_scan_for_encoding
* libguile/read.c:
(read_token): now takes scheme string instead of C string/length
(read_complete_token): new function
(scm_read_sexp, scm_read_number, scm_read_mixed_case_symbol)
(scm_read_number_and_radix, scm_read_quote, scm_read_semicolon_comment)
(scm_read_srfi4_vector, scm_read_bytevector, scm_read_guile_bit_vector)
(scm_read_scsh_block_comment, scm_read_commented_expression)
(scm_read_extended_symbol, scm_read_sharp_extension, scm_read_shart)
(scm_read_expression): use scm_t_wchar for char type, use read_complete_token
(scm_scan_for_encoding): new function to find a file's character encoding
(scm_file_encoding): new function to find a port's character encoding
* libguile/rdelim.c: don't unpack strings
* libguile/print.h: declaration for modified function
scm_i_charprint
* libguile/print.c: use locale when printing characters and
strings
(scm_i_charprint): input parameter is now scm_t_wchar
(scm_simple_format): don't unpack strings
* libguile/posix.h: new declaration for scm_setbinary.
* libguile/posix.c (scm_setlocale): set default and stdio port
encodings based on the locale's character encoding
(scm_setbinary): new function
* libguile/ports.h (scm_t_port): add encoding and failed
conversion handler to port type. Declarations for new or modified
functions scm_getc, scm_unget_byte, scm_ungetc,
scm_i_get_port_encoding, scm_i_set_port_encoding_x,
scm_port_encoding, scm_set_port_encoding_x,
scm_i_get_conversion_strategy, scm_i_set_conversion_strategy_x,
scm_port_conversion_strategy, scm_set_port_conversion_strategy_x.
* libguile/ports.c: assign the current ports to zero on startup so
we can see if they've been set.
(scm_current_input_port, scm_current_output_port,
scm_current_error_port): return #f if the port is not yet
initialized
(scm_new_port_table_entry): set up a new port's encoding and
illegal sequence handler based on the thread's current defaults
(scm_i_remove_port): free port encoding name when port is removed
(scm_i_mode_bits_n): now takes a scheme string instead of a c
string and length. All callers changed.
(SCM_MBCHAR_BUF_SIZE): new const
(scm_getc): new function, since the scm_getc in inline.h is now
scm_get_byte_or_eof. This pulls one codepoint from a port.
(scm_lfwrite_substr, scm_lfwrite_str): now uses port's encoding
(scm_unget_byte): new function, incorportaing the low-level functionality
of scm_ungetc
(scm_ungetc): uses scm_unget_byte
* libguile/numbers.h (scm_t_wchar): compilation order problem with
scm_t_wchar being use in functions in multiple headers. Forward
declare scm_t_wchar.
* libguile/load.c (scm_primitive_load): scan for file encoding at
top of file and use it to set the load port's encoding
* libguile/inline.h (scm_get_byte_or_eof): new function
incorporating most of the functionality of scm_getc.
* libguile/fports.c (fport_fill_input): now returns scm_t_wchar
* libguile/chars.h (scm_t_wchar): avoid compilation order problem
with declaration of scm_t_wchar
* libguile/goops.c (scm_make_extended_class_from_symbol): new function
(scm_class_of): don't unpack symbol chars
(wrap_init): don't unpack symbol chars
(make_class_from_symbol): new function
(make_struct_class): don't unpack symbol chars
* libguile/socket.c (scm_recv): receive the message without holding the
stringbuf writing lock
(scm_send): try to narrow a string before using it
* libguile/stime.c (strftime): convert string to UTF-8 so that it can
be safely passed to strftime
(strptime): convert input string to UTF-8 so that it can be safely
passed through strptime
* libguile/strings.c (narrow_stringbuf): new function
(scm_i_try_narrow_string): new function
* libguile/strings.h: new declaration for scm_i_try_narrow_string
* libguile/struct.c (scm_make_struct_layout, scm_struct_init)
(scm_struct_vtable_p, scm_struct_ref, scm_struct_set_x): use string
and symbol accessors and avoid unpacking strings and symbols
* libguile/throw.c (scm_ithrow): allow wide symbols in the error message
* libguile/unif.c (scm_enclose_array, scm_istr2bve): use string
accessors and avoid unpacking strings
Problem was that if an application includes both libguile.h and the
system's setjmp.h, and is compiled on IA64, it gets compile errors
because of jmp_buf, setjmp and longjmp being multiply defined.
* libguile/__scm.h (__ia64__): Define scm_i_jmp_buf, SCM_I_SETJMP and
SCM_I_LONGJMP instead of jmp_buf, setjmp and longjmp.
(all other platforms): Map scm_i_jmp_buf, SCM_I_SETJMP and
SCM_I_LONGJMP to jmp_buf, setjmp and longjmp.
* libguile/continuations.c (scm_make_continuation): Use `SCM_I_SETJMP'
instead of `setjmp'.
(copy_stack_and_call): Use `SCM_I_LONJMP' instead of `longjmp'.
(scm_ia64_longjmp): Use type `scm_i_jmp_buf' instead of `jmp_buf'.
* libguile/continuations.h (scm_t_contregs): Use type `scm_i_jmp_buf'
instead of `jmp_buf'.
* libguile/threads.c (suspend): Use `SCM_I_SETJMP' instead of
`setjmp'.
* libguile/threads.h (scm_i_thread): Use type `scm_i_jmp_buf' instead
of `jmp_buf'.
* libguile/throw.c (JBJMPBUF, make_jmpbuf, jmp_buf_and_retval): Use
type `scm_i_jmp_buf' instead of `jmp_buf'.
(scm_c_catch): Use `SCM_I_SETJMP' instead of `setjmp'.
(scm_ithrow): Use `SCM_I_LONGJMP' instead of `longjmp'.
* libguile/srcprop.c (scm_set_source_properties_x): Look for the special
source properties, save them off, and then construct a srcprops object
using them.
As with the previous commit, this is to avoid any suggestion that
the source properties API uses the property list format, i.e.
(key1 value1 key2 value2 ...).
Also remove scm_srcprops_to_plist () from the API. It doesn't have any
external usefulness and has never documented.
* libguile/numbers.c (scm_i_print_fraction): use string accessors
(XDIGIT2UINT): use libunistring function
(mem2uinteger, mem2integer, mem2decimal_from_point, mem2ureal)
(mem2complex): take scheme string instead of c string; use accessors
(scm_i_string_to_number): new function
(scm_c_locale_string_to_number): use scm_i_string_to_number
* libguile/numbers.h: declaration for scm_i_string_to_number
* libguile/strings.c (scm_i_string_strcmp): new function
* libguile/strings.h: declaration for scm_i_string_strcmp
* libguile/hash.c (scm_i_string_hash): new function
(scm_hasher): don't unpack string: use scm_i_string_hash
* libguile/hash.h: new declaration for scm_i_string_hash
* libguile/print.c (quote_keywordish_symbol): use symbol accessors
(scm_i_print_symbol_name): new function
(scm_print_symbol_name): call scm_i_print_symbol_name
(iprin1): use scm_i_print_symbol_name to print symbols
* libguile/print.h: new declaration for scm_i_print_symbol_name
* libguile/symbols.c (lookup_interned_symbol): now takes scheme string
instead of c string; callers changed
(lookup_interned_symbol): add wide symbol support
(scm_i_c_mem2symbol): removed
(scm_i_mem2symbol): removed and replaced with scm_i_str2symbol
(scm_i_str2symbol): new function
(scm_i_mem2uninterned_symbol): removed and replaced with
scm_i_str2uninterned_symbol
(scm_i_str2uninterned_symbol): new function
(scm_make_symbol, scm_string_to_symbol, scm_from_locale_symbol)
(scm_from_locale_symboln): use scm_i_str2symbol
* test-suite/tests/symbols.test: new tests