1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-06-24 12:20:20 +02:00

Improve handling of Unicode byte-order marks (BOMs).

* libguile/ports-internal.h (struct scm_port_internal): Add new members
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (SCM_UNICODE_BOM): New macro.
  (scm_i_port_iconv_descriptors): Add 'mode' parameter to prototype.

* libguile/ports.c (scm_new_port_table_entry): Initialize
  'at_stream_start_for_bom_read' and 'at_stream_start_for_bom_write'.
  (get_iconv_codepoint): Pass new 'mode' parameter to
  'scm_i_port_iconv_descriptors'.
  (get_codepoint): After reading a codepoint at stream start, record
  that we're no longer at stream start, and consume a BOM where
  appropriate.
  (scm_seek): Set the stream start flags according to the new position.
  (looking_at_bytes): New static function.
  (scm_utf8_bom, scm_utf16be_bom, scm_utf16le_bom, scm_utf32be_bom,
  scm_utf32le_bom): New static const arrays.
  (decide_utf16_encoding, decide_utf32_encoding): New static functions.
  (scm_i_port_iconv_descriptors): Add new 'mode' parameter.  If the
  specified encoding is UTF-16 or UTF-32, make that precise by deciding
  what byte order to use, and construct iconv descriptors based on the
  precise encoding.
  (scm_i_set_port_encoding_x): Record that we are now at stream start.
  Do not open the new iconv descriptors immediately; let them be
  initialized lazily.

* libguile/print.c (display_string_using_iconv): Record that we're no
  longer at stream start.  Write a BOM if appropriate.

* doc/ref/api-io.texi (BOM Handling): New node.

* test-suite/tests/ports.test ("set-port-encoding!, wrong encoding"):
  Adapt test to cope with the fact that 'set-port-encoding!' does not
  immediately open the iconv descriptors.
  (bv-read-test): New procedure.
  ("unicode byte-order marks (BOMs)"): New test prefix.
This commit is contained in:
Mark H Weaver 2013-04-03 04:22:04 -04:00
parent 45c0878b86
commit cdd3d6c9f4
5 changed files with 515 additions and 32 deletions

View file

@ -881,8 +881,24 @@ display_string_using_iconv (const void *str, int narrow_p, size_t len,
{
size_t printed;
scm_t_iconv_descriptors *id;
scm_t_port_internal *pti = SCM_PORT_GET_INTERNAL (port);
id = scm_i_port_iconv_descriptors (port);
id = scm_i_port_iconv_descriptors (port, SCM_PORT_WRITE);
if (SCM_UNLIKELY (pti->at_stream_start_for_bom_write && len > 0))
{
scm_t_port *pt = SCM_PTAB_ENTRY (port);
/* Record that we're no longer at stream start. */
pti->at_stream_start_for_bom_write = 0;
if (pt->rw_random)
pti->at_stream_start_for_bom_read = 0;
/* Write a BOM if appropriate. */
if (SCM_UNLIKELY (strcasecmp(pt->encoding, "UTF-16") == 0
|| strcasecmp(pt->encoding, "UTF-32") == 0))
display_character (SCM_UNICODE_BOM, port, iconveh_error);
}
printed = 0;