1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-06-10 14:00:21 +02:00

Change iconv procedures to take optional instead of keyword arg

* module/ice-9/iconv.scm (call-with-encoded-output-string):
  (string->bytevector, bytevector->string): Take an optional instead of
  a keyword argument.

* doc/ref/api-data.texi (Representing Strings as Bytes): Adapt docs to
  change, and fix a number of errors.  Thanks to Ludovic Courtès for the
  pointers.

* test-suite/tests/iconv.test ("wide non-ascii string"): Add a test for
  the 'substitute path.
This commit is contained in:
Andy Wingo 2013-01-11 21:15:28 +01:00
parent 990b11c53f
commit 5ed4ea90a9
3 changed files with 29 additions and 15 deletions

View file

@ -4190,6 +4190,11 @@ sequences of bytes. @xref{Bytevectors}, for more on how Guile
represents raw byte sequences. This module gets its name from the
common @sc{unix} command of the same name.
Note that often it is sufficient to just read and write strings from
ports instead of using these functions. To do this, specify the port
encoding using @code{set-port-encoding!}. @xref{Ports}, for more on
ports and character encodings.
Unlike the rest of the procedures in this section, you have to load the
@code{iconv} module before having access to these procedures:
@ -4197,31 +4202,32 @@ Unlike the rest of the procedures in this section, you have to load the
(use-modules (ice-9 iconv))
@end example
@deffn string->bytevector string encoding [#:conversion-strategy='error]
@deffn string->bytevector string encoding [conversion-strategy]
Encode @var{string} as a sequence of bytes.
The string will be encoded in the character set specified by the
@var{encoding} string. If the string has characters that cannot be
represented in the encoding, by default this procedure raises an
@code{encoding-error}, though the @code{#:conversion-strategy} keyword
can specify other behaviors.
@code{encoding-error}. Pass a @var{conversion-strategy} argument to
specify other behaviors.
The return value is a bytevector. @xref{Bytevectors}, for more on
bytevectors. @xref{Ports}, for more on character encodings and
conversion strategies.
@end deffn
@deffn bytevector->string bytevector encoding
@deffn bytevector->string bytevector encoding [conversion-strategy]
Decode @var{bytevector} into a string.
The bytes will be decoded from the character set by the @var{encoding}
string. If the bytes do not form a valid encoding, by default this
procedure raises an @code{decoding-error}, though that may be overridden
with the @code{#:conversion-strategy} keyword. @xref{Ports}, for more
on character encodings and conversion strategies.
procedure raises an @code{decoding-error}. As with
@code{string->bytevector}, pass the optional @var{conversion-strategy}
argument to modify this behavior. @xref{Ports}, for more on character
encodings and conversion strategies.
@end deffn
@deffn call-with-output-encoded-string encoding proc [#:conversion-strategy='error]
@deffn call-with-output-encoded-string encoding proc [conversion-strategy]
Like @code{call-with-output-string}, but instead of returning a string,
returns a encoding of the string according to @var{encoding}, as a
bytevector. This procedure can be more efficient than collecting a
@ -4371,7 +4377,7 @@ If @var{lenp} is @code{NULL}, this function will return a null-terminated C
string. It will throw an error if the string contains a null
character.
The Scheme interface to this function is @code{encode-string}, from the
The Scheme interface to this function is @code{string->bytevector}, from the
@code{ice-9 iconv} module. @xref{Representing Strings as Bytes}.
@end deftypefn
@ -4382,7 +4388,7 @@ string is passed as the ASCII, null-terminated C string @code{encoding}.
The @var{handler} parameters suggests a strategy for dealing with
unconvertable characters.
The Scheme interface to this function is @code{decode-string}.
The Scheme interface to this function is @code{bytevector->string}.
@xref{Representing Strings as Bytes}.
@end deftypefn

View file

@ -43,7 +43,8 @@
bv))))
(define* (call-with-encoded-output-string encoding proc
#:key (conversion-strategy 'error))
#:optional
(conversion-strategy 'error))
(if (string-ci=? encoding "utf-8")
;; I don't know why, but this appears to be faster; at least for
;; serving examples/debug-sxml.scm (1464 reqs/s versus 850
@ -59,16 +60,18 @@
;; TODO: Provide C implementations that call scm_from_stringn and
;; friends?
(define* (string->bytevector str encoding #:key (conversion-strategy 'error))
(define* (string->bytevector str encoding
#:optional (conversion-strategy 'error))
(if (string-ci=? encoding "utf-8")
(string->utf8 str)
(call-with-encoded-output-string
encoding
(lambda (port)
(display str port))
#:conversion-strategy conversion-strategy)))
conversion-strategy)))
(define* (bytevector->string bv encoding #:key (conversion-strategy 'error))
(define* (bytevector->string bv encoding
#:optional (conversion-strategy 'error))
(if (string-ci=? encoding "utf-8")
(utf8->string bv)
(let ((p (open-bytevector-input-port bv)))

View file

@ -112,4 +112,9 @@
(string->bytevector s "ascii"))
(pass-if-exception "encode as latin1" exception:encoding-error
(string->bytevector s "latin1"))))
(string->bytevector s "latin1"))
(pass-if "encode as ascii with substitutions"
(equal? (make-string (string-length s) #\?)
(bytevector->string (string->bytevector s "ascii" 'substitute)
"ascii")))))