1
Fork 0
mirror of https://git.savannah.gnu.org/git/guile.git synced 2025-06-11 14:21:10 +02:00

Change iconv procedures to take optional instead of keyword arg

* module/ice-9/iconv.scm (call-with-encoded-output-string):
  (string->bytevector, bytevector->string): Take an optional instead of
  a keyword argument.

* doc/ref/api-data.texi (Representing Strings as Bytes): Adapt docs to
  change, and fix a number of errors.  Thanks to Ludovic Courtès for the
  pointers.

* test-suite/tests/iconv.test ("wide non-ascii string"): Add a test for
  the 'substitute path.
This commit is contained in:
Andy Wingo 2013-01-11 21:15:28 +01:00
parent 990b11c53f
commit 5ed4ea90a9
3 changed files with 29 additions and 15 deletions

View file

@ -4190,6 +4190,11 @@ sequences of bytes. @xref{Bytevectors}, for more on how Guile
represents raw byte sequences. This module gets its name from the represents raw byte sequences. This module gets its name from the
common @sc{unix} command of the same name. common @sc{unix} command of the same name.
Note that often it is sufficient to just read and write strings from
ports instead of using these functions. To do this, specify the port
encoding using @code{set-port-encoding!}. @xref{Ports}, for more on
ports and character encodings.
Unlike the rest of the procedures in this section, you have to load the Unlike the rest of the procedures in this section, you have to load the
@code{iconv} module before having access to these procedures: @code{iconv} module before having access to these procedures:
@ -4197,31 +4202,32 @@ Unlike the rest of the procedures in this section, you have to load the
(use-modules (ice-9 iconv)) (use-modules (ice-9 iconv))
@end example @end example
@deffn string->bytevector string encoding [#:conversion-strategy='error] @deffn string->bytevector string encoding [conversion-strategy]
Encode @var{string} as a sequence of bytes. Encode @var{string} as a sequence of bytes.
The string will be encoded in the character set specified by the The string will be encoded in the character set specified by the
@var{encoding} string. If the string has characters that cannot be @var{encoding} string. If the string has characters that cannot be
represented in the encoding, by default this procedure raises an represented in the encoding, by default this procedure raises an
@code{encoding-error}, though the @code{#:conversion-strategy} keyword @code{encoding-error}. Pass a @var{conversion-strategy} argument to
can specify other behaviors. specify other behaviors.
The return value is a bytevector. @xref{Bytevectors}, for more on The return value is a bytevector. @xref{Bytevectors}, for more on
bytevectors. @xref{Ports}, for more on character encodings and bytevectors. @xref{Ports}, for more on character encodings and
conversion strategies. conversion strategies.
@end deffn @end deffn
@deffn bytevector->string bytevector encoding @deffn bytevector->string bytevector encoding [conversion-strategy]
Decode @var{bytevector} into a string. Decode @var{bytevector} into a string.
The bytes will be decoded from the character set by the @var{encoding} The bytes will be decoded from the character set by the @var{encoding}
string. If the bytes do not form a valid encoding, by default this string. If the bytes do not form a valid encoding, by default this
procedure raises an @code{decoding-error}, though that may be overridden procedure raises an @code{decoding-error}. As with
with the @code{#:conversion-strategy} keyword. @xref{Ports}, for more @code{string->bytevector}, pass the optional @var{conversion-strategy}
on character encodings and conversion strategies. argument to modify this behavior. @xref{Ports}, for more on character
encodings and conversion strategies.
@end deffn @end deffn
@deffn call-with-output-encoded-string encoding proc [#:conversion-strategy='error] @deffn call-with-output-encoded-string encoding proc [conversion-strategy]
Like @code{call-with-output-string}, but instead of returning a string, Like @code{call-with-output-string}, but instead of returning a string,
returns a encoding of the string according to @var{encoding}, as a returns a encoding of the string according to @var{encoding}, as a
bytevector. This procedure can be more efficient than collecting a bytevector. This procedure can be more efficient than collecting a
@ -4371,7 +4377,7 @@ If @var{lenp} is @code{NULL}, this function will return a null-terminated C
string. It will throw an error if the string contains a null string. It will throw an error if the string contains a null
character. character.
The Scheme interface to this function is @code{encode-string}, from the The Scheme interface to this function is @code{string->bytevector}, from the
@code{ice-9 iconv} module. @xref{Representing Strings as Bytes}. @code{ice-9 iconv} module. @xref{Representing Strings as Bytes}.
@end deftypefn @end deftypefn
@ -4382,7 +4388,7 @@ string is passed as the ASCII, null-terminated C string @code{encoding}.
The @var{handler} parameters suggests a strategy for dealing with The @var{handler} parameters suggests a strategy for dealing with
unconvertable characters. unconvertable characters.
The Scheme interface to this function is @code{decode-string}. The Scheme interface to this function is @code{bytevector->string}.
@xref{Representing Strings as Bytes}. @xref{Representing Strings as Bytes}.
@end deftypefn @end deftypefn

View file

@ -43,7 +43,8 @@
bv)))) bv))))
(define* (call-with-encoded-output-string encoding proc (define* (call-with-encoded-output-string encoding proc
#:key (conversion-strategy 'error)) #:optional
(conversion-strategy 'error))
(if (string-ci=? encoding "utf-8") (if (string-ci=? encoding "utf-8")
;; I don't know why, but this appears to be faster; at least for ;; I don't know why, but this appears to be faster; at least for
;; serving examples/debug-sxml.scm (1464 reqs/s versus 850 ;; serving examples/debug-sxml.scm (1464 reqs/s versus 850
@ -59,16 +60,18 @@
;; TODO: Provide C implementations that call scm_from_stringn and ;; TODO: Provide C implementations that call scm_from_stringn and
;; friends? ;; friends?
(define* (string->bytevector str encoding #:key (conversion-strategy 'error)) (define* (string->bytevector str encoding
#:optional (conversion-strategy 'error))
(if (string-ci=? encoding "utf-8") (if (string-ci=? encoding "utf-8")
(string->utf8 str) (string->utf8 str)
(call-with-encoded-output-string (call-with-encoded-output-string
encoding encoding
(lambda (port) (lambda (port)
(display str port)) (display str port))
#:conversion-strategy conversion-strategy))) conversion-strategy)))
(define* (bytevector->string bv encoding #:key (conversion-strategy 'error)) (define* (bytevector->string bv encoding
#:optional (conversion-strategy 'error))
(if (string-ci=? encoding "utf-8") (if (string-ci=? encoding "utf-8")
(utf8->string bv) (utf8->string bv)
(let ((p (open-bytevector-input-port bv))) (let ((p (open-bytevector-input-port bv)))

View file

@ -112,4 +112,9 @@
(string->bytevector s "ascii")) (string->bytevector s "ascii"))
(pass-if-exception "encode as latin1" exception:encoding-error (pass-if-exception "encode as latin1" exception:encoding-error
(string->bytevector s "latin1")))) (string->bytevector s "latin1"))
(pass-if "encode as ascii with substitutions"
(equal? (make-string (string-length s) #\?)
(bytevector->string (string->bytevector s "ascii" 'substitute)
"ascii")))))