guile/doc/ref/api-i18n.texi

@c -*-texinfo-*-
@c This is part of the GNU Guile Reference Manual.
@c Copyright (C)  1996, 1997, 2000, 2001, 2002, 2003, 2004, 2006
@c   Free Software Foundation, Inc.
@c See the file guile.texi for copying conditions.

@page
@node Internationalization
@section Support for Internationalization

@cindex internationalization
@cindex i18n

Guile provides internationalization support for Scheme programs in two
ways.  First, procedures to manipulate text and data in a way that
conforms to particular cultural conventions (i.e., in a
``locale-dependent'' way) are provided in the @code{(ice-9 i18n)}.
Second, Guile allows the use of GNU @code{gettext} to translate
program message strings.

@menu
* The ice-9 i18n Module::       Honoring cultural conventions.
* Gettext Support::             Translating message strings.
@end menu


@node The ice-9 i18n Module
@subsection The @code{(ice-9 i18n)} Module

In order to make use of the following functions, one must import the
@code{(ice-9 i18n)} module in the usual way:

@example
(use-modules (ice-9 i18n))
@end example

@cindex libguile-i18n-v-@value{LIBGUILE_I18N_MAJOR}

C programs can use the C functions corresponding to the procedures of
this module by including @code{<libguile/i18n.h>} and by linking
against @code{libguile-i18n-v-@value{LIBGUILE_I18N_MAJOR}}.

@cindex cultural conventions

The @code{(ice-9 i18n)} module provides procedures to manipulate text
and other data in a way that conforms to the cultural conventions
chosen by the user.  Each region of the world or language has its own
customs to, for instance, represent real numbers, classify characters,
collate text, etc.  All these aspects comprise the so-called
``cultural conventions'' of that region or language.

@cindex locale
@cindex locale category

Computer systems typically refer to a set of cultural conventions as a
@dfn{locale}.  For each particular aspect that comprise those cultural
conventions, a @dfn{locale category} is defined.  For instance, the
way characters are classified is defined by the @code{LC_CTYPE}
category, while the language in which program messages are issued to
the user is defined by the @code{LC_MESSAGES} category
(@pxref{Locales, General Locale Information} for details).

@cindex locale object

The procedures provided by this module allow the development of
programs that adapt automatically to any locale setting.  As we will
see later, many of the locale-dependent procedures provided by this
module can optionally take a @dfn{locale object} argument.  This
additional argument defines the locale settings that must be followed
by the invoked procedure.  When it is omitted, then the current locale
settings of the process are followed (@pxref{Locales,
@code{setlocale}}).

The following procedures allow the manipulation of such locale
objects.

@deffn {Scheme Procedure} make-locale category-mask locale-name [base-locale]
@deffnx {C Function} scm_make_locale (category_mask, locale_name, base_locale)
Return a reference to a data structure representing a set of locale
datasets.  @var{locale-name} should be a string denoting a particular
locale, e.g., @code{"aa_DJ"}.  Unlike for the @var{category} parameter
for @code{setlocale}, the @var{category-mask} parameter here uses a
single bit for each category, made by OR'ing together @code{LC_*_MASK}
bits.  The optional @var{base-locale} argument can be used to specify
a locale object whose settings are to be used as a basis for the
locale object being returned.

The available locale category masks are the following:

@defvar LC_COLLATE_MASK
Represents the collation locale category.
@end defvar
@defvar LC_CTYPE_MASK
Represents the character classification locale category.
@end defvar
@defvar LC_MESSAGES_MASK
Represents the messages locale category.
@end defvar
@defvar LC_MONETARY_MASK
Represents the monetary locale category.
@end defvar
@defvar LC_NUMERIC_MASK
Represents the way numbers are displayed.
@end defvar
@defvar LC_TIME_MASK
Represents the way date and time are displayed
@end defvar

The following category masks are also available but will not have any
effect on systems that do not support them:

@defvar LC_PAPER_MASK
@defvarx LC_NAME_MASK
@defvarx LC_ADDRESS_MASK
@defvarx LC_TELEPHONE_MASK
@defvarx LC_MEASUREMENT_MASK
@defvarx LC_IDENTIFICATION_MASK
@end defvar

Finally, there is also:

@defvar LC_ALL_MASK
This represents all the locale categories supported by the system.
@end defvar

The @code{LC_*_MASK} variables are bound to integers which may be OR'd
together using @code{logior} (@pxref{Primitive Numerics,
@code{logior}}).  For instance, the following invocation creates a
locale object that combines the use of Esperanto for messages and
character classification with the default settings for the other
categories (i.e., the settings of the default @code{C} locale which
usually represents conventions in use in the USA):

@example
(make-locale (logior LC_MESSAGE_MASK LC_CTYPE_MASK) "eo_EO")
@end example

The following example combines the use of Swedish conventions with
monetary conventions from Croatia:

@example
(make-locale LC_MONETARY_MASK "hr_HR"
             (make-locale LC_ALL_MASK "sv_SE"))
@end example

A @code{system-error} exception (@pxref{Handling Errors}) is raised by
@code{make-locale} when @var{locale-name} does not match any of the
locales compiled on the system.  Note that on non-GNU systems, this
error may be raised later, when the locale object is actually used.

@end deffn

@deffn {Scheme Procedure} locale? obj
@deffnx {C Function} scm_locale_p (obj)
Return true if @var{obj} is a locale object.
@end deffn

The following procedures provide support for text collation.

@deffn {Scheme Procedure} string-locale<? s1 s2 [locale]
@deffnx {C Function} scm_string_locale_lt (s1, s2, locale)
Compare strings @var{s1} and @var{s2} in a locale-dependent way.  If
@var{locale} is provided, it should be locale object (as returned by
@code{make-locale}) and will be used to perform the comparison;
otherwise, the current system locale is used.
@end deffn

@deffn {Scheme Procedure} string-locale>? s1 s2 [locale]
@deffnx {C Function} scm_string_locale_gt (s1, s2, locale)
Compare strings @var{s1} and @var{s2} in a locale-dependent way.  If
@var{locale} is provided, it should be locale object (as returned by
@code{make-locale}) and will be used to perform the comparison;
otherwise, the current system locale is used.
@end deffn

@deffn {Scheme Procedure} string-locale-ci<? s1 s2 [locale]
@deffnx {C Function} scm_string_locale_ci_lt (s1, s2, locale)
Compare strings @var{s1} and @var{s2} in a case-insensitive, and
locale-dependent way.  If @var{locale} is provided, it should be
locale object (as returned by @code{make-locale}) and will be used to
perform the comparison; otherwise, the current system locale is used.
@end deffn

@deffn {Scheme Procedure} string-locale-ci>? s1 s2 [locale]
@deffnx {C Function} scm_string_locale_ci_gt (s1, s2, locale)
Compare strings @var{s1} and @var{s2} in a case-insensitive, and
locale-dependent way.  If @var{locale} is provided, it should be
locale object (as returned by @code{make-locale}) and will be used to
perform the comparison; otherwise, the current system locale is used.
@end deffn

@deffn {Scheme Procedure} string-locale-ci=? s1 s2 [locale]
@deffnx {C Function} scm_string_locale_ci_eq (s1, s2, locale)
Compare strings @var{s1} and @var{s2} in a case-insensitive, and
locale-dependent way.  If @var{locale} is provided, it should be
locale object (as returned by @code{make-locale}) and will be used to
perform the comparison; otherwise, the current system locale is used.
@end deffn

@deffn {Scheme Procedure} char-locale<? c1 c2 [locale]
@deffnx {C Function} scm_char_locale_lt (c1, c2, locale)
Return true if character @var{c1} is lower than @var{c2} according to
@var{locale} or to the current locale.
@end deffn

@deffn {Scheme Procedure} char-locale>? c1 c2 [locale]
@deffnx {C Function} scm_char_locale_gt (c1, c2, locale)
Return true if character @var{c1} is greater than @var{c2} according
to @var{locale} or to the current locale.
@end deffn

@deffn {Scheme Procedure} char-locale-ci<? c1 c2 [locale]
@deffnx {C Function} scm_char_locale_ci_lt (c1, c2, locale)
Return true if character @var{c1} is lower than @var{c2}, in a case
insensitive way according to @var{locale} or to the current locale.
@end deffn

@deffn {Scheme Procedure} char-locale-ci>? c1 c2 [locale]
@deffnx {C Function} scm_char_locale_ci_gt (c1, c2, locale)
Return true if character @var{c1} is greater than @var{c2}, in a case
insensitive way according to @var{locale} or to the current locale.
@end deffn

@deffn {Scheme Procedure} char-locale-ci=? c1 c2 [locale]
@deffnx {C Function} scm_char_locale_ci_eq (c1, c2, locale)
Return true if character @var{c1} is equal to @var{c2}, in a case
insensitive way according to @var{locale} or to the current locale.
@end deffn

The procedures below provide support for ``character case mapping'',
i.e., to convert characters or strings to their upper-case or
lower-case equivalent.  Note that SRFI-13 provides procedures that
look similar (@pxref{Alphabetic Case Mapping}).  However, the SRFI-13
procedures are locale-independent.  Therefore, they do not take into
account specificities of the customs in use in a particular language
or region of the world.  For instance, while most languages using the
Latin alphabet map lower-case letter ``i'' to upper-case letter ``I'',
Turkish maps lower-case ``i'' to ``Latin capital letter I with dot
above''.  The following procedures allow to provide idiomatic
character mapping.

@deffn {Scheme Procedure} char-locale-downcase chr [locale]
@deffnx {C Function} scm_char_locale_upcase (chr, locale)
Return the lowercase character that corresponds to @var{chr} according
to either @var{locale} or the current locale.
@end deffn

@deffn {Scheme Procedure} char-locale-upcase chr [locale]
@deffnx {C Function} scm_char_locale_downcase (chr, locale)
Return the uppercase character that corresponds to @var{chr} according
to either @var{locale} or the current locale.
@end deffn

@deffn {Scheme Procedure} string-locale-upcase str [locale]
@deffnx {C Function} scm_string_locale_upcase (str, locale)
Return a new string that is the uppercase version of @var{str}
according to either @var{locale} or the current locale.
@end deffn

@deffn {Scheme Procedure} string-locale-downcase str [locale]
@deffnx {C Function} scm_string_locale_downcase (str, locale)
Return a new string that is the down-case version of @var{str}
according to either @var{locale} or the current locale.
@end deffn

Finally, the following procedures allow programs to read numbers
written according to a particular locale.  As an example, in English,
``ten thousand and a half'' is usually written @code{10,000.5} while
in French it is written @code{10000,5}.  These procedures allow to
account for these differences.

@deffn {Scheme Procedure} locale-string->integer str [base [locale]]
@deffnx {C Function} scm_locale_string_to_integer (str, base, locale)
Convert string @var{str} into an integer according to either
@var{locale} (a locale object as returned by @code{make-locale}) or
the current process locale.  If @var{base} is specified, then it
determines the base of the integer being read (e.g., @code{16} for an
hexadecimal number, @code{10} for a decimal number); by default,
decimal numbers are read.  Return two values: an integer (on success)
or @code{#f}, and the number of characters read from @var{str}
(@code{0} on failure).
@end deffn

@deffn {Scheme Procedure} locale-string->inexact str [locale]
@deffnx {C Function} scm_locale_string_to_inexact (str, locale)
Convert string @var{str} into an inexact number according to either
@var{locale} (a locale object as returned by @code{make-locale}) or
the current process locale.  Return two values: an inexact number (on
success) or @code{#f}, and the number of characters read from
@var{str} (@code{0} on failure).
@end deffn


@node Gettext Support
@subsection Gettext Support

Guile provides an interface to GNU @code{gettext} for translating
message strings (@pxref{Introduction,,, gettext, GNU @code{gettext}
utilities}).

Messages are collected in domains, so different libraries and programs
maintain different message catalogues.  The @var{domain} parameter in
the functions below is a string (it becomes part of the message
catalog filename).

When @code{gettext} is not available, or if Guile was configured
@samp{--without-nls}, dummy functions doing no translation are
provided.  When @code{gettext} support is available in Guile, the
@code{i18n} feature is provided (@pxref{Feature Tracking}).

@deffn {Scheme Procedure} gettext msg [domain [category]]
@deffnx {C Function} scm_gettext (msg, domain, category)
Return the translation of @var{msg} in @var{domain}.  @var{domain} is
optional and defaults to the domain set through @code{textdomain}
below.  @var{category} is optional and defaults to @code{LC_MESSAGES}
(@pxref{Locales}).

Normal usage is for @var{msg} to be a literal string.
@command{xgettext} can extract those from the source to form a message
catalogue ready for translators (@pxref{xgettext Invocation,, Invoking
the @command{xgettext} Program, gettext, GNU @code{gettext}
utilities}).

@example
(display (gettext "You are in a maze of twisty passages."))
@end example

@code{_} is a commonly used shorthand, an application can make that an
alias for @code{gettext}.  Or a library can make a definition that
uses its specific @var{domain} (so an application can change the
default without affecting the library).

@example
(define (_ msg) (gettext msg "mylibrary"))
(display (_ "File not found."))
@end example

@code{_} is also a good place to perhaps strip disambiguating extra
text from the message string, as for instance in @ref{GUI program
problems,, How to use @code{gettext} in GUI programs, gettext, GNU
@code{gettext} utilities}.
@end deffn

@deffn {Scheme Procedure} ngettext msg msgplural n [domain [category]]
@deffnx {C Function} scm_ngettext (msg, msgplural, n, domain, category)
Return the translation of @var{msg}/@var{msgplural} in @var{domain},
with a plural form chosen appropriately for the number @var{n}.
@var{domain} is optional and defaults to the domain set through
@code{textdomain} below.  @var{category} is optional and defaults to
@code{LC_MESSAGES} (@pxref{Locales}).

@var{msg} is the singular form, and @var{msgplural} the plural.  When
no translation is available, @var{msg} is used if @math{@var{n} = 1},
or @var{msgplural} otherwise.  When translated, the message catalogue
can have a different rule, and can have more than two possible forms.

As per @code{gettext} above, normal usage is for @var{msg} and
@var{msgplural} to be literal strings, since @command{xgettext} can
extract them from the source to build a message catalogue.  For
example,

@example
(define (done n)
  (format #t (ngettext "~a file processed\n"
                       "~a files processed\n" n)
             n))

(done 1) @print{} 1 file processed
(done 3) @print{} 3 files processed
@end example

It's important to use @code{ngettext} rather than plain @code{gettext}
for plurals, since the rules for singular and plural forms in English
are not the same in other languages.  Only @code{ngettext} will allow
translators to give correct forms (@pxref{Plural forms,, Additional
functions for plural forms, gettext, GNU @code{gettext} utilities}).
@end deffn

@deffn {Scheme Procedure} textdomain [domain]
@deffnx {C Function} scm_textdomain (domain)
Get or set the default gettext domain.  When called with no parameter
the current domain is returned.  When called with a parameter,
@var{domain} is set as the current domain, and that new value
returned.  For example,

@example
(textdomain "myprog")
@result{} "myprog"
@end example
@end deffn

@deffn {Scheme Procedure} bindtextdomain domain [directory]
@deffnx {C Function} scm_bindtextdomain (domain, directory)
Get or set the directory under which to find message files for
@var{domain}.  When called without a @var{directory} the current
setting is returned.  When called with a @var{directory},
@var{directory} is set for @var{domain} and that new setting returned.
For example,

@example
(bindtextdomain "myprog" "/my/tree/share/locale")
@result{} "/my/tree/share/locale"
@end example

When using Autoconf/Automake, an application should arrange for the
configured @code{localedir} to get into the program (by substituting,
or by generating a config file) and set that for its domain.  This
ensures the catalogue can be found even when installed in a
non-standard location.
@end deffn

@deffn {Scheme Procedure} bind-textdomain-codeset domain [encoding]
@deffnx {C Function} scm_bind_textdomain_codeset (domain, encoding)
Get or set the text encoding to be used by @code{gettext} for messages
from @var{domain}.  @var{encoding} is a string, the name of a coding
system, for instance @nicode{"8859_1"}.  (On a Unix/POSIX system the
@command{iconv} program can list all available encodings.)

When called without an @var{encoding} the current setting is returned,
or @code{#f} if none yet set.  When called with an @var{encoding}, it
is set for @var{domain} and that new setting returned.  For example,

@example
(bind-textdomain-codeset "myprog")
@result{} #f
(bind-textdomain-codeset "myprog" "latin-9")
@result{} "latin-9"
@end example

The encoding requested can be different from the translated data file,
messages will be recoded as necessary.  But note that when there is no
translation, @code{gettext} returns its @var{msg} unchanged, ie.@:
without any recoding.  For that reason source message strings are best
as plain ASCII.

Currently Guile has no understanding of multi-byte characters, and
string functions won't recognise character boundaries in multi-byte
strings.  An application will at least be able to pass such strings
through to some output though.  Perhaps this will change in the
future.
@end deffn

@c Local Variables:
@c TeX-master: "guile.texi"
@c ispell-local-dictionary: "american"
@c End: