 | Level: Introductory Jean T. Anderson, developerWorks Contributing Author, IBM
20 Dec 2001 Introduces Informix Global Language Support (GLS) and explains internationalization issues for UDRs. Includes C source code.
© 2001 International Business Machines Corporation.
All rights reserved. Introduction
An internationalized user-defined routine (UDR) supports different
language and cultural conventions without requiring the
source code to be modified and
recompiled.
The Informix® GLS Library is an API
that lets a UDR process
single and multibyte characters, and manage different
data formats for date, time and numeric values.
The GLS functions access locale-specific information
at runtime, so the client gets the correct behavior
without the UDR having to know the client locale.
This technical note highlights issues to consider when
implementing internationalized UDRs,
and points out the support you may need to additionally
provide for a specific language.
GLS Overview & Definitions
Global Language Support (GLS)
is the Informix feature that allows an Informix product,
a single application,
or UDR to support many languages.
This section summarizes the important terms.
GLS locale
Corresponds to a language, territory, and code set:
The language
specifies the kinds of characters that
can be used, how characters get sorted
and compared, alphabetic case conversion,
and messages.
The territory identifies
date, time, money, and numeric
formats.
The code set includes
single-byte and multibyte string
processing and code set conversions between
different representations of the same character
on client and server.
The default Informix locales are below:
| en_us.8859-1 | UNIX | 8859-1 is the ISO8859-1 code set. | | en_us.1252 | Windows | 1252 is the Microsoft code set. |
The ISO8859-1 code set has alternate forms.
0333 is the condensed form and
IBM additionally registered ISO8859-1 as
CCSID 819. So the following three locales
refer to the same code set:
en_us.8859-1 en_us.0333 en_us.819
Informix uses the following
environment variables to specify the locale:
| CLIENT_LOCALE | Locale that the client application uses. | | DB_LOCALE | Locale of the data in the database. | | SERVER_LOCALE | Locale that the database server uses for its
server-specific files. |
Internationalization
Means implementing products, such as DataBladeTM modules,
so that they are language-independent.
An internationalized UDR is one
that can support different languages, territories, and
code sets without changing or recompiling the source code.
The Informix server is internationalized.
The Informix GLS Library
provides functions so that DataBlade modules and
client applications can use information in GLS
locales to:
process single-byte and multibyte characters format date, time, and numeric data
Localization
Means resolving information for a specific locale at
runtime so that the internationalized product executes
properly for that locale; for example,
a localized application allows users to see error
messages in their own language and dates in the format
they expect.
Support for a specific locale is consolidated into
a set of files.
(Locale-specific DataBlade messages get stored in the
system catalogs.)
The UNIX glfiles command
outputs a list of locales that are supported on your
UNIX system.
For locales not included in the Informix distribution,
an International Language Supplement
provides locales, code set conversions, translated
user interfaces and documentation.
While the focus of this technical note is on UDR implementation,
DataBlade developers need to also consider internationalization
for any client component that is included as part of a DataBlade module.
Contents
GLS Products GLS Documentation
DataBlade developers need the following documentation:
Informix Guide to GLS Functionality
This is the core document for implementing GLS-enabled
products.
Informix GLS Programmer's Manual
This manual describes how to use the Informix GLS API calls.
GLS Documentation and Release Notes
Always check the current distribution for new information:
$INFORMIXDIR/release
Informix DataBlade API Programmer's Manual
Discusses GLS features for raising errors and outputting
trace messages.
DataBlade Developers Kit User's Guide
DBDK generates source code that uses GLS calls.
All documents mentioned above
are available for download from the
Informix Online Documentation web site.
What Informix products does a DataBlade developer need?
You need the following Informix products to implement a DataBlade module
that supports multiple languages:
Informix Dynamic Server
DataBlade Developers Kit
BladeSmith generates code that uses many GLS library functions,
including for exception handling and tracing.
Client Software Development Kit
The International Language Supplement
(ILS) for any locales you plan to support
that are not included in the default distribution
(check with the glfiles command),
so that you can test the behavior
of your blade in each locale.
You also need the documentation listed above in the
GLS Documentation section.
What Informix products do customers need?
Your customers need the following products so they can run your
DataBlade module in their locale:
Customers must also correctly set up their GLS environment by
setting the CLIENT_LOCALE and DB_LOCALE environment variables
appropriately.
Contents
What needs to be internationalized/localized?
The DataBlade module developer must address both
internationalization issues
and localization issues.
First, you need to implement a UDR so that it is
language-independent.
That is, the DataBlade module does not make any assumptions
about the language, territory, or code set that is used at
runtime on the data that the module handles.
Then you need to port any language-dependent pieces,
such as error messages, to each locale that
you want to support.
This section summarizes some of the issues.
Text I/O
The Informix GLS Library provides functions that scan and format
multibyte character strings, dates, numbers, and currency,
using information in the current processing locale.
Use these functions to parse input strings or
format output strings.
They are described in:
Both manuals
are available for download from the
Informix Online Documentation web site.
An example of where this might apply to a UDR is in
processing the external (character-based) format of an opaque type.
DBDK generates code for opaque type
input/output support functions that
converts between GLS text and the internal
binary representation of the opaque type.
The mi_get_string()
and mi_put_string() DataBlade API
functions automatically perform code set conversions
on string/text data.
(For binary transmission of numeric data,
the mi_put_* and
mi_get_* functions,
which are used for opaque type send/receive
support functions,
also automatically perform code set conversions.)
Error Messages
The syserrors system catalog table stores
DataBlade exception messages that can be raised by passing
the MI_SQL message type to mi_db_error_raise().
Each error message entry has a locale associated with it.
When an error is raised, the server tries to match
the CLIENT_LOCALE with an error in syserrors.
It starts with an explicit match,
and if it doesn't find one, tries partial matches, searching
in the following order:
Exact client locale
For example: es_es.8859-1
Language + territory of client locale
For example: es_es
Just the language of the client local
For example: es
Default (en_us) locale
en_us.8859-1 on UNIX, and
en_us.1252 on NT.
The way that the final default
en_us locale gets resolved
depends on whether the user set CLIENT_LOCALE or not:
If CLIENT_LOCALE is not set, the default locale
gets resolved based on a partial match of the
locale.
If en_us.1252 (NT)
errors are in the
syserrors table on a UNIX server,
the message will be found based on a partial
match of en_us.
If CLIENT_LOCALE is set, the default locale gets resolved
based on a complete match for that platform.
If en_us.1252 (NT)
errors are in the
syserrors table on a UNIX server,
the locale will be looked for based on a complete
match of en_us.8859-1, resulting
in a "Message cannot be found"
error.
The 9.2 release of the server will perform a partial
match on the en_us locale whether CLIENT_LOCALE is set or not.
In the meantime, realize that BladeSmith by default generates
messages with a locale of en_us.1252.
Change the locale to en_us.8859-1 if your target
is a UNIX system. If you plan to support NT and UNIX,
register the same error messages for both locales.
You can also translate DataBlade exception messages to
each specific language you intend to support, and store those
messages in the syserrors system catalog table.
(Incidentally, if you want to insert localized messages using a UDR,
see the section below on
"Inserting Localized Error
Messages into syserrors from a UDR".)
See the section on "Exception Raising" in the
Informix DataBlade API Programmer's Manual
and the example below.
Trace Messages
The systracemsgs system catalog
stores DataBlade trace messages that can be output
with gl_tprintf().
Be sure to provide en_us
default messages in addition
to translating any trace messages intended for end-users.
See the section on "DataBlade API Support for an Internationalized
UDR" in the
Informix DataBlade API Programmer's Manual.
Query Processing
Informix supports multibyte characters for the names of
database objects, such as tables and columns.
And, of course, tables can store values that contain multibyte
characters in NCHAR and NVARCHAR columns.
Review any UDR that executes SQL queries
with mi_exec() or
mi_exec_prepared_statement().
If the query could access user-defined
database objects or fetch user-supplied data,
the UDR should be prepared to handle non-ASCII characters
when it creates the SQL statement
and when it fetches query results.
Contents
Examples Error Messages
The RGBA DataBlade Module
implements an opaque type that manages computer color.
The input function for an RGBA converts the external
(textual) representation into the internal C data structure.
After the C structure has been populated, the DataBlade
code checks each element to see if it falls within 0-255.
If the user enters 256, an error gets raised that
looks like the following, depending on the user's locale:
| Locale | Error Message | |
en_us.1252
|
(URGB1) - RGBAInput: R value 256 is not within the valid range of 0-255.
| |
es_es.8859-1
|
(URGB1) - RGBAInput: el valor R 256 no calza entre los limites de 0-255.
|
This section describes what the source code needs to do,
what messages get stored in the syserrors
table, and how the database server decides which error to output.
The code below shows how the input function checks the red
component of the RGBA and raises an error.
if(Gen_RetVal->red < 0 || Gen_RetVal->red > 255)
mi_db_error_raise
(
Gen_Con, /* Connection handle */
MI_SQL, /* Message type */
"URGB1", /* SQLSTATE in syserrors */
"FUNCNAME%s", /* token takes string */
"RGBAInput", /* value for FUNCNAME */
"VALUE%d", /* token takes an integer */
(mi_integer) Gen_RetVal->red, /* value for VALUE token */
NULL /* No more tokens */
);
|
The MI_SQL message type in the
mi_db_error_raise() call indicates
that the error should be looked up in the syserrors
system catalog table
based on the SQLSTATE value in the third argument.
If we select from the syserrors system catalog,
we see that there are two different errors for "URGB1":
> select * from syserrors where sqlstate='URGB1';
sqlstate URGB1
locale en_us.1252
level 0
seqno 1
message %FUNCNAME%: R value %VALUE% is not within the valid range of 0-255.
sqlstate URGB1
locale es_es.8859-1
level 0
seqno 1
message %FUNCNAME%: el valor R %VALUE% no calza entre los limites de 0-255.
|
These exception messages were inserted
into the syserrors system catalog table
with the following
SQL insert statements:
insert into informix.syserrors (level, seqno, sqlstate, locale, message)
values
(
0,
1,
"URGB1",
"en_us.1252",
"%FUNCNAME%: R value %VALUE% is not within the valid range of 0-255."
);
insert into informix.syserrors (level, seqno, sqlstate, locale, message)
values
(
0,
1,
"URGB1",
"es_es.8859-1",
"%FUNCNAME%: el valor R %VALUE% no calza entre los limites de 0-255."
);
|
The RGBA project was generated with the default locale used by
BladeSmith: en_us.1252. We'll see in
a moment what affect that has when an error is encountered on
a UNIX machine.
The following example assumes that it is run on a Solaris machine,
which has a default locale of en_us.8859-1.
If CLIENT_LOCALE is not set, the server looks for a partial
match on the default
U.S. English locale and outputs that message:
bladerunner% echo $CLIENT_LOCALE
CLIENT_LOCALE: Undefined variable
bladerunner% dbaccess colors -
> create table colors (id serial, color rgba);
> insert into colors values (0, '255 0 255 0');
> insert into colors values (0, '256 0 255 0');
(URGB1) - RGBAInput: R value 256 is not within the valid range of 0-255.
|
If CLIENT_LOCALE is set, the server outputs the message for that locale,
if it exists:
bladerunner% setenv CLIENT_LOCALE es_es.8859-1
bladerunner% dbaccess colors -
> insert into colors values (0, '256 0 255 0');
(URGB1) - RGBAInput: el valor R 256 no calza entre los limites de 0-255.
|
First the server tries to match the entire locale.
If it does not find a match, it
tries to match language and territory. If it can't find that,
it tries to match just the language.
For example,
the Spanish message was entered into syserrors
with a locale of es_es.8859-1.
The message still gets resolved if the client has
a different code set:
bladerunner% setenv CLIENT_LOCALE es_es.CP1252
bladerunner% dbaccess colors -
> insert into colors values (0, '256 0 255 0');
(URGB1) - RGBAInput: el valor R 256 no calza entre los limites de 0-255.
|
Next we set CLIENT_LOCALE to a German locale, which does not
have an entry in syserrors.
If the server cannot find a message for the locale based on a partial
match, it looks for a complete match on the
en_us locale.
In this case, our default UNIX locale is
en_us.8859-1, but
the en_us error messages are for
en_us.1252,
so it outputs a "message cannot be found" error:
bladerunner% setenv CLIENT_LOCALE de_de.8859-1
bladerunner% dbaccess colors -
> insert into colors values (0, '256 0 255 0');
(URGB1) - Message cannot be found.
|
So, be sure to insert English messages in the default locale
for each platform you plan to run on:
en_us.8859-1 (UNIX) and en_us.1252 (NT).
Starting in 9.2, the server will complete a partial
match on the en_us locale, and it will no longer
be necessary to insert duplicate en_us errors.
Floating Point Values
ifx_gl_format_number() and
ifx_gl_convert_number()
convert between a text string and a floating point variable.
However, both functions store the float value as a decimal.
Since the range of a double precision value is greater than
that of a decimal,
these functions fail to convert very large float values.
Two new functions in 9.2,
ifx_gl_format_double() and
ifx_gl_convert_double(),
use a double
instead of a decimal.
Unfortunately, function prototypes for the new functions
were inadvertently omitted from
$INFORMIXDIR/incl/public/ifxgls.h,
so they are included below:
MI_EXT_DECL int
ifx_gl_convert_double (double *d,
char *dstr,
char *format);
MI_EXT_DECL int
ifx_gl_format_double ( char *dstr,
int len,
double d,
char *format);
|
The
GetDouble()
UDR below shows how to convert an mi_lvarchar UDR argument into
a double precision value.
If the integer flag passed to it is 0, it performs the conversion
using
ifx_gl_convert_number().
Otherwise, it performs the conversion using
ifx_gl_convert_double().
#include <ifxgls.h>
#include <mi.h>
mi_double_precision *
GetDouble (mi_lvarchar *Gen_param1, mi_integer flag, MI_FPARAM *fp)
{
mi_double_precision *retval=NULL;
gl_mchar_t *Gen_InData;
int status, glerror;
/* ifx_gl_* return code */
mi_decimal dec_number;
/* for ifx_gl_convert_number */
double d=0;
/* double result */
mi_string errbuf[80];
/* Allocate the return value. */
retval =
(mi_double_precision *)mi_alloc(sizeof(mi_double_precision));
if(retval == NULL)
{
mi_fp_setreturnisnull(fp, 0, MI_TRUE);
mi_db_error_raise(NULL, MI_EXCEPTION, "mi_alloc failed!");
return (mi_double_precision *)NULL;
}
/* Convert mi_lvarchar argument
into a NULL-terminated string. */
Gen_InData = (gl_mchar_t *)mi_lvarchar_to_string(Gen_param1);
/* Convert the null-terminated string to a double.
** If the return value is not 0, the conversion failed and
** ifx_gl_lc_errno() retrieves a more specific error code.
*/
if(flag == 0) /*
use ifx_gl_convert_number() */
{
/* ifx_gl_convert_number()
stores the result in a decimal
** variable, so it handles a smaller range than a double.
*/
status=ifx_gl_convert_number(&dec_number, Gen_InData, "%e");
if(status != 0)
glerror=ifx_gl_lc_errno();
else
dectodbl(&dec_number, (double *)&d);
/* convert to double */
}
else /* use ifx_gl_convert_double()
*/
{
/* ifx_gl_convert_double() stores
the result in a double
** variable.
*/
status=ifx_gl_convert_double(&d,
(char *)Gen_InData, (char *)"%e");
if(status != 0)
glerror=ifx_gl_lc_errno();
}
if(status != 0)
{
switch (glerror)
{
case IFX_GL_INVALIDFMT:
sprintf(errbuf,
"GetDouble: conversion failed [%d:IFX_GL_INVALIDFMT]",
status);
break;
case IFX_GL_PARAMERR:
sprintf(errbuf,
"GetDouble: conversion failed [%d:IFX_GL_PARAMERR]",
status);
break;
default:
sprintf(errbuf,
"GetDouble: conversion failed [%d:%d]!",
status,glerror);
break;
}
mi_fp_setreturnisnull(fp, 0, MI_TRUE);
mi_db_error_raise(NULL, MI_EXCEPTION, errbuf);
return (mi_double_precision *)NULL;
}
mi_free(Gen_InData); /*
mi_lvarchar_to_string() allocated val */
*retval = (mi_double_precision) d;
return retval;
}
|
If the integer argument passed to
GetDouble()
is 0,
then the underlying code uses
ifx_gl_convert_number().
The numbers in the two queries that follow
are small enough to be stored in a decimal:
execute function GetDouble("1234.5",0);
(expression)
1234.500000000
1 row(s) retrieved.
execute function GetDouble("9.875e-43",0);
(expression)
9.875e-43
1 row(s) retrieved.
|
However, the next number is too big for a decimal, so the
query fails:
execute function GetDouble("1.000000e+150",0);
(expression)
(U0001) - GetDouble: conversion failed [-1:IFX_GL_PARAMERR]
Error in line 24
Near character position 45
|
If the integer argument passed to
GetDouble()
is 1,
then the underlying code uses
ifx_gl_convert_double(),
which stores the result directly in a double.
All the queries succeed.
execute function GetDouble("1234.5",1);
(expression)
1234.500000000
1 row(s) retrieved.
execute function GetDouble("9.875e-43",1);
(expression)
9.875e-43
1 row(s) retrieved.
execute function GetDouble("1.000000e+150",1);
(expression)
1.000000e+150
1 row(s) retrieved.
|
Inserting Localized
Error Messages into syserrors from a UDR
Datablade developers may need to insert into the syserrors table localized
error messages which have a different locale than the current session
(for example, may want to install SJIS messages while DB_LOCALE for the current
session is set to EUC). The normal process of loading the messages via
an SQL script or from a UDR using
mi_exec()
is not reliable when the locale
of the messages is different from the session locale because characters
not recognized by the SQL parser will trigger an error condition.
It is possible to bypass this SQL parser problem by creating a UDR that
loads the error messages using a prepared statement
(mi_prepare())
containing
placeholders for the sqlstate and message data. The data is later supplied
in the call to execute the prepared statement
(mi_exec_prepared_statement()).
The following UDR code demonstrates how the sqlstate and message column
strings for a locale may be edited into a message array and then passed
as data to insert messages into syserrors. The specific locale demonstrated
is EUC for readability reasons, but the method can be used for any locale.
#include <stdio.h>
#include <string.h>
#include "mi.h"
#define MAX_MSG 3
char *enus_msg[MAX_MSG][2] = {
"XT010", "First error message for insertion",
"XT020", "Second error message for insertion",
"XT030", "Third error message for insertion"
};
/*
* Title: gls_insert_enus
* Purpose: Add localized messages to the system error table 'syserrors'
* for given locale independent of locale setting of session.
*/
mi_integer
gls_insert_enus()
{
MI_DATUM args[2]; /* pointers to column values */
mi_integer lens[2]; /* lengths of column values */
mi_integer nulls[2]; /* null capability of columns */
mi_string *types[2]; /* types of columns */
mi_integer i;
MI_STATEMENT *stmt;
MI_CONNECTION *conn = mi_open(NULL, NULL, NULL);
/*
* Prepare statement using placeholder values for sqlstate and message
* columns while providing fixed values for locale, level, seqno columns.
*/
stmt = mi_prepare(conn,
"insert into syserrors values(?,'en_us.8859-1',0,1,?)", NULL);
for (i=0; i<MAX_MSG; i++) /* Loop through the message array */
{
args[0] = (MI_DATUM)enus_msg[i][0]; /* Set pointer to sqlstate string */
lens[0] = strlen(args[0]); /* Set length of sqlstate string */
nulls[0] = MI_FALSE; /* Set null handling capability */
types[0] = "char(5)"; /* Set sqlstate column type */
args[1] = (MI_DATUM)enus_msg[i][1]; /* Set pointer to message string */
lens[1] = strlen(args[1]); /* Set length of message string */
nulls[1] = MI_FALSE; /* Set null handling capability */
types[1] = "varchar(255)"; /* Set message column type */
mi_exec_prepared_statement(stmt,0,0,2,args,lens,nulls,types,NULL,NULL);
}
mi_close(conn);
return 0;
}
|
Contents
Known Problems
94450: ifx_gl_format_number() outputs %g incorrectly if value is very large
If a string in exponential notation is very large, a blank gets inserted
into the output after the "e".
For example, this value:
.123456789012e80
Becomes this:
1.234567e 79
Contents
For more information about this topic The RGBA DataBlade module implements
the Spanish error messages used by this tech note.
DBDK generates trace messages that use the GL_DPRINTF GLS tracing macro.
The
Debugging Using the DBDK-Generated Tracing Features
IDN tech note describes how to use it.
The Gen_sscanf() C function generated by BladeSmith
shows how to use ifx_gl_convert_number() to convert
input into various numeric types.
The examples distributed with the DataBlade Developers Kit
includes a "Strings" DataBlade module that shows how
to manipulate character strings.
Contents
Glossary Terms and acronyms used by this tech note include:
GLS
Global Language Support.
Locale
Language, territory and code set
Localize
Make software work for a specific locale
Internationalize
Make software work for any locale
UDR
User-Defined Routine
Contents
Resources
About the author  | |  | Jean Anderson is an IBM contributing author. |
Rate this page
|  |