Skip to main content

skip to main content

developerWorks  >  Information Management  >

Internationalizing UDRs

developerWorks
Document options

Document options requiring JavaScript are not displayed


Learn and share!

Exchange know-how with your peers -- try our new Pass It Along beta app


Rate this page

Help us improve this content


Level: Introductory

Jean T. Anderson, developerWorks Contributing Author, IBM

20 Dec 2001

Introduces Informix Global Language Support (GLS) and explains internationalization issues for UDRs. Includes C source code.

© 2001 International Business Machines Corporation. All rights reserved.

Introduction

An internationalized user-defined routine (UDR) supports different language and cultural conventions without requiring the source code to be modified and recompiled.

The Informix® GLS Library is an API that lets a UDR process single and multibyte characters, and manage different data formats for date, time and numeric values. The GLS functions access locale-specific information at runtime, so the client gets the correct behavior without the UDR having to know the client locale.

This technical note highlights issues to consider when implementing internationalized UDRs, and points out the support you may need to additionally provide for a specific language.



Back to top


GLS Overview & Definitions

Global Language Support (GLS) is the Informix feature that allows an Informix product, a single application, or UDR to support many languages. This section summarizes the important terms.

GLS locale

Corresponds to a language, territory, and code set:

  • The language specifies the kinds of characters that can be used, how characters get sorted and compared, alphabetic case conversion, and messages.

  • The territory identifies date, time, money, and numeric formats.

  • The code set includes single-byte and multibyte string processing and code set conversions between different representations of the same character on client and server.

The default Informix locales are below:

en_us.8859-1 UNIX 8859-1 is the ISO8859-1 code set.
en_us.1252 Windows 1252 is the Microsoft code set.

The ISO8859-1 code set has alternate forms. 0333 is the condensed form and IBM additionally registered ISO8859-1 as CCSID 819. So the following three locales refer to the same code set:

  • en_us.8859-1

  • en_us.0333

  • en_us.819

Informix uses the following environment variables to specify the locale:

CLIENT_LOCALE Locale that the client application uses.
DB_LOCALE Locale of the data in the database.
SERVER_LOCALE Locale that the database server uses for its server-specific files.

Internationalization

Means implementing products, such as DataBladeTM modules, so that they are language-independent. An internationalized UDR is one that can support different languages, territories, and code sets without changing or recompiling the source code.

The Informix server is internationalized. The Informix GLS Library provides functions so that DataBlade modules and client applications can use information in GLS locales to:

  • process single-byte and multibyte characters

  • format date, time, and numeric data

Localization

Means resolving information for a specific locale at runtime so that the internationalized product executes properly for that locale; for example, a localized application allows users to see error messages in their own language and dates in the format they expect.

Support for a specific locale is consolidated into a set of files. (Locale-specific DataBlade messages get stored in the system catalogs.) The UNIX glfiles command outputs a list of locales that are supported on your UNIX system. For locales not included in the Informix distribution, an International Language Supplement provides locales, code set conversions, translated user interfaces and documentation.

While the focus of this technical note is on UDR implementation, DataBlade developers need to also consider internationalization for any client component that is included as part of a DataBlade module.

Contents



Back to top


GLS Products

GLS Documentation

DataBlade developers need the following documentation:

  • Informix Guide to GLS Functionality
    This is the core document for implementing GLS-enabled products.

  • Informix GLS Programmer's Manual
    This manual describes how to use the Informix GLS API calls.

  • GLS Documentation and Release Notes
    Always check the current distribution for new information:

    $INFORMIXDIR/release

  • Informix DataBlade API Programmer's Manual
    Discusses GLS features for raising errors and outputting trace messages.

  • DataBlade Developers Kit User's Guide
    DBDK generates source code that uses GLS calls.

All documents mentioned above are available for download from the Informix Online Documentation web site.

What Informix products does a DataBlade developer need?

You need the following Informix products to implement a DataBlade module that supports multiple languages:

  • Informix Dynamic Server

  • DataBlade Developers Kit
    BladeSmith generates code that uses many GLS library functions, including for exception handling and tracing.

  • Client Software Development Kit

  • The International Language Supplement (ILS) for any locales you plan to support that are not included in the default distribution (check with the glfiles command), so that you can test the behavior of your blade in each locale.

You also need the documentation listed above in the GLS Documentation section.

What Informix products do customers need?

Your customers need the following products so they can run your DataBlade module in their locale:

  • Informix Dynamic Server

  • your DataBlade module

  • The International Language Supplement (ILS) for the customer's locale if it is not already included in the distribution.

Customers must also correctly set up their GLS environment by setting the CLIENT_LOCALE and DB_LOCALE environment variables appropriately.

Contents



Back to top


What needs to be internationalized/localized?

The DataBlade module developer must address both internationalization issues and localization issues. First, you need to implement a UDR so that it is language-independent. That is, the DataBlade module does not make any assumptions about the language, territory, or code set that is used at runtime on the data that the module handles. Then you need to port any language-dependent pieces, such as error messages, to each locale that you want to support. This section summarizes some of the issues.

Text I/O

The Informix GLS Library provides functions that scan and format multibyte character strings, dates, numbers, and currency, using information in the current processing locale. Use these functions to parse input strings or format output strings. They are described in:

  • Informix Guide to GLS Functionality

  • Informix GLS Programmer's Manual

Both manuals are available for download from the Informix Online Documentation web site.

An example of where this might apply to a UDR is in processing the external (character-based) format of an opaque type.

  • DBDK generates code for opaque type input/output support functions that converts between GLS text and the internal binary representation of the opaque type.

  • The mi_get_string() and mi_put_string() DataBlade API functions automatically perform code set conversions on string/text data.

    (For binary transmission of numeric data, the mi_put_* and mi_get_* functions, which are used for opaque type send/receive support functions, also automatically perform code set conversions.)

Error Messages

The syserrors system catalog table stores DataBlade exception messages that can be raised by passing the MI_SQL message type to mi_db_error_raise(). Each error message entry has a locale associated with it. When an error is raised, the server tries to match the CLIENT_LOCALE with an error in syserrors. It starts with an explicit match, and if it doesn't find one, tries partial matches, searching in the following order:

  1. Exact client locale
    For example: es_es.8859-1

  2. Language + territory of client locale
    For example: es_es

  3. Just the language of the client local
    For example: es

  4. Default (en_us) locale
    en_us.8859-1 on UNIX, and en_us.1252 on NT.

The way that the final default en_us locale gets resolved depends on whether the user set CLIENT_LOCALE or not:

  • If CLIENT_LOCALE is not set, the default locale gets resolved based on a partial match of the locale.
    If en_us.1252 (NT) errors are in the syserrors table on a UNIX server, the message will be found based on a partial match of en_us.

  • If CLIENT_LOCALE is set, the default locale gets resolved based on a complete match for that platform.
    If en_us.1252 (NT) errors are in the syserrors table on a UNIX server, the locale will be looked for based on a complete match of en_us.8859-1, resulting in a "Message cannot be found" error.

The 9.2 release of the server will perform a partial match on the en_us locale whether CLIENT_LOCALE is set or not. In the meantime, realize that BladeSmith by default generates messages with a locale of en_us.1252. Change the locale to en_us.8859-1 if your target is a UNIX system. If you plan to support NT and UNIX, register the same error messages for both locales.

You can also translate DataBlade exception messages to each specific language you intend to support, and store those messages in the syserrors system catalog table. (Incidentally, if you want to insert localized messages using a UDR, see the section below on "Inserting Localized Error Messages into syserrors from a UDR".)

See the section on "Exception Raising" in the Informix DataBlade API Programmer's Manual and the example below.

Trace Messages

The systracemsgs system catalog stores DataBlade trace messages that can be output with gl_tprintf(). Be sure to provide en_us default messages in addition to translating any trace messages intended for end-users.

See the section on "DataBlade API Support for an Internationalized UDR" in the Informix DataBlade API Programmer's Manual.

Query Processing

Informix supports multibyte characters for the names of database objects, such as tables and columns. And, of course, tables can store values that contain multibyte characters in NCHAR and NVARCHAR columns.

Review any UDR that executes SQL queries with mi_exec() or mi_exec_prepared_statement(). If the query could access user-defined database objects or fetch user-supplied data, the UDR should be prepared to handle non-ASCII characters when it creates the SQL statement and when it fetches query results.

Contents



Back to top


Examples

Error Messages

The RGBA DataBlade Module implements an opaque type that manages computer color.

The input function for an RGBA converts the external (textual) representation into the internal C data structure. After the C structure has been populated, the DataBlade code checks each element to see if it falls within 0-255. If the user enters 256, an error gets raised that looks like the following, depending on the user's locale:

Locale Error Message
en_us.1252 (URGB1) - RGBAInput: R value 256 is not within the valid range of 0-255.
es_es.8859-1 (URGB1) - RGBAInput: el valor R 256 no calza entre los limites de 0-255.

This section describes what the source code needs to do, what messages get stored in the syserrors table, and how the database server decides which error to output.

The code below shows how the input function checks the red component of the RGBA and raises an error.

 
if(Gen_RetVal->red < 0 || Gen_RetVal->red > 255) 
   mi_db_error_raise 
   ( 
      Gen_Con,                      /* Connection handle      */
      MI_SQL,                       /* Message type           */
      "URGB1",                      /* SQLSTATE in syserrors  */
      "FUNCNAME%s",                 /* token takes string     */
      "RGBAInput",                  /* value for FUNCNAME     */
      "VALUE%d",                    /* token takes an integer */
      (mi_integer) Gen_RetVal->red, /* value for VALUE token  */
      NULL                          /* No more tokens         */
   ); 

The MI_SQL message type in the mi_db_error_raise() call indicates that the error should be looked up in the syserrors system catalog table based on the SQLSTATE value in the third argument. If we select from the syserrors system catalog, we see that there are two different errors for "URGB1":

 
> select * from syserrors where sqlstate='URGB1'; 
 
sqlstate URGB1 
locale   en_us.1252 
level    0 
seqno    1 
message  %FUNCNAME%: R value %VALUE% is not within the valid range of 0-255. 
 
sqlstate URGB1 
locale   es_es.8859-1 
level    0 
seqno    1 
message  %FUNCNAME%: el valor R %VALUE% no calza entre los limites de 0-255. 

These exception messages were inserted into the syserrors system catalog table with the following SQL insert statements:

 
insert into informix.syserrors (level, seqno, sqlstate, locale, message) 
values 
( 
  0, 
  1, 
  "URGB1", 
  "en_us.1252", 
  "%FUNCNAME%: R value %VALUE% is not within the valid range of 0-255." 
); 
 
insert into informix.syserrors (level, seqno, sqlstate, locale, message) 
values 
( 
  0, 
  1, 
  "URGB1", 
  "es_es.8859-1", 
  "%FUNCNAME%: el valor R %VALUE% no calza entre los limites de 0-255." 
); 

The RGBA project was generated with the default locale used by BladeSmith: en_us.1252. We'll see in a moment what affect that has when an error is encountered on a UNIX machine.

The following example assumes that it is run on a Solaris machine, which has a default locale of en_us.8859-1. If CLIENT_LOCALE is not set, the server looks for a partial match on the default U.S. English locale and outputs that message:

 
bladerunner% echo $CLIENT_LOCALE 
CLIENT_LOCALE: Undefined variable 
 
bladerunner% dbaccess colors - 
 
> create table colors (id serial, color rgba); 
 
> insert into colors values (0, '255 0 255 0'); 
 
> insert into colors values (0, '256 0 255 0'); 
 
(URGB1) - RGBAInput: R value 256 is not within the valid range of 0-255. 

If CLIENT_LOCALE is set, the server outputs the message for that locale, if it exists:

 
bladerunner% setenv CLIENT_LOCALE es_es.8859-1 
 
bladerunner% dbaccess colors - 
 
> insert into colors values (0, '256 0 255 0'); 
 
(URGB1) - RGBAInput: el valor R 256 no calza entre los limites de 0-255. 

First the server tries to match the entire locale. If it does not find a match, it tries to match language and territory. If it can't find that, it tries to match just the language. For example, the Spanish message was entered into syserrors with a locale of es_es.8859-1. The message still gets resolved if the client has a different code set:

 
bladerunner% setenv CLIENT_LOCALE es_es.CP1252 
bladerunner% dbaccess colors - 
 
> insert into colors values (0, '256 0 255 0'); 
 
(URGB1) - RGBAInput: el valor R 256 no calza entre los limites de 0-255. 

Next we set CLIENT_LOCALE to a German locale, which does not have an entry in syserrors. If the server cannot find a message for the locale based on a partial match, it looks for a complete match on the en_us locale. In this case, our default UNIX locale is en_us.8859-1, but the en_us error messages are for en_us.1252, so it outputs a "message cannot be found" error:

 
bladerunner% setenv CLIENT_LOCALE de_de.8859-1 
 
bladerunner% dbaccess colors - 
 
> insert into colors values (0, '256 0 255 0'); 
 
(URGB1) - Message cannot be found. 

So, be sure to insert English messages in the default locale for each platform you plan to run on: en_us.8859-1 (UNIX) and en_us.1252 (NT).

Starting in 9.2, the server will complete a partial match on the en_us locale, and it will no longer be necessary to insert duplicate en_us errors.

Floating Point Values

ifx_gl_format_number() and ifx_gl_convert_number() convert between a text string and a floating point variable. However, both functions store the float value as a decimal. Since the range of a double precision value is greater than that of a decimal, these functions fail to convert very large float values.

Two new functions in 9.2, ifx_gl_format_double() and ifx_gl_convert_double(), use a double instead of a decimal. Unfortunately, function prototypes for the new functions were inadvertently omitted from $INFORMIXDIR/incl/public/ifxgls.h, so they are included below:

 
MI_EXT_DECL int 
ifx_gl_convert_double (double *d, 
                       char   *dstr, 
                       char   *format); 
 
MI_EXT_DECL int 
ifx_gl_format_double ( char   *dstr, 
                       int    len, 
                       double d, 
                       char   *format); 

The GetDouble() UDR below shows how to convert an mi_lvarchar UDR argument into a double precision value. If the integer flag passed to it is 0, it performs the conversion using ifx_gl_convert_number(). Otherwise, it performs the conversion using ifx_gl_convert_double().

 
#include <ifxgls.h> 
#include <mi.h> 
 
mi_double_precision * 
GetDouble (mi_lvarchar *Gen_param1, mi_integer flag, MI_FPARAM *fp) 
{ 
   mi_double_precision *retval=NULL; 
   gl_mchar_t          *Gen_InData; 
   int                 status, glerror; 
   /* ifx_gl_* return code */
   mi_decimal          dec_number; 
   /* for ifx_gl_convert_number */
   double              d=0; 
   /* double result */ 
   mi_string           errbuf[80]; 
 
   /* Allocate the return value. */
   retval = 
      (mi_double_precision *)mi_alloc(sizeof(mi_double_precision)); 
   if(retval == NULL) 
   { 
      mi_fp_setreturnisnull(fp, 0, MI_TRUE); 
      mi_db_error_raise(NULL, MI_EXCEPTION, "mi_alloc failed!"); 
      return (mi_double_precision *)NULL; 
   } 
 
   /* Convert mi_lvarchar argument 
   into a NULL-terminated string. */
   Gen_InData = (gl_mchar_t *)mi_lvarchar_to_string(Gen_param1); 
 
   /* Convert the null-terminated string to a double. 
   ** If the return value is not 0, the conversion failed and 
   ** ifx_gl_lc_errno() retrieves a more specific error code. 
   */
 
   if(flag == 0) /* 
   use ifx_gl_convert_number() */
   { 
      /* ifx_gl_convert_number() 
      stores the result in a decimal 
      ** variable, so it handles a smaller range than a double. 
      */
      status=ifx_gl_convert_number(&dec_number, Gen_InData, "%e"); 
      if(status != 0) 
         glerror=ifx_gl_lc_errno(); 
      else 
         dectodbl(&dec_number, (double *)&d); 
         /* convert to double */
   } 
   else /* use ifx_gl_convert_double() 
   */
   { 
      /* ifx_gl_convert_double() stores 
      the result in a double 
      ** variable. 
      */
      status=ifx_gl_convert_double(&d, 
         (char *)Gen_InData, (char *)"%e"); 
      if(status != 0) 
         glerror=ifx_gl_lc_errno(); 
   } 
 
   if(status != 0) 
   { 
      switch (glerror) 
      { 
         case IFX_GL_INVALIDFMT: 
            sprintf(errbuf, 
               "GetDouble: conversion failed [%d:IFX_GL_INVALIDFMT]", 
               status); 
            break; 
         case IFX_GL_PARAMERR: 
            sprintf(errbuf, 
               "GetDouble: conversion failed [%d:IFX_GL_PARAMERR]", 
               status); 
            break; 
         default: 
            sprintf(errbuf, 
               "GetDouble: conversion failed [%d:%d]!", 
               status,glerror); 
            break; 
      } 
 
      mi_fp_setreturnisnull(fp, 0, MI_TRUE); 
      mi_db_error_raise(NULL, MI_EXCEPTION, errbuf); 
      return (mi_double_precision *)NULL; 
   } 
 
   mi_free(Gen_InData); /* 
   mi_lvarchar_to_string() allocated val */
 
   *retval = (mi_double_precision) d; 
   return retval; 
} 

If the integer argument passed to GetDouble() is 0, then the underlying code uses ifx_gl_convert_number(). The numbers in the two queries that follow are small enough to be stored in a decimal:

 
execute function GetDouble("1234.5",0); 
 
  (expression) 
 
1234.500000000 
 
1 row(s) retrieved. 
 
execute function GetDouble("9.875e-43",0); 
 
  (expression) 
 
     9.875e-43 
 
1 row(s) retrieved. 

However, the next number is too big for a decimal, so the query fails:

 
execute function GetDouble("1.000000e+150",0); 
 
  (expression) 
 
(U0001) - GetDouble: conversion failed [-1:IFX_GL_PARAMERR] 
Error in line 24 
Near character position 45 

If the integer argument passed to GetDouble() is 1, then the underlying code uses ifx_gl_convert_double(), which stores the result directly in a double. All the queries succeed.

 
execute function GetDouble("1234.5",1); 
 
  (expression) 
 
1234.500000000 
 
1 row(s) retrieved. 
 
execute function GetDouble("9.875e-43",1); 
 
  (expression) 
 
     9.875e-43 
 
1 row(s) retrieved. 
 
execute function GetDouble("1.000000e+150",1); 
 
  (expression) 
 
 1.000000e+150 
 
1 row(s) retrieved. 

Inserting Localized Error Messages into syserrors from a UDR

Datablade developers may need to insert into the syserrors table localized error messages which have a different locale than the current session (for example, may want to install SJIS messages while DB_LOCALE for the current session is set to EUC). The normal process of loading the messages via an SQL script or from a UDR using mi_exec() is not reliable when the locale of the messages is different from the session locale because characters not recognized by the SQL parser will trigger an error condition.

It is possible to bypass this SQL parser problem by creating a UDR that loads the error messages using a prepared statement (mi_prepare()) containing placeholders for the sqlstate and message data. The data is later supplied in the call to execute the prepared statement (mi_exec_prepared_statement()).

The following UDR code demonstrates how the sqlstate and message column strings for a locale may be edited into a message array and then passed as data to insert messages into syserrors. The specific locale demonstrated is EUC for readability reasons, but the method can be used for any locale.

 
#include <stdio.h> 
#include <string.h> 
#include "mi.h" 
 
#define MAX_MSG 3 
char *enus_msg[MAX_MSG][2] =  { 
    "XT010", "First error message for insertion", 
    "XT020", "Second error message for insertion", 
    "XT030", "Third error message for insertion" 
    }; 
 
/* 
 * Title: gls_insert_enus 
 * Purpose: Add localized messages to the system error table 'syserrors' 
 *          for given locale independent of locale setting of session. 
 */
mi_integer 
gls_insert_enus() 
{ 
   MI_DATUM       args[2];                 /* pointers to column values */
   mi_integer     lens[2];                 /* lengths of column values */
   mi_integer     nulls[2];                /* null capability of columns */
   mi_string     *types[2];                /* types of columns */
   mi_integer     i; 
   MI_STATEMENT  *stmt; 
   MI_CONNECTION *conn = mi_open(NULL, NULL, NULL); 
 
   /* 
    * Prepare statement using placeholder values for sqlstate and message 
    * columns while providing fixed values for locale, level, seqno columns. 
    */
   stmt = mi_prepare(conn, 
         "insert into syserrors values(?,'en_us.8859-1',0,1,?)", NULL); 
 
   for (i=0; i<MAX_MSG; i++)               /* Loop through the message array */
   { 
       args[0] = (MI_DATUM)enus_msg[i][0]; /* Set pointer to sqlstate string */
       lens[0] = strlen(args[0]);          /* Set length of sqlstate string */
       nulls[0] = MI_FALSE;                /* Set null handling capability */
       types[0] = "char(5)";               /* Set sqlstate column type */
 
       args[1] = (MI_DATUM)enus_msg[i][1]; /* Set pointer to message string */
       lens[1] = strlen(args[1]);          /* Set length of message string */
       nulls[1] = MI_FALSE;                /* Set null handling capability */
       types[1] = "varchar(255)";          /* Set message column type */
 
       mi_exec_prepared_statement(stmt,0,0,2,args,lens,nulls,types,NULL,NULL); 
   } 
 
   mi_close(conn); 
   return 0; 
} 

Contents



Back to top


Known Problems

94450: ifx_gl_format_number() outputs %g incorrectly if value is very large

If a string in exponential notation is very large, a blank gets inserted into the output after the "e". For example, this value:

.123456789012e80

Becomes this:

1.234567e 79

Contents



Back to top


For more information about this topic

  • The RGBA DataBlade module implements the Spanish error messages used by this tech note.

  • DBDK generates trace messages that use the GL_DPRINTF GLS tracing macro. The Debugging Using the DBDK-Generated Tracing Features IDN tech note describes how to use it.

  • The Gen_sscanf() C function generated by BladeSmith shows how to use ifx_gl_convert_number() to convert input into various numeric types.

  • The examples distributed with the DataBlade Developers Kit includes a "Strings" DataBlade module that shows how to manipulate character strings.

Contents



Back to top


Glossary

Terms and acronyms used by this tech note include:

GLS

Global Language Support.

Locale

Language, territory and code set

Localize

Make software work for a specific locale

Internationalize

Make software work for any locale

UDR

User-Defined Routine

Contents



Resources



About the author

Jean Anderson is an IBM contributing author.




Rate this page


Please take a moment to complete this form to help us better serve you.



 


 


Not
useful
Extremely
useful
 


Share this....

digg Digg this story del.icio.us del.icio.us Slashdot Slashdot it!



Back to top