Follow Techotopia on Twitter

On-line Guides
All Guides
eBook Store
iOS / Android
Linux for Beginners
Office Productivity
Linux Installation
Linux Security
Linux Utilities
Linux Virtualization
Linux Kernel
System/Network Admin
Programming
Scripting Languages
Development Tools
Web Development
GUI Toolkits/Desktop
Databases
Mail Systems
openSolaris
Eclipse Documentation
Techotopia.com
Virtuatopia.com
Answertopia.com

How To Guides
Virtualization
General System Admin
Linux Security
Linux Filesystems
Web Servers
Graphics & Desktop
PC Hardware
Windows
Problem Solutions
Privacy Policy

  




 

 

6.5 Generic Charset Conversion

The conversion functions mentioned so far in this chapter all had in common that they operate on character sets that are not directly specified by the functions. The multibyte encoding used is specified by the currently selected locale for the LC_CTYPE category. The wide character set is fixed by the implementation (in the case of GNU C library it is always UCS-4 encoded ISO 10646.

This has of course several problems when it comes to general character conversion:

  • For every conversion where neither the source nor the destination character set is the character set of the locale for the LC_CTYPE category, one has to change the LC_CTYPE locale using setlocale.

    Changing the LC_TYPE locale introduces major problems for the rest of the programs since several more functions (e.g., the character classification functions, see Classification of Characters) use the LC_CTYPE category.

  • Parallel conversions to and from different character sets are not possible since the LC_CTYPE selection is global and shared by all threads.
  • If neither the source nor the destination character set is the character set used for wchar_t representation, there is at least a two-step process necessary to convert a text using the functions above. One would have to select the source character set as the multibyte encoding, convert the text into a wchar_t text, select the destination character set as the multibyte encoding, and convert the wide character text to the multibyte (= destination) character set.

    Even if this is possible (which is not guaranteed) it is a very tiring work. Plus it suffers from the other two raised points even more due to the steady changing of the locale.

The XPG2 standard defines a completely new set of functions, which has none of these limitations. They are not at all coupled to the selected locales, and they have no constraints on the character sets selected for source and destination. Only the set of available conversions limits them. The standard does not specify that any conversion at all must be available. Such availability is a measure of the quality of the implementation.

In the following text first the interface to iconv and then the conversion function, will be described. Comparisons with other implementations will show what obstacles stand in the way of portable applications. Finally, the implementation is described in so far as might interest the advanced user who wants to extend conversion capabilities.


 
 
  Published under the terms of the GNU General Public License Design by Interspire