In some locales, the conventions for lexicographic ordering differ from
the strict numeric ordering of character codes. For example, in Spanish
most glyphs with diacritical marks such as accents are not considered
distinct letters for the purposes of collation. On the other hand, the
two-character sequence `ll' is treated as a single letter that is
collated immediately after `l'.
You can use the functions strcoll and strxfrm (declared in
the headers file string.h) and wcscoll and wcsxfrm
(declared in the headers file wchar) to compare strings using a
collation ordering appropriate for the current locale. The locale used
by these functions in particular can be specified by setting the locale
for the LC_COLLATE category; see Locales.
In the standard C locale, the collation sequence for strcoll is
the same as that for strcmp. Similarly, wcscoll and
wcscmp are the same in this situation.
Effectively, the way these functions work is by applying a mapping to
transform the characters in a string to a byte sequence that represents
the string's position in the collating sequence of the current locale.
Comparing two such byte sequences in a simple fashion is equivalent to
comparing the strings with the locale's collating sequence.
The functions strcoll and wcscoll perform this translation
implicitly, in order to do one comparison. By contrast, strxfrm
and wcsxfrm perform the mapping explicitly. If you are making
multiple comparisons using the same string or set of strings, it is
likely to be more efficient to use strxfrm or wcsxfrm to
transform all the strings just once, and subsequently compare the
transformed strings with strcmp or wcscmp.
— Function: int strcoll (const char *s1, const char *s2)
The strcoll function is similar to strcmp but uses the
collating sequence of the current locale for collation (the
LC_COLLATE locale).
— Function: int wcscoll (const wchar_t *ws1, const wchar_t *ws2)
The wcscoll function is similar to wcscmp but uses the
collating sequence of the current locale for collation (the
LC_COLLATE locale).
Here is an example of sorting an array of strings, using strcoll
to compare them. The actual sort algorithm is not written here; it
comes from qsort (see Array Sort Function). The job of the
code shown here is to say how to compare the strings while sorting them.
(Later on in this section, we will show a way to do this more
efficiently using strxfrm.)
/* This is the comparison function used with qsort. */
int
compare_elements (char **p1, char **p2)
{
return strcoll (*p1, *p2);
}
/* This is the entry point---the function to sortstrings using the locale's collating sequence. */
void
sort_strings (char **array, int nstrings)
{
/* Sort temp_array by comparing the strings. */
qsort (array, nstrings,
sizeof (char *), compare_elements);
}
The function strxfrm transforms the string from using the
collation transformation determined by the locale currently selected for
collation, and stores the transformed string in the array to. Up
to size characters (including a terminating null character) are
stored.
The return value is the length of the entire transformed string. This
value is not affected by the value of size, but if it is greater
or equal than size, it means that the transformed string did not
entirely fit in the array to. In this case, only as much of the
string as actually fits was stored. To get the whole transformed
string, call strxfrm again with a bigger output array.
The transformed string may be longer than the original string, and it
may also be shorter.
If size is zero, no characters are stored in to. In this
case, strxfrm simply returns the number of characters that would
be the length of the transformed string. This is useful for determining
what size the allocated array should be. It does not matter what
to is if size is zero; to may even be a null pointer.
The function wcsxfrm transforms wide character string wfrom
using the collation transformation determined by the locale currently
selected for collation, and stores the transformed string in the array
wto. Up to size wide characters (including a terminating null
character) are stored.
The return value is the length of the entire transformed wide character
string. This value is not affected by the value of size, but if
it is greater or equal than size, it means that the transformed
wide character string did not entirely fit in the array wto. In
this case, only as much of the wide character string as actually fits
was stored. To get the whole transformed wide character string, call
wcsxfrm again with a bigger output array.
The transformed wide character string may be longer than the original
wide character string, and it may also be shorter.
If size is zero, no characters are stored in to. In this
case, wcsxfrm simply returns the number of wide characters that
would be the length of the transformed wide character string. This is
useful for determining what size the allocated array should be (remember
to multiply with sizeof (wchar_t)). It does not matter what
wto is if size is zero; wto may even be a null pointer.
Here is an example of how you can use strxfrm when
you plan to do many comparisons. It does the same thing as the previous
example, but much faster, because it has to transform each string only
once, no matter how many times it is compared with other strings. Even
the time needed to allocate and free storage is much less than the time
we save, when there are many strings.
struct sorter { char *input; char *transformed; };
/* This is the comparison function used with qsortto sort an array of struct sorter. */
int
compare_elements (struct sorter *p1, struct sorter *p2)
{
return strcmp (p1->transformed, p2->transformed);
}
/* This is the entry point---the function to sortstrings using the locale's collating sequence. */
void
sort_strings_fast (char **array, int nstrings)
{
struct sorter temp_array[nstrings];
int i;
/* Set up temp_array. Each element containsone input string and its transformed string. */
for (i = 0; i < nstrings; i++)
{
size_t length = strlen (array[i]) * 2;
char *transformed;
size_t transformed_length;
temp_array[i].input = array[i];
/* First try a buffer perhaps big enough. */
transformed = (char *) xmalloc (length);
/* Transform array[i]. */
transformed_length = strxfrm (transformed, array[i], length);
/* If the buffer was not large enough, resize itand try again. */
if (transformed_length >= length)
{
/* Allocate the needed space. +1 for terminatingNUL character. */
transformed = (char *) xrealloc (transformed,
transformed_length + 1);
/* The return value is not interesting because we knowhow long the transformed string is. */
(void) strxfrm (transformed, array[i],
transformed_length + 1);
}
temp_array[i].transformed = transformed;
}
/* Sort temp_array by comparing transformed strings. */
qsort (temp_array, sizeof (struct sorter),
nstrings, compare_elements);
/* Put the elements back in the permanent arrayin their sorted order. */
for (i = 0; i < nstrings; i++)
array[i] = temp_array[i].input;
/* Free the strings we allocated. */
for (i = 0; i < nstrings; i++)
free (temp_array[i].transformed);
}
The interesting part of this code for the wide character version would
look like this:
void
sort_strings_fast (wchar_t **array, int nstrings)
{
...
/* Transform array[i]. */
transformed_length = wcsxfrm (transformed, array[i], length);
/* If the buffer was not large enough, resize itand try again. */
if (transformed_length >= length)
{
/* Allocate the needed space. +1 for terminatingNUL character. */
transformed = (wchar_t *) xrealloc (transformed,
(transformed_length + 1)
* sizeof (wchar_t));
/* The return value is not interesting because we knowhow long the transformed string is. */
(void) wcsxfrm (transformed, array[i],
transformed_length + 1);
}
...
Note the additional multiplication with sizeof (wchar_t) in the
realloc call.
Compatibility Note: The string collation functions are a new
feature of ISO C90. Older C dialects have no equivalent feature.
The wide character versions were introduced in Amendment 1 to ISO C90.
Published under the terms of the GNU General Public License