Next: , Previous: Words With Symbols in Them, Up: Language Related Issues


C.3 Unicode Normalization

Because Unicode contains a large number of precomposed characters there are multiple ways a character can be represented. For example letter ö can either be represented as

     U+00F6 LATIN SMALL LETTER O WITH DIAERESIS

or
U+0061 LATIN SMALL LETTER O + U+0308 COMBINING DIAERESIS

By performing normalization first, Aspell will only see one of these representations. The exact form of normalization depends on the language. Give the choice of:

  1. Precomposed character
  2. Base letter + combining character(s)
  3. Base letter only

if the precomposed character is in the target character set, then (1), if both base and combining character is present, then (2), otherwise (3).

Unicode Normalization is now implemented in Aspell 0.60.