Unsupported - GNU Aspell 0.61-cvs

Next: Multiple Scripts, Previous: Supported, Up: Languages Which Aspell can Support

B.2 Unsupported

These languages, when written in the given script, are currently unsupported by Aspell for one reason or another.

Code Language Name Script
ja Japanese Japanese
km Khmer Khmer
ko Korean Han, Hangul
lo Lao Lao
th Thai Thai
zh Chinese Han

B.2.1 The Thai, Khmer, and Lao Scripts

The Thai, Khmer, and Lao scripts presents a different problem for Aspell. The problem is not that there are more than 210 unique symbols, but that there are no spaces between words. This means that there is no easy way to split a sentence into individual words. However, it is still possible to spell check these scripts, it is just a lot more difficult. I will be happy to work with someone who is interested in adding Thai, Khmer, or Lao support to Aspell, but it is not likely something I will do on my own in the foreseeable future.

B.2.2 Languages which use Hànzi Characters

Hànzi Characters are used to write Chinese, Japanese, Korean, and were once used to write Vietnamese. Each hànzi character represents a syllable of a spoken word and also has a meaning. Since there are around 3,000 of them in common usage it is unlikely that Aspell will ever be able to support spell checking languages written using hànzi until full Unicode support is implemented. However, I am not even sure if these languages need spell checking since hànzi characters are generally not entered in directly. Furthermore even if Aspell could spell check hànzi the existing suggestion strategy will not work well at all, and thus a completely new strategy will need to be developed. However, if it is the case that hànzi needs to be spell checked and you know something about the issues involved please fell free to contact me.

B.2.3 Japanese

Modern Japanese is written in a mixture of hiragana, katakana, kanji, and sometimes romaji. Hiragana and katakana are both syllabaries unique to Japan, kanji is a modified form of hànzi, and romaji uses the Latin alphabet. With some work, Aspell should be able to check the non-kanji part of Japanese text. However, based on my limited understanding of Japanese hiragana is often used at the end of kanji. Thus if Aspell was to simply separate out the hiragana from kanji it would end up with a lot of word endings which are not proper words and will thus be flagged as misspellings. However, this can be fairly easily rectified as text is tokenized into words before it is converted into Aspell's internal encoding. In fact, some Japanese text is written in entirely in one script. For example books for children and foreigners are sometimes written entirely in hiragana. Thus, Aspell, in its current state, could prove at least somewhat useful for spell checking Japanese.

B.2.4 Hangul

Korean is generally written in hangul or a mixture of han and hangul. In Hangul letters individual letters, known as jamo, are grouped together in syllable blocks. Unicode allows Hangul to be stored in one of three ways, (A) Individual jamo letters (Hangul Compatibility Jamo, U+3130 - U+318F), (D) decomposed jamo (Hangul Jamo, U+1100 - U+11FF), and (C) precomposed syllable blocks (Hangul Syllables, U+AC00 - U+D7AF). In order for Aspell to work with Hangul it needs to be form A. Unfortunately the existing Normalization code in Aspell will not be able to adequately deal with converting Hangul from form D and C to form A and back again. However, once this code is written, Aspell should be able to spell check Hangul without any problem.

Code	Language Name	Script
ja	Japanese	Japanese
km	Khmer	Khmer
ko	Korean	Han, Hangul
lo	Lao	Lao
th	Thai	Thai
zh	Chinese	Han