Next: , Up: Languages Which Aspell can Support   [Contents]


B.1 Supported

Aspell 0.60 should be able to support the following languages:

CodeLanguage NameScriptDictionary AvailableGettext Translation
aaAfarLatin--
afAfrikaansLatin0.50-
akAkanLatinMaybe-
amAmharicEthiopic0.60-
arArabicArabic0.60-
asAssameseBengali--
avAvarCyrillic--
ayAymaraLatin--
azAzerbaijaniCyrillic, Latin0.60-
baBashkirCyrillic--
beBelarusianCyrillic0.50Incomplete
bgBulgarianCyrillic0.50-
bhBihariDevanagari--
bmBambaraLatin--
bnBengaliBengali0.60-
boTibetanTibetan--
brBretonLatin0.50-
bsBosnianLatinMaybe-
caCatalan / ValencianLatin0.50Yes
ceChechenCyrillic--
coCorsicanLatinMaybe-
copCopticGreekMaybe-
csCzechLatin0.50Yes
csbKashubianLatin0.60-
cvChuvashCyrillic--
cyWelshLatin0.50-
daDanishLatin0.50Incomplete
deGermanLatin0.50Yes
dyuDyula-Maybe-
eeEweLatin--
elGreekGreek0.50-
enEnglishLatin0.50Yes
eoEsperantoLatin0.50-
esSpanishLatin0.50Incomplete
etEstonianLatin0.60-
euBasqueLatinMaybe-
faPersianArabic0.60-
ffFulahLatinMaybe-
fiFinnishLatin0.60-
fjFijianLatinMaybe-
foFaroeseLatin0.50-
frFrenchLatin0.50Yes
furFriulianLatinMaybe-
fyFrisianLatin0.60-
gaIrishLatin0.50Yes
gdScottish GaelicLatin0.50-
glGalleganLatin0.50-
gnGuaraniLatinMaybe-
guGujaratiGujarati0.60-
gvManx GaelicLatin0.50-
haHausaLatinMaybe-
heHebrewHebrew0.60-
hiHindiDevanagari0.60-
hilHiligaynonLatin0.50-
hoHiri MotuLatin--
hrCroatianLatin0.50-
hsbUpper SorbianLatin0.60-
htHaitian CreoleLatinMaybe-
huHungarianLatin0.60-
hyArmenianArmenian0.60-
hzHereroLatin--
iaInterlingua (IALA)Latin0.50-
idIndonesianArabic, Latin0.50-
igIgboLatinMaybe-
iiSichuan YiYi--
ioIdoLatin--
isIcelandicLatin0.50-
itItalianLatin0.50Yes
jvJavaneseJavanese, LatinMaybe-
kaGeorgianGeorgian--
kgKongoLatinMaybe-
kiKikuyu / GikuyuLatin--
kjKwanyamaLatin--
kkKazakhCyrillic--
kmKhmerKhmerMaybe-
knKannadaKannadaPlanned-
krKanuriLatin--
ksKashmiriArabic, Devanagari--
kuKurdishArabic, Cyrillic, Latin0.50-
kvKomiCyrillic--
kyKirghizArabic, Cyrillic, LatinMaybe-
laLatinLatin0.60-
lbLuxembourgishLatinMaybe-
lgGandaLatinMaybe-
liLimburgianLatinMaybe-
lnLingalaLatinMaybe-
ltLithuanianLatin0.60-
luLuba-KatangaLatin--
lvLatvianLatin0.60-
mgMalagasyLatin0.50-
miMaoriLatin0.50-
mkMacedonianCyrillic0.50-
mlMalayalamLatin, Malayalam0.60-
mnMongolianCyrillic, Mongolian0.60Incomplete
moMoldavianCyrillic--
mosMossi-Maybe-
mrMarathiDevanagari0.60-
msMalayArabic, Latin0.50-
mtMalteseLatin0.50-
myBurmeseMyanmar--
nbNorwegian BokmalLatin0.50-
ndNorth NdebeleLatinMaybe-
ndsLow SaxonLatin0.60-
neNepaliDevanagariMaybe-
ngNdongaLatinMaybe-
nlDutchLatin0.50Yes
nnNorwegian NynorskLatin0.50-
nrSouth NdebeleLatinMaybe-
nsoNorthern SothoLatinMaybe-
nvNavajoLatinMaybe-
nyNyanjaLatin0.50-
ocOccitan / ProvencalLatinMaybe-
omOromoEthiopic, Latin--
orOriyaOriya0.60-
osOsseticCyrillic--
paPunjabiGurmukhi0.60-
plPolishLatin0.50-
psPushtoArabic--
ptPortugueseLatin0.50Incomplete
quQuechuaLatin0.60-
rnRundiLatinMaybe-
roRomanianLatin0.50Incomplete
ruRussianCyrillic0.50Yes
rwKinyarwandaLatin0.50-
scSardinianLatin0.50-
sdSindhiArabic--
sgSangoLatinMaybe-
siSinhaleseSinhala--
skSlovakLatin0.50Yes
slSlovenianLatin0.50Yes
smSamoanLatinMaybe-
snShonaLatinMaybe-
soSomaliLatinMaybe-
sqAlbanianLatinMaybe-
srSerbianCyrillic, Latin0.60Incomplete
ssSwatiLatinMaybe-
stSouthern SothoLatinMaybe-
suSundaneseLatinMaybe-
svSwedishLatin0.50Incomplete
swSwahiliLatin0.50-
taTamilTamil0.60-
teTeluguTelugu0.60-
tetTetumLatin0.50-
tgTajikArabic, Cyrillic, LatinMaybeIncomplete
tiTigrinyaEthiopicMaybe-
tkTurkmenArabic, Cyrillic, Latin0.50-
tlTagalogLatin, Tagalog0.50-
tnTswanaLatin0.50-
toTongaLatinMaybe-
trTurkishArabic, Latin0.50-
tsTsongaLatinMaybe-
ttTatarCyrillic--
twTwiLatin--
tyTahitianLatinMaybe-
ugUighurArabic, Cyrillic, Latin--
ukUkrainianCyrillic0.50Yes
urUrduArabicMaybe-
uzUzbekCyrillic, Latin0.60-
veVendaLatinMaybe-
viVietnameseLatin0.60Yes
waWalloonLatin0.50Incomplete
woWolofLatinMaybe-
xhXhosaLatinMaybe-
yiYiddishHebrew0.60-
yoYorubaLatinMaybe-
zaZhuangLatin--
zuZuluLatin0.50-

Dictionaries marked as 0.50 are available for Aspell 0.50. Ones marked as 0.60 are available for Aspell 0.60 only. Ones marked as Planned should eventually be available. Ones marked as Maybe might be available in the future. See Planned Dictionaries, for more info.

B.1.1 Notes on Latin Languages

Any word that can be written using one of the Latin ISO-8859 character sets (ISO-8859-1,2,3,4,9,10,13,14,15,16) can be written, in decomposed form, using the ASCII characters, the 23 additional letters:

U+00C6 LATIN CAPITAL LETTER AE
U+00D0 LATIN CAPITAL LETTER ETH
U+00D8 LATIN CAPITAL LETTER O WITH STROKE
U+00DE LATIN CAPITAL LETTER THORN
U+00DE LATIN SMALL LETTER THORN
U+00DF LATIN SMALL LETTER SHARP S
U+00E6 LATIN SMALL LETTER AE
U+00F0 LATIN SMALL LETTER ETH
U+00F8 LATIN SMALL LETTER O WITH STROKE
U+0110 LATIN CAPITAL LETTER D WITH STROKE
U+0111 LATIN SMALL LETTER D WITH STROKE
U+0126 LATIN CAPITAL LETTER H WITH STROKE
U+0127 LATIN SMALL LETTER H WITH STROKE
U+0131 LATIN SMALL LETTER DOTLESS I
U+0138 LATIN SMALL LETTER KRA
U+0141 LATIN CAPITAL LETTER L WITH STROKE
U+0142 LATIN SMALL LETTER L WITH STROKE
U+014A LATIN CAPITAL LETTER ENG
U+014B LATIN SMALL LETTER ENG
U+0152 LATIN CAPITAL LIGATURE OE
U+0153 LATIN SMALL LIGATURE OE
U+0166 LATIN CAPITAL LETTER T WITH STROKE
U+0167 LATIN SMALL LETTER T WITH STROKE

and the 14 modifiers:

U+0300 COMBINING GRAVE ACCENT
U+0301 COMBINING ACUTE ACCENT
U+0302 COMBINING CIRCUMFLEX ACCENT
U+0303 COMBINING TILDE
U+0304 COMBINING MACRON
U+0306 COMBINING BREVE
U+0307 COMBINING DOT ABOVE
U+0308 COMBINING DIAERESIS
U+030A COMBINING RING ABOVE
U+030B COMBINING DOUBLE ACUTE ACCENT
U+030C COMBINING CARON
U+0326 COMBINING COMMA BELOW
U+0327 COMBINING CEDILLA
U+0328 COMBINING OGONEK

Which is a total of 37 additional Unicode code points.

All ISO-8859 character leaves the characters 0x00 - 0x1F, and 0x80 - 0x9F unmapped as they are generally used as control characters. Of those, 0x01 - 0x0F, 0x11 - 0x1F and 0x80 - 0x9F may be mapped to anything in Aspell. This is a total of 62 characters which can be remapped in any ISO-8859 character set. Thus, by remapping 37 of the 62 characters to the previously specified Unicode code-points, any modified ISO-8859 character set can be used for any Latin languages covered by ISO-8859. Of course decomposing every single accented character wastes a lot of space, so only characters that cannot be represented in the precomposed form should be broken up. By using this trick it is possible to store foreign words in the correctly accented form in the dictionary even if the precomposed character is not in the current character set.

Any letter in the Unicode range U+0000 - U+0249, U+1E00 - U+1EFF (Basic Latin, Latin-1 Supplement, Latin Extended-A, Latin Extended-B, and Latin Extended Additional) can be represented using around 175 basic letters, and 25 modifiers which is less than 210 and can thus fit in an Aspell 8-bit character set. Since this Unicode range covers any possible Latin language this special character set can be used to represent any word written using the Latin script if so desired.

B.1.2 Syllabic

Syllabic languages use a separate symbol for each syllable of the language. Even thought most of them have more than 210 distinct symbols Aspell can still support them by breaking them up.

B.1.2.1 The Ethiopic Syllabary

Even though the Ethiopic script has more than 210 distinct characters Aspell can still handle it. The idea is to split each character into two parts based on the Consonant and Vowel parts. This encoding of the syllabary is far more useful to Aspell than if they were stored in UTF-8 or UTF-16. In fact, the exiting suggestion strategy of Aspell will work well with this encoding without any additional modifications. However, additional improvements may be possible by taking advantage of the consonant-vowel structure of this encoding.

In fact, the split consonant-vowel representation may prove to be so useful that it may be beneficial to encode other syllabary in this fashion, even if they are less than 210 of them.

The code to break up a syllabary into the consonant-vowel part is part of the Unicode normalization process.

B.1.2.2 The Yi Syllabary

A very large syllabary with 819 distinct symbols. However, like Ethiopic, it should be possible to support this script by breaking it up.

B.1.2.3 The Ojibwe Syllabary

With only 120 distinct symbols, Aspell can actually support this one as is. However, as previously mentioned, it may be beneficial to break it up into the consonant-vowel representation anyway.


Next: , Up: Languages Which Aspell can Support   [Contents]