draft Languages and character sets Aug 95 Characters and character sets for various languages Tue Aug 22 15:39:23 MET DST 1995 Harald Tveit Alvestrand UNINETT Harald.Alvestrand@uninett.no draft-alvestrand-lang-char-03.txt Abstract There is a need to have a source of information about the characters that are used in various languages. No such information is currently readily available on the net. This document attempts to fill that void. Status of this Memo This draft document is being circulated for comment. It does not yet cover anything but Latin-based scripts; volunteers to collect material for other scripts are sought. Promises made have not been kept, so the Cyrillic information is still not present; this draft has only minor updateds and corrections compared to the June 93 version. Please send comments to the author, or to the RARE WG-CHAR list . The following text is required by the Internet-draft rules: This document is an Internet Draft. Internet Drafts are working documents of the Internet Engineering Task Force (IETF), its Areas, and its Working Groups. Note that other groups may also distribute working documents as Internet Drafts. Internet Drafts are draft documents valid for a maximum of six months. Internet Drafts may be updated, replaced, or obsoleted by other documents at any time. It is not appropriate to use Internet Drafts as reference material or to cite them other than Alvestrand Expires Feb 95 [Page 1] draft Languages and character sets Aug 95 as a "working draft" or "work in progress." To learn the current status of any Internet-Draft, please check the 1id-abstracts.txt listing contained in the Internet-Drafts Shadow Directories on ds.internic.net, nic.nordu.net, venera.isi.edu, or munnari.oz.au. Alvestrand Expires Feb 95 [Page 2] draft Languages and character sets Aug 95 1. Introduction There are a lot of languages in the world. Estimates vary between 500 and 6000, with some eternal conflicts about the difference between a language and a dialect guaranteeing that any list claiming to be authoritative will be the source of endless debate. Many of these languages have a writing system. Some have several. These are also likely to have changed over time, with the meaning of character symbols changing, the shape of the characters changing, or completely new characters being added, or old ones removed from the set. This means that even within a single language, a list of characters is likely to be controversial. These problems have made several experts in the field of languages and characters refuse to even consider the idea of working out such a list. Nevertheless, it is clear that an easily available source of this kind of information is needed, in order to: (1) Identify the problems encountered when trying to use equipment with limited character support for a language (2) Identify what support for additional characters will be "enough" for that language (3) Identify what internationally standardized character sets are able to fulfill the requirements for that languag The tables given below are an attempt at providing such an identification. The rest of the document is in 3 parts: The language tables a 2. Introduction to language tables Alvestrand Expires Feb 95 [Page 3] draft Languages and character sets Aug 95 2.1. Table structure Each language is listed in 4 parts: (1) The language name with its ISO 639 code if applicable (2) The characters required for that language. For brevity, the characters of ASCII (A-Z) are not listed. Note that some languages do NOT require all the ASCII characters. (3) Characters that are in normal use, but have replacements that mostly do not change the meaning of the word in context. These may be called "optional" characters. This should _not_ be taken as liberty to remove those characters from the language, but as a reminder that if it is great trouble to use the charsets that cover the complete language, a smaller character set may be used without causing grievous harm to the expressive power of the writer. (4) Internationally registered character sets that cover the required and/or optional characters for that language. (5) Comments The division between "required" and "optional" characters is likely to produce much discussion. As a rough guide, I have taken the registered ISO 646 variants of a number of countries, and classified as "optional" all characters which did _not_ appear in that ISO 646 variant. As a result, an ISO 646 variant should appear under the "required characters only" for all languages that have an ISO 646 variant. Note that for brevity, only the lower case version of the character is listed. If no note is made, one should assume that the upper case version is equally required. Note, however, that a lot of languages permit the dropping of accents on upper case characters where it would be considered improper to drop them on lower case characters. Alvestrand Expires Feb 95 [Page 4] draft Languages and character sets Aug 95 2.2. Sources utilized The table of Latin-script languages is based on work by Johan van Wingen. . The others are best guesses by the author. The tables of character sets prepared by Keld Jorn Simonsen (RFC-KELD) were invaluable in matching the data on languages to the data on character sets. The language codes (for those languages that have codes) come from ISO 639. NOTE: ISO 639 is a very incomplete list of the world's languages (perhaps 10 or 20 % according to some experts), and is undergoing revision. The only reason for using it is that it is the only ISO- standardized shorthand notation for languages available at the moment. Languages for which no such exact information is known are listed at the end of the tables. 2.3. What accents mean For those who feel unfamiliar with the names of accents: Grave slants upwards to the left, like the Unix "backtick". Acute slants upwards to the right. Circumflex looks like a little pointed hat. Tilde looks like a wavy line. Alvestrand Expires Feb 95 [Page 5] draft Languages and character sets Aug 95 Macron looks like a bar placed on top of the character. Breve looks like the lower quarter of a circle, placed on top of the character. Dot above should be self-explanatory. Diaeresis looks like 2 dots above the character. Ring above should be self-explanatory. Cedilla looks like a little squiggle on the bottom of the letter, down and then left. Ogonek looks like a squiggle too, but goes down and to the right. Caron looks like a little "v" on top of the character. 3. Language tables 3.1. la Latin Required characters a 0061 LATIN SMALL LETTER A b 0062 LATIN SMALL LETTER B c 0063 LATIN SMALL LETTER C Alvestrand Expires Feb 95 [Page 6] draft Languages and character sets Aug 95 d 0064 LATIN SMALL LETTER D e 0065 LATIN SMALL LETTER E f 0066 LATIN SMALL LETTER F g 0067 LATIN SMALL LETTER G h 0068 LATIN SMALL LETTER H i 0069 LATIN SMALL LETTER I j 006a LATIN SMALL LETTER J k 006b LATIN SMALL LETTER K l 006c LATIN SMALL LETTER L m 006d LATIN SMALL LETTER M n 006e LATIN SMALL LETTER N o 006f LATIN SMALL LETTER O p 0070 LATIN SMALL LETTER P q 0071 LATIN SMALL LETTER Q r 0072 LATIN SMALL LETTER R s 0073 LATIN SMALL LETTER S t 0074 LATIN SMALL LETTER T u 0075 LATIN SMALL LETTER U v 0076 LATIN SMALL LETTER V w 0077 LATIN SMALL LETTER W x 0078 LATIN SMALL LETTER X y 0079 LATIN SMALL LETTER Y z 007a LATIN SMALL LETTER Z A 0041 LATIN CAPITAL LETTER A B 0042 LATIN CAPITAL LETTER B C 0043 LATIN CAPITAL LETTER C D 0044 LATIN CAPITAL LETTER D E 0045 LATIN CAPITAL LETTER E F 0046 LATIN CAPITAL LETTER F G 0047 LATIN CAPITAL LETTER G H 0048 LATIN CAPITAL LETTER H I 0049 LATIN CAPITAL LETTER I J 004a LATIN CAPITAL LETTER J K 004b LATIN CAPITAL LETTER K L 004c LATIN CAPITAL LETTER L M 004d LATIN CAPITAL LETTER M N 004e LATIN CAPITAL LETTER N O 004f LATIN CAPITAL LETTER O P 0050 LATIN CAPITAL LETTER P Q 0051 LATIN CAPITAL LETTER Q R 0052 LATIN CAPITAL LETTER R S 0053 LATIN CAPITAL LETTER S T 0054 LATIN CAPITAL LETTER T U 0055 LATIN CAPITAL LETTER U Alvestrand Expires Feb 95 [Page 7] draft Languages and character sets Aug 95 V 0056 LATIN CAPITAL LETTER V W 0057 LATIN CAPITAL LETTER W X 0058 LATIN CAPITAL LETTER X Y 0059 LATIN CAPITAL LETTER Y Z 005a LATIN CAPITAL LETTER Z Character sets covering the whole NO SET (iso ) CSA_Z243.4-1985-gr (iso 123) ISO_6937-2-add (iso 142) ANSI_X3.4-1968 (iso 6) IT (iso 15) ISO_10367-box (iso 155) DIN_66003 (iso 21) NS_4551-2 (iso 61) ISO_8859-2:1987 (iso 101) MSZ_7795.3 (iso 86) GB_2312-80 (iso 58) ISO_8859-8:1988 (iso 138) ECMA-cyrillic (iso 111) ANSI_X3.110-1983 (iso 99) ISO_8859-5:1988 (iso 144) ISO_6937-2-25 (iso 152) ES (iso 17) JIS_C6226-1978 (iso 42) latin-lap (iso 158) BS_4730 (iso 4) iso-ir-90 (iso 90) Latin-greek-1 (iso 27) videotex-suppl (iso 70) NATS-DANO (iso 9) T.61-8bit (iso 103) KS_C_5601-1987 (iso 149) JUS_I.B1.002 (iso 141) CSA_Z243.4-1985-2 (iso 122) JIS_C6220-1969-ro (iso 14) ISO_8859-supp (iso 154) INIS (iso 49) NS_4551-1 (iso 60) ISO_646.irv:1983 (iso 2) GB_1988-80 (iso 57) ES2 (iso 85) JIS_C6229-1984-b (iso 92) Alvestrand Expires Feb 95 [Page 8] draft Languages and character sets Aug 95 ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) NC_NC00-10:81 (iso 151) ISO_8859-7:1987 (iso 126) IEC_P27-1 (iso 143) SEN_850200_C (iso 11) PT (iso 16) latin6 (iso 157) NF_Z_62-010_(1973) (iso 25) NF_Z_62-010 (iso 69) JIS_C6226-1983 (iso 87) T.61-7bit (iso 102) CSN_369103 (iso 139) CSA_Z243.4-1985-1 (iso 121) ISO_8859-1:1987 (iso 100) ISO_8859-9:1989 (iso 148) NATS-SEFI (iso 8) GOST_19768-74 (iso 153) SEN_850200_B (iso 10) BS_viewdata (iso 47) PT2 (iso 84) ISO_8859-6:1987 (iso 127) ISO_8859-3:1988 (iso 109) 3.2. ?? Cyrillic Required characters a= 0430 CYRILLIC SMALL LETTER A A= 0410 CYRILLIC CAPITAL LETTER A b= 0431 CYRILLIC SMALL LETTER BE B= 0411 CYRILLIC CAPITAL LETTER BE v= 0432 CYRILLIC SMALL LETTER VE V= 0412 CYRILLIC CAPITAL LETTER VE g= 0433 CYRILLIC SMALL LETTER GHE G= 0413 CYRILLIC CAPITAL LETTER GHE d= 0434 CYRILLIC SMALL LETTER DE D= 0414 CYRILLIC CAPITAL LETTER DE e= 0435 CYRILLIC SMALL LETTER IE E= 0415 CYRILLIC CAPITAL LETTER IE z% 0436 CYRILLIC SMALL LETTER ZHE Z% 0416 CYRILLIC CAPITAL LETTER ZHE z= 0437 CYRILLIC SMALL LETTER ZE Alvestrand Expires Feb 95 [Page 9] draft Languages and character sets Aug 95 Z= 0417 CYRILLIC CAPITAL LETTER ZE i= 0438 CYRILLIC SMALL LETTER I I= 0418 CYRILLIC CAPITAL LETTER I k= 043a CYRILLIC SMALL LETTER KA K= 041a CYRILLIC CAPITAL LETTER KA l= 043b CYRILLIC SMALL LETTER EL L= 041b CYRILLIC CAPITAL LETTER EL m= 043c CYRILLIC SMALL LETTER EM M= 041c CYRILLIC CAPITAL LETTER EM n= 043d CYRILLIC SMALL LETTER EN N= 041d CYRILLIC CAPITAL LETTER EN o= 043e CYRILLIC SMALL LETTER O O= 041e CYRILLIC CAPITAL LETTER O p= 043f CYRILLIC SMALL LETTER PE P= 041f CYRILLIC CAPITAL LETTER PE r= 0440 CYRILLIC SMALL LETTER ER R= 0420 CYRILLIC CAPITAL LETTER ER s= 0441 CYRILLIC SMALL LETTER ES S= 0421 CYRILLIC CAPITAL LETTER ES t= 0442 CYRILLIC SMALL LETTER TE T= 0422 CYRILLIC CAPITAL LETTER TE u= 0443 CYRILLIC SMALL LETTER U U= 0423 CYRILLIC CAPITAL LETTER U f= 0444 CYRILLIC SMALL LETTER EF F= 0424 CYRILLIC CAPITAL LETTER EF h= 0445 CYRILLIC SMALL LETTER HA H= 0425 CYRILLIC CAPITAL LETTER HA c= 0446 CYRILLIC SMALL LETTER TSE C= 0426 CYRILLIC CAPITAL LETTER TSE c% 0447 CYRILLIC SMALL LETTER CHE C% 0427 CYRILLIC CAPITAL LETTER CHE s% 0448 CYRILLIC SMALL LETTER SHA S% 0428 CYRILLIC CAPITAL LETTER SHA Character sets covering the whole NO SET (iso ) JIS_C6226-1983 (iso 87) JIS_C6226-1978 (iso 42) JUS_I.B1.003-serb (iso 146) GOST_19768-74 (iso 153) GB_2312-80 (iso 58) KS_C_5601-1987 (iso 149) ISO_5427 (iso 37) Alvestrand Expires Feb 95 [Page 10] draft Languages and character sets Aug 95 ECMA-cyrillic (iso 111) INIS-cyrillic (iso 51) JUS_I.B1.003-mac (iso 147) ISO_8859-5:1988 (iso 144) 3.3. en English Based on script listed as Latin This language needs no additional characters This language has no known character set 3.4. lt Lithuanian Based on script listed as Latin Required characters a; 0105 LATIN SMALL LETTER A WITH OGONEK e; 0119 LATIN SMALL LETTER E WITH OGONEK i; 012f LATIN SMALL LETTER I WITH OGONEK u; 0173 LATIN SMALL LETTER U WITH OGONEK e. 0117 LATIN SMALL LETTER E WITH DOT ABOVE u- 016b LATIN SMALL LETTER U WITH MACRON c< 010d LATIN SMALL LETTER C WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON A; 0104 LATIN CAPITAL LETTER A WITH OGONEK E; 0118 LATIN CAPITAL LETTER E WITH OGONEK I; 012e LATIN CAPITAL LETTER I WITH OGONEK U; 0172 LATIN CAPITAL LETTER U WITH OGONEK E. 0116 LATIN CAPITAL LETTER E WITH DOT ABOVE U- 016a LATIN CAPITAL LETTER U WITH MACRON C< 010c LATIN CAPITAL LETTER C WITH CARON S< 0160 LATIN CAPITAL LETTER S WITH CARON Z< 017d LATIN CAPITAL LETTER Z WITH CARON Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) Alvestrand Expires Feb 95 [Page 11] draft Languages and character sets Aug 95 videotex-suppl (iso 70) iso-ir-90 (iso 90) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) latin6 (iso 157) ANSI_X3.110-1983 (iso 99) 3.5. lv Latvian Based on script listed as Latin Required characters a- 0101 LATIN SMALL LETTER A WITH MACRON e- 0113 LATIN SMALL LETTER E WITH MACRON i- 012b LATIN SMALL LETTER I WITH MACRON o- 014d LATIN SMALL LETTER O WITH MACRON u- 016b LATIN SMALL LETTER U WITH MACRON g, 0123 LATIN SMALL LETTER G WITH CEDILLA k, 0137 LATIN SMALL LETTER K WITH CEDILLA l, 013c LATIN SMALL LETTER L WITH CEDILLA n, 0146 LATIN SMALL LETTER N WITH CEDILLA r, 0157 LATIN SMALL LETTER R WITH CEDILLA c< 010d LATIN SMALL LETTER C WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON A- 0100 LATIN CAPITAL LETTER A WITH MACRON E- 0112 LATIN CAPITAL LETTER E WITH MACRON I- 012a LATIN CAPITAL LETTER I WITH MACRON O- 014c LATIN CAPITAL LETTER O WITH MACRON U- 016a LATIN CAPITAL LETTER U WITH MACRON G, 0122 LATIN CAPITAL LETTER G WITH CEDILLA K, 0136 LATIN CAPITAL LETTER K WITH CEDILLA L, 013b LATIN CAPITAL LETTER L WITH CEDILLA N, 0145 LATIN CAPITAL LETTER N WITH CEDILLA R, 0156 LATIN CAPITAL LETTER R WITH CEDILLA C< 010c LATIN CAPITAL LETTER C WITH CARON S< 0160 LATIN CAPITAL LETTER S WITH CARON Z< 017d LATIN CAPITAL LETTER Z WITH CARON Character sets covering the whole Alvestrand Expires Feb 95 [Page 12] draft Languages and character sets Aug 95 NO SET (iso ) ISO_6937-2-add (iso 142) videotex-suppl (iso 70) iso-ir-90 (iso 90) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) T.61-8bit (iso 103) latin6 (iso 157) ANSI_X3.110-1983 (iso 99) 3.6. et Estonian Based on script listed as Latin Required characters o? 00f5 LATIN SMALL LETTER O WITH TILDE a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON O? 00d5 LATIN CAPITAL LETTER O WITH TILDE A: 00c4 LATIN CAPITAL LETTER A WITH DIAERESIS O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS S< 0160 LATIN CAPITAL LETTER S WITH CARON Z< 017d LATIN CAPITAL LETTER Z WITH CARON Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) videotex-suppl (iso 70) iso-ir-90 (iso 90) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) latin6 (iso 157) ANSI_X3.110-1983 (iso 99) Alvestrand Expires Feb 95 [Page 13] draft Languages and character sets Aug 95 3.7. fi Finnish Based on script listed as Latin Required characters a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS A: 00c4 LATIN CAPITAL LETTER A WITH DIAERESIS O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) CSN_369103 (iso 139) ISO_8859-9:1989 (iso 148) NATS-SEFI (iso 8) videotex-suppl (iso 70) iso-ir-90 (iso 90) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) DIN_66003 (iso 21) NATS-DANO-ADD (iso 9) T.61-8bit (iso 103) SEN_850200_B (iso 10) JIS_X0212-1990 (iso 159) SEN_850200_C (iso 11) ISO_8859-2:1987 (iso 101) latin6 (iso 157) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) 3.8. ?? Sami Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE d/ 0111 LATIN SMALL LETTER D WITH STROKE ng 014b LATIN SMALL LETTER ENG Alvestrand Expires Feb 95 [Page 14] draft Languages and character sets Aug 95 t/ 0167 LATIN SMALL LETTER T WITH STROKE c< 010d LATIN SMALL LETTER C WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE D/ 0110 LATIN CAPITAL LETTER D WITH STROKE NG 014a LATIN CAPITAL LETTER ENG (Lappish) T/ 0166 LATIN CAPITAL LETTER T WITH STROKE C< 010c LATIN CAPITAL LETTER C WITH CARON S< 0160 LATIN CAPITAL LETTER S WITH CARON Z< 017d LATIN CAPITAL LETTER Z WITH CARON Important characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS ae 00e6 LATIN SMALL LETTER AE aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE o/ 00f8 LATIN SMALL LETTER O WITH STROKE n' 0144 LATIN SMALL LETTER N WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX A: 00c4 LATIN CAPITAL LETTER A WITH DIAERESIS E: 00cb LATIN CAPITAL LETTER E WITH DIAERESIS I: 00cf LATIN CAPITAL LETTER I WITH DIAERESIS O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS AE 00c6 LATIN CAPITAL LETTER AE AA 00c5 LATIN CAPITAL LETTER A WITH RING ABOVE O/ 00d8 LATIN CAPITAL LETTER O WITH STROKE N' 0143 LATIN CAPITAL LETTER N WITH ACUTE Comments Information from Otto Prytz This information is for the current Norwegian North Sami ortography of 1979. The letters aa, ae and o/ are in use for Norwegian/Swedish names, but not for Sami proper. a> and n' are no longer used. There is some doubt about whether e: and i: were ever used, but Alvestrand Expires Feb 95 [Page 15] draft Languages and character sets Aug 95 they are listed by van Wingen. Information from regnorj@powertech.no (Regnor Jernsletten): a', c< and s< only occur at the beginning of words. d/, ng, t/ and z< occur only within words. a' and n' are used in Lule sami, together with ae (in Norway) or a: (in Sweden). In South Sami, i: is used, together with ae and o/ (in Norway) or a: and o: (in Sweden). a> is used in Skolte Sami, together with g<, k< and o~. Skolte sami also uses z and z<, but the Z is written much like the number 3. Also, the letter "stungen g" (ISO code unknown) is used. Character sets covering the whole JIS_X0212-1990 (iso 159) NO SET (iso ) latin6 (iso 157) Character sets covering the required characters only ISO_8859-4:1988 (iso 110) 3.9. sv Swedish Based on script listed as Latin Required characters a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE A: 00c4 LATIN CAPITAL LETTER A WITH DIAERESIS O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS AA 00c5 LATIN CAPITAL LETTER A WITH RING ABOVE Important characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE e: 00eb LATIN SMALL LETTER E WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE Alvestrand Expires Feb 95 [Page 16] draft Languages and character sets Aug 95 E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE E: 00cb LATIN CAPITAL LETTER E WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) ISO_8859-9:1989 (iso 148) videotex-suppl (iso 70) iso-ir-90 (iso 90) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) latin6 (iso 157) ANSI_X3.110-1983 (iso 99) Character sets covering the required characters only NATS-SEFI (iso 8) SEN_850200_B (iso 10) SEN_850200_C (iso 11) 3.10. no Norwegian Based on script listed as Latin Required characters ae 00e6 LATIN SMALL LETTER AE aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE o/ 00f8 LATIN SMALL LETTER O WITH STROKE AE 00c6 LATIN CAPITAL LETTER AE AA 00c5 LATIN CAPITAL LETTER A WITH RING ABOVE O/ 00d8 LATIN CAPITAL LETTER O WITH STROKE Important characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX Alvestrand Expires Feb 95 [Page 17] draft Languages and character sets Aug 95 a! 00e0 LATIN SMALL LETTER A WITH GRAVE u: 00fc LATIN SMALL LETTER U WITH DIAERESIS a< 01ce LATIN SMALL LETTER A WITH CARON e` - name not known o` - name not known E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE O> 00d4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS A< 01cd LATIN CAPITAL LETTER A WITH CARON E` - name not known O` - name not known Comments Information from Johan van Wingen and Otto Prytz.The charactes e` and o` are used in the "Nynorsk" sublanguage (information from Knut S. Vikør Character sets covering the required characters only NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) ISO_8859-9:1989 (iso 148) videotex-suppl (iso 70) iso-ir-90 (iso 90) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) NATS-DANO (iso 9) NS_4551-2 (iso 61) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) latin6 (iso 157) ANSI_X3.110-1983 (iso 99) NS_4551-1 (iso 60) 3.11. da Danish Based on script listed as Latin Required characters Alvestrand Expires Feb 95 [Page 18] draft Languages and character sets Aug 95 ae 00e6 LATIN SMALL LETTER AE aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE o/ 00f8 LATIN SMALL LETTER O WITH STROKE AE 00c6 LATIN CAPITAL LETTER AE AA 00c5 LATIN CAPITAL LETTER A WITH RING ABOVE O/ 00d8 LATIN CAPITAL LETTER O WITH STROKE Important characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE y' 00fd LATIN SMALL LETTER Y WITH ACUTE A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE Y' 00dd LATIN CAPITAL LETTER Y WITH ACUTE Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) latin6 (iso 157) ANSI_X3.110-1983 (iso 99) Character sets covering the required characters only ISO_8859-9:1989 (iso 148) ISO_8859-4:1988 (iso 110) NATS-DANO (iso 9) NS_4551-2 (iso 61) NS_4551-1 (iso 60) Alvestrand Expires Feb 95 [Page 19] draft Languages and character sets Aug 95 3.12. fo Faeroese Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE y' 00fd LATIN SMALL LETTER Y WITH ACUTE ae 00e6 LATIN SMALL LETTER AE o/ 00f8 LATIN SMALL LETTER O WITH STROKE d- 00f0 LATIN SMALL LETTER ETH (Icelandic) A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE Y' 00dd LATIN CAPITAL LETTER Y WITH ACUTE AE 00c6 LATIN CAPITAL LETTER AE O/ 00d8 LATIN CAPITAL LETTER O WITH STROKE D- 00d0 LATIN CAPITAL LETTER ETH (Icelandic) Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) ANSI_X3.110-1983 (iso 99) 3.13. is Icelandic Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE Alvestrand Expires Feb 95 [Page 20] draft Languages and character sets Aug 95 o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE y' 00fd LATIN SMALL LETTER Y WITH ACUTE o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS ae 00e6 LATIN SMALL LETTER AE d- 00f0 LATIN SMALL LETTER ETH (Icelandic) th 00fe LATIN SMALL LETTER THORN (Icelandic) A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE Y' 00dd LATIN CAPITAL LETTER Y WITH ACUTE O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS AE 00c6 LATIN CAPITAL LETTER AE D- 00d0 LATIN CAPITAL LETTER ETH (Icelandic) TH 00de LATIN CAPITAL LETTER THORN (Icelandic) Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) ANSI_X3.110-1983 (iso 99) 3.14. kl Greenlandic Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX Alvestrand Expires Feb 95 [Page 21] draft Languages and character sets Aug 95 u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX ae 00e6 LATIN SMALL LETTER AE aa 00e5 LATIN SMALL LETTER A WITH RING ABOVE o/ 00f8 LATIN SMALL LETTER O WITH STROKE a? 00e3 LATIN SMALL LETTER A WITH TILDE i? 0129 LATIN SMALL LETTER I WITH TILDE u? 0169 LATIN SMALL LETTER U WITH TILDE kk 0138 LATIN SMALL LETTER KRA (Greenlandic) A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX E> 00ca LATIN CAPITAL LETTER E WITH CIRCUMFLEX I> 00ce LATIN CAPITAL LETTER I WITH CIRCUMFLEX O> 00d4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX U> 00db LATIN CAPITAL LETTER U WITH CIRCUMFLEX AE 00c6 LATIN CAPITAL LETTER AE AA 00c5 LATIN CAPITAL LETTER A WITH RING ABOVE O/ 00d8 LATIN CAPITAL LETTER O WITH STROKE A? 00c3 LATIN CAPITAL LETTER A WITH TILDE I? 0128 LATIN CAPITAL LETTER I WITH TILDE U? 0168 LATIN CAPITAL LETTER U WITH TILDE KK - name not known This language has no known character set 3.15. ?? Gaelic Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE i! 00ec LATIN SMALL LETTER I WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE u! 00f9 LATIN SMALL LETTER U WITH GRAVE A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE Alvestrand Expires Feb 95 [Page 22] draft Languages and character sets Aug 95 E! 00c8 LATIN CAPITAL LETTER E WITH GRAVE I! 00cc LATIN CAPITAL LETTER I WITH GRAVE O! 00d2 LATIN CAPITAL LETTER O WITH GRAVE U! 00d9 LATIN CAPITAL LETTER U WITH GRAVE Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) 3.16. ga Irish Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE Character sets covering the whole NO SET (iso ) CSA_Z243.4-1985-gr (iso 123) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) CSN_369103 (iso 139) ISO_8859-9:1989 (iso 148) Alvestrand Expires Feb 95 [Page 23] draft Languages and character sets Aug 95 videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ISO_8859-2:1987 (iso 101) latin6 (iso 157) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) 3.17. cy Welsh Based on script listed as Latin Required characters w' 1e83 LATIN SMALL LETTER W WITH ACUTE y' 00fd LATIN SMALL LETTER Y WITH ACUTE a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE i! 00ec LATIN SMALL LETTER I WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE u! 00f9 LATIN SMALL LETTER U WITH GRAVE w! 1e81 LATIN SMALL LETTER W WITH GRAVE y! 1ef3 LATIN SMALL LETTER Y WITH GRAVE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX w> 0175 LATIN SMALL LETTER W WITH CIRCUMFLEX y> 0177 LATIN SMALL LETTER Y WITH CIRCUMFLEX a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS w: 1e85 LATIN SMALL LETTER W WITH DIAERESIS Alvestrand Expires Feb 95 [Page 24] draft Languages and character sets Aug 95 y: 00ff LATIN SMALL LETTER Y WITH DIAERESIS W' 1e82 LATIN CAPITAL LETTER W WITH ACUTE Y' 00dd LATIN CAPITAL LETTER Y WITH ACUTE A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE E! 00c8 LATIN CAPITAL LETTER E WITH GRAVE I! 00cc LATIN CAPITAL LETTER I WITH GRAVE O! 00d2 LATIN CAPITAL LETTER O WITH GRAVE U! 00d9 LATIN CAPITAL LETTER U WITH GRAVE W! 1e80 LATIN CAPITAL LETTER W WITH GRAVE Y! 1ef2 LATIN CAPITAL LETTER Y WITH GRAVE A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX E> 00ca LATIN CAPITAL LETTER E WITH CIRCUMFLEX I> 00ce LATIN CAPITAL LETTER I WITH CIRCUMFLEX O> 00d4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX U> 00db LATIN CAPITAL LETTER U WITH CIRCUMFLEX W> 0174 LATIN CAPITAL LETTER W WITH CIRCUMFLEX Y> 0176 LATIN CAPITAL LETTER Y WITH CIRCUMFLEX A: 00c4 LATIN CAPITAL LETTER A WITH DIAERESIS E: 00cb LATIN CAPITAL LETTER E WITH DIAERESIS I: 00cf LATIN CAPITAL LETTER I WITH DIAERESIS O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS W: 1e84 LATIN CAPITAL LETTER W WITH DIAERESIS Y: 0178 LATIN CAPITAL LETTER Y WITH DIAERESIS This language has no known character set 3.18. br Breton Based on script listed as Latin Required characters e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX u! 00f9 LATIN SMALL LETTER U WITH GRAVE u: 00fc LATIN SMALL LETTER U WITH DIAERESIS n? 00f1 LATIN SMALL LETTER N WITH TILDE E> 00ca LATIN CAPITAL LETTER E WITH CIRCUMFLEX U! 00d9 LATIN CAPITAL LETTER U WITH GRAVE Alvestrand Expires Feb 95 [Page 25] draft Languages and character sets Aug 95 U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS N? 00d1 LATIN CAPITAL LETTER N WITH TILDE Character sets covering the whole NO SET (iso ) CSA_Z243.4-1985-gr (iso 123) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) 3.19. fy Frisian Based on script listed as Latin Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX E> 00ca LATIN CAPITAL LETTER E WITH CIRCUMFLEX O> 00d4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX U> 00db LATIN CAPITAL LETTER U WITH CIRCUMFLEX A: 00c4 LATIN CAPITAL LETTER A WITH DIAERESIS E: 00cb LATIN CAPITAL LETTER E WITH DIAERESIS I: 00cf LATIN CAPITAL LETTER I WITH DIAERESIS Alvestrand Expires Feb 95 [Page 26] draft Languages and character sets Aug 95 O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) iso-ir-90 (iso 90) videotex-suppl (iso 70) T.101-G2 (iso 128) T.61-8bit (iso 103) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) 3.20. nl Dutch Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS ij 0133 LATIN SMALL LIGATURE IJ A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE A: 00c4 LATIN CAPITAL LETTER A WITH DIAERESIS E: 00cb LATIN CAPITAL LETTER E WITH DIAERESIS I: 00cf LATIN CAPITAL LETTER I WITH DIAERESIS O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS IJ 0132 LATIN CAPITAL LIGATURE IJ Alvestrand Expires Feb 95 [Page 27] draft Languages and character sets Aug 95 Character sets covering the whole JIS_X0212-1990 (iso 159) NO SET (iso ) ISO_6937-2-add (iso 142) ANSI_X3.110-1983 (iso 99) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) 3.21. af Afrikaans Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE e! 00e8 LATIN SMALL LETTER E WITH GRAVE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE E! 00c8 LATIN CAPITAL LETTER E WITH GRAVE A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX E> 00ca LATIN CAPITAL LETTER E WITH CIRCUMFLEX I> 00ce LATIN CAPITAL LETTER I WITH CIRCUMFLEX O> 00d4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX U> 00db LATIN CAPITAL LETTER U WITH CIRCUMFLEX E: 00cb LATIN CAPITAL LETTER E WITH DIAERESIS I: 00cf LATIN CAPITAL LETTER I WITH DIAERESIS O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS This language has no known character set Alvestrand Expires Feb 95 [Page 28] draft Languages and character sets Aug 95 3.22. de German Based on script listed as Latin Required characters a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS ss 00df LATIN SMALL LETTER SHARP S (German) A: 00c4 LATIN CAPITAL LETTER A WITH DIAERESIS O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS SS 0098 START OF STRING (SOS) Important characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE Comments The "ss" character exists only in lower case; the upper case equivalentis "SS" (2 letters). Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) ISO_8859-9:1989 (iso 148) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) Character sets covering the required characters only CSN_369103 (iso 139) Alvestrand Expires Feb 95 [Page 29] draft Languages and character sets Aug 95 ISO_8859-4:1988 (iso 110) ISO_8859-2:1987 (iso 101) latin6 (iso 157) 3.23. fr French Based on script listed as Latin Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE e! 00e8 LATIN SMALL LETTER E WITH GRAVE u! 00f9 LATIN SMALL LETTER U WITH GRAVE c, 00e7 LATIN SMALL LETTER C WITH CEDILLA a! 00e0 LATIN SMALL LETTER A WITH GRAVE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE E! 00c8 LATIN CAPITAL LETTER E WITH GRAVE U! 00d9 LATIN CAPITAL LETTER U WITH GRAVE C, 00c7 LATIN CAPITAL LETTER C WITH CEDILLA A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE Important characters a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX ae 00e6 LATIN SMALL LETTER AE oe 0153 LATIN SMALL LIGATURE OE e: 00eb LATIN SMALL LETTER E WITH DIAERESIS i: 00ef LATIN SMALL LETTER I WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS y: 00ff LATIN SMALL LETTER Y WITH DIAERESIS A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX E> 00ca LATIN CAPITAL LETTER E WITH CIRCUMFLEX I> 00ce LATIN CAPITAL LETTER I WITH CIRCUMFLEX O> 00d4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX U> 00db LATIN CAPITAL LETTER U WITH CIRCUMFLEX AE 00c6 LATIN CAPITAL LETTER AE OE 0152 LATIN CAPITAL LIGATURE OE E: 00cb LATIN CAPITAL LETTER E WITH DIAERESIS I: 00cf LATIN CAPITAL LETTER I WITH DIAERESIS Alvestrand Expires Feb 95 [Page 30] draft Languages and character sets Aug 95 U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS Y: 0178 LATIN CAPITAL LETTER Y WITH DIAERESIS Comments ae and y: are very uncommon in current French; there have been argumentsthat all of the others should be "required". Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) iso-ir-90 (iso 90) videotex-suppl (iso 70) T.101-G2 (iso 128) T.61-8bit (iso 103) ANSI_X3.110-1983 (iso 99) Character sets covering the required characters only CSA_Z243.4-1985-gr (iso 123) ISO_8859-9:1989 (iso 148) ISO_8859-1:1987 (iso 100) JIS_X0212-1990 (iso 159) ISO_8859-3:1988 (iso 109) 3.24. ca Catalan Based on script listed as Latin Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE i: 00ef LATIN SMALL LETTER I WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS l. 0140 LATIN SMALL LETTER L WITH MIDDLE DOT Alvestrand Expires Feb 95 [Page 31] draft Languages and character sets Aug 95 c, 00e7 LATIN SMALL LETTER C WITH CEDILLA E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE E! 00c8 LATIN CAPITAL LETTER E WITH GRAVE O! 00d2 LATIN CAPITAL LETTER O WITH GRAVE I: 00cf LATIN CAPITAL LETTER I WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS L. 013f LATIN CAPITAL LETTER L WITH MIDDLE DOT C, 00c7 LATIN CAPITAL LETTER C WITH CEDILLA Important characters n? 00f1 LATIN SMALL LETTER N WITH TILDE N? 00d1 LATIN CAPITAL LETTER N WITH TILDE Comments Information from van Wingen and Otto Prytz. Character sets covering the whole JIS_X0212-1990 (iso 159) NO SET (iso ) ISO_6937-2-add (iso 142) iso-ir-90 (iso 90) videotex-suppl (iso 70) ANSI_X3.110-1983 (iso 99) T.101-G2 (iso 128) T.61-8bit (iso 103) 3.25. es Spanish Based on script listed as Latin Required characters n? 00f1 LATIN SMALL LETTER N WITH TILDE !I 00a1 INVERTED EXCLAMATION MARK ?I 00bf INVERTED QUESTION MARK N? 00d1 LATIN CAPITAL LETTER N WITH TILDE Alvestrand Expires Feb 95 [Page 32] draft Languages and character sets Aug 95 !I 00a1 INVERTED EXCLAMATION MARK ?I 00bf INVERTED QUESTION MARK Important characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE u: 00fc LATIN SMALL LETTER U WITH DIAERESIS c, 00e7 LATIN SMALL LETTER C WITH CEDILLA -a 00aa FEMININE ORDINAL INDICATOR -o 00ba MASCULINE ORDINAL INDICATOR A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS C, 00c7 LATIN CAPITAL LETTER C WITH CEDILLA -A - name not known -O - name not known Comments Note that this language also uses special punctuation marks.The accented vowels may be mandatory; Spanish speakers who think they should be are encouraged to speak up. Information from Otto Prytz Character sets covering the required characters only NO SET (iso ) ES (iso 17) CSA_Z243.4-1985-gr (iso 123) ISO_6937-2-add (iso 142) ES2 (iso 85) ISO_8859-1:1987 (iso 100) ISO_8859-9:1989 (iso 148) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) Alvestrand Expires Feb 95 [Page 33] draft Languages and character sets Aug 95 NC_NC00-10:81 (iso 151) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) 3.26. gl Galician Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE u: 00fc LATIN SMALL LETTER U WITH DIAERESIS n? 00f1 LATIN SMALL LETTER N WITH TILDE A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS N? 00d1 LATIN CAPITAL LETTER N WITH TILDE Character sets covering the whole NO SET (iso ) CSA_Z243.4-1985-gr (iso 123) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) ISO_8859-9:1989 (iso 148) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) Alvestrand Expires Feb 95 [Page 34] draft Languages and character sets Aug 95 3.27. pt Portuguese Based on script listed as Latin Required characters a? 00e3 LATIN SMALL LETTER A WITH TILDE o? 00f5 LATIN SMALL LETTER O WITH TILDE c, 00e7 LATIN SMALL LETTER C WITH CEDILLA A? 00c3 LATIN CAPITAL LETTER A WITH TILDE O? 00d5 LATIN CAPITAL LETTER O WITH TILDE C, 00c7 LATIN CAPITAL LETTER C WITH CEDILLA Important characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX u: 00fc LATIN SMALL LETTER U WITH DIAERESIS o! 00f2 LATIN SMALL LETTER O WITH GRAVE A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX E> 00ca LATIN CAPITAL LETTER E WITH CIRCUMFLEX O> 00d4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS O! 00d2 LATIN CAPITAL LETTER O WITH GRAVE Comments Information from van Wingen and Otto Prytz Character sets covering the whole Alvestrand Expires Feb 95 [Page 35] draft Languages and character sets Aug 95 NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) Character sets covering the required characters only ISO_8859-9:1989 (iso 148) PT (iso 16) PT2 (iso 84) 3.28. eu Basque Based on script listed as Latin Required characters n? 00f1 LATIN SMALL LETTER N WITH TILDE c, 00e7 LATIN SMALL LETTER C WITH CEDILLA N? 00d1 LATIN CAPITAL LETTER N WITH TILDE C, 00c7 LATIN CAPITAL LETTER C WITH CEDILLA Character sets covering the whole NO SET (iso ) CSA_Z243.4-1985-gr (iso 123) ISO_6937-2-add (iso 142) ES2 (iso 85) ISO_8859-1:1987 (iso 100) ISO_8859-9:1989 (iso 148) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) Alvestrand Expires Feb 95 [Page 36] draft Languages and character sets Aug 95 3.29. mt Maltese Based on script listed as Latin Required characters a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE i! 00ec LATIN SMALL LETTER I WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE u! 00f9 LATIN SMALL LETTER U WITH GRAVE i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX c. 010b LATIN SMALL LETTER C WITH DOT ABOVE g. 0121 LATIN SMALL LETTER G WITH DOT ABOVE h/ 0127 LATIN SMALL LETTER H WITH STROKE z. 017c LATIN SMALL LETTER Z WITH DOT ABOVE A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE E! 00c8 LATIN CAPITAL LETTER E WITH GRAVE I! 00cc LATIN CAPITAL LETTER I WITH GRAVE O! 00d2 LATIN CAPITAL LETTER O WITH GRAVE U! 00d9 LATIN CAPITAL LETTER U WITH GRAVE I> 00ce LATIN CAPITAL LETTER I WITH CIRCUMFLEX C. 010a LATIN CAPITAL LETTER C WITH DOT ABOVE G. 0120 LATIN CAPITAL LETTER G WITH DOT ABOVE H/ 0126 LATIN CAPITAL LETTER H WITH STROKE Z. 017b LATIN CAPITAL LETTER Z WITH DOT ABOVE Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) iso-ir-90 (iso 90) videotex-suppl (iso 70) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) 3.30. it Italian Based on script listed as Latin Alvestrand Expires Feb 95 [Page 37] draft Languages and character sets Aug 95 Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE i! 00ec LATIN SMALL LETTER I WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE E! 00c8 LATIN CAPITAL LETTER E WITH GRAVE I! 00cc LATIN CAPITAL LETTER I WITH GRAVE O! 00d2 LATIN CAPITAL LETTER O WITH GRAVE Important characters i' 00ed LATIN SMALL LETTER I WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE u! 00f9 LATIN SMALL LETTER U WITH GRAVE i: 00ef LATIN SMALL LETTER I WITH DIAERESIS I' 00cd LATIN CAPITAL LETTER I WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE U! 00d9 LATIN CAPITAL LETTER U WITH GRAVE I: 00cf LATIN CAPITAL LETTER I WITH DIAERESIS Comments The accented characters appear only in the lower case variant inthe Italian version of ISO 646 (ISO-IR-15). Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) iso-ir-90 (iso 90) videotex-suppl (iso 70) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) Alvestrand Expires Feb 95 [Page 38] draft Languages and character sets Aug 95 3.31. rm Rhaeto-Romance Based on script listed as Latin Required characters e' 00e9 LATIN SMALL LETTER E WITH ACUTE a! 00e0 LATIN SMALL LETTER A WITH GRAVE e! 00e8 LATIN SMALL LETTER E WITH GRAVE o! 00f2 LATIN SMALL LETTER O WITH GRAVE a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX e> 00ea LATIN SMALL LETTER E WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE A! 00c0 LATIN CAPITAL LETTER A WITH GRAVE E! 00c8 LATIN CAPITAL LETTER E WITH GRAVE O! 00d2 LATIN CAPITAL LETTER O WITH GRAVE A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX E> 00ca LATIN CAPITAL LETTER E WITH CIRCUMFLEX I> 00ce LATIN CAPITAL LETTER I WITH CIRCUMFLEX O> 00d4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS Comments In van Wingen's table, this appeared as "Rhaetian". Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) iso-ir-90 (iso 90) videotex-suppl (iso 70) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) Alvestrand Expires Feb 95 [Page 39] draft Languages and character sets Aug 95 3.32. ro Romanian Based on script listed as Latin Required characters a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX a( 0103 LATIN SMALL LETTER A WITH BREVE s, 015f LATIN SMALL LETTER S WITH CEDILLA t, 0163 LATIN SMALL LETTER T WITH CEDILLA A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX I> 00ce LATIN CAPITAL LETTER I WITH CIRCUMFLEX A( 0102 LATIN CAPITAL LETTER A WITH BREVE S, 015e LATIN CAPITAL LETTER S WITH CEDILLA T, 0162 LATIN CAPITAL LETTER T WITH CEDILLA Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) CSN_369103 (iso 139) iso-ir-90 (iso 90) videotex-suppl (iso 70) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ISO_8859-2:1987 (iso 101) ANSI_X3.110-1983 (iso 99) 3.33. hu Hungarian Based on script listed as Latin Required characters a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS Alvestrand Expires Feb 95 [Page 40] draft Languages and character sets Aug 95 o" 0151 LATIN SMALL LETTER O WITH DOUBLE ACUTE u" 0171 LATIN SMALL LETTER U WITH DOUBLE ACUTE A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS O" 0150 LATIN CAPITAL LETTER O WITH DOUBLE ACUTE U" 0170 LATIN CAPITAL LETTER U WITH DOUBLE ACUTE Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) CSN_369103 (iso 139) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ISO_8859-2:1987 (iso 101) ANSI_X3.110-1983 (iso 99) 3.34. sq Albanian Based on script listed as Latin Required characters e: 00eb LATIN SMALL LETTER E WITH DIAERESIS c, 00e7 LATIN SMALL LETTER C WITH CEDILLA E: 00cb LATIN CAPITAL LETTER E WITH DIAERESIS C, 00c7 LATIN CAPITAL LETTER C WITH CEDILLA Character sets covering the whole NO SET (iso ) CSA_Z243.4-1985-gr (iso 123) ISO_6937-2-add (iso 142) ISO_8859-1:1987 (iso 100) CSN_369103 (iso 139) Alvestrand Expires Feb 95 [Page 41] draft Languages and character sets Aug 95 ISO_8859-9:1989 (iso 148) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ISO_8859-2:1987 (iso 101) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) 3.35. tr Turkish Based on script listed as Latin Required characters a> 00e2 LATIN SMALL LETTER A WITH CIRCUMFLEX i> 00ee LATIN SMALL LETTER I WITH CIRCUMFLEX u> 00fb LATIN SMALL LETTER U WITH CIRCUMFLEX o: 00f6 LATIN SMALL LETTER O WITH DIAERESIS u: 00fc LATIN SMALL LETTER U WITH DIAERESIS i. 0131 LATIN SMALL LETTER I WITH NO DOT c, 00e7 LATIN SMALL LETTER C WITH CEDILLA s, 015f LATIN SMALL LETTER S WITH CEDILLA g( 011f LATIN SMALL LETTER G WITH BREVE A> 00c2 LATIN CAPITAL LETTER A WITH CIRCUMFLEX I> 00ce LATIN CAPITAL LETTER I WITH CIRCUMFLEX U> 00db LATIN CAPITAL LETTER U WITH CIRCUMFLEX O: 00d6 LATIN CAPITAL LETTER O WITH DIAERESIS U: 00dc LATIN CAPITAL LETTER U WITH DIAERESIS I. 0130 LATIN CAPITAL LETTER I WITH DOT ABOVE C, 00c7 LATIN CAPITAL LETTER C WITH CEDILLA S, 015e LATIN CAPITAL LETTER S WITH CEDILLA G( 011e LATIN CAPITAL LETTER G WITH BREVE Comments The dotless i is converted to a normal (dotless) I in uppercase; thedotted (normal) lowercase I is converted into a dotted uppercase I. Character sets covering the whole Alvestrand Expires Feb 95 [Page 42] draft Languages and character sets Aug 95 NO SET (iso ) ISO_6937-2-add (iso 142) ISO_8859-9:1989 (iso 148) iso-ir-90 (iso 90) videotex-suppl (iso 70) T.101-G2 (iso 128) T.61-8bit (iso 103) ANSI_X3.110-1983 (iso 99) ISO_8859-3:1988 (iso 109) 3.36. hr Croatian Based on script listed as Latin Required characters c' 0107 LATIN SMALL LETTER C WITH ACUTE d/ 0111 LATIN SMALL LETTER D WITH STROKE c< 010d LATIN SMALL LETTER C WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON C' 0106 LATIN CAPITAL LETTER C WITH ACUTE D/ 0110 LATIN CAPITAL LETTER D WITH STROKE C< 010c LATIN CAPITAL LETTER C WITH CARON S< 0160 LATIN CAPITAL LETTER S WITH CARON Z< 017d LATIN CAPITAL LETTER Z WITH CARON Character sets covering the whole JIS_X0212-1990 (iso 159) NO SET (iso ) ISO_8859-2:1987 (iso 101) JUS_I.B1.002 (iso 141) CSN_369103 (iso 139) 3.37. sl Slovenian Based on script listed as Latin Required characters c< 010d LATIN SMALL LETTER C WITH CARON Alvestrand Expires Feb 95 [Page 43] draft Languages and character sets Aug 95 s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON C< 010c LATIN CAPITAL LETTER C WITH CARON S< 0160 LATIN CAPITAL LETTER S WITH CARON Z< 017d LATIN CAPITAL LETTER Z WITH CARON Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) CSN_369103 (iso 139) videotex-suppl (iso 70) iso-ir-90 (iso 90) ISO_8859-4:1988 (iso 110) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ISO_8859-2:1987 (iso 101) JUS_I.B1.002 (iso 141) latin6 (iso 157) ANSI_X3.110-1983 (iso 99) 3.38. sk Slovak Based on script listed as Latin Required characters y' 00fd LATIN SMALL LETTER Y WITH ACUTE a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE u' 00fa LATIN SMALL LETTER U WITH ACUTE a: 00e4 LATIN SMALL LETTER A WITH DIAERESIS o> 00f4 LATIN SMALL LETTER O WITH CIRCUMFLEX l' 013a LATIN SMALL LETTER L WITH ACUTE r' 0155 LATIN SMALL LETTER R WITH ACUTE c< 010d LATIN SMALL LETTER C WITH CARON d< 010f LATIN SMALL LETTER D WITH CARON l< 013e LATIN SMALL LETTER L WITH CARON n< 0148 LATIN SMALL LETTER N WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON Alvestrand Expires Feb 95 [Page 44] draft Languages and character sets Aug 95 t< 0165 LATIN SMALL LETTER T WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Y' 00dd LATIN CAPITAL LETTER Y WITH ACUTE A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE A: 00c4 LATIN CAPITAL LETTER A WITH DIAERESIS O> 00d4 LATIN CAPITAL LETTER O WITH CIRCUMFLEX L' 0139 LATIN CAPITAL LETTER L WITH ACUTE R' 0154 LATIN CAPITAL LETTER R WITH ACUTE C< 010c LATIN CAPITAL LETTER C WITH CARON D< 010e LATIN CAPITAL LETTER D WITH CARON L< 013d LATIN CAPITAL LETTER L WITH CARON N< 0147 LATIN CAPITAL LETTER N WITH CARON S< 0160 LATIN CAPITAL LETTER S WITH CARON T< 0164 LATIN CAPITAL LETTER T WITH CARON Z< 017d LATIN CAPITAL LETTER Z WITH CARON Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) CSN_369103 (iso 139) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) ISO_8859-2:1987 (iso 101) ANSI_X3.110-1983 (iso 99) 3.39. cs Czech Based on script listed as Latin Required characters y' 00fd LATIN SMALL LETTER Y WITH ACUTE a' 00e1 LATIN SMALL LETTER A WITH ACUTE e' 00e9 LATIN SMALL LETTER E WITH ACUTE i' 00ed LATIN SMALL LETTER I WITH ACUTE o' 00f3 LATIN SMALL LETTER O WITH ACUTE Alvestrand Expires Feb 95 [Page 45] draft Languages and character sets Aug 95 u' 00fa LATIN SMALL LETTER U WITH ACUTE e< 011b LATIN SMALL LETTER E WITH CARON u0 016f LATIN SMALL LETTER U WITH RING ABOVE c< 010d LATIN SMALL LETTER C WITH CARON d< 010f LATIN SMALL LETTER D WITH CARON n< 0148 LATIN SMALL LETTER N WITH CARON r< 0159 LATIN SMALL LETTER R WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON t< 0165 LATIN SMALL LETTER T WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON Y' 00dd LATIN CAPITAL LETTER Y WITH ACUTE A' 00c1 LATIN CAPITAL LETTER A WITH ACUTE E' 00c9 LATIN CAPITAL LETTER E WITH ACUTE I' 00cd LATIN CAPITAL LETTER I WITH ACUTE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE U' 00da LATIN CAPITAL LETTER U WITH ACUTE E< 011a LATIN CAPITAL LETTER E WITH CARON U0 016e LATIN CAPITAL LETTER U WITH RING ABOVE C< 010c LATIN CAPITAL LETTER C WITH CARON D< 010e LATIN CAPITAL LETTER D WITH CARON N< 0147 LATIN CAPITAL LETTER N WITH CARON R< 0158 LATIN CAPITAL LETTER R WITH CARON S< 0160 LATIN CAPITAL LETTER S WITH CARON T< 0164 LATIN CAPITAL LETTER T WITH CARON Z< 017d LATIN CAPITAL LETTER Z WITH CARON Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) CSN_369103 (iso 139) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) ISO_8859-2:1987 (iso 101) ANSI_X3.110-1983 (iso 99) 3.40. pl Polish Based on script listed as Latin Required characters Alvestrand Expires Feb 95 [Page 46] draft Languages and character sets Aug 95 o' 00f3 LATIN SMALL LETTER O WITH ACUTE a; 0105 LATIN SMALL LETTER A WITH OGONEK e; 0119 LATIN SMALL LETTER E WITH OGONEK c' 0107 LATIN SMALL LETTER C WITH ACUTE n' 0144 LATIN SMALL LETTER N WITH ACUTE s' 015b LATIN SMALL LETTER S WITH ACUTE z' 017a LATIN SMALL LETTER Z WITH ACUTE l/ 0142 LATIN SMALL LETTER L WITH STROKE z. 017c LATIN SMALL LETTER Z WITH DOT ABOVE O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE A; 0104 LATIN CAPITAL LETTER A WITH OGONEK E; 0118 LATIN CAPITAL LETTER E WITH OGONEK C' 0106 LATIN CAPITAL LETTER C WITH ACUTE N' 0143 LATIN CAPITAL LETTER N WITH ACUTE S' 015a LATIN CAPITAL LETTER S WITH ACUTE Z' 0179 LATIN CAPITAL LETTER Z WITH ACUTE L/ 0141 LATIN CAPITAL LETTER L WITH STROKE Z. 017b LATIN CAPITAL LETTER Z WITH DOT ABOVE Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) CSN_369103 (iso 139) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ISO_8859-2:1987 (iso 101) ANSI_X3.110-1983 (iso 99) 3.41. ?? Sorbian Based on script listed as Latin Required characters o' 00f3 LATIN SMALL LETTER O WITH ACUTE e< 011b LATIN SMALL LETTER E WITH CARON c' 0107 LATIN SMALL LETTER C WITH ACUTE n' 0144 LATIN SMALL LETTER N WITH ACUTE s' 015b LATIN SMALL LETTER S WITH ACUTE Alvestrand Expires Feb 95 [Page 47] draft Languages and character sets Aug 95 z' 017a LATIN SMALL LETTER Z WITH ACUTE l/ 0142 LATIN SMALL LETTER L WITH STROKE c< 010d LATIN SMALL LETTER C WITH CARON r< 0159 LATIN SMALL LETTER R WITH CARON s< 0161 LATIN SMALL LETTER S WITH CARON z< 017e LATIN SMALL LETTER Z WITH CARON O' 00d3 LATIN CAPITAL LETTER O WITH ACUTE E< 011a LATIN CAPITAL LETTER E WITH CARON C' 0106 LATIN CAPITAL LETTER C WITH ACUTE N' 0143 LATIN CAPITAL LETTER N WITH ACUTE S' 015a LATIN CAPITAL LETTER S WITH ACUTE Z' 0179 LATIN CAPITAL LETTER Z WITH ACUTE L/ 0141 LATIN CAPITAL LETTER L WITH STROKE C< 010c LATIN CAPITAL LETTER C WITH CARON R< 0158 LATIN CAPITAL LETTER R WITH CARON S< 0160 LATIN CAPITAL LETTER S WITH CARON Z< 017d LATIN CAPITAL LETTER Z WITH CARON Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) CSN_369103 (iso 139) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) ISO_8859-2:1987 (iso 101) ANSI_X3.110-1983 (iso 99) 3.42. eo Esperanto Based on script listed as Latin Required characters u( 016d LATIN SMALL LETTER U WITH BREVE c> 0109 LATIN SMALL LETTER C WITH CIRCUMFLEX g> 011d LATIN SMALL LETTER G WITH CIRCUMFLEX h> 0125 LATIN SMALL LETTER H WITH CIRCUMFLEX j> 0135 LATIN SMALL LETTER J WITH CIRCUMFLEX s> 015d LATIN SMALL LETTER S WITH CIRCUMFLEX U( 016c LATIN CAPITAL LETTER U WITH BREVE Alvestrand Expires Feb 95 [Page 48] draft Languages and character sets Aug 95 C> 0108 LATIN CAPITAL LETTER C WITH CIRCUMFLEX G> 011c LATIN CAPITAL LETTER G WITH CIRCUMFLEX H> 0124 LATIN CAPITAL LETTER H WITH CIRCUMFLEX J> 0134 LATIN CAPITAL LETTER J WITH CIRCUMFLEX S> 015c LATIN CAPITAL LETTER S WITH CIRCUMFLEX Character sets covering the whole NO SET (iso ) ISO_6937-2-add (iso 142) videotex-suppl (iso 70) iso-ir-90 (iso 90) T.101-G2 (iso 128) T.61-8bit (iso 103) JIS_X0212-1990 (iso 159) ANSI_X3.110-1983 (iso 99) ISO_8859-supp (iso 154) ISO_8859-3:1988 (iso 109) 3.43. ?? serbian Based on script listed as Cyrillic Required characters j% 0458 CYRILLIC SMALL LETTER JE J% 0408 CYRILLIC CAPITAL LETTER JE nj 045a CYRILLIC SMALL LETTER NJE NJ 040a CYRILLIC CAPITAL LETTER NJE dz 045f CYRILLIC SMALL LETTER DZHE DZ 040f CYRILLIC CAPITAL LETTER DZHE lj 0459 CYRILLIC SMALL LETTER LJE LJ 0409 CYRILLIC CAPITAL LETTER LJE ts 045b CYRILLIC SMALL LETTER TSHE (Serbocroatian) Ts 040b CYRILLIC CAPITAL LETTER TSHE (Serbocroatian) d% 0452 CYRILLIC SMALL LETTER DJE (Serbocroatian) D% 0402 CYRILLIC CAPITAL LETTER DJE (Serbocroatian) Character sets covering the whole NO SET (iso ) JIS_X0212-1990 (iso 159) ISO_5427:1981 (iso 54) Alvestrand Expires Feb 95 [Page 49] draft Languages and character sets Aug 95 ECMA-cyrillic (iso 111) JUS_I.B1.003-serb (iso 146) ISO_8859-5:1988 (iso 144) 3.44. mk Macedonian Based on script listed as Cyrillic Required characters j% 0458 CYRILLIC SMALL LETTER JE J% 0408 CYRILLIC CAPITAL LETTER JE nj 045a CYRILLIC SMALL LETTER NJE NJ 040a CYRILLIC CAPITAL LETTER NJE dz 045f CYRILLIC SMALL LETTER DZHE DZ 040f CYRILLIC CAPITAL LETTER DZHE lj 0459 CYRILLIC SMALL LETTER LJE LJ 0409 CYRILLIC CAPITAL LETTER LJE g% 0453 CYRILLIC SMALL LETTER GJE (Macedonian) G% 0403 CYRILLIC CAPITAL LETTER GJE (Macedonian) kj 045c CYRILLIC SMALL LETTER KJE (Macedonian) KJ 040c CYRILLIC CAPITAL LETTER KJE (Macedonian) ds 0455 CYRILLIC SMALL LETTER DZE (Macedonian) DS 0405 CYRILLIC CAPITAL LETTER DZE (Macedonian) Character sets covering the whole NO SET (iso ) JIS_X0212-1990 (iso 159) ISO_5427:1981 (iso 54) ECMA-cyrillic (iso 111) JUS_I.B1.003-mac (iso 147) ISO_8859-5:1988 (iso 144) 3.45. bg Bulgarian Based on script listed as Cyrillic Required characters j= 0439 CYRILLIC SMALL LETTER SHORT I J= 0419 CYRILLIC CAPITAL LETTER SHORT I Alvestrand Expires Feb 95 [Page 50] draft Languages and character sets Aug 95 %' 044c CYRILLIC SMALL SOFT SIGN %" 042c CYRILLIC CAPITAL SOFT SIGN sc 0449 CYRILLIC SMALL LETTER SHCHA Sc 0429 CYRILLIC CAPITAL LETTER SHCHA ju 044e CYRILLIC SMALL LETTER YU JU 042e CYRILLIC CAPITAL LETTER YU ja 044f CYRILLIC SMALL LETTER YA JA 042f CYRILLIC CAPITAL LETTER YA =' 044a CYRILLIC SMALL HARD SIGN =" 042a CYRILLIC CAPITAL HARD SIGN Character sets covering the whole NO SET (iso ) JIS_C6226-1983 (iso 87) JIS_C6226-1978 (iso 42) GOST_19768-74 (iso 153) GB_2312-80 (iso 58) KS_C_5601-1987 (iso 149) ECMA-cyrillic (iso 111) INIS-cyrillic (iso 51) ISO_8859-5:1988 (iso 144) 3.46. ru Russian Based on script listed as Cyrillic Required characters y= 044b CYRILLIC SMALL LETTER YERU Y= 042b CYRILLIC CAPITAL LETTER YERU je 044d CYRILLIC SMALL LETTER E JE 042d CYRILLIC CAPITAL LETTER E io 0451 CYRILLIC SMALL LETTER IO IO 0401 CYRILLIC CAPITAL LETTER IO Character sets covering the whole NO SET (iso ) JIS_C6226-1983 (iso 87) JIS_C6226-1978 (iso 42) GOST_19768-74 (iso 153) GB_2312-80 (iso 58) Alvestrand Expires Feb 95 [Page 51] draft Languages and character sets Aug 95 KS_C_5601-1987 (iso 149) ECMA-cyrillic (iso 111) ISO_8859-5:1988 (iso 144) 3.47. be Byelorussian Based on script listed as Cyrillic Required characters ii 0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I II 0406 CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I v% 045e CYRILLIC SMALL LETTER SHORT U (Byelorussian) V% 040e CYRILLIC CAPITAL LETTER SHORT U (Byelorussian) Character sets covering the whole NO SET (iso ) JIS_X0212-1990 (iso 159) ISO_5427:1981 (iso 54) ECMA-cyrillic (iso 111) ISO_8859-5:1988 (iso 144) 3.48. uk Ukrainian Based on script listed as Cyrillic Required characters ii 0456 CYRILLIC SMALL LETTER BYELORUSSIAN-UKRAINIAN I II 0406 CYRILLIC CAPITAL LETTER BYELORUSSIAN-UKRAINIAN I ie 0454 CYRILLIC SMALL LETTER UKRAINIAN IE IE 0404 CYRILLIC CAPITAL LETTER UKRAINIAN IE yi 0457 CYRILLIC SMALL LETTER YI (Ukrainian) YI 0407 CYRILLIC CAPITAL LETTER YI (Ukrainian) g3 0491 CYRILLIC SMALL LETTER GE WITH UPTURN G3 0490 CYRILLIC CAPITAL LETTER GE WITH UPTURN Character sets covering the whole NO SET (iso ) ISO_5427:1981 (iso 54) Alvestrand Expires Feb 95 [Page 52] draft Languages and character sets Aug 95 4. Other languages with appropriate character sets Other languages for which appropriate character sets are known are listed in the table below. Language Character set ar Arabic ISO-8859-6 be Byelorussian ISO-8859-5 bg Bulgarian ISO-8859-5 el Greek ISO-8859-7 en English USASCII fa Persian ISO-8859-6 iw Hebrew ISO-8859-8 ja Japanese ISO-IR-87 (Japanese JIS C6226-1983) ko Korean ISO-IR-149 (Korean KS C 5601-1989) la Latin USASCII lo Laotian ISO-IR-166 ru Russian ISO-8859-5 sw Swahili USASCII th Thai ISO-IR-166 uk Ukrainian ISO-8859-5 ur Urdu ISO-8859-6 vo Volapuk ISO-8859-1 zh Chinese ISO-IR-58 (Chinese GB 2312-80) Additional entries in this table are welcome! 4.1. ISO 10646 only languages The following languages can (to the author's limited knowledge) be written with the current ISO 10646 standard, but with no other registered character sets: Language Country(ies) Script(s) aa Afar Somalia, Ethiopia, Djibouti Latin ab Abkhazian Georgia Cyrillic am Amharic Ethiopia Ethiopic as Assamese India, Nepal Bengali ay Aymara Bolivia, Peru, Chile Latin az Azerbaijani SNC, Iran, Iraq, Turkey Cyrillic, Arabic ba Bashkir SNC Cyrillic Alvestrand Expires Feb 95 [Page 53] draft Languages and character sets Aug 95 bh Bihari India Gujarati (or Kaithi) bi Bislama Vanuatu, New Caledonia Latin bn Bengali India Bengali co Corsican France Latin fj Fiji Fiji Latin gd Scots UK Latin gn Guarani Paraguay Latin gu Gujarati India Gujarati ha Hausa Nigeria, Niger, Chad, Sudan,... Latin hi Hindi India Devanagari hy Armenian Armenia Armenian ia Interlingua None (Artificial Language) Latin ie Interlingue None (Artificial Language) Latin ik Inupiak USA, Cannada Latin, Cree in Indonesian Indonesia Latin ji Yiddish Germany, USA, SNC, Israel Hebrew jw Javanese Indonesia, Malaysia Latin, Javanese ka Georgian Georgia Georgian kk Kazakh SNC, Afghanistan Cyrillic, Arabic km Cambodian Cambodia Khmer kn Kannada India Kannada ks Kashmiri India, Pakistan Arabic ku Kurdish SNC, Turkey, Iraq, Iran Cyrillic, Arabic ky Kirghiz SNC, China, Afghanistan Cyrillic, Arabic ln Lingala CAR, Congo, Zaire Latin mg Malagasy Madagascar, Comoro Islands Latin, Arabic mi Maori New Zealand Latin mk Macedonian Greece, Yugoslavia Greek, Cyrillic ml Malayalam India Malayalam mn Mongolian Mongolia Cyrillic, Mongolian mo Moldavian Romania Latin mr Marathi India Devanagari ms Malay Malaysia, Thailand Latin my Burmese Myanmar Burmese na Nauru Nauru Latin ne Nepali Nepal Devanagari oc Occitan France Latin or Oriya India Oriya pa Punjabi India Gurmukhi ps Pashto (Western) Afghanistan, Iran Arabic qu Quechua Peru Latin rm Rhaeto Swizerland Latin rn Kirundi Burundi, Uganda Latin rw Kinyarwanda Rwanda, Uganda, Zaire Latin Alvestrand Expires Feb 95 [Page 54] draft Languages and character sets Aug 95 sa Sanskrit India Devanagari sd Sindhi Pakistan, India, Afghanistan Arabic, Gurmukhi sg Sangro Central African Republic Latin si Singhalese Sri Lanka Sinhalese sm Samoan Samoa, USA, New Zealand Latin sn Shona Zimbabwe, Zambia, Mozambique Latin so Somali Somalia, Ethiopia, Djibouti Latin sr Serbian former Yugoslavia Cyrillic ss Siswati S. Africa, Swaziland Latin st Sesotho S. Africa, Lesotho Latin su Sudanese Sudan Latin ta Tamil India, Malaysia Tamil te Tegulu India Telugu tg Tajik Tajikistan Arabic ti Tigrinya Ethiopia Latin, Ethiopic tk Turkmen SNC, Iran, Afghanistan Cyrillic, Arabic tl Tagalog Phillipines Latin tn Setswana S. Africa, Botswana, Namibia Latin to Tonga (3) Mozambique Latin ts Tsonga Mozambique, Swaziland Latin tt Tatar SNC Cyrillic tw Twi (Ewe) Ghana Latin uz Uzbek (Southern) Afghanistan, Turkey Arabic vi Vietnamese Vietnam, Cambodia, China Latin wo Wolof Senegal, Mauritania Latin xh Xhosa S. Africa Latin yo Yoruba Nigeria, Togo, Benin Latin zu Zulu S. Africa, Lesotho, Malawi Latin The information about languages in ISO 10646 was kindly supplied by Glenn Adams Languages for which the author does NOT know any proper character set include: bo Tibetan br Breton cs Czech da Danish de German dz Bhutani eo Esperanto Alvestrand Expires Feb 95 [Page 55] draft Languages and character sets Aug 95 es Spanish et Estonian eu Basque fi Finnish fo Faeroese fy Frisian ga Irish gl Galician hr Croatian hu Hungarian is Icelandic it Italian lt Lithuanian lv Latvian, Lettish mt Maltese no Norwegian pl Polish pt Portuguese ro Romanian sh Serbo-Croatian sk Slovak sl Slovenian sq Albanian sv Swedish tr Turkish 5. Encoded format of charset data This section contains, in a very compact format, all the information used to make the technical content of this RFC, apart from the content of ISO 639 and RFC 1345. It would be helpful if new information was also supplied in this format. # A list of languages and their required/optional characters. # Format: # language: Name # Base: family # Required: characters # Important: characters Alvestrand Expires Feb 95 [Page 56] draft Languages and character sets Aug 95 # Comments: (several lines) # A blank line separates descriptions # # LANGUAGES USED AS BASIS FOR OTHER LANGUAGES #============================================ Language: Latin Required: a b c d e f g h i j k l m n o p q r s t u v w x y z A B C D E F G H I J K L M N O P Q R S T U V W X Y Z Language: Cyrillic Required: a= A= b= B= v= V= g= G= d= D= e= E= z% Z% z= Z= i= I= k= K= l= L= m= M= n= N= o= O= p= P= r= R= s= S= t= T= u= U= f= F= h= H= c= C= c% C% s% S% #========================================================= # LANGUAGES AND THEIR SPECIAL REQUIREMENTS #========================================================= Language: English Base: Latin Language: Lithuanian Base: Latin Required: a; e; i; u; e. u- c< s< z< A; E; I; U; E. U- C< S< Z< Language: Latvian Base: Latin Required: a- e- i- o- u- g, k, l, n, r, c< s< z< A- E- I- O- U- G, K, L, N, R, C< S< Z< Language: Estonian Base: Latin Required: o? a: o: u: s< z< O? A: O: U: S< Z< Language: Finnish Base: Latin Required: a: o: A: O: Language: Sami Base: Latin Required: a' d/ ng t/ c< s< z< A' D/ NG T/ C< S< Z< Alvestrand Expires Feb 95 [Page 57] draft Languages and character sets Aug 95 Important: e' a> a: e: i: o: u: ae aa o/ n' E' A> A: E: I: O: U: AE AA O/ N' Comments: Information from Otto Prytz This information is for the current Norwegian North Sami ortography of 1979. The letters aa, ae and o/ are in use for Norwegian/Swedish names, but not for Sami proper. a> and n' are no longer used. There is some doubt about whether e: and i: were ever used, but they are listed by van Wingen. Information from regnorj@powertech.no (Regnor Jernsletten): a', c< and s< only occur at the beginning of words. d/, ng, t/ and z< occur only within words. a' and n' are used in Lule sami, together with ae (in Norway) or a: (in Sweden). In South Sami, i: is used, together with ae and o/ (in Norway) or a: and o: (in Sweden). a> is used in Skolte Sami, together with g<, k< and o~. Skolte sami also uses z and z<, but the Z is written much like the number 3. Also, the letter "stungen g" (ISO code unknown) is used. Language: Swedish Base: Latin Required: a: o: aa A: O: AA Important: a' e' e: u: A' E' E: U: Language: Norwegian Base: Latin Required: ae aa o/ AE AA O/ Important: e' o' o> a! u: a< e` o` E' O' O> A! U: A< E` O` Comments: Information from Johan van Wingen and Otto Prytz. The charactes e` and o` are used in the "Nynorsk" sublanguage (information from Knut S. Vikør Language: Danish Base: Latin Required: ae aa o/ AE AA O/ Alvestrand Expires Feb 95 [Page 58] draft Languages and character sets Aug 95 Important: a' e' i' o' u' y' A' E' I' O' U' Y' Language: Faeroese Base: Latin Required: a' i' o' u' y' ae o/ d- A' I' O' U' Y' AE O/ D- Language: Icelandic Base: Latin Required: a' e' i' o' u' y' o: ae d- th A' E' I' O' U' Y' O: AE D- TH Language: Greenlandic Base: Latin Required: a' e' i' u' a> e> i> o> u> ae aa o/ a? i? u? kk A' E' I' U' A> E> I> O> U> AE AA O/ A? I? U? KK Language: Gaelic Base: Latin Required: a' e' o' a! e! i! o! u! A' E' O' A! E! I! O! U! Language: Irish Base: Latin Required: a' e' i' o' u' A' E' I' O' U' Language: Welsh Base: Latin Required: w' y' a' e' i' o' u' a! e! i! o! u! w! y! a> e> i> o> u> w> y> a: e: i: o: u: w: y: W' Y' A' E' I' O' U' A! E! I! O! U! W! Y! A> E> I> O> U> W> Y> A: E: I: O: U: W: Y: Language: Breton Base: Latin Required: e> u! u: n? E> U! U: N? Language: Frisian Base: Latin Required: e' u' a> e> o> u> a: e: i: o: u: E' U' A> E> O> U> A: E: I: O: U: Alvestrand Expires Feb 95 [Page 59] draft Languages and character sets Aug 95 Language: Dutch Base: Latin Required: a' e' i' o' u' a: e: i: o: u: ij A' E' I' O' U' A: E: I: O: U: IJ Language: Afrikaans Base: Latin Required: a' e' e! a> e> i> o> u> e: i: o: 'n A' E' E! A> E> I> O> U> E: I: O: 'N Language: German Base: Latin Required: a: o: u: ss A: O: U: SS Important: e' a! E' A! Comments: The "ss" character exists only in lower case; the upper case equivalent is "SS" (2 letters). Language: French Base: Latin Required: e' e! u! c, a! E' E! U! C, A! Important: a> e> i> o> u> ae oe e: i: u: y: A> E> I> O> U> AE OE E: I: U: Y: Comments: ae and y: are very uncommon in current French; there have been arguments that all of the others should be "required". Language: Catalan Base: Latin Required: e' i' o' u' a! e! o! i: u: l. c, E' I' O' U' A! E! O! I: U: L. C, Important: n? N? Comments: Information from van Wingen and Otto Prytz. Language: Spanish Base: Latin Required: n? !I ?I N? !I ?I Important: a' e' i' o' u' u: c, -a -o Alvestrand Expires Feb 95 [Page 60] draft Languages and character sets Aug 95 A' E' I' O' U' U: C, -A -O Comments: Note that this language also uses special punctuation marks. The accented vowels may be mandatory; Spanish speakers who think they should be are encouraged to speak up. Information from Otto Prytz Language: Galician Base: Latin Required: a' e' i' o' u' u: n? A' E' I' O' U' U: N? Language: Portuguese Base: Latin Required: a? o? c, A? O? C, Important: a' e' i' o' u' a! a> e> o> u: o! A' E' I' O' U' A! A> E> O> U: O! Comments: Information from van Wingen and Otto Prytz Language: Basque Base: Latin Required: n? c, N? C, Language: Maltese Base: Latin Required: a! e! i! o! u! i> c. g. h/ z. A! E! I! O! U! I> C. G. H/ Z. Language: Italian Base: Latin Required: e' o' a! e! i! o! E' O' A! E! I! O! Important: i' u' u! i: I' U' U! I: Comments: The accented characters appear only in the lower case variant in the Italian version of ISO 646 (ISO-IR-15). Language: Rhaeto-Romance Base: Latin Required: e' a! e! o! a> e> i> o> o: u: Alvestrand Expires Feb 95 [Page 61] draft Languages and character sets Aug 95 E' A! E! O! A> E> I> O> O: U: Comments: In van Wingen's table, this appeared as "Rhaetian". Language: Romanian Base: Latin Required: a> i> a( s, t, A> I> A( S, T, Language: Hungarian Base: Latin Required: a' e' i' o' u' o: u: o" u" A' E' I' O' U' O: U: O" U" Language: Albanian Base: Latin Required: e: c, E: C, Language: Turkish Base: Latin Required: a> i> u> o: u: i. c, s, g( A> I> U> O: U: I. C, S, G( Comments: The dotless i is converted to a normal (dotless) I in uppercase; the dotted (normal) lowercase I is converted into a dotted uppercase I. Language: Croatian Base: Latin Required: c' d/ c< s< z< C' D/ C< S< Z< Language: Slovenian Base: Latin Required: c< s< z< C< S< Z< Language: Slovak Base: Latin Required: y' a' e' i' o' u' a: o> l' r' c< d< l< n< s< t< z< Y' A' E' I' O' U' A: O> L' R' C< D< L< N< S< T< Z< Language: Czech Base: Latin Alvestrand Expires Feb 95 [Page 62] draft Languages and character sets Aug 95 Required: y' a' e' i' o' u' e< u0 c< d< n< r< s< t< z< Y' A' E' I' O' U' E< U0 C< D< N< R< S< T< Z< Language: Polish Base: Latin Required: o' a; e; c' n' s' z' l/ z. O' A; E; C' N' S' Z' L/ Z. Language: Sorbian Base: Latin Required: o' e< c' n' s' z' l/ c< r< s< z< O' E< C' N' S' Z' L/ C< R< S< Z< Language: Esperanto Base: Latin Required: u( c> g> h> j> s> U( C> G> H> J> S> Language: serbian Base: Cyrillic Required: j% J% nj NJ dz DZ lj LJ ts Ts d% D% Language: Macedonian Base: Cyrillic Required: j% J% nj NJ dz DZ lj LJ g% G% kj KJ ds DS Language: Bulgarian Base: Cyrillic Required: j= J= %' %" sc Sc ju JU ja JA =' =" Language: Russian Base: Cyrillic Required: j= J= %' %" sc Sc ju JU ja JA =' =" Required: y= Y= je JE io IO Language: Byelorussian Base: Cyrillic Required: ii II v% V% Language: Ukrainian Base: Cyrillic Required: ii II ie IE yi YI g3 G3 Alvestrand Expires Feb 95 [Page 63] draft Languages and character sets Aug 95 6. REFERENCES [ISO 8859] Information technology - 8-bit single-byte coded graphic character sets [ISO 6937] Information processing - Coded graphic character set for text communication [ISO 639] Codes for identifying languages (1988 version) [ISO 10646] Information technology - Universal Multiple-Octet Coded Character Set [RFC-KELD] Keld Simonsen: Character Mnemonics & Character Sets, RFC 1345, June 1992 Alvestrand Expires Feb 95 [Page 64] draft Languages and character sets Aug 95 Table of Contents Abstract ................................................... 1 Status of this Memo ........................................ 1 1 Introduction .............................................. 3 2 Introduction to language tables ........................... 3 2.1 Table structure ......................................... 4 2.2 Sources utilized ........................................ 5 2.3 What accents mean ....................................... 5 3 Language tables ........................................... 6 3.1 la Latin ................................................ 6 3.2 ?? Cyrillic ............................................. 9 3.3 en English .............................................. 11 3.4 lt Lithuanian ........................................... 11 3.5 lv Latvian .............................................. 12 3.6 et Estonian ............................................. 13 3.7 fi Finnish .............................................. 14 3.8 ?? Sami ................................................. 14 3.9 sv Swedish .............................................. 16 3.10 no Norwegian ........................................... 17 3.11 da Danish .............................................. 18 3.12 fo Faeroese ............................................ 20 3.13 is Icelandic ........................................... 20 3.14 kl Greenlandic ......................................... 21 3.15 ?? Gaelic .............................................. 22 3.16 ga Irish ............................................... 23 3.17 cy Welsh ............................................... 24 3.18 br Breton .............................................. 25 3.19 fy Frisian ............................................. 26 3.20 nl Dutch ............................................... 27 3.21 af Afrikaans ........................................... 28 3.22 de German .............................................. 29 3.23 fr French .............................................. 30 3.24 ca Catalan ............................................. 31 3.25 es Spanish ............................................. 32 3.26 gl Galician ............................................ 34 3.27 pt Portuguese .......................................... 35 3.28 eu Basque .............................................. 36 3.29 mt Maltese ............................................. 37 3.30 it Italian ............................................. 37 3.31 rm Rhaeto-Romance ...................................... 39 3.32 ro Romanian ............................................ 40 3.33 hu Hungarian ........................................... 40 Alvestrand Expires Feb 95 [Page 65] draft Languages and character sets Aug 95 3.34 sq Albanian ............................................ 41 3.35 tr Turkish ............................................. 42 3.36 hr Croatian ............................................ 43 3.37 sl Slovenian ........................................... 43 3.38 sk Slovak .............................................. 44 3.39 cs Czech ............................................... 45 3.40 pl Polish .............................................. 46 3.41 ?? Sorbian ............................................. 47 3.42 eo Esperanto ........................................... 48 3.43 ?? serbian ............................................. 49 3.44 mk Macedonian .......................................... 50 3.45 bg Bulgarian ........................................... 50 3.46 ru Russian ............................................. 51 3.47 be Byelorussian ........................................ 52 3.48 uk Ukrainian ........................................... 52 4 Other languages with appropriate character sets ........... 53 4.1 ISO 10646 only languages ................................ 53 5 Encoded format of charset data ............................ 56 6 REFERENCES ................................................ 64 Alvestrand Expires Feb 95 [Page 66]