Apache OpenOffice (AOO) Bugzilla – Issue 6659
Add 'Thai (TIS-620)' item to the encoding menu
Last modified: 2002-12-09 16:23:28 UTC
To be able to open/save plain 8 bits tis-620 text files. **** diff -urdP -X srcdiff.txt ../oo_1.0_src.orig/./svx/source/dialog/txenctab.src ./svx/source/dialog/txenctab.src --- ../oo_1.0_src.orig/./svx/source/dialog/txenctab.src Tue Feb 26 17:23:56 2002 +++ ./svx/source/dialog/txenctab.src Sun Jul 28 20:33:44 2002 @@ -102,7 +102,7 @@ // < "Arabisch (DOS/OS2-864)" ; RTL_TEXTENCODING_IBM_864 ; > ; < "Kyrillisch (DOS/OS2-866/Russisch)" ; RTL_TEXTENCODING_IBM_866 ; > ; < "Griechisch (DOS/OS2-869/Modern)" ; RTL_TEXTENCODING_IBM_869 ; > ; -// < "Thai (Dos/Windows-874)" ; RTL_TEXTENCODING_MS_874 ; > ; + < "Thai (TIS-620)" ; RTL_TEXTENCODING_MS_874 ; > ; < "Osteuropa (Windows-1250/WinLatin 2)" ; RTL_TEXTENCODING_MS_1250 ; > ; < "Kyrillisch (Windows-1251)" ; RTL_TEXTENCODING_MS_1251 ; > ; < "Griechisch (Windows-1253)" ; RTL_TEXTENCODING_MS_1253 ; > ; @@ -190,7 +190,7 @@ // < "Arabic (DOS/OS2-864)" ; RTL_TEXTENCODING_IBM_864 ; > ; < "Cyrillic (DOS/OS2-866/Russian)" ; RTL_TEXTENCODING_IBM_866 ; > ; < "Greek (DOS/OS2-869/Modern)" ; RTL_TEXTENCODING_IBM_869 ; > ; -// < "Thai (Dos/Windows-874)" ; RTL_TEXTENCODING_MS_874 ; > ; + < "Thai (TIS-620)" ; RTL_TEXTENCODING_MS_874 ; > ; < "Central European (Windows-1250/WinLatin 2)" ; RTL_TEXTENCODING_MS_1250 ; > ; < "Cyrillic (Windows-1251)" ; RTL_TEXTENCODING_MS_1251 ; > ; < "Greek (Windows-1253)" ; RTL_TEXTENCODING_MS_1253 ; > ; @@ -274,6 +274,7 @@ < "T?kisch (DOS/OS2-857)" ; RTL_TEXTENCODING_IBM_857 ; > ; < "Kyrillisch (DOS/OS2-866/Russisch)" ; RTL_TEXTENCODING_IBM_866 ; > ; < "Griechisch (DOS/OS2-869/Modern)" ; RTL_TEXTENCODING_IBM_869 ; > ; + < "Thai (TIS-620)" ; RTL_TEXTENCODING_MS_874 ; > ; < "Osteuropa (Windows-1250/WinLatin 2)" ; RTL_TEXTENCODING_MS_1250 ; > ; < "Kyrillisch (Windows-1251)" ; RTL_TEXTENCODING_MS_1251 ; > ; < "Griechisch (Windows-1253)" ; RTL_TEXTENCODING_MS_1253 ; > ; @@ -344,6 +345,7 @@ < "Turkish (DOS/OS2-857)" ; RTL_TEXTENCODING_IBM_857 ; > ; < "Cyrillic (DOS/OS2-866/Russian)" ; RTL_TEXTENCODING_IBM_866 ; > ; < "Greek (DOS/OS2-869/Modern)" ; RTL_TEXTENCODING_IBM_869 ; > ; + < "Thai (TIS-620)" ; RTL_TEXTENCODING_MS_874 ; > ; < "Eastern Europe (Windows-1250/WinLatin 2)" ; RTL_TEXTENCODING_MS_1250 ; > ; < "Cyrillic (Windows-1251)" ; RTL_TEXTENCODING_MS_1251 ; > ; < "Greek (Windows-1253)" ; RTL_TEXTENCODING_MS_1253 ; > ;
Eicke, am I right that your are the actual owner of this code ?
@Oliver: not really, I just was the only one who dared to touch that file manually.. and as long as the localization tools can't handle StringArray ItemList resources right, I refuse to be the owner of it, though I'll do all necessary changes so far.. Regarding the suggested patch: 1. It's quite senseless to paste a diff into this comment field, as line wraps may get added or deleted on any occasion, so the changes aren't really useful without manually reformatting it. Diffs should be attached to an issue instead. 2. The patch is wrong. AFAIK Windows-874 is not equal to TIS-620. See the current SRX643 revision of the file, which introduces an additional RTL_TEXTENCODING_TIS_620. Of course there are more places than just this resource file where the new encoding would have to be added. I therefor close this issue as invalid. @Oliver: see also SO internal tracker IDs #98208# and #99587#
@Eike: sorry for misspelling your name :-( I wonder if it is possible and desired to backport the changes in 643 to OOO_STABLE_1. I am sure there will be a 1.02 sometimes where these changes might be useful in. What do you think ?
I'm not sure if it would be useful. We don't have any Thai (CTL) specific features on the OOO_STABLE_1 branch, so I wonder if it would help anything if we're able to read/write a TIS encoded document without being able to handle it's content. We'll definitely not backport the SRX643 CTL features to the OOO_STABLE_1 branch. @Stephan: do you see any benefits in backporting the encodig only? @Oliver: don't worry about the (mis)spelling of my name, I'm used to people not getting it right ;-)
Thanks for adding RTL_TEXTENCODING_TIS_620 :-). I hate using WINDOWS-874 in place of TIS-620 because they're not actually the same. TIS-620 is the standard encoding for information exchange for Thai. Windows-874 is a font-encoding (another is Mac-Thai) which extend TIS-620 in the TIS-620's private-use area to include the same set of characters that Windows-1252 extend ISO-8859-1. Windows-874 is meant to be used internally in Windows machine but should not be used as the encoding for plain text files (lack out-of-band info) and internet documents, otherwise people with other OSes (e.g. Mac/Linux/Solaris) will see things differently than what the authors intended. Also, windows-874 (unlike windows-1252 and tis-620) is not registered as a valid MIME charset. So I suggest removing 'Thai (Windows-874)' entries in the menu.
> Windows-874 is a font-encoding (another is Mac-Thai) which extend > TIS-620 in the TIS-620's private-use area to include the same > set of characters that Windows-1252 extend ISO-8859-1. The sets are not actually the same. The set that was added by Windows-874 is just a small subset of what Windows-1252 added to iso-8859-1. It includes only the EURO , ELLIPSIS, LEFT/RIGHT SINGLE/DOUBLE QUOTATION MARK, BULLET, EN/EM DASH. This is because Windows-874 need to leave room for a bunce of positional-varient of vowel/tone marks for shaping/rendering of Thai clusters. So I don't think Windows-874 should be used outside the software at all.
> there are more places than just this resource file > where the new encoding would have to be added. If you mean SRx643, I think so. But if you mean OOo 1.0.x of which this patch and those in i6658 (which add a lot of entries to a lot of tables) are intended, I'm quite sure that this is all you need to add support for TIS-620 encodings to OOo. Because it was done by looking at every files from 'egrep -rIl '8859.?2' and add the missing parts. > We don't have any Thai (CTL) specific features on the OOO_STABLE_1 > branch, so I wonder if it would help anything if we're able to > read/write a TIS encoded document without being able to handle it's > content. We already use OOo for Thai document long before we have any Thai-enabled versions because you don't need Thai shaping for text to be readable. The one thing that was missed from OOo 1.0.x is the entry in the menu to select Thai encoding to open/save plain text. BTW, whether adding this entry to the menu is appropriate or not depend entirely on what you think is a minor fix and your priority to support Thai in OOo 1.0.x. I'm sure you'll support Thai on SRX643 (but please remove the Windows-874 menu entry there if you agree with the previous comment :-).
I don't agree to remove MS-874 as it is different to the TIS-620/iso8859-11 encoding. As already mentioned in issue 6658 your patches only cure a symptom by pretending to support TIS-620 but instead only support MS-874 under a different name. If you want to see what would have to be done to support TIS-620 please checkout the SRX643 revisions of sal/rtl/* and sal/textenc/* and grep for RTL_TEXTENCODING_TIS_620 and RTL_TEXTENCODING_MS_874 (or just look at the diffs to -rOOO_STABLE_1). The only thing that could be done for OOo1.0 without backporting everything would be to simply remove the comments from the svx/source/dialog/txenctab.src RTL_TEXTENCODING_MS_874 entries and use them as such (Windows-874) to be able to load/store documents using that encoding, but not to change the entries to read "TIS-620" or to change anything in sal to fake TIS-620.
Thanks for pointing me to the sources. I don't care much about Thai support in OOo 1.0. What I care most is SRX642/643. Can we talk about SRXs here? This is my idea about SRX643 vs TIS-620 :- - presenting both 'Thai (TIS-620)' and 'Thai (Dos/Windows-874)' on the encoding menu will confuse the users :- 1) Very few people know the differences between the two. The average users won't know which to choose. 2) Windows-874 should not be used outside the software 2.1) In plain-text files. You can't tell the differences when looking at a file that seems to be 8-bits Thai text because there's no charset information and both encoding treat codepoints 0x80-0x9f differently. 2.2) Though internet documents have MIME charset information, Windows-874 is not registered as a MIME charset. - having MS_874 in addition to TIS_620 has a benefit on MS Windows machines. That is an additional set of characters (EURO, ELLIPSIS, LEFT/RIGHT SINGLE/DOUBLE QUOTATION MARK, BULLET, EN/EM DASH) can be converted to/from Unicode. But having it on Linux/Solaris/Mac seems to be useless? So I think RTL_TEXTENCODING_MS_874 should be in the code, but not presented to the users.
I strongly agree with Samphan on his comment. MS-874/Windows-874 should not be seen by user. (it's ok to have it in the code) We have only TWO standard encoding for Thai chars 1) ISO-8859-11 2) TIS-620 And for the IANA registered MIME type for text/plain, we have only ONE --> it is TIS-620. Mozilla/Netscape/Solaris also has only TIS-620/ISO-8859-11. Java has Windows-874, but the default one for Thai is TIS-620.
close