Issue 6659 - Add 'Thai (TIS-620)' item to the encoding menu
Summary: Add 'Thai (TIS-620)' item to the encoding menu
Status: CLOSED NOT_AN_OOO_ISSUE
Alias: None
Product: General
Classification: Code
Component: ui (show other issues)
Version: OOo 1.0.0
Hardware: PC All
: P3 Trivial (vote)
Target Milestone: ---
Assignee: ooo
QA Contact: issues@framework
URL:
Keywords:
Depends on: 6658
Blocks:
  Show dependency tree
 
Reported: 2002-07-31 08:31 UTC by samphan
Modified: 2002-12-09 16:23 UTC (History)
4 users (show)

See Also:
Issue Type: PATCH
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments

Note You need to log in before you can comment on or make changes to this issue.
Description samphan 2002-07-31 08:31:33 UTC
To be able to open/save plain 8 bits tis-620 text files.

****
diff -urdP -X srcdiff.txt ../oo_1.0_src.orig/./svx/source/dialog/txenctab.src
./svx/source/dialog/txenctab.src
--- ../oo_1.0_src.orig/./svx/source/dialog/txenctab.src	Tue Feb 26 17:23:56 2002
+++ ./svx/source/dialog/txenctab.src	Sun Jul 28 20:33:44 2002
@@ -102,7 +102,7 @@
 //
	< "Arabisch (DOS/OS2-864)"							; RTL_TEXTENCODING_IBM_864			; > ;
 
	< "Kyrillisch (DOS/OS2-866/Russisch)"				; RTL_TEXTENCODING_IBM_866			; > ;
 
	< "Griechisch (DOS/OS2-869/Modern)"					; RTL_TEXTENCODING_IBM_869			; > ;
-//
	< "Thai (Dos/Windows-874)"							; RTL_TEXTENCODING_MS_874	
	; > ;
+
	< "Thai (TIS-620)"						
		; RTL_TEXTENCODING_MS_874
		; > ;
 
	< "Osteuropa (Windows-1250/WinLatin 2)"				; RTL_TEXTENCODING_MS_1250			; > ;
 
	< "Kyrillisch (Windows-1251)"						; RTL_TEXTENCODING_MS_1251			; > ;
 
	< "Griechisch (Windows-1253)"						; RTL_TEXTENCODING_MS_1253			; > ;
@@ -190,7 +190,7 @@
 //
	< "Arabic (DOS/OS2-864)"							; RTL_TEXTENCODING_IBM_864			; > ;
 
	< "Cyrillic (DOS/OS2-866/Russian)"                  ; RTL_TEXTENCODING_IBM_866			; > ;
 
	< "Greek (DOS/OS2-869/Modern)"                      ; RTL_TEXTENCODING_IBM_869			; > ;
-//
	< "Thai (Dos/Windows-874)"							; RTL_TEXTENCODING_MS_874	
	; > ;
+
	< "Thai (TIS-620)"						
		; RTL_TEXTENCODING_MS_874
		; > ;
 
	< "Central European (Windows-1250/WinLatin 2)"      ; RTL_TEXTENCODING_MS_1250			; > ;
 
	< "Cyrillic (Windows-1251)"                         ; RTL_TEXTENCODING_MS_1251			; > ;
 
	< "Greek (Windows-1253)"                            ; RTL_TEXTENCODING_MS_1253			; > ;
@@ -274,6 +274,7 @@
 
	< "T?kisch (DOS/OS2-857)"							; RTL_TEXTENCODING_IBM_857			; > ;
 
	< "Kyrillisch (DOS/OS2-866/Russisch)"				; RTL_TEXTENCODING_IBM_866			; > ;
 
	< "Griechisch (DOS/OS2-869/Modern)"					; RTL_TEXTENCODING_IBM_869			; > ;
+
	< "Thai (TIS-620)"						
		; RTL_TEXTENCODING_MS_874
		; > ;
 
	< "Osteuropa (Windows-1250/WinLatin 2)"				; RTL_TEXTENCODING_MS_1250			; > ;
 
	< "Kyrillisch (Windows-1251)"						; RTL_TEXTENCODING_MS_1251			; > ;
 
	< "Griechisch (Windows-1253)"						; RTL_TEXTENCODING_MS_1253			; > ;
@@ -344,6 +345,7 @@
 
	< "Turkish (DOS/OS2-857)"							; RTL_TEXTENCODING_IBM_857			; > ;
 
	< "Cyrillic (DOS/OS2-866/Russian)"				; RTL_TEXTENCODING_IBM_866			; > ;
 
	< "Greek (DOS/OS2-869/Modern)"					; RTL_TEXTENCODING_IBM_869			; > ;
+
	< "Thai (TIS-620)"						
		; RTL_TEXTENCODING_MS_874
		; > ;
 
	< "Eastern Europe (Windows-1250/WinLatin 2)"				; RTL_TEXTENCODING_MS_1250			; > ;
 
	< "Cyrillic (Windows-1251)"						; RTL_TEXTENCODING_MS_1251			; > ;
 
	< "Greek (Windows-1253)"						; RTL_TEXTENCODING_MS_1253			; > ;
Comment 1 nospam4obr 2002-07-31 13:16:30 UTC
Eicke, am I right that your are the actual owner of this code ?
Comment 2 ooo 2002-07-31 22:29:41 UTC
@Oliver: not really, I just was the only one who dared to touch that
file manually.. and as long as the localization tools can't handle
StringArray ItemList resources right, I refuse to be the owner of it,
though I'll do all necessary changes so far..

Regarding the suggested patch:

1. It's quite senseless to paste a diff into this comment field, as
line wraps may get added or deleted on any occasion, so the changes
aren't really useful without manually reformatting it. Diffs should be
attached to an issue instead.

2. The patch is wrong. AFAIK Windows-874 is not equal to TIS-620. See
the current SRX643 revision of the file, which introduces an
additional RTL_TEXTENCODING_TIS_620. Of course there are more places
than just this resource file where the new encoding would have to be
added. I therefor close this issue as invalid.

@Oliver: see also SO internal tracker IDs #98208# and #99587#
Comment 3 nospam4obr 2002-08-01 06:11:55 UTC
@Eike: sorry for misspelling your name :-(

I wonder if it is possible and desired to backport the changes in 643
to OOO_STABLE_1. I am sure there will be a 1.02 sometimes where these
changes might be useful in. What do you think ?
Comment 4 ooo 2002-08-01 11:05:08 UTC
I'm not sure if it would be useful. We don't have any Thai (CTL)
specific features on the OOO_STABLE_1 branch, so I wonder if it would
help anything if we're able to read/write a TIS encoded document
without being able to handle it's content. We'll definitely not
backport the SRX643 CTL features to the OOO_STABLE_1 branch.

@Stephan: do you see any benefits in backporting the encodig only?

@Oliver: don't worry about the (mis)spelling of my name, I'm used to
people not getting it right ;-)
Comment 5 samphan 2002-08-01 14:45:39 UTC
Thanks for adding RTL_TEXTENCODING_TIS_620 :-). I hate using
WINDOWS-874 in place of TIS-620 because they're not actually the same.
TIS-620 is the standard encoding for information exchange for Thai.
Windows-874 is a font-encoding (another is Mac-Thai) which extend
TIS-620 in the TIS-620's private-use area to include the same set of
characters that Windows-1252 extend ISO-8859-1. Windows-874 is meant
to be used internally in Windows machine but should not be used as the
encoding for plain text files (lack out-of-band info) and internet
documents, otherwise people with other OSes (e.g. Mac/Linux/Solaris)
will see things differently than what the authors intended. Also,
windows-874 (unlike windows-1252 and tis-620) is not registered as a
valid MIME charset.
So I suggest removing 'Thai (Windows-874)' entries in the menu.
Comment 6 samphan 2002-08-02 05:20:00 UTC
> Windows-874 is a font-encoding (another is Mac-Thai) which extend
> TIS-620 in the TIS-620's private-use area to include the same 
> set of characters that Windows-1252 extend ISO-8859-1.

The sets are not actually the same. The set that was added by
Windows-874 is just a small subset of what Windows-1252 added to
iso-8859-1. It includes only the EURO , ELLIPSIS, LEFT/RIGHT
SINGLE/DOUBLE QUOTATION MARK, BULLET, EN/EM DASH. This is because
Windows-874 need to leave room for a bunce of positional-varient of
vowel/tone marks for shaping/rendering of Thai clusters. So I don't
think Windows-874 should be used outside the software at all.

Comment 7 samphan 2002-08-03 18:38:25 UTC
> there are more places than just this resource file 
> where the new encoding would have to be added.

If you mean SRx643, I think so. But if you mean OOo 1.0.x of
which this patch and those in i6658 (which add a lot of 
entries to a lot of tables) are intended, I'm quite sure that
this is all you need to add support for TIS-620 encodings to OOo.
Because it was done by looking at  every files from 
'egrep -rIl '8859.?2' and add the missing parts.

> We don't have any Thai (CTL) specific features on the OOO_STABLE_1 
> branch, so I wonder if it would help anything if we're able to 
> read/write a TIS encoded document without being able to handle it's 
> content. 

We already use OOo for Thai document long before we have any Thai-enabled
versions because you don't need Thai shaping for text to be readable.
The one thing that was missed from OOo 1.0.x is the entry in the menu
to select Thai encoding to open/save plain text.
BTW, whether adding this entry to the menu is appropriate or not depend
entirely on what you think is a minor fix and your priority to support
Thai 
in OOo 1.0.x. I'm sure you'll support Thai on SRX643 (but please
remove the 
Windows-874 menu entry there if you agree with the previous comment :-).

Comment 8 ooo 2002-08-04 16:41:48 UTC
I don't agree to remove MS-874 as it is different to the
TIS-620/iso8859-11 encoding. As already mentioned in issue 6658 your
patches only cure a symptom by pretending to support TIS-620 but
instead only support MS-874 under a different name. If you want to see
what would have to be done to support TIS-620 please checkout the
SRX643 revisions of sal/rtl/* and sal/textenc/* and grep for
RTL_TEXTENCODING_TIS_620 and RTL_TEXTENCODING_MS_874 (or just look at
the diffs to -rOOO_STABLE_1).

The only thing that could be done for OOo1.0 without backporting
everything would be to simply remove the comments from the
svx/source/dialog/txenctab.src RTL_TEXTENCODING_MS_874 entries and use
them as such (Windows-874) to be able to load/store documents using
that encoding, but not to change the entries to read "TIS-620" or to
change anything in sal to fake TIS-620.
Comment 9 samphan 2002-08-04 18:15:28 UTC
Thanks for pointing me to the sources. I don't care much about Thai
support in OOo 1.0. What I care most is SRX642/643. Can we talk about
SRXs here? 

This is my idea about SRX643 vs TIS-620 :-

- presenting both 'Thai (TIS-620)' and 'Thai (Dos/Windows-874)' on the
encoding menu will confuse the users :-
1) Very few people know the differences between the two. The average
users won't know which to choose.
2) Windows-874 should not be used outside the software
2.1) In plain-text files. You can't tell the differences when looking
at a file that seems to be 8-bits Thai text because there's no charset
information and both encoding treat codepoints 0x80-0x9f differently.
2.2) Though internet documents have MIME charset information,
Windows-874 is not registered as a MIME charset.

- having MS_874 in addition to TIS_620 has a benefit on MS Windows
machines. That is an additional set of characters (EURO, ELLIPSIS,
LEFT/RIGHT SINGLE/DOUBLE QUOTATION MARK, BULLET, EN/EM DASH) can be
converted to/from Unicode. But having it on Linux/Solaris/Mac seems to
be useless?

So I think RTL_TEXTENCODING_MS_874 should be in the code, but not
presented to the users.
Comment 10 Unknown 2002-08-06 18:42:14 UTC
I strongly agree with Samphan on his comment.

MS-874/Windows-874 should not be seen by user.
(it's ok to have it in the code)

We have only TWO standard encoding for Thai chars
1) ISO-8859-11
2) TIS-620

And for the IANA registered MIME type for text/plain,
we have only ONE --> it is TIS-620.

Mozilla/Netscape/Solaris also has only TIS-620/ISO-8859-11.
Java has Windows-874, but the default one for Thai is TIS-620.
Comment 11 ooo 2002-12-09 16:23:28 UTC
close