Apache OpenOffice (AOO) Bugzilla – Issue 88376
X11: Wrong character mapping for some fonts
Last modified: 2009-04-28 13:29:15 UTC
A few days ago I opened an existing document in Presentation that I have not edited for 1 year. Back then, the document looked fine in OOo, and its PDF export from last year supports this. No, though, some special characters (e.g., ö = &omul; and ä = ä) are displayed incorrectly; they are replaced by other characters, e.g. an accent-^ and a permille character (0/00). The characters render OK when you make them italic! Also, the character rendering seems to be dependent on a type of context not visible to the user. This can be seen in the attached file. Please find attached a testcase that proves this. The special characters on page 3 render incorrectly. Editing in this region is also sometimes impossible. Please make the special (non-7bit) characters italic: they will change to their actual correct content. When you remove slide 1 from the presentation (without editing anything else), then save and re-load the file, the exact same characters on the (now) page 2 do render correctly! I have confirmed this in releases 2.4.0.3.5-1.1 and 2.3.1.2-3.1, both Linux, and on both i586 and x86-64.
Created attachment 52957 [details] testcase as described in the bug report.
The document looks ok here when exported. Did you use the same pdf viewer in both cases and could you try another? Please attach one of the bad pdf files. Thanks!
Created attachment 52976 [details] To make things clear: screenshots. No. 1: Problem on page 3, see Fufl (should be Fuß = Fuß) etc.
Created attachment 52977 [details] To make things clear: screenshots. No. 2: Changed page 3: Italics for some (not all) special characters (marked and clicked [I] above), they are correct now!
Created attachment 52978 [details] To make things clear: screenshots. No. 3: Opened file, removed slides 1+2, saved, closed OOo completely, reopened file, result: No problem on page 3! Nothing edited on this slide!
Sorry to be unclear. The PDF file was only quoted to show that the exact same file rendered OK previously. It is not necessary to export to PDF to see the bug. Please see the 3 new attachments -- screenshots this time. It may be something to do with rendering, maybe also character coding, or something else ??? The fact that removing the first 2 slides (then save, close OOo, re-open saved file) changes the rendering/coding on the last slide (which remains unchanged by the user) may suggest that some invisible character/attribute/mode is present/initiated somewhere in the first two slides? Hope this helps to track the problem. Once again, I used Linux, i586 and x86_64, OpenSuSE 10.3, OOo from OpenSuSE's repo, today's release 2.4.0.3.5-2.1 (same problem on 2.4.0.3.5-1.1, 2.3.1.2-3.1). Please let me know if I can give you more info.
The problem does not exist on OpenOffice 2.3.1 DE for Windows. I do not know whether the reason is in the different OS/compilation or with the fact that a (probably) different version of the Andale Sans font is present on that system. For the moment the bug should be tracked under Linux, where I use the standard Andale Sans font /usr/share/fonts/truetype/ans_____.ttf from the agfa-fonts-2003.03.19-92 package (MD5 sum: e2518c39b4eecd3eb72dc81c956172c5 /usr/share/fonts/truetype/ans_____.ttf). However, this CANNOT simply be a problem in the font file, as (a) such problem would have been detected and fixed a very long time ago, (b) the characters are rendered OK with the same attributes (non-bold, non-italic, same point size) when slides 1+2 are removed (see screenshots), and (c) the characters in question look OK in kfontview. Hope this helps.
Two colleagues of mine have the same problem on their computers, which rules out personal settings as a source of the problem. Their machines are x86 (32-bit) platforms on OpenSUSE 10.3. Anybody else see the same thing when opening the attached file? Maybe we can track down on which systems it looks OK and on which it fails.
And you are all using the OOo from OpenSuSE'srepo? In this case I would recommend to have a try with the orginal version from OOo. Thanks.
Thanks for the hint. They did all use OpenSuSE's version. I have therefore downloaded and installed (as a user) OOo_2.4.0_LinuxIntel_install_wJRE_en-US.tar.gz from a mirror and ran it. The problem is still there on slide 3, with characters similar to fl, 0/00, ^ and "," where ß, ä, ü and ö should be. The degree sign is displayed as an integral sign. All of the characters on this page look OK when made italic (select, press [I] button above).
Ah, now I can reproduce this. Reassigned.
Changed target and owner.
I have checked again and can confirm that the issue is still present in build 2.4.1.6.
.
I'm sure I've already seen issues with the same root cause in the tracker but I can't find them. Anyway, this seems to be a case of ImplFontData.meFamily aliasing => fixed in CWS vcl93
@wg: please check in CWS vcl93
forgot to reassign
Now I found the real root cause: When fontconfig's FcFreeTypeCharIndex() is called it tries some charmaps on the FT_Face and doesn't reset it to its original. In this case the charmap was changed from FT_ENCODING_UNICODE to FT_ENCODING_APPLE_ROMAN, so an U+00E4 (a with diaresis) became an APPLE_ROMAN_0xE4 (per mille sign), etc. Disabling the patch from issue 72129 fixes the bad regression.
Created attachment 55933 [details] quite minimal bugdoc
Only certain document fonts were affected by the problem, because FcFreeTypeCharIndex() was only called when glyph fallback got involved. ASCII chars did not hit the problem because the problematic call usually just set another latin encoding.
Created attachment 55940 [details] a thought
It would have been really good if the FcFreeTypeCharIndex api had any mention that it did that :-( The alternative patch there might also work (?) but I can understand once bitten twice shy, so maybe something like that for a future version
Verified in CWS.
@cmc: yes, your patch solves the problem too. I sent a similar patch to the fontconfig list to fix the unexpected side effect in the library itself. But the CWS with the current fix is already closed for development and is really urgently needed for the OOo3RC. If the CWS doesn't come back and gets integrated as it is a followup task for the issue 72129- like problem is due. self reminder: Since we'll probably link against unfixed versions of libfontconfig for a while it would be even better to have psprint's FcFreeTypeCharIndex wrapper fixed, but that would add a new dependency to that. So that remains a TODO until we merge psprint into vcl.
To help identify duplicates to this issue (e.g. issue 92843 is a candidate) I'd like to point out the subtle details to make it understandable out why CJK, Indic and non-ASCII were impacted differently by this same root cause: Fonts nowadays usually have an unicode mapping, many still have an old macos compatibility mapping like apple_roman and some important CJK fonts still contain non-unicode legacy charmaps, FcFreeTypeCharIndex() changed the FT_Face's charmap by iterating through the ones available in the font until it either hit once or until all available charmaps missed. The result of that unexpected side effect was that - non-ASCII latin misses resulted in the FT_Face being changed to apple_roman encoding - CJK fonts often got the regular unicode mapping back, but sometimes a legacy CJK mapping hit first - Indic, Thai, Hebrew, etc. almost certainly switched back to the unicode mapping at the first glyph hit Now if the scenario resulted in FT_Face being changed back to a unicode mapping everything was fine again. If it resulted in e.g. apple_roman then the problem seen in this issue happened. In the case of just one legacy CJK mapping being available there usually was no problem too, unless the mapping from unicode to legacy mapping differed between OOo and FC. In the case of both unicode maps and legacy maps being available in the font, the scenario of many glyph misses caused occassional switches between these charmaps. Escpecially since these big mappings often have a slightly different coverage. The root cause is easily understable, but many of resulting bug scenarios are so complex to be mind bending... though the mapping trouble outlined above is complex enough it is further complicated by an LRU-like caching of the usually expensive mapping results.
Correction to the above: in the official fontconfig library not all charmaps of a font are tried, e.g. the legacy non-unicode CJK encodings are ignored. Maybe asian distributions have patched up their libfontconfig though to enable them. This would allow them to use of important fonts that only have legacy encodings. Can anyone confirm this? If no unicode charmaps are available OOo uses legacy CJK-encodings for the same reason. The side effect that the FT_Face's charmap got silently switched is still causing bad problems, but unless the library is patched up the mind bending scenarios of the previous comment are much less likely to occur in real life.
FWIW, I see no custom patches at all in fedora fontconfig (2.1.4) for F10/F9 except a single custom fontconfig configuration rule to set some asian fonts to embeddedbitmap=false (http://cvs.fedora.redhat.com/viewvc/devel/fontconfig/)
FYI, I found that issue so interesting that I blogged about it: http://blogs.sun.com/GullFOSS/entry/what_could_possibly_go_wrong
*** Issue 87161 has been marked as a duplicate of this issue. ***
*** Issue 83370 has been marked as a duplicate of this issue. ***
*** Issue 93437 has been marked as a duplicate of this issue. ***
*** Issue 83884 has been marked as a duplicate of this issue. ***
*** Issue 90564 has been marked as a duplicate of this issue. ***
*** Issue 86309 has been marked as a duplicate of this issue. ***
*** Issue 89982 has been marked as a duplicate of this issue. ***
*** Issue 82150 has been marked as a duplicate of this issue. ***
*** Issue 84335 has been marked as a duplicate of this issue. ***
*** Issue 86114 has been marked as a duplicate of this issue. ***
*** Issue 89157 has been marked as a duplicate of this issue. ***
*** Issue 80190 has been marked as a duplicate of this issue. ***
Also fixed in CWS chart33 for target 2.4.2 @wg: please verify in CWS chart33
SBA: I put ES and myselc on c/c.
Tested in Final. Closed.
*** Issue 86781 has been marked as a duplicate of this issue. ***
*** Issue 81020 has been marked as a duplicate of this issue. ***