Apache OpenOffice (AOO) Bugzilla – Issue 79498
No support for ligatures in spellcheck
Last modified: 2017-05-20 11:29:25 UTC
OpenOffice.org spell checking does not understand ligature characters. As an example ‘find’ and ‘find’ means the same. However OpenOffice.org flags the first as invalid because it does not understand that ‘fi’ means ‘f’ and ‘i’. For information about ligature characters at Wikipedia: http://en.wikipedia.org/wiki/ Typographical_ligature
This is a speller enhancement, setting the component correctly, and adjusting priority. @nemeth -> there could be some hunspell setting in the affix file, as adding all ligature words to the dictionary seems only a hack. As of now, I can see no such setting. For en_US.aff, it could be: EQUAL 2 #treat characters as equal EQUAL fi fi EQUAL fl fl
.
Good news: Handling fi, fl etc. ligatures can be solved with the new ICONV feature of Hunspell 1.2.8 in the near future. But we need Unicode en_US dictionary and extended word breaking (ligatures are not word characters in OOo yet) to handle ligatures as word characters. Milek: Thanks. The syntax is quite similar to your suggestion: # input conversion ICONV 2 ICONV fi fi ICONV fl fl Moreover, optionally you can also add the following lines to the affix file to get suggestions with ligatures: # output conversion OCONV 2 OCONV fi fi OCONV fl fl It would be nice to make the OOo-typography extension: - automatic ligature conversion by autocorrection (bug: autocorrection decapitalizes capitalized not sentence-starting words) - automatic/manual ligature conversion by a macro (after file loading, before file saving or printing) - extended hyphenation dictionary with non-standard hyphenation patterns to hyphenate the words with ligatures (later we need to correct the hyphenator to calculate precise character counts with ligatures, too) - spelling dictionary with default ICONV and OCONV We might have OpenOffice.org with this extension looking like a semi-professional DTP program.
Is that not a duplicate of #4638 ?
I have solved the en/em dash problem of OpenOffice.org 3.2 with an improved English dictionary extension setting the ICONV for f ligatures, too: http://extensions.services.openoffice.org/en/project/dict-en-fixed With this English dictionaries the spell checker recognizes the words with Unicode f ligatures. Moreover, the improved hyphenation patterns hyphenate correctly the words with f ligatures, too. I plan to add an option to the English module of the Lightproof grammar checker extension to suggest ligatures (semiautomatic ligature handling). This is not an automatic OpenType solution, but it is a big help to use OpenOffice.org for more advanced tasks.
but what is the real benefit for OOo? I think nobody will enter ligatures manually into a text processor. Maybe a handful of people will do that but probably only for headlines not for every word in running text. So doesn't it make more sense to make ligatures a display-only thing (#4638) in OOo and internally keep the text literally letter-by-letter which would also not affect spell checking?
Yes, this is for a handful of people. I write a book about OpenOffice.org and DTP, and I have positive experience with the semiautomatic ligature handling. F ligatures are not too frequent (for example, 1,3% of the English words of Orwell's 1984 contain fi, fl, ff, ffi or ffl characters). Lightproof underlines these words and calling the local menu by Shift-F10 and choosing the alternative word form with Unicode f ligature are not annoying (at least for Hungarian texts, maybe with lesser ligatures). But for languages with minimal morphology (like English) the full automatic ligature replacement can be made an Autocorrect extension. OpenOffice.org has already supported Unicode f-ligatures in searching and capitalization, so recognizing them by the spell checker is a natural extension. Finally, we need a temporarily alternative for the upcoming Microsoft Office 2010 (with ligature handling). (I have no information about the OpenOffice.org development to this direction.) By the way, the automatic OpenType solution of ligature handling has also potential problems: some languages, for example German doesn't use ligatures at word part boundaries in compound words. Also the HYPHENMIN values depends from the usage of ligatures. The fi- can be in the end of the lines in Hungarian, but this hyphenation is deprecated with ligatures. Related issues: Issue 109543 (Update Hyphen hyphenation library (improved hyphenation) and en_US hyphenation patterns) Issue 71608 (Bad non-standard hyphenation of diaeresis and Unicode f ligatures) Issue 56348 (Special letter characters in first letter position is not handled by spell checking in Writer)
Adding reference to ligatures, there are Latin and Armenian ligatures in Unicode: http://www.unicode.org/charts/PDF/UFB00.pdf. They range from 0xFB00-0xFB06 and 0xFB13-0xFB17
Reset assigne to the default "issues@openoffice.apache.org".