Apache OpenOffice (AOO) Bugzilla – Issue 113587
Change case - Sentence case does not work correctly with mix languages text
Last modified: 2017-05-20 10:20:23 UTC
- new document - type "first exemple" - Set English to "first" and French to "exemple" - Select all - Format - Change case - Sentence case Expected: "First exemple" Result: "First Exemple" The "Sentence case" formatting seems to see a new sentence when a new language is set.
Other examples with "Capitalize Every Word" which might have the same root cause: Sentence: "the rapide brown fox-like creature" (where "rapide" is in French) Result: "The Rapide BroWn Fox-Like Creature"
Created attachment 71026 [details] Attaching bugdoc from original issue 113558
Adding andba to cc list in order to list the requirements for 'sentence case' before fixing this. Adding jurf because he/she was the submitter of the original issue. tl->jurf: Could you explain why you think sentence case should behave as you listed in the document (only the first character of the first word should become uppercase and everything else lowercase)? After all there are other possible choices like - keep words with all uppercase characters unchanged (e.g WWW, W.H.O.) - keep words with correct title case unchanged (e.g. Australia, Peter) usually if such words exist the user had intended them to be that way and thus it could be argued that those should not be modified in any way by sentence case. Also what would you expect if only a few words or even a single word within a sentence got selected and sentence case is applied? (At least what happens in other reference applications seems to be completely odd in this case...) Thanks in advance!
.
tl, thanks for your post. You ask two questions: BEHAVIOUR OF SENTENCE CASE After writing a first draft for this reply, it occurred to me that, logically, there's actually no reason to offer anything more complicated than capitalizing the first word of the sentence, with everything else lowercase. Two reasons: 1. Assuming that the main usage of Sentence case is to quickly correct blocks of text in all-caps or in all lower-case, guessing the casing of proper nouns, acronyms etc would be a cpu-heavy waste of resources, and pointless if the words aren't in a dictionary. In other words, the user would likely have to correct stuff manually anyway. 2. If the original text is mixed case (where some words are and should remain capitalized, like proper nouns and acronyms), it's unlikely a user would want or need to apply Sentence case, as the casing should, in theory, already be correct! In other words, and conversely, you're unlikely to come across a block of text whose sentences start with lowercase but which contain words that have and should have caps. Conclusion: keep it simple by just capitalizing the first letter of each sentence. ********************************************** SELECTION SCOPE I'd expect the selection(s) to (temporarily) expand to cover the entire sentence. This is what I've implemented in a case toggling macro that applies my own casing code: when it gets around to sentence case, it temporarily expands the selections to entire sentences, before restoring the original selections. As I'm not up to speed with textcursors and the like, I use temporary bookmarks to mark the original selections (RegExp would have been easier, but kills character formatting). The advantage of selection expansion is apparent when applying a casing routine to multiple selections. Instead of having to carefully select each sentence in its entirety (and risk accidentally clearing all selections by misclicking and having to start again), I can scoot through a document CTRL-clicking any bits of those sentences, before firing off the macro. Thanks again
TL->QA: For a list of all ligatures see http://www.unicode.org/charts/PDF/UFB00.pdf. They range from 0xFB00-0xFB06 and 0xFB13-0xFB17.
The list of ligs reminds me: another problem with Sentence case is that all the letters in a ligature at the start of a sentence are converted to all caps, not just the first letter. eg: find -> FInd fluke -> FLuke ſtop -> STop stop -> STop (same with ff, ffi and ffl, but those sequences don’t appear at the start of any words in English) I’ve copied this comment to issue 113584 (Capitalize Every Words), as it may be related.
Fixed in CWS sw33bf08
Thanks, tl, for the fix and also the one for issue 113584. I look forward to pawing my grubby mitts over whichever release gets the CWS first. :-)
tl->jurf: You need to thank ES for finding a crash. ^_- Otherwise it would hardly have been considered as show stopper and thus would have been addressed in 3.4 at best. Anyway I hope everything is fine now and will work without problems, although now the breakiterator is involved and that one sometimes behaves unexpected.
Verified in CWS sw33bf08.