113587 – Change case - Sentence case does not work correctly with mix languages text

Issue 113587 - Change case - Sentence case does not work correctly with mix languages text

Summary: Change case - Sentence case does not work correctly with mix languages text

Status:	CLOSED FIXED

Alias:	None

Product:	Writer
Classification:	Application
Component:	formatting (show other issues)
Version:	OOO330m1
Hardware:	All All

Importance:	P3 Trivial (vote)
Target Milestone:	---
Assignee:	stefan.baltzer
QA Contact:	issues@sw

URL:	http://specs.openoffice.org/writer/ch...
Keywords:

Depends on:
Blocks:	111112
	Show dependency tree

Reported:	2010-08-02 14:16 UTC by eric.savary
Modified:	2017-05-20 10:20 UTC (History)
CC List:	4 users (show)

See Also:
Issue Type:	DEFECT
Latest Confirmation in:	---
Developer Difficulty:	---

Attachments
Attaching bugdoc from original issue 113558 (21.26 KB, application/octet-stream) 2010-08-10 13:09 UTC, thomas.lange	no flags	Details
Add an attachment (proposed patch, testcase, etc.)

Note You need to log in before you can comment on or make changes to this issue.

Description eric.savary 2010-08-02 14:16:42 UTC

- new document
- type "first exemple"
- Set English to "first" and French to "exemple"
- Select all
- Format - Change case - Sentence case

Expected: "First exemple"
Result: "First Exemple"

The "Sentence case" formatting seems to see a new sentence when a new language
is set.

Comment 1 eric.savary 2010-08-02 14:50:11 UTC

Other examples with "Capitalize Every Word" which might have the same root cause:

Sentence: "the rapide brown fox-like creature" (where "rapide" is in French)
Result: "The Rapide BroWn Fox-Like Creature"

Comment 2 thomas.lange 2010-08-10 13:10:00 UTC

Created attachment 71026 [details]
Attaching bugdoc from original issue 113558

Comment 3 thomas.lange 2010-08-10 13:21:19 UTC

Adding andba to cc list in order to list the requirements for 'sentence case'
before fixing this. 
Adding jurf because he/she was the submitter of the original issue.

tl->jurf: Could you explain why you think sentence case should behave as you
listed in the document (only the first character of the first word should become
uppercase and everything else lowercase)? After all there are other possible
choices like 
- keep words with all uppercase characters unchanged (e.g WWW, W.H.O.)
- keep words with correct title case unchanged (e.g. Australia, Peter)
usually if such words exist the user had intended them to be that way and thus
it could be argued that those should not be modified in any way by sentence case.

Also what would you expect if only a few words or even a single word within a
sentence got selected and sentence case is applied? (At least what happens in
other reference applications seems to be completely odd in this case...)

Thanks in advance!

Comment 4 thomas.lange 2010-08-10 13:26:12 UTC

Comment 5 jurf 2010-08-10 18:13:17 UTC

tl, thanks for your post. You ask two questions:

BEHAVIOUR OF SENTENCE CASE

After writing a first draft for this reply, it occurred to me that, logically,
there's actually no reason to offer anything more complicated than capitalizing
the first word of the sentence, with everything else lowercase.

Two reasons:

1. Assuming that the main usage of Sentence case is to quickly correct blocks of
text in all-caps or in all lower-case, guessing the casing of proper nouns,
acronyms etc would be a cpu-heavy waste of resources, and pointless if the words
aren't in a dictionary. In other words, the user would likely have to correct
stuff manually anyway.

2. If the original text is mixed case (where some words are and should remain
capitalized, like proper nouns and acronyms), it's unlikely a user would want or
need to apply Sentence case, as the casing should, in theory, already be
correct! In other words, and conversely, you're unlikely to come across a block
of text whose sentences start with lowercase but which contain words that have
and should have caps.

Conclusion: keep it simple by just capitalizing the first letter of each sentence.

**********************************************

SELECTION SCOPE

I'd expect the selection(s) to (temporarily) expand to cover the entire sentence.

This is what I've implemented in a case toggling macro that applies my own
casing code: when it gets around to sentence case, it temporarily expands the
selections to entire sentences, before restoring the original selections. As I'm
not up to speed with textcursors and the like, I use temporary bookmarks to mark
the original selections (RegExp would have been easier, but kills character
formatting).

The advantage of selection expansion is apparent when applying a casing routine
to multiple selections. Instead of having to carefully select each sentence in
its entirety (and risk accidentally clearing all selections by misclicking and
having to start again), I can scoot through a document CTRL-clicking any bits of
those sentences, before firing off the macro.

Thanks again

Comment 6 thomas.lange 2010-08-11 06:13:41 UTC

TL->QA: For a list of all ligatures see
http://www.unicode.org/charts/PDF/UFB00.pdf. They range from 0xFB00-0xFB06 and
0xFB13-0xFB17.

Comment 7 jurf 2010-08-11 13:07:39 UTC

The list of ligs reminds me: another problem with Sentence case is that all the
letters in a ligature at the start of a sentence are converted to all caps, not
just the first letter. eg:

ﬁnd -> FInd
ﬂuke -> FLuke
ﬅop -> STop
ﬆop -> STop

(same with ﬀ, ﬃ and ﬄ, but those sequences don’t appear at the start of any
words in English)

I’ve copied this comment to issue 113584 (Capitalize Every Words), as it may be
related.

Comment 8 thomas.lange 2010-08-18 07:51:59 UTC

Fixed in CWS sw33bf08

Comment 9 thomas.lange 2010-08-19 10:01:03 UTC

Comment 10 jurf 2010-08-20 08:28:35 UTC

Thanks, tl, for the fix and also the one for issue 113584.
I look forward to pawing my grubby mitts over whichever release gets the CWS
first. :-)

Comment 11 thomas.lange 2010-08-20 08:45:00 UTC

tl->jurf: You need to thank ES for finding a crash. ^_- Otherwise it would
hardly have been considered as show stopper and thus would have been addressed
in 3.4 at best. Anyway I hope everything is fine now and will work without
problems, although now the breakiterator is involved and that one sometimes
behaves unexpected.

Comment 12 stefan.baltzer 2010-08-23 14:47:53 UTC

Verified in CWS sw33bf08.