Apache OpenOffice (AOO) Bugzilla – Issue 128019
Replace OpenOffice string implementation with Standard Library string implementation
Last modified: 2019-03-26 08:20:42 UTC
The goal is to use the standard implementation of String template instead our own implementation.
Which one? We have 6 string implementations: https://wiki.openoffice.org/wiki/Hacking#Can_I_get_a_char_.2A.2C_please.3F
Do we need 6? shouldn't be one enough?
We have C string structs and C++ string wrapper classes around those, found in main/sal, in ASCII and "Unicode" (UTF-16) versions, with 2^32 chars max length. Another 2 are in main/tools, 2^16 chars max length, used by Calc, StarBasic, possibly more. Keeping max string length in a 16 bit instead of 32 bit length field probably saves a lot of space in spreadsheets with lots of cells; Excel also does this. Apart from being based on sal_Char / sal_Unicode instead of native C++ types, they contain many functions not found in C++ standard library strings, eg. conversion to/from integer and double, string tokenization, interning, comparison of Unicode strings against ASCII, etc. Given the move to UTF8-only languages lately (Go, Rust), and the UTF-8 everywhere manifesto (https://utf8everywhere.org), we could consider eliminating the UTF-16 strings, and using the ASCII strings as UTF-8. That would however require fixing all code to traverse code points instead of code units, something it probably does wrong already.
I added the 67649 for reference on an issue. Because an String Overhaul has the high possibility of fixing the other Bug. (IMHO)
I like the UTF-8 approach as described on https://utf8everywhere.org/ but I have not many insights on alternatives. I think we should decouple the string implementation from OpenOffice. This would allow us to be able to change and maintain this part easier. Also we would need valid convertors for the Other UTF definitions.I think maybe it makes sense to base the string implementation on STL, then have a Own string class that adds the features we need, hidden behind an interface. And we have to check the API. This is I think the most hideous part, based on the FOSDEM presentation. https://ftp.fau.de/fosdem/2018/AW1.120/ode_uri.mp4 UTF concerning part starts around 10 Minutes.Very interesting talk, thanks to Stephan Bergmann.