Apache OpenOffice (AOO) Bugzilla – Issue 38719
Want CVS-friendly file-format
Last modified: 2013-02-07 22:33:08 UTC
I thought I might add an RFE related to a recent thread of mine on the mailing lists: In some cases I want to check in OOo documents in a CVS archive. I know the document format has built-in version handling, but sometimes I wish to integrate revision control of OOo docs with other data. For instance, I may want to retrieve matching versions of source code and OOo docs that document it. Also, I've grown so used to CVS that I'm pretty nervious about files that aren't in the archive... Now, OOo docs may indeed be checked into CVS, but you have to use the "binary" format, so you a full copy of the file will probably be appended on every check-in, and you can forget about getting useful diff listings or use "Tags" etc. In other words, a "more textual" format would be nice. There is "Flat XML", of course, but unfortunately, using that doesn't make a lot of different. The problem is that the XML code is encoded as a single line.
RFE
OpenOffice.org Issue Tracker - Feedback Request. The Issue you raised is currently 'Unconfirmed' pending review, but has not been updated within the last 3 years. Please consider re-testing with one of the latest versions of OOo, as the problem(s) may have already been addressed. Either use the recent stable version: http://download.openoffice.org/index.html or consider trying the new OOo 3 BETA (still in testing): http://download.openoffice.org/3.0beta/ Please report back the outcome so this Issue may be Closed or Progressed as necessary - otherwise it may be Resolved as Invalid in the future. You may also wish to search for (and note) any duplicates of this Issue that may have advanced further by checking the Issue Tracker: http://www.openoffice.org/issues/query.cgi Many thanks, Andrew Cleaning-up and Closing old Issues as part of: ~ The Grand Bug Squash, pre v3 ~ http://marketing.openoffice.org/3.0/announcementbeta.html
i can suggest as workaround implementing XSLT filter (¨as is¨) both import/export filter just copy source XML so you can easy save/open ¨as is¨ files <?xml version="1.0" encoding="UTF-8"?> <xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xs="http://www.w3.org/2001/XMLSchema" exclude-result-prefixes="xs" version="2.0"> <xsl:output indent="yes" method="xml"/> <xsl:template match="node()"> <xsl:copy> <xsl:copy-of select="@*"/> <xsl:apply-templates/> </xsl:copy> </xsl:template> </xsl:stylesheet>
Though the use of a plain XML is worth trying, it really doesn't solve it. I'm currently using .fodt files and it kinda works, except that useless information is getting to the .fodt and disrupts the purpose of RCS. I'll try giving some examples taken from a blank document like this: creating a new Writer document and saving it as a .fodt file, loading it again without "Load user settings" and "Load printer settings", resaving and filtering it through "tidy -utf8 -q -xml -i -w 0" for easier analysis. Compatibility-related options are included by default when they should not: "Options > Writer > Compatibility > Consider wrapping style when positioning Objects" is being included in the document as "ConfigTextWrapOnObjPos". It is a compatibility-only option that should really not appear in the file unless it is explicitly set to true or false. Why? Because if the file is loaded in an old-enough version of OOo, the option will not be understood at all by the old Writer, and if it is loaded into a future version of OOo, it should detect the OOo version from the Generator label and --only then-- set the option. This applies for other compatibility-related options as well. CurrentDatabaseDataSource: it is a useless empty-string value set at creation time and included in the file. AllowPrintJobCancel: I don't know for sure what this does, I must confess, but sounds like a setting that should be in the user settings in the PC instead of the document. The reason is that this is printer-dependant, not document- dependant. initial-creator: isn't this some info that only certain users will be interested in, and besides, by default it should NOT include any personal information in the document, unless explicitly requested? Also, this breaks RCS in the sense that this information should be stored in the RCS. Now, for the following format settings, there might be a good reason, I just fail to see how it is useful to include thwm before even being used/applied. style:font-face-style, maybe used by the subsequent default outline styles? Or is it the alias setting for fonts? text:outline-level-style: why is it that all levels, though unused, are included by default in the absolutely empty file? Now, for most of the above, I can just filter them out automatically by some RCS (like Git). However, what really breaks RCS in general is the following: I imported a Word document and saved it as .fodt. Filtered it through tidy and saved it as "version a". I loaded the .fodt and saved it again without any change, not even View or else. I saved the file again and filtered it through tidy to save it as "version b". A diff between both versions shows the following: 1. All xml:id where rewritten. 2. For some reason, it inclued a soft-page-break before some paragraphs. 3. It rewrote some style:names. 4. The new save included some PrinterSetup info that the fist .fodt export didn't. I repeated the fodt file load-and-saving procedure. Still, I found some differences (besides the ViewArea/Visible settings): 1. style:paragraph-properties was changed in a paragraph from style:writing- mode="page" to style:writing-mode="lr-tb". 2. More style:style style:name="P8" renaming. This last part, rather than describing an RFE, describes a bug that needs fixing. Thank you for your attention to this long comment.
To grep the issues easier via "requirements" I put the issues currently lying on my owner to the owner "requirements".