Apache OpenOffice (AOO) Bugzilla – Issue 128492
Buggy Index generator
Last modified: 2021-11-14 10:53:10 UTC
Generated Index is inconsistent and some of it makes no sense. There seem to be two underlying problems: 1. OO Writer seems to mark index terms inside the document to be indexed. Even after terms are removed from the concordance file past index terms appear in the Index. In addition the new version of OO does not even show places where a document has been marked. SUGGESTIONS: (A) if marking is necessary it should be done in a temporary document with a known name to be created each time the Index is rebuilt. (B) there should be an option in the Index definition panel to clear any markings in a document should it have been marked in the past. 2. automatically inserted hyphenation marks should be ignored for purpose of indexing. Thus COMPUTER COM-PUTER COM-PU-TER should all be considered as COMPUTER. There should also be an option to remove all optional hyphenation markers. The above issues were noted in the past.
PS. After posting the above saw that gray background marked terms can be made visible by view > Field Shadings
What kind of Index did you try? About: "OO Writer seems to mark index terms inside the document to be indexed. Even after terms are removed from the concordance file past index terms appear in the Index." If you look into the Navigator in the sidepane, there should be an option Index. If you open the point, you get a list of all indexes. You can Click on the Index in question and in the menue select Index -> Update. In this moment OpenOffice will update all displayed indexes. You can also Click the Outdated Index. About: "In addition the new version of OO does not even show places where a document has been marked. " As you noted yourself, it depends on your Options. About: "(B) there should be an option in the Index definition panel to clear any markings in a document should it have been marked in the past" You mean remove the Index? You can do that with right click, Index -> remove. About: "2. automatically inserted hyphenation marks should be ignored for purpose of indexing. Thus COMPUTER COM-PUTER COM-PU-TER should all be considered as COMPUTER. There should also be an option to remove all optional hyphenation markers." Indeed this is not recognized. Starting Point for a Dev would be the break iterator I guess. Check [1] [1] https://wiki.openoffice.org/wiki/Writer/Text_Formatting
The Hyphen identification is covered in -> 23541 In kind one could argue 40930 is similar. But not the same since it is About word boundaries. So maybe if we add hyphens we could also give an option to define other none boundary breaking characters.
“If you look into the Navigator in the sidepane, there should be an option Index. If you open the point, you get a list of all indexes. You can Click on the Index in question and in the menue select Index -> Update. In this moment OpenOffice will update all displayed indexes. You can also Click the Outdated Index.” Updating the Index, even if the Index parameters are edited, retains all the old “marked” (gray background terns etc). And the errors recur. MY RESPONSE: (The indexer “marks” up the document based on the Concordance file and even if some terms in the file are later dropped and the Index is updated the markings are retained in the document and the obsolete terms etc still appear in the updated Index. The best and maybe only solution seems to be an option to clear all marked text somehow, which should be simple to implement. *** "(B) there should be an option in the Index definition panel to clear any markings in a document should it have been marked in the past" You mean remove the Index? You can do that with right click, Index -> remove. MY RESPONSE: Removing the Index leaves the document marked up. That is the source of the problem. An option is needed to remove all markings.
Save our time and provide: - a sample document - two screenshot with noted and expected result - a step-by-step explanation to reproducing
Created attachment 87073 [details] Shows the issues
Sent ZIP file containing: Sample .odf file Concordance file Screen shots ISSUES: 1. How to remove gray from ALL terms marked by gray (which is set by indexer) with a single command or macro? 2. corporation twice in Index, once hyphenated 3. hyphenated only appears once although once it has hyphen