Apache OpenOffice (AOO) Bugzilla – Issue 83545
helplinker dies while building helpcontent2/util/sbasic
Last modified: 2009-03-25 16:52:40 UTC
Output e.g. looks like the following: HelpLinker @/tmp/mkWmaqIU Making /tmp/kr/SRC680/helpcontent2/unxsols4.pro/bin/sbasic_en-US.zip from 1180 input files ............................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................................ done 0 time taken was 32.501 seconds dmake: Error code 139, while making '../../unxsols4.pro/bin/sbasic_en-US.zip' dmake: '../../unxsols4.pro/bin/sbasic_en-US.zip' removed. ---* tg_merge.mk *--- The "HelpLinker" seems to die during termination ...
*** Issue 84145 has been marked as a duplicate of this issue. ***
Same issue on Mac OS 10.5.1 / PowerPC
In fact this is not exactly the same issue (got error 255 instead of 139), but very similar
added ab as cc
raising prio. This breaks the build process on Solaris and as reported on Mac, whenever helpcontent2 is included. I already lost a whole CWS due to this issue; the CWS had to be shifted from OOo 2.4 to 3.0, and no new help CWSs will be accepted by QA without a Solaris build. More info in Bugtracker issue 152168
ericb->ufi It is possible to have more info than a not public link ? http://www.openoffice.org/issues/show_bug.cgi?id=152168 Is not reachable for me ...
As this hinders the help to build on Solaris in a CWS, this is P1! Reassigning to AB.
STARTED I doubt that this is really a P1. The background for setting this issue to P1 are build problems in cws hcshared14. But hcshared14 has m238 as current version. cws ab38 was integrated into m238 and brought an accidential change that made the already known helplinker problem worse. This problem was fixed in #i84529 for SRC680 m242 + OOH680 m1. So a resync of hcshared14 would probably solve the problem. To check this I will apply Ause's fix to xmlhelp in hcshared14. If this works, this issue is no P1.
ericb@ab Thanks for the info. From my side, I'll try to build m242, and I'll let you know.
Currently this issue does not have any target. First of all I set target 2.4. If it will be discussed to leave this for 2.x as it is and work on it for 3.c only, then it's OK for me. Or when it doesn't occurs anymore, than it can be closed.
As discussed with ufi I set this back to P2 for now. Anyway this task will be addressed as soon as possible.
If someone can reproduce the dying during termination issue, then does it make any difference if the contents of the dtor of Tokenizer::~Tokenizer() is removed
Issue occurs again on Mac OSX PowerPC , building m243 Build breaker -> P1 ericb@cmc No change removing the content of Tokenizer dtor: on Mac OS X 10.4, PowerPC (no problem on Intel), I got a crash, and I'll attach the log Issue occurs again on Mac OSX PowerPC , building m243 Build breaker -> P1
To add informations, build xmlhelp with symbols enabled AND defining CMCDEBUG 1 seems to work, means helpcontent2 build now in sbasic. Thanks to Caolan McNamara, I now have tracks to investigate. Seems to be a specific Mac OS X issue or something like that. I'll search in HelLinker.cxx changes (between m233 and now ), the cause is probably iside.
ab->ericb: I don't know if your Mac OS build problem is exactly the same as the Solaris build problem, but in this case unfortunately we cannot be sure that any tricks like building xmlhelp in a special way really help, because this problem has never been strictly reproducible. I've already made small changes and the build worked. I changed it back and it broke. Sounds good, doesn't it? Then I did the same change again and it didn't build anyway. :-( Currently I have a stable situation breaking when building schart_en-US.zip. Did your build succeed comletely after you got over the sbasic problem?
ericb->ab I'll try to answer you more completely: I spent some energy to resync some cws for m243. Now this is done, and I'll build more on PowerPC, thus I'll have more information. The starting point : my current Mac OS X is 10.4, on PowerPC arch, and I'"m building m243. Building xmlhelp (i.e. HelpLinker) without symbols -> DICTIONNARY creation ( empty file) causes a crash in HelpLinker Building xmlhelp after removing everything in Tokenizer (in HelpLinker.cxx) dtor does lead to HelpLinker crash too. Building it using symbols ( means build debug="non_empty_string" ), does not, and seems to work. -> helpcontent2 builds but takes hours ( already 2 hours, and still building to be precise) I'll confirm nothing bad occured asap. As summary : everything above is fully reproductible for me. Can you please tell me more about the changes you tried ? Maybe I can give it a try too ?
ericb->ab To isolate the change causing the breakage, I'll cross the tests 1) build helpcontent2 from m233 , xmlhelp from m243 2) build helpcontent2 from m243 using xmlhelp from m233 I remember it was perfectly building with m233. If you have a better idea ..
ab->ericb: I habe no ideas concerning your xmlhelp-without-symbols problem, as far as I know there's nothing similar on Solaris, the build alway dies silently. The most important changes to xmlhelp came with my cws ab38 into m238 introducing the extensible help feature. But all these change shouldn't affect the helplinker tool and the problem has been detected long before on Solaris. ab38 brought a compiler switch bug in xmlhelp/source/com/sun/star/help/ helplinker.pmk but this problem was fixed in m242. The changes I meant were only for debugging, playing around, skipping some help files to see if it's a special file etc. So unfortunately nothing really useful. I only mentioned it to point out how fragile the whole scenario is.
Created attachment 51137 [details] a thought
Here's a thought wrt. the specific original comment #0 death-on-exit problem. If that is reproducable after removing any exit 0 hacks, does this explicit linking to icudata make it stop
ab->cmc: Thanks, but I haven't managed to find a machine so far, that shows the original error scenario at all. This is an intelligent bug, it vanishes when the haunters come. :-) I'll keep trying...
I did see something with the inbuilt 3.6 icu after it exited. A valgrind warning about a pointer being freed. But no bt details. Using 3.8 icu and there was no such warning. Playing with the icu examples, I saw that the same warning appeared in the simple "break" example if I manually hacked out explicitly linking to libicudata and it disappeared on restoring it. i.e. valgrind ./break ==16516== Invalid free() / delete / delete[] ==16516== at 0x4A0560B: free (vg_replace_malloc.c:233) ==16516== by 0x36CC31043A: free_mem (dl-libc.c:235) ==16516== by 0x36CC30FFC9: __libc_freeres (set-freeres.c:47) ==16516== by 0x4802344: _vgnU_freeres (vg_preloaded.c:60) ==16516== by 0x36CC23513A: exit (exit.c:90) ==16516== by 0x36CC21E2BA: (below main) (libc-start.c:252) ==16516== Address 0x4F50870 is not stack'd, malloc'd or (recently) free'd As libicuuc links to libicudata anyway it would suggest a link ordering problem if libicuuc is not followed immediately by libicudata in a command line app.
ab->ericb: I'm not very happy with the fact that we're mixing up the Solaris build problem with your Mac OS 10.5.1 / PowerPC problem and after having a look at i84145 I'm sure, it's something else. Obviously you agree as you've reopened it yesterday. The "totally screwed" message doesn't appear at all in the Solaris scenario. So please let us keep these two issues and systems seperate. If the reason for the Solaris problem is found we still can check if this also applies for the MAC Power PC problem. But I see no reason to keep this issue on P1 because of a problem that's not really the same. Besides that you've mana- ged to build helpcontent2 in the mean time, if I understood correctly. -> Setting this back to P2
ericb->ab Ok, you're right. From my side, I have a lot of tracks, and directions to investigate, and I'll continue on issue 84145
Just to keep anybody interested posted, here is a stack of the crash (SIGSEGV) Uwe is facing: (dbx) where current thread: t@1 =>[1] rtl_allocateMemory(n = ???) (optimized), at 0x7f3abd98 (line ~228) in "alloc_global.c" [2] rtl_uString_ImplAlloc(nLen = ???) (optimized), at 0x7f3b9ec4 (line ~957) in "strtmpl.c" [3] rtl_uString_new_WithLength(ppThis = ???, nLen = ???) (optimized), at 0x7f3ba060 (line ~1053) in "strtmpl.c" [4] rtl_uStringbuffer_ensureCapacity(This = ???, capacity = ???, minimumCapacity = ???) (optimized), at 0x7f3bbdac (line ~111) in "ustrbuf.c" [5] rtl_uStringbuffer_insert(This = ???, capacity = ???, offset = ???, str = ???, len = ???) (optimized), at 0x7f3bbe18 (line ~135) in "ustrbuf.c" [6] __unnamed_GFEEKKBjjH0nJ::writeUcs4(pBuffer = ???, pCapacity = ???, nUtf32 = ???) (optimized), at 0x7f3bcf94 (line ~265) in "uri.cxx" [7] rtl_uriDecode(pText = ???, eMechanism = ???, eCharset = ???, pResult = ???) (optimized), at 0x7f3bdac4 (line ~683) in "uri.cxx" [8] osl_getSystemPathFromFileURL(ustrFileURL = ???, pustrSystemPath = ???) (optimized), at 0x7f3a8e68 (line ~255) in "file_url.cxx" [9] fs::path::native_file_string(this = ???) (optimized), at 0x433bc (line ~106) in "HelpCompiler.hxx" [10] JarOutputStream::commit(this = ???) (optimized), at 0x34610 (line ~4777) in "HelpLinker.cxx" [11] HelpLinker::link(this = ???) (optimized), at 0x38170 (line ~5310) in "HelpLinker.cxx" [12] HelpLinker::main(args = CLASS, pExtensionPath = ???) (optimized), at 0x3b368 (line ~5536) in "HelpLinker.cxx" [13] main(argc = ???, argv = ???) (optimized), at 0x3b9bc (line ~5547) in "HelpLinker.cxx"
adjusted target, because of simple workaround not a stopper
just a record... As ericb suggested, on PowerPC G5 Mac, Tiger, DEV300_m5, Rebuilding xmlhelp with % cd /Volumes/ooo-dev/Tiger/work.DEV300_m5.AQUA/DEV300_m5/xmlhelp % rm -rf unxmacxp.pro % build debug="non_empty_string" ... % deliver -force seems to solve this problem.
This problem is currently breaking our builds on the Solaris BuildBots (http://buildbot.go-oo.org/buildbot/Solaris-Intel and http://buildbot.go-oo.org/buildbot/Solaris-Sparc). Thus again raising the priority to P1.
Cannot build hcshared18 on both Solaris or Mac at the same point with the same erro no.
Resuming evaluation... Unfortunately no quick solution can be expected as we (kr, ab) have stopped evaluation the last time because we were running out of ideas. ab->jj: Have the BuildBot machines / configuration have been changed in any way? If yes it could be worth a try to change it back to the old state as the problem seems to be very sensible to even minor chan- ges in the environment. If I remember correctly Ause has made the experience that the problem first did not occur on some machines at all and then, after the first time it occured, it occured again and again. So maybe we should also consider to completely reset the BuildBot environments or even reboot the BuildBot machines if this is possible. I know that all this is no solution but we currently have no real one either. There's also an ugly but sometimes helpful patch that Ause applies to the HelpLinker code. Maybe we could also give this a try for now as the common strategy for this problem "repeat again and again on dif- ferent machines until you're through" could be difficult to adapt for BuildBots... :-)
If this is reproducible for someone, can they try http://www.openoffice.org/nonav/issues/showattachment.cgi/51137/trythis.patch to rule it in or out.
hjs->ab: the "exit early" patch (hack) is already applied on that machines since ages. otherwise we would have been unable to build months ago.
I can not build solaris sparc helpcontent2, while this error in past only sometimes occurred now it always break the build. I applied this "trythis.patch" but sadly it doesn't fixed the problem
I installed solaris 10 intel under qemu with SunStudio 12 and was unable to reproduce the crasher :-( Probably just too different from the buildbot compiler/environment to be able to reproduce that way.
Some general comments: - In the scope of using Lucene as indexer, the HelpLinker has been changed (dev300 m22). A lot of code has been deleted. Unfortunately this problem still occurs. - To reduce the chance of breaking the block #ifdef SOLARIS if( !bExtensionMode ) _exit( 0 ); #endif has been added to HelpLinker.cxx. To reproduce the problem and to debug this block should be commented out. - This problem does not occur always. Sometimes I changed something and the problem didn't appear any more, I changed it back and the problem came back, I did the first change again and the problem was still there. - The problem does occur both on Solaris Sparc and Solaris Intel. - Stacks do not show the problem but only the consequences. E.g. a stack found by gh showed a crash in addBookmark, but commenting out all calls to addBookmark did not solve the problem. The stack posted by kr on Jan 30 probably also is only one of many possible stacks. - kr once patched sal to use system allocation and the problem seemed to be gone. I've tried this again and found that also with system allocation the problem occurs. So this is also a dead end. - It did not help to block optimization for HelpLinker.cxx and HelpCompiler.cxx - It did not help to only use a small number of xhp files. - It did not help to avoid output to stdout completely (Ause told me about problems he saw concerning using the out streams). - It did not help to avoid all xslt operations.
Created attachment 55079 [details] patch for test
I've never been able to reproduce this, though I did try under qemu and solaris intel 10. Given that the _exit works around, I sort of suspect a global object dtor call on shutting down, where there is either something wrong with the order of destruction or something of that nature. Following that theory, does it make a difference to remove the icu lib now that it doesn't seem to be used by HelpLinker anymore on the theory that it is a global object belonging to icu that is the root of this ?
I've just checked the patch on a Solaris intel system and unfortunately it had no positive effect. By the way: If a lib isn't needed any more does it have any effect at all to still have it in the makefile? If no function is called in the lib it shouldn't be used anyway, should it?
"If a lib isn't needed any more does it have any effect at all to still have it in the makefile? If no function is called in the lib it shouldn't be used anyway, should it?" It would still get added to the link line and be a dependency: http://udrepper.livejournal.com/19395.html and increase even so slightly the size and startup time of the lib/application. All I got left are voodoo programming suggestions of reordering the libraries around a bit, e.g. moving $(SALLIB) to the end of the line and stuff like that
taking over (though it will take time before I can have a look)
On a Solaris 10 x86 machine (x42-so28) I got reproducible crashes in HelpLinker (after uncommenting the #ifdef SOLARIS block at xmlhelp/source/com/sun/star/help/HelpLinker.cxx:1.14 l. 652--655) when calling dmake in helpcontent2/util/sbasic on unxsoli4.pro DEV300m26: main calls IndexerPreProcessor::~IndexerPreProcessor (via inline HelpLinker::~HelpLinker) calls xsltFreeStylesheet (at HelpLinker.cxx l. 97) which SEGVs. Experimenting with the relative orders of the calls to xsltFreeStylesheet and the destruction of std::ifstream fileReader (created at HelpLinker.cxx l. 401), it appears that the std::ifstream destructor has an error that causes memory corruption. Solaris is virtually the only platform that still uses STLport-4.0 (all other use STLport-4.5, see stlport/makefile.mk:1.44). If cmc did his builds not using the default OOo STLport, that would also explain why cmc could not reproduce the crashes. I filed issue 92066 to upgrade Solaris to STLport-4.5, but for the meantime the simple fix of calling std::ifstream::close in xmlhelp/source/com/sun/star/help/HelpLinker.cxx:1.14.10.1 also seems to work around the problems. Builds of <http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fsb92> (with the fix included) on both Solaris-Intel (<http://buildbot.go-oo.org/buildbot/Solaris-Intel/builds/205>) and Solaris-Sparc (<http://buildbot.go-oo.org/buildbot/Solaris-Sparc/builds/193>) build bots did not fail in helpcontent2 (the Solaris-Sparc build failed further down due to unrelated issue 90172; also note that a Solaris-Sparc build of unfixed DEV300_m26, <http://buildbot.go-oo.org/buildbot/Solaris-Sparc/builds/192>, did also not fail in helpcontent2, however).
@gh: please verify
verified
.