Issue 83545 - helplinker dies while building helpcontent2/util/sbasic
Summary: helplinker dies while building helpcontent2/util/sbasic
Status: CLOSED FIXED
Alias: None
Product: utilities
Classification: Unclassified
Component: code (show other issues)
Version: 680m231
Hardware: All Solaris
: P1 (highest) Trivial (vote)
Target Milestone: OOo 3.0
Assignee: gregor.hartmann
QA Contact: Unknown
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-11-12 08:44 UTC by kay.ramme
Modified: 2009-03-25 16:52 UTC (History)
12 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
a thought (1.61 KB, patch)
2008-01-24 14:52 UTC, caolanm
no flags Details | Diff
patch for test (1.76 KB, patch)
2008-07-11 15:43 UTC, caolanm
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description kay.ramme 2007-11-12 08:44:27 UTC
Output e.g. looks like the following:

HelpLinker @/tmp/mkWmaqIU
Making /tmp/kr/SRC680/helpcontent2/unxsols4.pro/bin/sbasic_en-US.zip from 1180
input files

done
0
time taken was 32.501 seconds
dmake:  Error code 139, while making '../../unxsols4.pro/bin/sbasic_en-US.zip'
dmake:  '../../unxsols4.pro/bin/sbasic_en-US.zip' removed.
---* tg_merge.mk *---

The "HelpLinker" seems to die during termination ...
Comment 1 eric.bachard 2007-12-01 00:00:45 UTC
*** Issue 84145 has been marked as a duplicate of this issue. ***
Comment 2 eric.bachard 2007-12-01 00:01:15 UTC
Same issue on Mac OS  10.5.1 / PowerPC
Comment 3 eric.bachard 2007-12-01 14:46:19 UTC
In fact this  is not exactly the same issue (got error 255 instead of 139), but
very similar 
Comment 4 frank.thomas.peters 2007-12-18 10:59:25 UTC
added ab as cc
Comment 5 Uwe Fischer 2007-12-18 11:55:17 UTC
raising prio. This breaks the build process on Solaris and as reported on Mac,
whenever helpcontent2 is included. I already lost a whole CWS due to this issue;
the CWS had to be shifted from OOo 2.4 to 3.0, and no new help CWSs will be
accepted by QA without a Solaris build.
More info in Bugtracker issue 152168
Comment 6 eric.bachard 2008-01-05 14:04:42 UTC
ericb->ufi

It is possible to have more info than a not public link ?

http://www.openoffice.org/issues/show_bug.cgi?id=152168  Is not reachable for me
... 

Comment 7 kay.ramme 2008-01-14 16:20:57 UTC
As this hinders the help to build on Solaris in a CWS, this is P1! 

Reassigning to AB.
Comment 8 ab 2008-01-14 17:11:15 UTC
STARTED

I doubt that this is really a P1. The background for setting this issue to P1 are
build problems in cws hcshared14. But hcshared14 has m238 as current version.
cws ab38 was integrated into m238 and brought an accidential change that made
the already known helplinker problem worse. This problem was fixed in #i84529
for SRC680 m242 + OOH680 m1.

So a resync of hcshared14 would probably solve the problem. To check this I will
apply Ause's fix to xmlhelp in hcshared14. If this works, this issue is no P1.
Comment 9 eric.bachard 2008-01-14 17:21:51 UTC
ericb@ab

Thanks for the info. From my side, I'll try to build m242, and I'll let you know.

Comment 10 thorsten.ziehm 2008-01-15 14:23:33 UTC
Currently this issue does not have any target. First of all I set target 2.4. If
it will be discussed to leave this for 2.x as it is and work on it for 3.c only,
then it's OK for me. Or when it doesn't occurs anymore, than it can be closed.
Comment 11 ab 2008-01-18 08:03:34 UTC
As discussed with ufi I set this back to P2 for now. Anyway this task
will be addressed as soon as possible.
Comment 12 caolanm 2008-01-24 09:46:19 UTC
If someone can reproduce the dying during termination issue, then does it make
any difference if the contents of the dtor of Tokenizer::~Tokenizer() is removed 
Comment 13 eric.bachard 2008-01-24 09:49:10 UTC
Issue occurs again on Mac OSX PowerPC , building m243

Build breaker -> P1


ericb@cmc 

No change removing the content of Tokenizer dtor: on Mac OS X 10.4, PowerPC (no problem on Intel), I 
got a crash, and I'll attach the log

Issue occurs again on Mac OSX PowerPC , building m243

Build breaker -> P1


Comment 14 eric.bachard 2008-01-24 10:20:11 UTC
To add informations, build xmlhelp with symbols enabled AND defining CMCDEBUG 1 seems to work, 
means helpcontent2 build now in sbasic.

Thanks to Caolan McNamara, I now have tracks to investigate. Seems to be a  specific Mac OS X issue or 
something like that. I'll search in HelLinker.cxx changes (between m233 and now ), the cause is probably 
iside.
Comment 15 ab 2008-01-24 11:20:51 UTC
ab->ericb: I don't know if your Mac OS build problem is exactly the same as
the Solaris build problem, but in this case unfortunately we cannot be sure
that any tricks like building xmlhelp in a special way really help, because
this problem has never been strictly reproducible. I've already made small
changes and the build worked. I changed it back and it broke. Sounds good,
doesn't it? Then I did the same change again and it didn't build anyway. :-(

Currently I have a stable situation breaking when building schart_en-US.zip.
Did your build succeed comletely after you got over the sbasic problem?
Comment 16 eric.bachard 2008-01-24 12:57:56 UTC
ericb->ab

I'll try to answer you more completely: I spent some energy to resync some cws
for m243. Now this is done, and I'll build more on PowerPC, thus I'll have more
information.

The starting point : my current Mac OS X is 10.4, on PowerPC arch, and I'"m
building m243.

Building xmlhelp (i.e. HelpLinker) without symbols -> DICTIONNARY creation (
empty file) causes a crash in HelpLinker

Building xmlhelp after removing everything in Tokenizer (in HelpLinker.cxx) dtor
does lead to HelpLinker crash too.

Building it using symbols ( means  build debug="non_empty_string" ), does not,
and seems to work.
-> helpcontent2 builds but takes hours ( already 2 hours, and still building to
be precise)
I'll confirm nothing bad occured asap.

As summary : everything above is fully reproductible for me.

Can you please tell me more about the changes you tried ? Maybe I can give it a
try too ? 


Comment 17 eric.bachard 2008-01-24 13:40:18 UTC
ericb->ab

To isolate the change causing the breakage, I'll cross the tests 

1) build helpcontent2 from m233 , xmlhelp from m243

2) build helpcontent2 from m243 using xmlhelp from m233

I remember it was perfectly building with m233. If you have a better idea .. 
Comment 18 ab 2008-01-24 14:05:43 UTC
ab->ericb: I habe no ideas concerning your xmlhelp-without-symbols problem,
as far as I know there's nothing similar on Solaris, the build alway dies silently.
The most important changes to xmlhelp came with my cws ab38 into m238
introducing the extensible help feature. But all these change shouldn't affect
the helplinker tool and the problem has been detected long before on Solaris.
ab38 brought a compiler switch bug in xmlhelp/source/com/sun/star/help/
helplinker.pmk but this problem was fixed in m242.

The changes I meant were only for debugging, playing around, skipping some
help files to see if it's a special file etc. So unfortunately nothing really
useful.
I only mentioned it to point out how fragile the whole scenario is.
Comment 19 caolanm 2008-01-24 14:52:54 UTC
Created attachment 51137 [details]
a thought
Comment 20 caolanm 2008-01-24 14:54:47 UTC
Here's a thought wrt. the specific original comment #0 death-on-exit problem. If
that is reproducable after removing any exit 0 hacks, does this explicit linking
to icudata make it stop
Comment 21 ab 2008-01-25 10:59:41 UTC
ab->cmc: Thanks, but I haven't managed to find a machine so far, that shows
the original error scenario at all. This is an intelligent bug, it vanishes when
the haunters come. :-) I'll keep trying...
Comment 22 caolanm 2008-01-25 11:16:34 UTC
I did see something with the inbuilt 3.6 icu after it exited. A valgrind warning
about a pointer being freed. But no bt details. Using 3.8 icu and there was no
such warning. Playing with the icu examples, I saw that the same warning
appeared in the simple "break" example if I manually hacked out explicitly
linking to libicudata and it disappeared on restoring it. 

i.e.

valgrind ./break
==16516== Invalid free() / delete / delete[]
==16516==    at 0x4A0560B: free (vg_replace_malloc.c:233)
==16516==    by 0x36CC31043A: free_mem (dl-libc.c:235)
==16516==    by 0x36CC30FFC9: __libc_freeres (set-freeres.c:47)
==16516==    by 0x4802344: _vgnU_freeres (vg_preloaded.c:60)
==16516==    by 0x36CC23513A: exit (exit.c:90)
==16516==    by 0x36CC21E2BA: (below main) (libc-start.c:252)
==16516==  Address 0x4F50870 is not stack'd, malloc'd or (recently) free'd

As libicuuc links to libicudata anyway it would suggest a link ordering problem
if libicuuc is not followed immediately by libicudata in a command line app.
Comment 23 ab 2008-01-25 12:09:57 UTC
ab->ericb: I'm not very happy with the fact that we're mixing up the Solaris
build problem with your Mac OS  10.5.1 / PowerPC problem and after having a
look at i84145 I'm sure, it's something else. Obviously you agree as you've
reopened it yesterday. The "totally screwed" message doesn't appear at all
in the Solaris scenario.

So please let us keep these two issues and systems seperate. If the reason
for the Solaris problem is found we still can check if this also applies
for the MAC Power PC problem. But I see no reason to keep this issue on P1
because of a problem that's not really the same. Besides that you've mana-
ged to build helpcontent2 in the mean time, if I understood correctly.
-> Setting this back to P2
Comment 24 eric.bachard 2008-01-25 16:23:36 UTC
ericb->ab

Ok, you're right.

From my side, I have a lot of tracks, and directions to investigate, and I'll
continue on issue 84145

Comment 25 kay.ramme 2008-01-30 08:53:46 UTC
Just to keep anybody interested posted, here is a stack of the crash (SIGSEGV)
Uwe is facing:

(dbx) where     
current thread: t@1
=>[1] rtl_allocateMemory(n = ???) (optimized), at 0x7f3abd98 (line ~228) in
"alloc_global.c"
  [2] rtl_uString_ImplAlloc(nLen = ???) (optimized), at 0x7f3b9ec4 (line ~957)
in "strtmpl.c"
  [3] rtl_uString_new_WithLength(ppThis = ???, nLen = ???) (optimized), at
0x7f3ba060 (line ~1053) in "strtmpl.c"
  [4] rtl_uStringbuffer_ensureCapacity(This = ???, capacity = ???,
minimumCapacity = ???) (optimized), at 0x7f3bbdac (line ~111) in "ustrbuf.c"
  [5] rtl_uStringbuffer_insert(This = ???, capacity = ???, offset = ???, str =
???, len = ???) (optimized), at 0x7f3bbe18 (line ~135) in "ustrbuf.c"
  [6] __unnamed_GFEEKKBjjH0nJ::writeUcs4(pBuffer = ???, pCapacity = ???, nUtf32
= ???) (optimized), at 0x7f3bcf94 (line ~265) in "uri.cxx"
  [7] rtl_uriDecode(pText = ???, eMechanism = ???, eCharset = ???, pResult =
???) (optimized), at 0x7f3bdac4 (line ~683) in "uri.cxx"
  [8] osl_getSystemPathFromFileURL(ustrFileURL = ???, pustrSystemPath = ???)
(optimized), at 0x7f3a8e68 (line ~255) in "file_url.cxx"
  [9] fs::path::native_file_string(this = ???) (optimized), at 0x433bc (line
~106) in "HelpCompiler.hxx"
  [10] JarOutputStream::commit(this = ???) (optimized), at 0x34610 (line ~4777)
in "HelpLinker.cxx"
  [11] HelpLinker::link(this = ???) (optimized), at 0x38170 (line ~5310) in
"HelpLinker.cxx"
  [12] HelpLinker::main(args = CLASS, pExtensionPath = ???) (optimized), at
0x3b368 (line ~5536) in "HelpLinker.cxx"
  [13] main(argc = ???, argv = ???) (optimized), at 0x3b9bc (line ~5547) in
"HelpLinker.cxx"
Comment 26 uwe.luebbers 2008-02-08 14:19:18 UTC
adjusted target, because of simple workaround not a stopper
Comment 27 maho.nakata 2008-03-30 02:41:11 UTC
just a record...
As ericb suggested,
on PowerPC G5 Mac, Tiger, DEV300_m5,

Rebuilding xmlhelp with
% cd /Volumes/ooo-dev/Tiger/work.DEV300_m5.AQUA/DEV300_m5/xmlhelp
% rm -rf unxmacxp.pro
% build debug="non_empty_string"
...
% deliver -force

seems to solve this problem.
Comment 28 epost 2008-05-28 14:45:01 UTC
This problem is currently breaking our builds on the Solaris BuildBots
(http://buildbot.go-oo.org/buildbot/Solaris-Intel and
http://buildbot.go-oo.org/buildbot/Solaris-Sparc). Thus again raising the
priority to P1.
Comment 29 frank.thomas.peters 2008-05-28 21:50:40 UTC
Cannot build hcshared18 on both Solaris or Mac at the same point with the same
erro no.
Comment 30 ab 2008-05-29 10:04:48 UTC
Resuming evaluation...

Unfortunately no quick solution can be expected as we (kr, ab) have 
stopped evaluation the last time because we were running out of ideas.

ab->jj: Have the BuildBot machines / configuration have been changed
in any way? If yes it could be worth a try to change it back to the
old state as the problem seems to be very sensible to even minor chan-
ges in the environment.

If I remember correctly Ause has made the experience that the problem
first did not occur on some machines at all and then, after the first
time it occured, it occured again and again. So maybe we should also
consider to completely reset the BuildBot environments or even reboot
the BuildBot machines if this is possible. I know that all this is no
solution but we currently have no real one either.

There's also an ugly but sometimes helpful patch that Ause applies to
the HelpLinker code. Maybe we could also give this a try for now as
the common strategy for this problem "repeat again and again on dif-
ferent machines until you're through" could be difficult to adapt
for BuildBots... :-)
Comment 31 caolanm 2008-05-29 10:15:10 UTC
If this is reproducible for someone, can they try 
http://www.openoffice.org/nonav/issues/showattachment.cgi/51137/trythis.patch
to rule it in or out.
Comment 32 hjs 2008-06-03 17:48:02 UTC
hjs->ab: the "exit early" patch (hack) is already applied on that machines since
ages. otherwise we would have been unable to build months ago.
Comment 33 ivo.hinkelmann 2008-06-09 14:59:24 UTC
I can not build solaris sparc helpcontent2, while this error in past only
sometimes occurred now it always break the build.

I applied this "trythis.patch" but sadly it doesn't fixed the problem
Comment 34 caolanm 2008-06-12 11:36:23 UTC
I installed solaris 10 intel under qemu with SunStudio 12 and was unable to
reproduce the crasher :-( Probably just too different from the buildbot
compiler/environment to be able to reproduce that way.
Comment 35 ab 2008-07-11 15:25:36 UTC
Some general comments:

- In the scope of using Lucene as indexer, the HelpLinker has been
changed (dev300 m22). A lot of code has been deleted. Unfortunately 
this problem still occurs.

- To reduce the chance of breaking the block

#ifdef SOLARIS
	if( !bExtensionMode )
		_exit( 0 );
#endif

has been added to HelpLinker.cxx. To reproduce the problem and to
debug this block should be commented out.

- This problem does not occur always. Sometimes I changed something
and the problem didn't appear any more, I changed it back and the
problem came back, I did the first change again and the problem was
still there.

- The problem does occur both on Solaris Sparc and Solaris Intel.

- Stacks do not show the problem but only the consequences. E.g.
a stack found by gh showed a crash in addBookmark, but commenting
out all calls to addBookmark did not solve the problem. The stack 
posted by kr on Jan 30 probably also is only one of many possible 
stacks.

- kr once patched sal to use system allocation and the problem
seemed to be gone. I've tried this again and found that also with
system allocation the problem occurs. So this is also a dead end.

- It did not help to block optimization for HelpLinker.cxx and
HelpCompiler.cxx

- It did not help to only use a small number of xhp files.

- It did not help to avoid output to stdout completely (Ause
told me about problems he saw concerning using the out streams).

- It did not help to avoid all xslt operations.
Comment 36 caolanm 2008-07-11 15:43:42 UTC
Created attachment 55079 [details]
patch for test
Comment 37 caolanm 2008-07-11 15:47:09 UTC
I've never been able to reproduce this, though I did try under qemu and solaris
intel 10. Given that the _exit works around, I sort of suspect a global object
dtor call on shutting down, where there is either something wrong with the order
of destruction or something of that nature. Following that theory, does it make
a difference to remove the icu lib now that it doesn't seem to be used by
HelpLinker anymore on the theory that it is a global object belonging to icu
that is the root of this ?
Comment 38 ab 2008-07-11 16:02:50 UTC
I've just checked the patch on a Solaris intel system and unfortunately
it had no positive effect. By the way: If a lib isn't needed any more does
it have any effect at all to still have it in the makefile? If no function is
called in the lib it shouldn't be used anyway, should it?
Comment 39 caolanm 2008-07-11 16:38:15 UTC
"If a lib isn't needed any more does it have any effect at all to still have it
in the makefile? If no function is called in the lib it shouldn't be used
anyway, should it?"

It would still get added to the link line
and be a dependency: http://udrepper.livejournal.com/19395.html and increase
even so slightly the size and startup time of the lib/application.

All I got left are voodoo programming suggestions of reordering the libraries
around a bit, e.g. moving $(SALLIB) to the end of the line and stuff like that

Comment 40 Stephan Bergmann 2008-07-15 15:29:32 UTC
taking over (though it will take time before I can have a look)
Comment 41 Stephan Bergmann 2008-07-24 08:35:54 UTC
On a Solaris 10 x86 machine (x42-so28) I got reproducible crashes in HelpLinker
(after uncommenting the #ifdef SOLARIS block at
xmlhelp/source/com/sun/star/help/HelpLinker.cxx:1.14 l. 652--655) when calling
dmake in helpcontent2/util/sbasic on unxsoli4.pro DEV300m26: main calls
IndexerPreProcessor::~IndexerPreProcessor (via inline HelpLinker::~HelpLinker)
calls xsltFreeStylesheet (at HelpLinker.cxx l. 97) which SEGVs.  Experimenting
with the relative orders of the calls to xsltFreeStylesheet and the destruction
of std::ifstream fileReader (created at HelpLinker.cxx l. 401), it appears that
the std::ifstream destructor has an error that causes memory corruption.

Solaris is virtually the only platform that still uses STLport-4.0 (all other
use STLport-4.5, see stlport/makefile.mk:1.44).  If cmc did his builds not using
the default OOo STLport, that would also explain why cmc could not reproduce the
crashes.

I filed issue 92066 to upgrade Solaris to STLport-4.5, but for the meantime the
simple fix of calling std::ifstream::close in
xmlhelp/source/com/sun/star/help/HelpLinker.cxx:1.14.10.1 also seems to work
around the problems.  Builds of
<http://eis.services.openoffice.org/EIS2/cws.ShowCWS?Path=DEV300%2Fsb92> (with
the fix included) on both Solaris-Intel
(<http://buildbot.go-oo.org/buildbot/Solaris-Intel/builds/205>) and
Solaris-Sparc (<http://buildbot.go-oo.org/buildbot/Solaris-Sparc/builds/193>)
build bots did not fail in helpcontent2 (the Solaris-Sparc build failed further
down due to unrelated issue 90172; also note that a Solaris-Sparc build of
unfixed DEV300_m26,
<http://buildbot.go-oo.org/buildbot/Solaris-Sparc/builds/192>, did also not fail
in helpcontent2, however).
Comment 42 Stephan Bergmann 2008-07-24 08:48:26 UTC
@gh: please verify
Comment 43 gregor.hartmann 2008-07-29 20:30:47 UTC
verified
Comment 44 gregor.hartmann 2009-03-25 16:52:40 UTC
.