Issue 81911 - Another process created this directory in exactly this moment :-)
Summary: Another process created this directory in exactly this moment :-)
Status: CLOSED FIXED
Alias: None
Product: Build Tools
Classification: Code
Component: code (show other issues)
Version: 680m230
Hardware: All All
: P1 (highest) Trivial (vote)
Target Milestone: OOo 2.4
Assignee: hjs
QA Contact: issues@tools
URL:
Keywords:
Depends on:
Blocks:
 
Reported: 2007-09-24 19:55 UTC by pavel
Modified: 2007-11-27 09:44 UTC (History)
8 users (show)

See Also:
Issue Type: DEFECT
Latest Confirmation in: ---
Developer Difficulty: ---


Attachments
patch to systemactions.pm to squeze out some more information (1.54 KB, patch)
2007-11-08 16:39 UTC, hjs
no flags Details | Diff
previous patch didn't help for windows. maybe this one is better - still some debug output (2.20 KB, patch)
2007-11-13 11:52 UTC, hjs
no flags Details | Diff

Note You need to log in before you can comment on or make changes to this issue.
Description pavel 2007-09-24 19:55:49 UTC
Hi,

my parallel build failed with:

/usr//bin//perl -w /data/oo/BuildDir/ooo_SRC680_m230_src/solenv/bin/make_installer.pl -f ../util/
openoffice.lst -l en-US -p OpenOffice -u ../unxlngx6.pro -bu
ildid 9224 -msitemplate ../unxlngx6.pro/misc/openoffice/msi_templates -msilanguage ../
unxlngx6.pro/misc/win_ulffiles -format rpm
/usr//bin//perl -w /data/oo/BuildDir/ooo_SRC680_m230_src/solenv/bin/gen_update_info.pl --buildid 
9224 --arch "X86_64" --os "Linux" --lstfile ../util/openoffi
ce.lst --product OpenOffice --languages en-US ../util/update.xml > ../unxlngx6.pro/misc/
openoffice_en-US_Linux_X86_64.update.xml
/usr//bin//perl -w /data/oo/BuildDir/ooo_SRC680_m230_src/solenv/bin/make_installer.pl -f ../util/
openoffice.lst -l en-US -p OpenOffice -u ../unxlngx6.pro -bu
ildid 9224 -msitemplate ../unxlngx6.pro/misc/openoffice/msi_templates -msilanguage ../
unxlngx6.pro/misc/win_ulffiles -format deb
... checking environment variables ...
Creating Debian packages
... checking environment variables ...
... checking environment variables ...

**************************************************
ERROR: ERROR: Could not create directory: /data/oo/BuildDir/tmp/ooopackaging/i_92191190561116
in function: create_directory
**************************************************

**************************************************
ERROR: Saved logfile: logfile.log
**************************************************
... cleaning the output tree ...
Sun Sep 23 17:25:16 2007 (00:00 min.)
dmake:  Error code 255, while making 'ure_en-US.deb'
---* tg_merge.mk *---

Multiprocessing build is finished

logfile.log contains:

oo@amd64:~/BuildDir/ooo_SRC680_m230_src> cat instsetoo_native/util/logfile.log 

Another process created this directory in exactly this moment :-) : /data/oo/BuildDir/tmp/
ooopackaging

Sun Sep 23 17:25:16 2007 (00:00 min.)
################################################################
Command line arguments:
################################################################
-f
../util/openoffice.lst
-l
en-US
-p
URE
-u
../unxlngx6.pro
-buildid
9224
-format
deb
-msitemplate
../unxlngx6.pro/misc/ure/msi_templates
-msilanguage
../unxlngx6.pro/misc/win_ulffiles
Separator: /
Creating Debian packages
***************************************************************
ERROR: Could not create directory: /data/oo/BuildDir/tmp/ooopackaging/i_92191190561116
in function: create_directory
***************************************************************

Can we get rid of this?
Comment 1 ingo.schmidt-rosbiegal 2007-09-25 10:09:41 UTC
Packaging problem 
Comment 2 ingo.schmidt-rosbiegal 2007-10-10 09:14:54 UTC
ccing ause
Comment 3 ingo.schmidt-rosbiegal 2007-10-12 10:32:46 UTC
Accepting
Comment 4 ingo.schmidt-rosbiegal 2007-10-24 14:40:45 UTC
So the directory /data/oo/BuildDir/tmp/ooopackaging is created by process A with
privileges 775 and process B cannot create a subdirectory. Strange. I have many
old and empty directories with name "i_<number>" below the toplevel directory
"ooopackaging". So the normal case is, that all processes can create
subdirectories. Why is this a problem, when the toplevel directory is created
just a moment before by another process? Are the privileges not clear at that
time? Is it possible, that process B has to wait for example one second and
should then try to create the subdirectory?
Comment 5 pavel 2007-10-24 14:44:19 UTC
is: interesting questions.

Ivo: do you think this could be connected with our "tool's mkdir is not safe?" issue? #i82531#.
Comment 6 ivo.hinkelmann 2007-10-24 15:14:43 UTC
the installer is written in perl thus I don't think it uses old and ugly tools c
code
Comment 7 pavel 2007-10-25 19:38:10 UTC
Well, I can now reproduce it using the following:

oo@octopus:~/BuildDir/ooo_SRC680_m234_src/instsetoo_native> rm -rf ../../tmp/ooopackaging ../
unxlngx6.pro/; build -P8 -- -P2

it happens approx. in 2 out of ten tries.

The problems happens, when someone (who?) REMOVEs the ooopacking directory. Then someone else 
can't create that subdirectory.

Does this enlighten this a bit?
Comment 8 ingo.schmidt-rosbiegal 2007-10-26 09:31:48 UTC
This enlightens the problem, if this is really the occuring scenario. I will
check which process tries to delete the directory tmp/ooopackaging. Then the
solution is simple: Never remove inside the packaging process the directory
tmp/ooopackaging, but only the subdirectories tmp/ooopackaging/i_<uniquenumber>.
Only the latter are specific for one packaging process.
Comment 9 ingo.schmidt-rosbiegal 2007-10-26 11:36:14 UTC
Pavel, I just checked, that the packaging process never removes the directory
"ooopackaging". It is created, if required, but not added to the list of
directories, that have to be removed, when the package process has finished. Of
course the packaging process cannot be successful, if the ooopackaging directory
is removed from an external process during the packaging (this seems to be your
scenario). But this is not the problem of parallel builds, described in this
task. From all parallel builds, no process removes "ooopackaging".
Comment 10 pavel 2007-10-26 11:44:06 UTC
Hmm, then it is a build tool. maybe it removes *complete* tmp dir?

I see at least these cases in build.pl:

sub do_exit {
#    close_server_socket();
    my $exit_code = shift;
    $build_finished++;
    generate_html_file(1);
    rmtree(CorrectPath($tmp_dir), 0, 1) if ($tmp_dir);
    exit($exit_code);
};

and

sub print_error {
    my $message = shift;
    my $force = shift;
    rmtree(CorrectPath($tmp_dir), 0, 1) if ($tmp_dir);

Comment 11 pavel 2007-10-26 12:49:46 UTC
The build command prints this:

Multiprocessing build is finished
Maximal number of processes run: 4

PJ: removing tmp_dir: 2 (/home/oo/BuildDir/tmp/)

after I instrumented the above mentioned rmdirs with

    print STDERR "\nPJ: removing tmp_dir: 2 ($tmp_dir)\n";

So do_exit() removes *complete* temporary directory.
Comment 12 ingo.schmidt-rosbiegal 2007-10-26 13:22:10 UTC
Pavel, thank you for this investigations. Then this seems to be a problem of
build.pl, that must not remove the tmp-directory, if it is still required by
other processes.
IS -> Ause: Please do not remove tmp-dir, if it is still required.
Comment 13 ingo.schmidt-rosbiegal 2007-10-26 13:32:43 UTC
IS -> pjanik: Pavel, I just commited version 1.33.76.1 of
solenv/bin/modules/installer/systemactions.pm in cws native112. This version
tries twice to create the directory, checks the existence of the parent
directory and logs more information. I think, it cannot fix this task, if the
complete tmp-directory is removed during packaging process, but if you test
this, we can get perhaps more information.
Comment 14 vg 2007-10-29 11:33:09 UTC
@all: sorry, but IMHO, the build's tool behavior is correct. First of all, it
removes its temp directory when the error is occurred (that is when the tool
finishes its job), and second - it removes ITS directory, the one that is
created by the build tool itself. Just consult get_tmp_dir routine.
Comment 15 pavel 2007-10-31 18:46:48 UTC
is: my build with your change failed with the logfile.log (in the source tree, BTW! - in instsetoo_native/
util):

Did not succeed in creating directory: "/home/oo/BuildDir/tmp/ooopackaging". Further attempts will 
follow.

Another process created this directory in exactly this moment :-) : /home/oo/BuildDir/tmp/
ooopackaging

Did not succeed in creating directory: "/home/oo/BuildDir/tmp/ooopackaging/i_216031193819133". 
Further attempts will follow.

Wed Oct 31 09:25:33 2007 (00:00 min.)
################################################################
Command line arguments:
################################################################
-f
../util/openoffice.lst
-l
en-US
-p
URE
-u
../unxlngx6.pro
-buildid
9235
-format
deb
-msitemplate
../unxlngx6.pro/misc/ure/msi_templates
-msilanguage
../unxlngx6.pro/misc/win_ulffiles
Separator: /
Creating Debian packages
***************************************************************
ERROR: Failed to create the directory: /home/oo/BuildDir/tmp/ooopackaging/i_216031193819133
in function: create_directory
***************************************************************
Comment 16 pavel 2007-11-03 11:31:15 UTC
OK, i had to do 7 full builds because of this issue. Raising prio to 1.

So what should we do with it?
Comment 17 ingo.schmidt-rosbiegal 2007-11-05 09:08:41 UTC
is -> vg: Does the build tool remove the directory $TMP? I use in the packaging
process a global temp-directory $TMP/ooopackaging and inside this a specific
directory "i_<uniqueid>". Only this unique directory "i_<uniqueid>" is removed
at the end of the installation process.  Pavel has set $TMP to
"/home/oo/BuildDir/tmp". If this directory is removed by the build-tool, no
parallel packagng process can be successful.
Comment 18 vg 2007-11-05 10:02:29 UTC
vg->is: no, the build tool create ITS own unique temp directory, which has
nothing in common with $ENV{TMP}/ooopackaging...
Comment 19 pavel 2007-11-05 10:08:19 UTC
vg: how this plays with #desc12? Is it a bug in build?
Comment 20 pavel 2007-11-07 04:33:08 UTC
OK, it happened with unxlngi6.pro. logfile.log contains:

Did not succeed in creating directory: "/home/oo/BuildDir/tmp/ooopackaging". Further attempts will 
follow.

Another process created this directory in exactly this moment :-) : /home/oo/BuildDir/tmp/ooopackaging

Did not succeed in creating directory: "/home/oo/BuildDir/tmp/ooopackaging/i_253631194386729". 
Further attempts will follow.

Comment 21 pavel 2007-11-07 04:46:44 UTC
... which means that my TMP *really* exists at that time.

When I manually removed it now, build.pl correctly prints:

ERROR: the $TMP directory /home/oo/BuildDir/tmp does not exist! Please, create it first

So this gets even more interesting now ;-)
Comment 22 hjs 2007-11-08 16:39:03 UTC
Created attachment 49530 [details]
patch to systemactions.pm to squeze out some more information
Comment 23 pavel 2007-11-09 05:07:30 UTC
With ause's patch:

oo@octopus:~> grep ooopackaging log-GNU_Linux |head -2
dr----x--t  2 oo users 4096 Nov  8 21:37 /home/oo/BuildDir/tmp/ooopackaging
drwxrwxrwx  3 oo users 4096 Nov  8 21:37 /home/oo/BuildDir/tmp/ooopackaging
oo@octopus:~> 

the rest was OK. The build was OK this time though. I have started another one.
Comment 24 hjs 2007-11-09 15:36:29 UTC
next try would be the chmod lines from the patch without all those ls verbosity
- just to make sure it's no timing issue hidden by some more commands executed
Comment 25 pavel 2007-11-10 16:33:53 UTC
All credits to the following belong to ause ;-)

This is a problem:

./bin/modules/installer/parameter.pm:408:  installer::systemactions::create_directory_with_privileges
($installer::globals::temppath, "777");

Compare:

pavel@oo:/tmp> perl -e 'mkdir('ooopackaging', "0777")'; ls -ld ooopackaging; chmod 755 
ooopackaging; rm -rf ooopackaging
dr----x--t    2 pavel    users        4096 Nov 10 18:49 ooopackaging
pavel@oo:/tmp> perl -e 'mkdir('ooopackaging', 0777)'; ls -ld ooopackaging; chmod 755 ooopackaging; 
rm -rf ooopackaging
drwxr-xr-x    2 pavel    users        4096 Nov 10 18:49 ooopackaging
pavel@oo:/tmp> 

Ingo: this comes from native97. chmod should use octal numbers, not string - compare with

perldoc -f chmod
Comment 26 ingo.schmidt-rosbiegal 2007-11-12 11:02:21 UTC
IS -> pjanik: You are right, using octal values works better (for mkdir, not chmod).

So the fix is a one-liner in systemactions.pm:

diff -r1.33 systemactions.pm
109c109
< 		my $localprivileges = "0" . $privileges;
---
> 		my $localprivileges = oct($privileges); # changes "777" to 0777

Fixed in cws ause087.
Comment 27 hjs 2007-11-13 11:52:27 UTC
Created attachment 49621 [details]
previous patch didn't help for windows. maybe this one is better - still some debug output
Comment 28 hjs 2007-11-13 11:53:10 UTC
@is: please have a look at the new patch
Comment 29 ingo.schmidt-rosbiegal 2007-11-13 12:26:26 UTC
is -> ause: looks good. it contains more support for cygwin. please feel free to
integrate your patch. so I reassign this task to you.
Comment 30 hjs 2007-11-13 14:10:35 UTC
maybe i got fooled by outdated anoncvs and handmade patches on the buildbot. i
can't find any reason why
<               my $localprivileges = oct($privileges); # changes "777" to 0777
---
>               my $localprivileges = oct("0".$privileges); # changes "777" to 0777
could give different results...
Comment 31 hjs 2007-11-14 10:21:04 UTC
probably already fixed with the oneliner from is. enabled some more chmod lines
for windows/cygwin in the same file.
Comment 32 rt 2007-11-15 11:14:36 UTC
Verified on CWS ause087
Comment 33 pavel 2007-11-27 09:44:45 UTC
on master. Closing.

Thanks everyone involved. This was tough one ;-)

ause: yet another virtual non-alcoholic beer for you now :-)