Apache OpenOffice (AOO) Bugzilla – Issue 81911
Another process created this directory in exactly this moment :-)
Last modified: 2007-11-27 09:44:46 UTC
Hi, my parallel build failed with: /usr//bin//perl -w /data/oo/BuildDir/ooo_SRC680_m230_src/solenv/bin/make_installer.pl -f ../util/ openoffice.lst -l en-US -p OpenOffice -u ../unxlngx6.pro -bu ildid 9224 -msitemplate ../unxlngx6.pro/misc/openoffice/msi_templates -msilanguage ../ unxlngx6.pro/misc/win_ulffiles -format rpm /usr//bin//perl -w /data/oo/BuildDir/ooo_SRC680_m230_src/solenv/bin/gen_update_info.pl --buildid 9224 --arch "X86_64" --os "Linux" --lstfile ../util/openoffi ce.lst --product OpenOffice --languages en-US ../util/update.xml > ../unxlngx6.pro/misc/ openoffice_en-US_Linux_X86_64.update.xml /usr//bin//perl -w /data/oo/BuildDir/ooo_SRC680_m230_src/solenv/bin/make_installer.pl -f ../util/ openoffice.lst -l en-US -p OpenOffice -u ../unxlngx6.pro -bu ildid 9224 -msitemplate ../unxlngx6.pro/misc/openoffice/msi_templates -msilanguage ../ unxlngx6.pro/misc/win_ulffiles -format deb ... checking environment variables ... Creating Debian packages ... checking environment variables ... ... checking environment variables ... ************************************************** ERROR: ERROR: Could not create directory: /data/oo/BuildDir/tmp/ooopackaging/i_92191190561116 in function: create_directory ************************************************** ************************************************** ERROR: Saved logfile: logfile.log ************************************************** ... cleaning the output tree ... Sun Sep 23 17:25:16 2007 (00:00 min.) dmake: Error code 255, while making 'ure_en-US.deb' ---* tg_merge.mk *--- Multiprocessing build is finished logfile.log contains: oo@amd64:~/BuildDir/ooo_SRC680_m230_src> cat instsetoo_native/util/logfile.log Another process created this directory in exactly this moment :-) : /data/oo/BuildDir/tmp/ ooopackaging Sun Sep 23 17:25:16 2007 (00:00 min.) ################################################################ Command line arguments: ################################################################ -f ../util/openoffice.lst -l en-US -p URE -u ../unxlngx6.pro -buildid 9224 -format deb -msitemplate ../unxlngx6.pro/misc/ure/msi_templates -msilanguage ../unxlngx6.pro/misc/win_ulffiles Separator: / Creating Debian packages *************************************************************** ERROR: Could not create directory: /data/oo/BuildDir/tmp/ooopackaging/i_92191190561116 in function: create_directory *************************************************************** Can we get rid of this?
Packaging problem
ccing ause
Accepting
So the directory /data/oo/BuildDir/tmp/ooopackaging is created by process A with privileges 775 and process B cannot create a subdirectory. Strange. I have many old and empty directories with name "i_<number>" below the toplevel directory "ooopackaging". So the normal case is, that all processes can create subdirectories. Why is this a problem, when the toplevel directory is created just a moment before by another process? Are the privileges not clear at that time? Is it possible, that process B has to wait for example one second and should then try to create the subdirectory?
is: interesting questions. Ivo: do you think this could be connected with our "tool's mkdir is not safe?" issue? #i82531#.
the installer is written in perl thus I don't think it uses old and ugly tools c code
Well, I can now reproduce it using the following: oo@octopus:~/BuildDir/ooo_SRC680_m234_src/instsetoo_native> rm -rf ../../tmp/ooopackaging ../ unxlngx6.pro/; build -P8 -- -P2 it happens approx. in 2 out of ten tries. The problems happens, when someone (who?) REMOVEs the ooopacking directory. Then someone else can't create that subdirectory. Does this enlighten this a bit?
This enlightens the problem, if this is really the occuring scenario. I will check which process tries to delete the directory tmp/ooopackaging. Then the solution is simple: Never remove inside the packaging process the directory tmp/ooopackaging, but only the subdirectories tmp/ooopackaging/i_<uniquenumber>. Only the latter are specific for one packaging process.
Pavel, I just checked, that the packaging process never removes the directory "ooopackaging". It is created, if required, but not added to the list of directories, that have to be removed, when the package process has finished. Of course the packaging process cannot be successful, if the ooopackaging directory is removed from an external process during the packaging (this seems to be your scenario). But this is not the problem of parallel builds, described in this task. From all parallel builds, no process removes "ooopackaging".
Hmm, then it is a build tool. maybe it removes *complete* tmp dir? I see at least these cases in build.pl: sub do_exit { # close_server_socket(); my $exit_code = shift; $build_finished++; generate_html_file(1); rmtree(CorrectPath($tmp_dir), 0, 1) if ($tmp_dir); exit($exit_code); }; and sub print_error { my $message = shift; my $force = shift; rmtree(CorrectPath($tmp_dir), 0, 1) if ($tmp_dir);
The build command prints this: Multiprocessing build is finished Maximal number of processes run: 4 PJ: removing tmp_dir: 2 (/home/oo/BuildDir/tmp/) after I instrumented the above mentioned rmdirs with print STDERR "\nPJ: removing tmp_dir: 2 ($tmp_dir)\n"; So do_exit() removes *complete* temporary directory.
Pavel, thank you for this investigations. Then this seems to be a problem of build.pl, that must not remove the tmp-directory, if it is still required by other processes. IS -> Ause: Please do not remove tmp-dir, if it is still required.
IS -> pjanik: Pavel, I just commited version 1.33.76.1 of solenv/bin/modules/installer/systemactions.pm in cws native112. This version tries twice to create the directory, checks the existence of the parent directory and logs more information. I think, it cannot fix this task, if the complete tmp-directory is removed during packaging process, but if you test this, we can get perhaps more information.
@all: sorry, but IMHO, the build's tool behavior is correct. First of all, it removes its temp directory when the error is occurred (that is when the tool finishes its job), and second - it removes ITS directory, the one that is created by the build tool itself. Just consult get_tmp_dir routine.
is: my build with your change failed with the logfile.log (in the source tree, BTW! - in instsetoo_native/ util): Did not succeed in creating directory: "/home/oo/BuildDir/tmp/ooopackaging". Further attempts will follow. Another process created this directory in exactly this moment :-) : /home/oo/BuildDir/tmp/ ooopackaging Did not succeed in creating directory: "/home/oo/BuildDir/tmp/ooopackaging/i_216031193819133". Further attempts will follow. Wed Oct 31 09:25:33 2007 (00:00 min.) ################################################################ Command line arguments: ################################################################ -f ../util/openoffice.lst -l en-US -p URE -u ../unxlngx6.pro -buildid 9235 -format deb -msitemplate ../unxlngx6.pro/misc/ure/msi_templates -msilanguage ../unxlngx6.pro/misc/win_ulffiles Separator: / Creating Debian packages *************************************************************** ERROR: Failed to create the directory: /home/oo/BuildDir/tmp/ooopackaging/i_216031193819133 in function: create_directory ***************************************************************
OK, i had to do 7 full builds because of this issue. Raising prio to 1. So what should we do with it?
is -> vg: Does the build tool remove the directory $TMP? I use in the packaging process a global temp-directory $TMP/ooopackaging and inside this a specific directory "i_<uniqueid>". Only this unique directory "i_<uniqueid>" is removed at the end of the installation process. Pavel has set $TMP to "/home/oo/BuildDir/tmp". If this directory is removed by the build-tool, no parallel packagng process can be successful.
vg->is: no, the build tool create ITS own unique temp directory, which has nothing in common with $ENV{TMP}/ooopackaging...
vg: how this plays with #desc12? Is it a bug in build?
OK, it happened with unxlngi6.pro. logfile.log contains: Did not succeed in creating directory: "/home/oo/BuildDir/tmp/ooopackaging". Further attempts will follow. Another process created this directory in exactly this moment :-) : /home/oo/BuildDir/tmp/ooopackaging Did not succeed in creating directory: "/home/oo/BuildDir/tmp/ooopackaging/i_253631194386729". Further attempts will follow.
... which means that my TMP *really* exists at that time. When I manually removed it now, build.pl correctly prints: ERROR: the $TMP directory /home/oo/BuildDir/tmp does not exist! Please, create it first So this gets even more interesting now ;-)
Created attachment 49530 [details] patch to systemactions.pm to squeze out some more information
With ause's patch: oo@octopus:~> grep ooopackaging log-GNU_Linux |head -2 dr----x--t 2 oo users 4096 Nov 8 21:37 /home/oo/BuildDir/tmp/ooopackaging drwxrwxrwx 3 oo users 4096 Nov 8 21:37 /home/oo/BuildDir/tmp/ooopackaging oo@octopus:~> the rest was OK. The build was OK this time though. I have started another one.
next try would be the chmod lines from the patch without all those ls verbosity - just to make sure it's no timing issue hidden by some more commands executed
All credits to the following belong to ause ;-) This is a problem: ./bin/modules/installer/parameter.pm:408: installer::systemactions::create_directory_with_privileges ($installer::globals::temppath, "777"); Compare: pavel@oo:/tmp> perl -e 'mkdir('ooopackaging', "0777")'; ls -ld ooopackaging; chmod 755 ooopackaging; rm -rf ooopackaging dr----x--t 2 pavel users 4096 Nov 10 18:49 ooopackaging pavel@oo:/tmp> perl -e 'mkdir('ooopackaging', 0777)'; ls -ld ooopackaging; chmod 755 ooopackaging; rm -rf ooopackaging drwxr-xr-x 2 pavel users 4096 Nov 10 18:49 ooopackaging pavel@oo:/tmp> Ingo: this comes from native97. chmod should use octal numbers, not string - compare with perldoc -f chmod
IS -> pjanik: You are right, using octal values works better (for mkdir, not chmod). So the fix is a one-liner in systemactions.pm: diff -r1.33 systemactions.pm 109c109 < my $localprivileges = "0" . $privileges; --- > my $localprivileges = oct($privileges); # changes "777" to 0777 Fixed in cws ause087.
Created attachment 49621 [details] previous patch didn't help for windows. maybe this one is better - still some debug output
@is: please have a look at the new patch
is -> ause: looks good. it contains more support for cygwin. please feel free to integrate your patch. so I reassign this task to you.
maybe i got fooled by outdated anoncvs and handmade patches on the buildbot. i can't find any reason why < my $localprivileges = oct($privileges); # changes "777" to 0777 --- > my $localprivileges = oct("0".$privileges); # changes "777" to 0777 could give different results...
probably already fixed with the oneliner from is. enabled some more chmod lines for windows/cygwin in the same file.
Verified on CWS ause087
on master. Closing. Thanks everyone involved. This was tough one ;-) ause: yet another virtual non-alcoholic beer for you now :-)