Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Michael Kluskens (mklus_at_[hidden])
Date: 2006-10-05 14:42:32

On Oct 4, 2006, at 7:51 PM, George Bosilca wrote:
> This is the correct patch (same as previous minus the debugging
> statements).
On Oct 4, 2006, at 7:42 PM, George Bosilca wrote:
> The problem was found and fixed. Until the patch get applied to the
> 1.1 and 1.2 branches please use the attached patch.

System: BLACS 1.1p3 on Debian Linux 3.1r3 on dual-opteron, gcc 3.3.5,
Intel ifort 9.0.32 all tests with 4 processors (comments below)

OpenMPi 1.1.1 patched and OpenMPI 1.1.2 patched:
   C & F tests: no errors with default data set. F test slowed down
in the middle of the tests.

OpenMPI 1.3a1r11962 patched: much better, completes all tests with
default data set but the tester crashes on exit (different problem?)
The final auxiliary test is for BLACS_ABORT.
Immediately after this message, all processes should be killed.
If processes survive the call, your BLACS_ABORT is incorrect.
{0,2}, pnum=2, Contxt=0, killed other procs, exiting with error #-1.

[cluster:32133] [0,0,0] ORTE_ERROR_LOG: Communication failure in file
base/errmgr_base_receive.c at line 143
Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
Failing at addr:0x100000030
[0] func:/opt/intel9.1/openmpi/1.3/lib/
(opal_backtrace_print+0x1f) [0x2a957e4c1f]
*** End of error message ***
Segmentation fault (core dumped)

Results of testing the patch on my system:
1) Not certain which branches this patch can be applied to so I may
have tried to do too much.
2) I don't have 11970 on my system so I tried to apply the patch to
1.1.1, 1.1.2rc1, 1.3a1r11962
  (no nightly tarball for 1.3a1r11970 this morning)
  (side note where is 1.2?, only via cvs?)
3) patch complained about all three I tried to apply it to but seemed
to apply the patch most of the time, I hand-checked all three patched
routines in the three branches I tried and hand fixed anything that
got missed because of differences in line numbers.
4) The patch applied best against 1.3a1r11962 and second best against
1.1.1 -- my lack of experience with patch likely confused the issue.