Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2006-10-06 00:04:51

On 10/5/06 2:42 PM, "Michael Kluskens" <mklus_at_[hidden]> wrote:

> System: BLACS 1.1p3 on Debian Linux 3.1r3 on dual-opteron, gcc 3.3.5,
> Intel ifort 9.0.32 all tests with 4 processors (comments below)
> OpenMPi 1.1.1 patched and OpenMPI 1.1.2 patched:
> C & F tests: no errors with default data set. F test slowed down
> in the middle of the tests.

Good. Can you expand on what you mean by "slowed down"?
> OpenMPI 1.3a1r11962 patched: much better, completes all tests with
> default data set but the tester crashes on exit (different problem?)
> ------------------------------------------------------------

Quite possibly so. 1.3 is the active development trunk and is not always
stable; we're working on some ORTE issues right now, so it's possible that
mpirun may not be rock solid at the moment. :-)

> The final auxiliary test is for BLACS_ABORT.
> Immediately after this message, all processes should be killed.
> If processes survive the call, your BLACS_ABORT is incorrect.
> {0,2}, pnum=2, Contxt=0, killed other procs, exiting with error #-1.
> [cluster:32133] [0,0,0] ORTE_ERROR_LOG: Communication failure in file
> base/errmgr_base_receive.c at line 143
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:0x100000030
> [0] func:/opt/intel9.1/openmpi/1.3/lib/
> (opal_backtrace_print+0x1f) [0x2a957e4c1f]
> *** End of error message ***
> Segmentation fault (core dumped)
> ------------------------------------------------------------

Ya; don't worry about this on the trunk at the moment. :-)

> Results of testing the patch on my system:
> 1) Not certain which branches this patch can be applied to so I may
> have tried to do too much.
> 2) I don't have 11970 on my system so I tried to apply the patch to
> 1.1.1, 1.1.2rc1, 1.3a1r11962

Good. I literally just posted 1.1.2rc3 with this DDT fix (among others).
It looks like we're getting darn close to releasing 1.1.2.

> (no nightly tarball for 1.3a1r11970 this morning)

We had a failure in the trunk tarball creation last night.

> (side note where is 1.2?, only via cvs?)

We haven't opened up nightly tarballs for v1.2 yet because we're not quite
happy yet with the level of stability there yet. That is, we expect the 1.1
series nightly tarballs to be more-or-less stable. And we've never provided
guarantees about trunk stability ;-). We'll open up the 1.2 nightly
tarballs probably in the not-distant future.

> 3) patch complained about all three I tried to apply it to but seemed
> to apply the patch most of the time, I hand-checked all three patched
> routines in the three branches I tried and hand fixed anything that
> got missed because of differences in line numbers.
> 4) The patch applied best against 1.3a1r11962 and second best against
> 1.1.1 -- my lack of experience with patch likely confused the issue.

No worries - we definitely appreciate all your testing!

Jeff Squyres
Server Virtualization Business Unit
Cisco Systems