Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: George Bosilca (bosilca_at_[hidden])
Date: 2006-10-05 15:01:39


Thanks Michael.

The seg-fault is related to some orterun problem. I notice it
yesterday and we try to find a fix. For the rest I'm quite happy that
the BLACS problem was solved.

   Thanks for your help,
     george.

On Oct 5, 2006, at 2:42 PM, Michael Kluskens wrote:

>
> On Oct 4, 2006, at 7:51 PM, George Bosilca wrote:
>> This is the correct patch (same as previous minus the debugging
>> statements).
> On Oct 4, 2006, at 7:42 PM, George Bosilca wrote:
>> The problem was found and fixed. Until the patch get applied to the
>> 1.1 and 1.2 branches please use the attached patch.
>
> System: BLACS 1.1p3 on Debian Linux 3.1r3 on dual-opteron, gcc 3.3.5,
> Intel ifort 9.0.32 all tests with 4 processors (comments below)
>
> OpenMPi 1.1.1 patched and OpenMPI 1.1.2 patched:
> C & F tests: no errors with default data set. F test slowed down
> in the middle of the tests.
>
> OpenMPI 1.3a1r11962 patched: much better, completes all tests with
> default data set but the tester crashes on exit (different problem?)
> ------------------------------------------------------------
> The final auxiliary test is for BLACS_ABORT.
> Immediately after this message, all processes should be killed.
> If processes survive the call, your BLACS_ABORT is incorrect.
> {0,2}, pnum=2, Contxt=0, killed other procs, exiting with error #-1.
>
> [cluster:32133] [0,0,0] ORTE_ERROR_LOG: Communication failure in file
> base/errmgr_base_receive.c at line 143
> Signal:11 info.si_errno:0(Success) si_code:1(SEGV_MAPERR)
> Failing at addr:0x100000030
> [0] func:/opt/intel9.1/openmpi/1.3/lib/libopal.so.0
> (opal_backtrace_print+0x1f) [0x2a957e4c1f]
> *** End of error message ***
> Segmentation fault (core dumped)
> ------------------------------------------------------------
>
> Results of testing the patch on my system:
> 1) Not certain which branches this patch can be applied to so I may
> have tried to do too much.
> 2) I don't have 11970 on my system so I tried to apply the patch to
> 1.1.1, 1.1.2rc1, 1.3a1r11962
> (no nightly tarball for 1.3a1r11970 this morning)
> (side note where is 1.2?, only via cvs?)
> 3) patch complained about all three I tried to apply it to but seemed
> to apply the patch most of the time, I hand-checked all three patched
> routines in the three branches I tried and hand fixed anything that
> got missed because of differences in line numbers.
> 4) The patch applied best against 1.3a1r11962 and second best against
> 1.1.1 -- my lack of experience with patch likely confused the issue.
>
> Michael
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users