Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Troy Telford (ttelford_at_[hidden])
Date: 2005-11-14 20:03:00


On Mon, 14 Nov 2005 17:28:15 -0700, Troy Telford
<ttelford_at_[hidden]> wrote:

> I've just finished a build of RC7, so I'll go give that a whirl and
> report.

RC7:

With *both* mvapi and openib, I recieve the following when using IMB-MPI1:

***mvapi***
[0,1,3][btl_mvapi_component.c:637:mca_btl_mvapi_component_progress] error
in posting pending send
[0,1,3][btl_mvapi_component.c:637:mca_btl_mvapi_component_progress] error
in posting pending send
[0,1,2][btl_mvapi_component.c:637:mca_btl_mvapi_component_progress] error
in posting pending send
**openib***
[0,1,3][btl_openib_endpoint.c:134:mca_btl_openib_endpoint_post_send] error
posting send request errno says Resource temporarily unavailable
[0,1,3][btl_openib_component.c:655:mca_btl_openib_component_progress]
error in posting pending send
[0,1,2][btl_openib_endpoint.c:134:mca_btl_openib_endpoint_post_send] error
posting send request errno says Resource temporarily unavailable
[0,1,2][btl_openib_component.c:655:mca_btl_openib_component_progress]
error in posting pending send
[0,1,3][btl_openib_endpoint.c:134:mca_btl_openib_endpoint_post_send] error
posting send request errno says Resource temporarily unavailable
[0,1,3][btl_openib_component.c:655:mca_btl_openib_component_progress]
error in posting pending send
***********

Notable is that they both fail in pretty much the same place (every time):
#----------------------------------------------------------------
# Benchmarking Reduce_scatter
# #processes = 4
# ( 4 additional processes waiting in MPI_Barrier)
#----------------------------------------------------------------
        #bytes #repetitions t_min[usec] t_max[usec] t_avg[usec]
             0 1000 0.04 0.04 0.04
<insert error here>

(sometimes it will finish having completed one more item -- ie. byte size
of 4)

HPL will run on mvapi, but on openib, it segfaults before completing the
first problem size with:
mpirun noticed that job rank 0 with PID 25662 on node "n57" exited on
signal 11.

HPCC also segfaults when it gets to the HPL section of HPCC with OpenIB
(with no additional info)
HPCC is still running on mvapi... so far so good...

The Presta tests seem to still error out (similar to IMB) as previously
reported; however it happens less frequently. (Meaning, I've been able to
complete the particular test successfully, then when I run it again, it
fails -- something like a 50% success rate.) This is with the 'com' and
'allred' tests; 'globalop' has refused to run since RC5, and this has not
changed with RC7.