Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Marcin Skoczylas (Marcin.Skoczylas_at_[hidden])
Date: 2007-05-29 07:14:14


hello,

recently my administrator made some changes on our cluster and now I
have a crash during MPI_Barrier:

[our-host:12566] *** Process received signal ***
[our-host:12566] Signal: Segmentation fault (11)
[our-host:12566] Signal code: Address not mapped (1)
[our-host:12566] Failing at address: 0x4
[our-host:12566] [ 0] /lib/tls/libpthread.so.0 [0xa22f80]
[our-host:12566] [ 1]
/usr/lib/openmpi/mca_btl_sm.so(mca_btl_sm_component_progress+0x68f)
[0xcd86d7]
[our-host:12566] [ 2]
/usr/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x32) [0xcb7e3a]
[our-host:12566] [ 3] /usr/lib/libopen-pal.so.0(opal_progress+0xed)
[0xc2b221]
[our-host:12566] [ 4] /usr/lib/libmpi.so.0 [0x3aecc5]
[our-host:12566] [ 5] /usr/lib/libmpi.so.0(ompi_request_wait_all+0xec)
[0x3ae784]
[our-host:12566] [ 6]
/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_sendrecv_actual+0x77)
[0xd025bb]
[our-host:12566] [ 7]
/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_recursivedoubling+0xde)
[0xd05e3a]
[our-host:12566] [ 8]
/usr/lib/openmpi/mca_coll_tuned.so(ompi_coll_tuned_barrier_intra_dec_fixed+0x44)
[0xd027d8]
[our-host:12566] [ 9] /usr/lib/libmpi.so.0(PMPI_Barrier+0x176) [0x3c0cea]

Actually, I made small investigation and I realised that:

[user_at_our-host]$ ssh our-host
ssh(12704) ssh: connect to host our-host port 22: No route to host

that could be the thing, I'm going to talk with my admin soon about this
routing change, however if it is really this problem, shouldn't it be
recognised during startup, f.e. in MPI_Init? Actually, I'm not sure...
your comments?

             greetings, Marcin