what version of Open MPI did you use?
Is it still occurring?
It is also possible that the connection went down during the execution...
although, a segfault really should not occur.
On Tue, 29 May 2007, Marcin Skoczylas wrote:
> recently my administrator made some changes on our cluster and now I
> have a crash during MPI_Barrier:
> [our-host:12566] *** Process received signal ***
> [our-host:12566] Signal: Segmentation fault (11)
> [our-host:12566] Signal code: Address not mapped (1)
> [our-host:12566] Failing at address: 0x4
> [our-host:12566] [ 0] /lib/tls/libpthread.so.0 [0xa22f80]
> [our-host:12566] [ 1]
> [our-host:12566] [ 2]
> /usr/lib/openmpi/mca_bml_r2.so(mca_bml_r2_progress+0x32) [0xcb7e3a]
> [our-host:12566] [ 3] /usr/lib/libopen-pal.so.0(opal_progress+0xed)
> [our-host:12566] [ 4] /usr/lib/libmpi.so.0 [0x3aecc5]
> [our-host:12566] [ 5] /usr/lib/libmpi.so.0(ompi_request_wait_all+0xec)
> [our-host:12566] [ 6]
> [our-host:12566] [ 7]
> [our-host:12566] [ 8]
> [our-host:12566] [ 9] /usr/lib/libmpi.so.0(PMPI_Barrier+0x176) [0x3c0cea]
> Actually, I made small investigation and I realised that:
> [user_at_our-host]$ ssh our-host
> ssh(12704) ssh: connect to host our-host port 22: No route to host
> that could be the thing, I'm going to talk with my admin soon about this
> routing change, however if it is really this problem, shouldn't it be
> recognised during startup, f.e. in MPI_Init? Actually, I'm not sure...
> your comments?
> greetings, Marcin
> users mailing list
Jelena Pjesivac-Grbovic, Pjesa
Graduate Research Assistant
Innovative Computing Laboratory
Computer Science Department, UTK
Claxton Complex 350
(865) 974 - 6722
(865) 974 - 6321
Murphy's Law of Research:
Enough research will tend to support your theory.