Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] job fails with "Signal: Bus error (7)"
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2009-10-03 07:12:42


Bus error usually means that there was an invalid address passed as a
pointer somewhere in the code -- it's not usually a communications
error.

Without more information, it's rather difficult to speculate on what
happened here. Did you get corefiles? If so, are there useful
backtraces available?

On Oct 1, 2009, at 6:01 AM, Sangamesh B wrote:

> Hi,
>
> A fortran application which is compiled with ifort-10.1 and
> open mpi 1.3.1 on Cent OS 5.2 fails after running 4 days with
> following error message:
>
> [compute-0-7:25430] *** Process received signal ***
>
> [compute-0-7:25433] *** Process received signal ***
> [compute-0-7:25433] Signal: Bus error (7)
> [compute-0-7:25433] Signal code: (2)
> [compute-0-7:25433] Failing at address: 0x4217b8
> [compute-0-7:25431] *** Process received signal ***
>
> [compute-0-7:25431] Signal: Bus error (7)
> [compute-0-7:25431] Signal code: (2)
> [compute-0-7:25431] Failing at address: 0x4217b8
> [compute-0-7:25432] *** Process received signal ***
> [compute-0-7:25432] Signal: Bus error (7)
>
> [compute-0-7:25432] Signal code: (2)
> [compute-0-7:25432] Failing at address: 0x4217b8
> [compute-0-7:25430] Signal: Bus error (7)
> [compute-0-7:25430] Signal code: (2)
> [compute-0-7:25430] Failing at address: 0x4217b8
>
> [compute-0-7:25431] *** Process received signal ***
> [compute-0-7:25431] Signal: Segmentation fault (11)
> [compute-0-7:25431] Signal code: (128)
> [compute-0-7:25431] Failing at address: (nil)
> [compute-0-7:25430] *** Process received signal ***
>
> [compute-0-7:25433] *** Process received signal ***
> [compute-0-7:25433] Signal: Segmentation fault (11)
> [compute-0-7:25433] Signal code: (128)
> [compute-0-7:25433] Failing at address: (nil)
> [compute-0-7:25432] *** Process received signal ***
>
> [compute-0-7:25432] Signal: Segmentation fault (11)
> [compute-0-7:25432] Signal code: (128)
> [compute-0-7:25432] Failing at address: (nil)
> [compute-0-7:25430] Signal: Segmentation fault (11)
> [compute-0-7:25430] Signal code: (128)
>
> [compute-0-7:25430] Failing at address: (nil)
> --------------------------------------------------------------------------
> mpirun noticed that process rank 3 with PID 25433 on node
> compute-0-7.local exited on signal 11 (Segmentation fault).
>
>
>
> --------------------------------------------------------------------------
> This job is run with 4 open mpi processes, on the nodes which have
> interconnected with Gigabit.
> The same job runs well on the nodes with infiniband connectivity.
>
> What could be the reason for this? Is this due to loose physical
> connectivities, as its giving a bus error?
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]