Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Brian W. Barrett (bbarrett_at_[hidden])
Date: 2007-01-16 09:47:30


On Jan 15, 2007, at 10:13 AM, Marcelo Maia Garcia wrote:

> I am trying to setup SGE to run DLPOLY compiled with mpif90
> (OpenMPI 1.2b2, pathscale Fortran compilers and gcc c/c++). In
> general I am much more luckier running DLPOLY interactively then
> using SGE. The error that I got is: Signal:7 info.si_errno:0
> (Success) si_code:2()[1]. A previous message in the list[2], says
> that this is more likely to be a configuration problem. But what
> kind of configuration? It is in the run time?

Could you include the entire stack trace next time? That can help
localize where the error is occurring. The message is saying that a
process died from a signal 7, which on Linux is a Bus Error. This
usually points to memory errors, either in Open MPI or in the user
application. Without seeing the stack trace, it's difficult to pin
down where the error is occurring.

> Another error that I got sometimes is related with "writev"[3]
> But this is pretty rare.

Usually these point to some process in the job dying and the other
processes having issues completing outstanding sends to the dead
process. I would guess that the problem originates with the bus
error you are seeing. Cleaning that up will likely make these errors
go away.

Brian

> [1]
> [ocf_at_master TEST2]$ mpirun -np 16 --hostfile /home/ocf/SRIFBENCH/
> DLPOLY3/data/nodes_16_slots4.txt /home/ocf/SRIFBENCH/DLPOLY3/
> execute/DLPOLY.Y
> Signal:7 info.si_errno:0(Success) si_code:2()
> Failing at addr:0x5107b0
> (...)
>
> [2] http://www.open-mpi.org/community/lists/users/2007/01/2423.php
>
>
> [3]
> [node007:05003] mca_btl_tcp_frag_send: writev failed with errno=104
> [node007:05004] mca_btl_tcp_frag_send: writev failed with errno=104
> [node006:05170] mca_btl_tcp_frag_send: writev failed with errno=104
> [node007:05005] mca_btl_tcp_frag_send: writev failed with errno=104
> [node007:05006] mca_btl_tcp_frag_send: writev failed with errno=104
> [node006:05170] mca_btl_tcp_frag_send: writev failed with errno=104
> [node006:05171] mca_btl_tcp_frag_send: writev failed with errno=104
> [node006:05171] mca_btl_tcp_frag_send: writev failed with errno=104
> [node006:05172] mca_btl_tcp_frag_send: writev failed with errno=104
> [node006:05172] mca_btl_tcp_frag_send: writev failed with errno=104
> [node006:05173] mca_btl_tcp_frag_send: writev failed with errno=104
> [node006:05173] mca_btl_tcp_frag_send: writev failed with errno=104
> mpirun noticed that job rank 0 with PID 0 on node node003 exited on
> signal 48.
> 15 additional processes aborted (not shown)
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
   Brian Barrett
   Open MPI Team, CCS-1
   Los Alamos National Laboratory