Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] process kill signal 59
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-10-30 14:50:28


Yeah, you're using too much memory for the shared memory system. Run with -mca btl ^sm on your cmd line - it'll run slower, but you probably don't have a choice.

On Oct 30, 2012, at 11:38 AM, Sandra Guija <sguija_at_[hidden]> wrote:

> yes I think is related with my program too, when I run 1000x1000 matrix multiplication, the program works.
> when I run the 10,000 matrix only on one machine I got this:
> mca_common_sm_mmap_init: mmap failed with errno=12
> mca_mpool_sm_init: unable to shared memory mapping ( /tmp/openmpi-sessions-mpiu_at_tango_0/default-universe-1529/1/shared_mem_pool .tango)
> mca_common_sm_mmap_init: /tmp/openmpi-sessions-mpiu_at_tango_0/default-universe-1529/1/shared_mem_pool .tango failed with errno=2
> mca_mpool_sm_init: unable to shared memory mapping ( /tmp/openmpi-sessions-mpiu_at_tango_0/default-universe-1529/1/shared_mem_pool .tango)
> PML add procs failed
> -->Returned "0ut of resource" (-2) instead of " Success" (0)
>
> this is the result when I run free -m
> total used free shared buffers cached
> Mem: 2026 54 1972 0 6 25
> -/+ buffer cache: 22 511
> Swap: 511 0 511
>
> Sandra Guija
>
> From: rhc_at_[hidden]
> Date: Tue, 30 Oct 2012 10:33:02 -0700
> To: devel_at_[hidden]
> Subject: Re: [OMPI devel] process kill signal 59
>
> Ummm...not sure what I can say about that with so little info. It looks like your process died for some reason that has nothing to do with us - a bug in your "magic10000" program?
>
>
> On Oct 30, 2012, at 10:24 AM, Sandra Guija <sguija_at_[hidden]> wrote:
>
> Hello,
> I am running a 10,000x10,000 matrix multiplication in 4 processors/1 core and I get the following error:
> mpirun -np 4 --hostfile nodes --bynode magic10000
>
> mpirun noticed that job rank1 with PID 635 on node slave1 exited on signal 509(Real-time signal 25).
> 2 additional process aborted (not shown)
> 1 process killed (possibly by open MPI)
>
> node file contains:
> master
> slave1
> slave2
> slave3
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________ devel mailing list devel_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/devel
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel