Yeah, you're using too much memory for the shared memory system. Run with -mca btl ^sm on your cmd line - it'll run slower, but you probably don't have a choice.

yes I think is related with my program too, when I run 1000x1000 matrix multiplication, the program works.
when I run the 10,000 matrix only on one machine  I got this:
mca_common_sm_mmap_init: mmap failed with errno=12
mca_mpool_sm_init: unable to shared memory mapping ( /tmp/openmpi-sessions-mpiu@tango_0/default-universe-1529/1/shared_mem_pool .tango)
mca_common_sm_mmap_init: /tmp/openmpi-sessions-mpiu@tango_0/default-universe-1529/1/shared_mem_pool .tango failed with errno=2
mca_mpool_sm_init: unable to shared memory mapping ( /tmp/openmpi-sessions-mpiu@tango_0/default-universe-1529/1/shared_mem_pool .tango)
PML add procs failed
-->Returned "0ut of resource" (-2) instead of " Success" (0)

this is the result when I run free -m
                  total   used   free   shared  buffers   cached
Mem:          2026    54    1972      0         6           25
-/+ buffer cache:    22      511      
 Swap:         511     0       511

Ummm...not sure what I can say about that with so little info. It looks like your process died for some reason that has nothing to do with us - a bug in your "magic10000" program?

I am running a 10,000x10,000 matrix multiplication in 4 processors/1 core and I get the following error:
mpirun -np 4 --hostfile nodes --bynode magic10000

mpirun noticed that job rank1 with PID 635 on node slave1 exited on signal 509(Real-time signal 25).
2 additional process aborted (not shown)
1 process killed (possibly by open MPI)

node file contains:

