Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] Conflicts between jobs running on the same node
From: Alfonso Sanchez (alfonso.sanchez_at_[hidden])
Date: 2014-04-17 13:37:56

Hi all,

I've compiled OMPI 1.8 on a x64 linux cluster using the PGI compilers v14.1 (I've tried it with PGI v11.10 and get the same result). I'm able to compile with the resulting mpicc/mpifort/etc. When running the codes, everything seems to be working fine when there's only one job running on a given computing node. However, whenever a second job gets assigned the same computing node, the CPU load of every process gets divided by 2. I'm using pbs torque. As an example:

-Submit jobA using torque to node1 using mpirun -n 4

-All 4 rocesses of jobA show 100% CPU load.

-Submit jobB using torque to node1 using mpirun -n 4

-All 8 processes ( 4 from jobA & 4 from jobB ) show 50% CPU load.

Moreover, whilst jobA/jobB would run in 30 mins by itself; when both jobs are on the same node they've gone 14 hrs without completing.

I'm attaching config.log & the output of ompi_info --all (bzipped)

Some more info:

$> ompi_info | grep tm

MCA ess: tm (MCA v2.0, API v3.0, Component v1.8)
MCA plm: tm (MCA v2.0, API v2.0, Component v1.8)
MCA ras: tm (MCA v2.0, API v2.0, Component v1.8)

Sorry if this is a common problem but I've tried searching for posts discussing similar problems but haven't been able to find any.

Thanks for your help,