Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenMPI at scale on Cray XK7
From: Mike Clark (mclark_at_[hidden])
Date: 2013-04-22 18:17:16


Hi,

I am trying to run OpenMPI on the Cray XK7 system at Oak Ridge National Lab (Titan), and am running in an issue whereby MPI_Init seems to hang indefinitely, but this issue only arises at large scale, e.g., when running on 18560 compute nodes (with two MPI processes per node). The application runs successfully on 4600 nodes, and we are currently trying to test a 9000 node job to see if this fails or runs.

We are launching our job using something like the following

# mpirun command
mpicmd="$OMP_DIR/bin/mpirun --prefix $OMP_DIR -np 37120 --npernode 2 --bind-to core --bind-to numa $app $args"
# Print and Run the Command
echo $mpicmd
$mpicmd >& $output

Are there any issues that I should be aware of when running OpenMPI on 37120 processes or when running on the Cray Gemini Interconnect?

We are using OpenMPI 1.7.1 (1.7.x is required for Cray Gemini support) and gcc 4.7.2.

Thanks,

Mike.
-----------------------------------------------------------------------------------
This email message is for the sole use of the intended recipient(s) and may contain
confidential information. Any unauthorized review, use, disclosure or distribution
is prohibited. If you are not the intended recipient, please contact the sender by
reply email and destroy all copies of the original message.
-----------------------------------------------------------------------------------