Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] OpenMPI 1.4.2 with Myrinet MX, mpirun seg faults
From: Raymond Muno (muno_at_[hidden])
Date: 2010-10-20 20:41:18


  We are doing a test build of a new cluster. We are re-using our
Myrinet 10G gear from a previous cluster.

I have built OpenMPI 1.4.2 with PGI 10.4. We use this regularly on
our Infiniband based cluster and all the install elements were readily
available.

With a few go-arounds with the Myrinet MX stack, we are now running MX
-1.2.12 with allowances for more than the max of 16 endpoints. Each node
has 24 cores.

The cluster is running rocks 5.3.

As part of the initial build, I installed the Myrinet_MX Rocks Roll from
Myricom. With the default limitation of 16 endpoints, we could not run
on all nodes. As mentioned above, the MX stack was replaced.

Myrinet provided a build of OpenMPI 1.4.1. That build works. It is
only compiled with gcc and gfortran and we wanted it built with the
compilers we normally use, e.g. PGI, Pathscale and Intel.

We can compile with the OpenMPI 1.4.2 / PGI 10.4 build. However, we
cannot launch jobs with mpirun. It seg faults.

--------------------------------------------------------------------------
mpirun noticed that the job aborted, but has no info as to the process
that caused that situation.
--------------------------------------------------------------------------
[enet1-head2-eth1:29532] *** Process received signal ***
[enet1-head2-eth1:29532] Signal: Segmentation fault (11)
[enet1-head2-eth1:29532] Signal code: Address not mapped (1)
[enet1-head2-eth1:29532] Failing at address: 0x6c
[enet1-head2-eth1:29532] *** End of error message ***
Segmentation fault

However, if we launch the job with the Myricom supplied mpirun in the
OpenMPI tree, the job runs successfully. This works even with a test
program compiled with the OpenMPI 1.4.2 with PGI 10.4 build.