Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Open MPI:Problem with 64-bit openMPI andintel compiler
From: Sims, James S. Dr. (james.sims_at_[hidden])
Date: 2009-08-12 00:17:57


Back to this problem.

The last suggestion was to upgrade to 1.3.3, which has been done. Still cannot get this code to
run in 64 bit mode with torque. What I can do is run the job in l6 bit mode using a hostfile.
Specifically, if I use
qsub -I -l nodes=2:ppn=1 torque allocates two nodes to the job, and since this is an interactive
shell, logs me in to the controlling node. In this example process rank 0 is n72 and process rank 1 is n89:
[sims_at_n72 4000]$ mpirun --display-allocation -pernode --display-map hostname

====================== ALLOCATED NODES ======================

 Data for node: Name: n72.clust.nist.gov Num slots: 1 Max slots: 0
 Data for node: Name: n89 Num slots: 1 Max slots: 0

=================================================================

 ======================== JOB MAP ========================

 Data for node: Name: n72.clust.nist.gov Num procs: 1
        Process OMPI jobid: [47657,1] Process rank: 0

 Data for node: Name: n89 Num procs: 1
        Process OMPI jobid: [47657,1] Process rank: 1

 =============================================================
n89
n72.clust.nist.gov

My hostfile is
[sims_at_n72 4000]$ cat hostfile
n72
n89

If, logged in to n72, I use the command
mpirun -np 2 ./MPI_li_64
the job fails with a
mpirun noticed that process rank 1 with PID 10538 on node n89 exited on signal 11 (Segmentation fault).

If I use the command
mpirun -np 2 --hostfile hostfile ./MPI_li_64
the same thing happens.

However, if I ssh to n73, for example, and use the command
mpirun -np 2 --hostfile hostfile ./MPI_li_64
everything works fine. So it appears that the problem is with torque.

Any ideas?