What OMPI version are you using?
On Jul 23, 2009, at 3:00 PM, Sims, James S. Dr. wrote:
> I have an OpenMPI program compiled with a version of OpenMPI built
> using the ifort 10.1
> compiler. I can compile and run this code with no problem, using the
> 32 bit
> version of ifort. And I can also submit batch jobs using torque with
> this 32-bit code.
> However, compiling the same code to produce a 64 bit executable
> produces a code
> that runs correctly only in the simplest cases. It does not run
> correctly when run
> under the torque batch queuing system, running for awhile and then
> giving a
> segmentation violation in s section of code that is fine in the 32
> bit version.
> I have to run the mpi multinode jobs using our torque batch queuing
> but we do have the capability of running the jobs in an interactive
> batch environment.
> If I do a qsub -I -l nodes=1:x4gb
> I get an interactive session on the remote node assigned to my job.
> I can run the
> job using either
> ./MPI_li_64 or
> mpirun -np 1 ./MPI_li_64
> and the job runs successfully to completion. I can also
> start an interactive shell using
> qsub -I -l nodes=1:ppn=2:x4gb
> and I will get a single dual processor (or greater node). On this
> single node,
> mpirun -np 2 ./MPI_li_64 works.
> However, if instead I ask for two nodes in my interactive batch node,
> qsub -I -l nodes=2:x4gb,
> Two nodes will be assigned to me but when I enter
> mpirun -np 2 ./MPI_li_64
> the job runs awhile, then fails with a
> mpirun noticed that process rank 1 with PID 23104 on node n339
> exited on signal 11 (Segmentation fault).
> I can trace this in the intel debugger and see that the segmentation
> fault is occuring in what should
> be good code, and in code that executes with no problem when
> everything is compiled 32-bit. I am
> at a loss for what could be preventing this code to run within the
> batch queuing environment in the
> 64-bit version.
> users mailing list