Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] SGE and openmpi
From: Ralph Castain (rhc_at_[hidden])
Date: 2011-04-06 21:18:12


Are you able to run non-MPI programs like "hostname"?

I ask because that error message indicates that everything started just fine, but there is an error in your application.

On Apr 6, 2011, at 6:01 PM, Jason Palmer wrote:

> Btw, I did compile openmpi with the --with-sge flag.
>
> I am able to compile a test program using openf90 with no errors or
> warnings. But when I try to run a test program that just calls
> MPI_INIT(ierr), then MPI_COMM_RANK(ierr), I get the following, whether
> static or linked, and whether run with mpirun or directly:
>
> [juggling.ucsd.edu:20218] *** An error occurred in MPI_Comm_rank
> [juggling.ucsd.edu:20218] *** on communicator MPI_COMM_WORLD
> [juggling.ucsd.edu:20218] *** MPI_ERR_COMM: invalid communicator
> [juggling.ucsd.edu:20218] *** MPI_ERRORS_ARE_FATAL (your MPI job will now
> abort)
>
> Is there something missing in the linux or parallel environment settings?
> Thanks.
>
> -----Original Message-----
> From: Jason Palmer [mailto:japalmer29_at_[hidden]]
> Sent: Wednesday, April 06, 2011 4:09 PM
> To: 'Open MPI Users'
> Subject: SGE and openmpi
>
> Hi,
> I am having trouble running a batch job in SGE using openmpi. I have read
> the faq, which says that openmpi will automatically do the right thing, but
> something seems to be wrong.
>
> Previously I used MPICH1 under SGE without any problems. I'm avoiding MPICH2
> because it doesn't seem to support static compilation, whereas I was able to
> get openmpi to compile with open64 and compile my program statically.
>
> But I am having problems launching. According to the documentation, I should
> be able to have a script file, qsub.sh:
>
> #!/bin/bash
> #$ -cwd
> #$ -j y
> #$ -S /bin/bash
> #$ -q all.q
> #$ -pe orte 18
> MPI_DIR=/home/jason/openmpi-1.4.3-install/bin
> /home/jason/openmpi-1.4.3-install/bin/mpirun -np $NSLOTS myprog
>
> Then,
> $ qsub qsub.sh
>
> Previously with MPICH1 I would have
>
> -machinefile $TMP/machines
>
> in the mpirun arguments, and the rest of the script the same except -pe
> mpich 18, and it would work. The -machinefile argument doesn't seem to work
> in orte. The error in qsub.sh.o is:
>
> [jason_at_juggling ~/amica_open64]$ cat qsub.sh.o7514 [compute-0-0.local:17792]
> *** An error occurred in MPI_Comm_rank [compute-0-0.local:17792] *** on
> communicator MPI_COMM_WORLD [compute-0-0.local:17792] *** MPI_ERR_COMM:
> invalid communicator [compute-0-0.local:17792] *** MPI_ERRORS_ARE_FATAL
> (your MPI job will now abort)
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 0 with PID 17792 on node
> compute-0-0.local exiting without calling "finalize". This may have caused
> other processes in the application to be terminated by signals sent by
> mpirun (as reported here).
> --------------------------------------------------------------------------
> [compute-0-0.local:17788] 8 more processes have sent help message
> help-mpi-errors.txt / mpi_errors_are_fatal [compute-0-0.local:17788] Set MCA
> parameter "orte_base_help_aggregate" to 0 to see all help / error messages
>
>
> I ran qconf, and I get the same output as in the documentation:
>
> [jason_at_juggling ~/amica_open64]$ qconf -sp orte
> pe_name orte
> slots 9999
> user_lists NONE
> xuser_lists NONE
> start_proc_args /bin/true
> stop_proc_args /bin/true
> allocation_rule $fill_up
> control_slaves TRUE
> job_is_first_task FALSE
> urgency_slots min
> accounting_summary TRUE
>
> The qconf mpich output is:
>
> [jason_at_juggling ~/amica_open64]$ qconf -sp mpich
> pe_name mpich
> slots 9999
> user_lists NONE
> xuser_lists NONE
> start_proc_args /opt/gridengine/mpi/startmpi.sh -catch_rsh $pe_hostfile
> stop_proc_args /opt/gridengine/mpi/stopmpi.sh
> allocation_rule $fill_up
> control_slaves TRUE
> job_is_first_task FALSE
> urgency_slots min
> accounting_summary TRUE
>
> with specific scripts for start_proc_args and stop_proc_args ...
>
> Am I missing something necessary to run openmpi under SGE?
>
> Thanks very much,
> Jason
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users