Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] -hostfile ignored in 1.6.1 / SGE integration broken
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-09-03 18:07:43


On Sep 3, 2012, at 2:40 PM, Reuti <reuti_at_[hidden]> wrote:

> Hi Ralph,
>
> Am 03.09.2012 um 23:34 schrieb Ralph Castain:
>
>>
>> On Sep 3, 2012, at 2:12 PM, Reuti <reuti_at_[hidden]> wrote:
>>
>>> Hi all,
>>>
>>> I just compiled Open MPI 1.6.1 and before digging any deeper: does anyone else notice, that the command:
>>>
>>> $ mpiexec -n 4 -machinefile mymachines ./mpihello
>>>
>>> will ignore the argument "-machinefile mymachines" and use the file "openmpi-default-hostfile" instead all the time?
>>
>> Try setting "-mca orte_default_hostfile mymachines" instead.
>
> Is this a known bug and will be fixed or is this the new syntax?

I'm leaning towards fixing it - it came due to discussions on how to handle hostfile when there is an allocation. For now, though, that should work.

>
>
>>> ==
>>>
>>> SGE issue
>>>
>>> I usually don't install new versions instantly, so I only noticed right now, that in 1.4.5 I get a wrong allocation inside SGE (always one process less than requested with `qsub -pe orted N ...`. This I tried only, as with 1.6.1 I get:
>>>
>>> --------------------------------------------------------------------------
>>> There are no nodes allocated to this job.
>>> --------------------------------------------------------------------------
>>>
>>> all the time.
>>
>> Weird - I'm not sure I understand what you are saying. Is this happening with 1.6.1 as well? Or just with 1.4.5?
>
> 1.6.1 = no nodes allocated
> 1.4.5 = one process less than requested
> 1.4.1 = works as it should
>

Well that seems strange! Can you run 1.6.1 with the following on the mpirun cmd line:

-mca ras_gridengine_debug 1 -mca ras_gridengine_verbose 10 -mca ras_base_verbose 10

My guess is that something in the pe_hostfile syntax may have changed and we didn't pick up on it.

> -- Reuti
>
>
>>
>>>
>>> ==
>>>
>>> I configured with:
>>>
>>> ./configure --prefix=$HOME/local/... --enable-static --disable-shared --with-sge
>>>
>>> and adjusted my PATHs accordingly (at least: I hope so).
>>>
>>> -- Reuti
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users