Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
From: Ralph Castain (rhc_at_[hidden])
Date: 2012-02-01 11:31:21


Ah - crud. Looks like the default-hostfile mca param isn't getting set to the default value. Will resolve - thanks!

On Feb 1, 2012, at 9:28 AM, Reuti wrote:

> Am 01.02.2012 um 17:16 schrieb Ralph Castain:
>
>> Could you add --display-allocation to your cmd line? This will tell us if it found/read the default hostfile, or if the problem is with the mapper.
>
> Sure:
>
> reuti_at_pc15370:~> mpiexec --display-allocation -np 4 ./mpihello
>
> ====================== ALLOCATED NODES ======================
>
> Data for node: Name: pc15370 Num slots: 1 Max slots: 0
>
> =================================================================
> Hello World from Node 0.
> Hello World from Node 1.
> Hello World from Node 2.
> Hello World from Node 3.
>
> (Nothing in `strace` about accessing someting with "default")
>
>
> reuti_at_pc15370:~> mpiexec --default-hostfile local/openmpi-1.4.4-thread/etc/openmpi-default-hostfile --display-allocation -np 4 ./mpihello
>
> ====================== ALLOCATED NODES ======================
>
> Data for node: Name: pc15370 Num slots: 2 Max slots: 0
> Data for node: Name: pc15381 Num slots: 2 Max slots: 0
>
> =================================================================
> Hello World from Node 0.
> Hello World from Node 3.
> Hello World from Node 2.
> Hello World from Node 1.
>
> Specifying it works fine with correct distribution in `ps`.
>
> -- Reuti
>
>
>> On Feb 1, 2012, at 7:58 AM, Reuti wrote:
>>
>>> Am 01.02.2012 um 15:38 schrieb Ralph Castain:
>>>
>>>> On Feb 1, 2012, at 3:49 AM, Reuti wrote:
>>>>
>>>>> Am 31.01.2012 um 21:25 schrieb Ralph Castain:
>>>>>
>>>>>> On Jan 31, 2012, at 12:58 PM, Reuti wrote:
>>>>>
>>>>> BTW: is there any default for a hostfile for Open MPI - I mean any in my home directory or /etc? When I check `man orte_hosts`, and all possible optiions are unset (like in a singleton run), it will only run local (Job is co-located with mpirun).
>>>>
>>>> Yep - it is <prefix>/etc/openmpi-default-hostfile
>>>
>>> Thx for replying Ralph.
>>>
>>> I spotted it too, but this is not working for me. Neither for mpiexec from the command line, nor any singleton. I also tried a plain /etc as location of this file as well.
>>>
>>> reuti_at_pc15370:~> which mpicc
>>> /home/reuti/local/openmpi-1.4.4-thread/bin/mpicc
>>> reuti_at_pc15370:~> cat /home/reuti/local/openmpi-1.4.4-thread/etc/openmpi-default-hostfile
>>> pc15370 slots=2
>>> pc15381 slots=2
>>> reuti_at_pc15370:~> mpicc -o mpihello mpihello.c
>>> reuti_at_pc15370:~> mpiexec -np 4 ./mpihello
>>> Hello World from Node 0.
>>> Hello World from Node 1.
>>> Hello World from Node 2.
>>> Hello World from Node 3.
>>>
>>> But all is local (no spawn here, traditional mpihello):
>>>
>>> 19503 ? Ss 0:00 /usr/sbin/sshd -o PidFile=/var/run/sshd.init.pid
>>> 11583 ? Ss 0:00 \_ sshd: reuti [priv]
>>> 11585 ? S 0:00 | \_ sshd: reuti_at_pts/6
>>> 11587 pts/6 Ss 0:00 | \_ -bash
>>> 13470 pts/6 S+ 0:00 | \_ mpiexec -np 4 ./mpihello
>>> 13471 pts/6 R+ 0:00 | \_ ./mpihello
>>> 13472 pts/6 R+ 0:00 | \_ ./mpihello
>>> 13473 pts/6 R+ 0:00 | \_ ./mpihello
>>> 13474 pts/6 R+ 0:00 | \_ ./mpihello
>>>
>>> -- Reuti
>>>
>>>
>>>>>> We probably aren't correctly marking the original singleton on that node, and so the mapper thinks there are still two slots available on the original node.
>>>>>
>>>>> Okay. There is something to discuss/fix. BTW: if started as singleton I get an error at the end with the program the OP provided:
>>>>>
>>>>> [pc15381:25502] [[12435,0],1] routed:binomial: Connection to lifeline [[12435,0],0] lost
>>>>
>>>> Okay, I'll take a look at it - but it may take awhile before I can address either issue as other priorities loom.
>>>>
>>>>>
>>>>> It's not the case if run by mpiexec.
>>>>>
>>>>> -- Reuti
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users