Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Spawn_multiple with tight integration to SGE grid engine
From: Reuti (reuti_at_[hidden])
Date: 2012-02-01 09:58:41

Am 01.02.2012 um 15:38 schrieb Ralph Castain:

> On Feb 1, 2012, at 3:49 AM, Reuti wrote:
>> Am 31.01.2012 um 21:25 schrieb Ralph Castain:
>>> On Jan 31, 2012, at 12:58 PM, Reuti wrote:
>> BTW: is there any default for a hostfile for Open MPI - I mean any in my home directory or /etc? When I check `man orte_hosts`, and all possible optiions are unset (like in a singleton run), it will only run local (Job is co-located with mpirun).
> Yep - it is <prefix>/etc/openmpi-default-hostfile

Thx for replying Ralph.

I spotted it too, but this is not working for me. Neither for mpiexec from the command line, nor any singleton. I also tried a plain /etc as location of this file as well.

reuti_at_pc15370:~> which mpicc
reuti_at_pc15370:~> cat /home/reuti/local/openmpi-1.4.4-thread/etc/openmpi-default-hostfile
pc15370 slots=2
pc15381 slots=2
reuti_at_pc15370:~> mpicc -o mpihello mpihello.c
reuti_at_pc15370:~> mpiexec -np 4 ./mpihello
Hello World from Node 0.
Hello World from Node 1.
Hello World from Node 2.
Hello World from Node 3.

But all is local (no spawn here, traditional mpihello):

19503 ? Ss 0:00 /usr/sbin/sshd -o PidFile=/var/run/
11583 ? Ss 0:00 \_ sshd: reuti [priv]
11585 ? S 0:00 | \_ sshd: reuti_at_pts/6
11587 pts/6 Ss 0:00 | \_ -bash
13470 pts/6 S+ 0:00 | \_ mpiexec -np 4 ./mpihello
13471 pts/6 R+ 0:00 | \_ ./mpihello
13472 pts/6 R+ 0:00 | \_ ./mpihello
13473 pts/6 R+ 0:00 | \_ ./mpihello
13474 pts/6 R+ 0:00 | \_ ./mpihello

-- Reuti

>>> We probably aren't correctly marking the original singleton on that node, and so the mapper thinks there are still two slots available on the original node.
>> Okay. There is something to discuss/fix. BTW: if started as singleton I get an error at the end with the program the OP provided:
>> [pc15381:25502] [[12435,0],1] routed:binomial: Connection to lifeline [[12435,0],0] lost
> Okay, I'll take a look at it - but it may take awhile before I can address either issue as other priorities loom.
>> It's not the case if run by mpiexec.
>> -- Reuti
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
> _______________________________________________
> users mailing list
> users_at_[hidden]