Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Newbie: Using hostfile
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-12-01 06:40:36


FWIW, orterun is exactly the same as mpirun (one is a sym link to the
other).

This smacks of having a mismatch of Open MPI versions on different
nodes.

Can you verify that default version of Open MPI that is being found on
all your nodes is the same?

On Nov 30, 2007, at 12:01 AM, Madireddy Samuel Vijaykumar wrote:

> Our application looks like it does not use mpirun at all. But we have
> "orterun" so i just tested it by run
>
> orterun --hostfile <hostfile> hostname and it prints out this ...
>
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file dss/dss_unpack.c at line 90
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file gpr_replica_cmd_processor.c at line 361
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file dss/dss_unpack.c at line 90
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file gpr_replica_cmd_processor.c at line 361
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file dss/dss_unpack.c at line 90
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file gpr_replica_cmd_processor.c at line 361
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file dss/dss_unpack.c at line 90
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file gpr_replica_cmd_processor.c at line 361
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file dss/dss_unpack.c at line 90
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file gpr_replica_cmd_processor.c at line 361
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file dss/dss_unpack.c at line 90
> [lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
> in file gpr_replica_cmd_processor.c at line 361
>
> and it just stay/hangs there :(
>
> On Nov 29, 2007 6:07 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>> On Nov 29, 2007, at 2:09 AM, Madireddy Samuel Vijaykumar wrote:
>>
>>> A non MPI application does run without any issues. Could eloberate
>>> on
>>> what you mean by doing mpirun "hostname". You mean i just do an
>>> 'mpirun lynx' in my case???
>>
>> No, I mean
>>
>> mpirun --hostfile <your_hostfile> hostname
>>
>> This should run the "hostname" command on each of your nodes. If
>> running "hostname" doesn't work after changing the order, then
>> something is very wrong. If it *does* work, it implies something
>> that
>> there is faulty in the MPI startup (which is more complicated than
>> starting up non-MPI applications).
>>
>>
>>>
>>> On Nov 28, 2007 9:57 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
>>>> Well, that's odd.
>>>>
>>>> What happens if you try to mpirun "hostname" (i.e., a non-MPI
>>>> application)? Does it run, or does it hang?
>>>>
>>>>
>>>>
>>>> On Nov 23, 2007, at 6:00 AM, Madireddy Samuel Vijaykumar wrote:
>>>>
>>>>> I have been using using clusters for some tests. My localhost
>>>>> "lynx"
>>>>> and i have "puma" and "tiger" which make up the cluster. All have
>>>>> passwordless ssh enabled. Now if i have the following in my
>>>>> hostfile(perline in the same order)
>>>>>
>>>>> lynx
>>>>> puma
>>>>> tiger
>>>>>
>>>>> My tests(from lynx) run over the cluster without any issues.
>>>>>
>>>>> But if move/remove the lynx from there either (perline in the same
>>>>> order)
>>>>>
>>>>> puma
>>>>> lynx
>>>>> tiger
>>>>>
>>>>> or
>>>>>
>>>>> puma
>>>>> tiger
>>>>>
>>>>> My test(from lynx) just does not get any where. It just hangs. And
>>>>> does not proceed at all. Is this an issue with way my script
>>>>> handles
>>>>> the cluster node. Or is there an method for the hostfile. Thanks.
>>>>>
>>>>> --
>>>>> Sam aka Vijju
>>>>> :)~
>>>>> Linux: Open, True and Cool
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>>
>>>> --
>>>> Jeff Squyres
>>>> Cisco Systems
>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>
>>>
>>>
>>>
>>> --
>>> Sam aka Vijju
>>> :)~
>>> Linux: Open, True and Cool
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>> --
>> Jeff Squyres
>> Cisco Systems
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>
>
>
> --
> Sam aka Vijju
> :)~
> Linux: Open, True and Cool
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems