Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Newbie: Using hostfile
From: Madireddy Samuel Vijaykumar (mad.vijay_at_[hidden])
Date: 2007-11-30 00:01:24


Our application looks like it does not use mpirun at all. But we have
"orterun" so i just tested it by run

orterun --hostfile <hostfile> hostname and it prints out this ...

[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361

and it just stay/hangs there :(

On Nov 29, 2007 6:07 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Nov 29, 2007, at 2:09 AM, Madireddy Samuel Vijaykumar wrote:
>
> > A non MPI application does run without any issues. Could eloberate on
> > what you mean by doing mpirun "hostname". You mean i just do an
> > 'mpirun lynx' in my case???
>
> No, I mean
>
> mpirun --hostfile <your_hostfile> hostname
>
> This should run the "hostname" command on each of your nodes. If
> running "hostname" doesn't work after changing the order, then
> something is very wrong. If it *does* work, it implies something that
> there is faulty in the MPI startup (which is more complicated than
> starting up non-MPI applications).
>
>
> >
> > On Nov 28, 2007 9:57 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> >> Well, that's odd.
> >>
> >> What happens if you try to mpirun "hostname" (i.e., a non-MPI
> >> application)? Does it run, or does it hang?
> >>
> >>
> >>
> >> On Nov 23, 2007, at 6:00 AM, Madireddy Samuel Vijaykumar wrote:
> >>
> >>> I have been using using clusters for some tests. My localhost "lynx"
> >>> and i have "puma" and "tiger" which make up the cluster. All have
> >>> passwordless ssh enabled. Now if i have the following in my
> >>> hostfile(perline in the same order)
> >>>
> >>> lynx
> >>> puma
> >>> tiger
> >>>
> >>> My tests(from lynx) run over the cluster without any issues.
> >>>
> >>> But if move/remove the lynx from there either (perline in the same
> >>> order)
> >>>
> >>> puma
> >>> lynx
> >>> tiger
> >>>
> >>> or
> >>>
> >>> puma
> >>> tiger
> >>>
> >>> My test(from lynx) just does not get any where. It just hangs. And
> >>> does not proceed at all. Is this an issue with way my script handles
> >>> the cluster node. Or is there an method for the hostfile. Thanks.
> >>>
> >>> --
> >>> Sam aka Vijju
> >>> :)~
> >>> Linux: Open, True and Cool
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
> >
> >
> > --
> > Sam aka Vijju
> > :)~
> > Linux: Open, True and Cool
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Sam aka Vijju
:)~
Linux: Open, True and Cool