Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Newbie: Using hostfile
From: Madireddy Samuel Vijaykumar (mad.vijay_at_[hidden])
Date: 2007-11-30 00:01:24


Our application looks like it does not use mpirun at all. But we have
"orterun" so i just tested it by run

orterun --hostfile <hostfile> hostname and it prints out this ...

[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file dss/dss_unpack.c at line 90
[lynx:21319] [0,0,0] ORTE_ERROR_LOG: Data unpack had inadequate space
in file gpr_replica_cmd_processor.c at line 361

and it just stay/hangs there :(

On Nov 29, 2007 6:07 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> On Nov 29, 2007, at 2:09 AM, Madireddy Samuel Vijaykumar wrote:
>
> > A non MPI application does run without any issues. Could eloberate on
> > what you mean by doing mpirun "hostname". You mean i just do an
> > 'mpirun lynx' in my case???
>
> No, I mean
>
> mpirun --hostfile <your_hostfile> hostname
>
> This should run the "hostname" command on each of your nodes. If
> running "hostname" doesn't work after changing the order, then
> something is very wrong. If it *does* work, it implies something that
> there is faulty in the MPI startup (which is more complicated than
> starting up non-MPI applications).
>
>
> >
> > On Nov 28, 2007 9:57 PM, Jeff Squyres <jsquyres_at_[hidden]> wrote:
> >> Well, that's odd.
> >>
> >> What happens if you try to mpirun "hostname" (i.e., a non-MPI
> >> application)? Does it run, or does it hang?
> >>
> >>
> >>
> >> On Nov 23, 2007, at 6:00 AM, Madireddy Samuel Vijaykumar wrote:
> >>
> >>> I have been using using clusters for some tests. My localhost "lynx"
> >>> and i have "puma" and "tiger" which make up the cluster. All have
> >>> passwordless ssh enabled. Now if i have the following in my
> >>> hostfile(perline in the same order)
> >>>
> >>> lynx
> >>> puma
> >>> tiger
> >>>
> >>> My tests(from lynx) run over the cluster without any issues.
> >>>
> >>> But if move/remove the lynx from there either (perline in the same
> >>> order)
> >>>
> >>> puma
> >>> lynx
> >>> tiger
> >>>
> >>> or
> >>>
> >>> puma
> >>> tiger
> >>>
> >>> My test(from lynx) just does not get any where. It just hangs. And
> >>> does not proceed at all. Is this an issue with way my script handles
> >>> the cluster node. Or is there an method for the hostfile. Thanks.
> >>>
> >>> --
> >>> Sam aka Vijju
> >>> :)~
> >>> Linux: Open, True and Cool
> >>> _______________________________________________
> >>> users mailing list
> >>> users_at_[hidden]
> >>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >>
> >> --
> >> Jeff Squyres
> >> Cisco Systems
> >>
> >> _______________________________________________
> >> users mailing list
> >> users_at_[hidden]
> >> http://www.open-mpi.org/mailman/listinfo.cgi/users
> >>
> >
> >
> >
> > --
> > Sam aka Vijju
> > :)~
> > Linux: Open, True and Cool
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> --
> Jeff Squyres
> Cisco Systems
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>

-- 
Sam aka Vijju
:)~
Linux: Open, True and Cool