Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Daniel Gruner (dgruner_at_[hidden])
Date: 2007-04-27 10:05:31


Thanks to both you and David Gunter. I disabled pty support and
it now works.

There is still the issue of the mpirun default being "-byslot", which
causes all kinds of trouble. Only by using "-bynode" do things work
properly.

Daniel

On Thu, Apr 26, 2007 at 02:28:33PM -0600, gshipman wrote:
> There is a known issue on BProc 4 w.r.t. pty support. Open MPI by
> default will try to use ptys for I/O forwarding but will revert to
> pipes if ptys are not available.
>
> You can "safely" ignore the pty warnings, or you may want to rerun
> configure and add:
> --disable-pty-support
>
> I say "safely" because my understanding is that some I/O data may be
> lost if pipes are used during abnormal termination.
>
> Alternatively you might try getting pty support working, you need to
> configure ptys on the backend nodes.
> You can then try the following code to test if it is working
> correctly, if this fails (it does on our BProc 4 cluster) you
> shouldn't use ptys on BProc.
>
>
> #include <pty.h>
> #include <utmp.h>
> #include <stdio.h>
> #include <string.h>
> #include <errno.h>
>
> int
> main(int argc, char *agrv[])
> {
> int amaster, aslave;
>
> if (openpty(&amaster, &aslave, NULL, NULL, NULL) < 0) {
> printf("openpty() failed with errno = %d, %s\n", errno, strerror
> (errno));
> } else {
> printf("openpty() succeeded\n");
> }
>
> return 0;
> }
>
>
>
>
>
>
> On Apr 26, 2007, at 2:06 PM, Daniel Gruner wrote:
>
> > Hi
> >
> > I have been testing OpenMPI 1.2, and now 1.2.1, on several BProc-
> > based clusters, and I have found some problems/issues. All my
> > clusters have standard ethernet interconnects, either 100Base/T or
> > Gigabit, on standard switches.
> >
> > The clusters are all running Clustermatic 5 (BProc 4.x), and range
> > from 32-bit Athlon, to 32-bit Xeon, to 64-bit Opteron. In all cases
> > the same problems occur, identically. I attach here the results
> > from "ompi_info --all" and the config.log, for my latest build on
> > an Opteron cluster, using the Pathscale compilers. I had exactly
> > the same problems when using the vanilla GNU compilers.
> >
> > Now for a description of the problem:
> >
> > When running an mpi code (cpi.c, from the standard mpi examples, also
> > attached), using the mpirun defaults (e.g. -byslot), with a single
> > process:
> >
> > sonoma:dgruner{134}> mpirun -n 1 ./cpip
> > [n17:30019] odls_bproc: openpty failed, using pipes instead
> > Process 0 on n17
> > pi is approximately 3.1415926544231341, Error is 0.0000000008333410
> > wall clock time = 0.000199
> >
> > However, if one tries to run more than one process, this bombs:
> >
> > sonoma:dgruner{134}> mpirun -n 2 ./cpip
> > .
> > .
> > .
> > [n21:30029] OOB: Connection to HNP lost
> > [n21:30029] OOB: Connection to HNP lost
> > [n21:30029] OOB: Connection to HNP lost
> > [n21:30029] OOB: Connection to HNP lost
> > [n21:30029] OOB: Connection to HNP lost
> > [n21:30029] OOB: Connection to HNP lost
> > .
> > . ad infinitum
> >
> > If one uses de option "-bynode", things work:
> >
> > sonoma:dgruner{145}> mpirun -bynode -n 2 ./cpip
> > [n17:30055] odls_bproc: openpty failed, using pipes instead
> > Process 0 on n17
> > Process 1 on n21
> > pi is approximately 3.1415926544231318, Error is 0.0000000008333387
> > wall clock time = 0.010375
> >
> >
> > Note that there is always the message about "openpty failed, using
> > pipes instead".
> >
> > If I run more processes (on my 3-node cluster, with 2 cpus per
> > node), the
> > openpty message appears repeatedly for the first node:
> >
> > sonoma:dgruner{146}> mpirun -bynode -n 6 ./cpip
> > [n17:30061] odls_bproc: openpty failed, using pipes instead
> > [n17:30061] odls_bproc: openpty failed, using pipes instead
> > Process 0 on n17
> > Process 2 on n49
> > Process 1 on n21
> > Process 5 on n49
> > Process 3 on n17
> > Process 4 on n21
> > pi is approximately 3.1415926544231239, Error is 0.0000000008333307
> > wall clock time = 0.050332
> >
> >
> > Should I worry about the openpty failure? I suspect that
> > communications
> > may be slower this way. Using the -byslot option always fails, so
> > this
> > is a bug. The same occurs for all the codes that I have tried,
> > both simple
> > and complex.
> >
> > Thanks for your attention to this.
> > Regards,
> > Daniel
> > --
> >
> > Dr. Daniel Gruner dgruner_at_[hidden]
> > Dept. of Chemistry daniel.gruner_at_[hidden]
> > University of Toronto phone: (416)-978-8689
> > 80 St. George Street fax: (416)-978-5325
> > Toronto, ON M5S 3H6, Canada finger for PGP public key
> > <cpi.c.gz>
> > <config.log.gz>
> > <ompiinfo.gz>
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Dr. Daniel Gruner                        dgruner_at_[hidden]
Dept. of Chemistry                       daniel.gruner_at_[hidden]
University of Toronto                    phone:  (416)-978-8689
80 St. George Street                     fax:    (416)-978-5325
Toronto, ON  M5S 3H6, Canada             finger for PGP public key