Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] TCP instead of openIB doesn't work
From: Vittorio Giovara (vitto.giova_at_[hidden])
Date: 2009-02-27 17:46:03


Hello, i ve corrected the syntax and added the flag you suggested, but
unfortunately the result doen't change.

randori ~ # mpirun --display-map --mca btl tcp,self -np 2 -host
randori,tatami graph
[randori:22322] Map for job: 1 Generated by mapping mode: byslot
     Starting vpid: 0 Vpid range: 2 Num app_contexts: 1
     Data for app_context: index 0 app: graph
         Num procs: 2
         Argv[0]: graph
         Env[0]: OMPI_MCA_btl=tcp,self
         Env[1]: OMPI_MCA_rmaps_base_display_map=1
         Env[2]:
OMPI_MCA_orte_precondition_transports=d45d47f6e1ed0e0b-691fd7f24609dec3
         Env[3]: OMPI_MCA_rds=proxy
         Env[4]: OMPI_MCA_ras=proxy
         Env[5]: OMPI_MCA_rmaps=proxy
         Env[6]: OMPI_MCA_pls=proxy
         Env[7]: OMPI_MCA_rmgr=proxy
         Working dir: /root (user: 0)
         Num maps: 1
         Data for app_context_map: Type: 1 Data: randori,tatami
     Num elements in nodes list: 2
     Mapped node:
         Cell: 0 Nodename: randori Launch id: -1 Username: NULL
         Daemon name:
             Data type: ORTE_PROCESS_NAME Data Value: NULL
         Oversubscribed: False Num elements in procs list: 1
         Mapped proc:
             Proc Name:
             Data type: ORTE_PROCESS_NAME Data Value: [0,1,0]
             Proc Rank: 0 Proc PID: 0 App_context index: 0

     Mapped node:
         Cell: 0 Nodename: tatami Launch id: -1 Username: NULL
         Daemon name:
             Data type: ORTE_PROCESS_NAME Data Value: NULL
         Oversubscribed: False Num elements in procs list: 1
         Mapped proc:
             Proc Name:
             Data type: ORTE_PROCESS_NAME Data Value: [0,1,1]
             Proc Rank: 1 Proc PID: 0 App_context index: 0
Master thread reporting
matrix size 33554432 kB, time is in [us]

(and then it just hangs)

Vittorio

On Fri, Feb 27, 2009 at 6:00 PM, <users-request_at_[hidden]> wrote:

>
> Date: Fri, 27 Feb 2009 08:22:17 -0700
> From: Ralph Castain <rhc_at_[hidden]>
> Subject: Re: [OMPI users] TCP instead of openIB doesn't work
> To: Open MPI Users <users_at_[hidden]>
> Message-ID: <E3C4683C-1F97-4558-AB68-006E39A8334B_at_[hidden]>
> Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes
>
> I'm not entirely sure what is causing the problem here, but one thing
> does stand out. You have specified two -host options for the same
> application - this is not our normal syntax. The usual way of
> specifying this would be:
>
> mpirun --mca btl tcp,self -np 2 -host randori,tatami hostname
>
> I'm not entirely sure what OMPI does when it gets two separate -host
> arguments - could be equivalent to the above syntax, but could also
> cause some unusual behavior.
>
> Could you retry your job with the revised syntax? Also, could you add
> --display-map to your mpirun cmd line? This will tell us where OMPI
> thinks the procs are going, and a little info about how it interpreted
> your cmd line.
>
> Thanks
> Ralph
>
>
> On Feb 27, 2009, at 8:00 AM, Vittorio Giovara wrote:
>
> > Hello, i'm posting here another problem of my installation
> > I wanted to benchmark the differences between tcp and openib transport
> >
> > if i run a simple non mpi application i get
> > randori ~ # mpirun --mca btl tcp,self -np 2 -host randori -host
> > tatami hostname
> > randori
> > tatami
> >
> > but as soon as i switch to my benchmark program i have
> > mpirun --mca btl tcp,self -np 2 -host randori -host tatami graph
> > Master thread reporting
> > matrix size 33554432 kB, time is in [us]
> >
> > and instead of starting the send/receive functions it just hangs
> > there; i also checked the transmitted packets with wireshark but
> > after the handshake no more packets are exchanged
> >
> > I read in the archives that there were some problems in this area
> > and so i tried what was suggested in previous emails
> >
> > mpirun --mca btl ^openib -np 2 -host randori -host tatami graph
> > mpirun --mca pml ob1 --mca btl tcp,self -np 2 -host randori -host
> > tatami graph
> >
> > gives exactly the same output as before (no mpisend/receive)
> > while the next commands gives something more interesting
> >
> > mpirun --mca pml cm --mca btl tcp,self -np 2 -host randori -host
> > tatami graph
> >
> --------------------------------------------------------------------------
> > No available pml components were found!
> >
> > This means that there are no components of this type installed on your
> > system or all the components reported that they could not be used.
> >
> > This is a fatal error; your MPI process is likely to abort. Check the
> > output of the "ompi_info" command and ensure that components of this
> > type are available on your system. You may also wish to check the
> > value of the "component_path" MCA parameter and ensure that it has at
> > least one directory that contains valid MCA components.
> >
> >
> --------------------------------------------------------------------------
> > [tatami:06619] PML cm cannot be selected
> > mpirun noticed that job rank 0 with PID 6710 on node randori exited
> > on signal 15 (Terminated).
> >
> > which is not possible as if i do ompi_info --param all there is the
> > CM pml component
> >
> > MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.8)
> > MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.8)
> >
> >
> > my test program is quite simple, just a couple of MPI_Send and
> > MPI_Recv (just after the signature)
> > do you have any ideas that might help me?
> > thanks a lot
> > Vittorio
> >
> > ========================
> > #include "mpi.h"
> > #include <stdio.h>
> > #include <stdlib.h>
> > #include <string.h>
> > #include <math.h>
> >
> > #define M_COL 4096
> > #define M_ROW 524288
> > #define NUM_MSG 25
> >
> > unsigned long int gigamatrix[M_ROW][M_COL];
> >
> > int main (int argc, char *argv[]) {
> > int numtasks, rank, dest, source, rc, tmp, count, tag=1;
> > unsigned long int exp, exchanged;
> > unsigned long int i, j, e;
> > unsigned long matsize;
> > MPI_Status Stat;
> > struct timeval timing_start, timing_end;
> > double inittime = 0;
> > long int totaltime = 0;
> >
> > MPI_Init (&argc, &argv);
> > MPI_Comm_size (MPI_COMM_WORLD, &numtasks);
> > MPI_Comm_rank (MPI_COMM_WORLD, &rank);
> >
> >
> > if (rank == 0) {
> > fprintf (stderr, "Master thread reporting\n", numtasks - 1);
> > matsize = (long) M_COL * M_ROW / 64;
> > fprintf (stderr, "matrix size %d kB, time is in [us]\n",
> > matsize);
> >
> > source = 1;
> > dest = 1;
> >
> > /*warm up phase*/
> > rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
> > rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> > rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
> > rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
> > rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> > rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
> >
> > for (e = 0; e < NUM_MSG; e++) {
> > exp = pow (2, e);
> > exchanged = 64 * exp;
> >
> > /*timing of ops*/
> > gettimeofday (&timing_start, NULL);
> > rc = MPI_Send (&gigamatrix[0], exchanged,
> > MPI_UNSIGNED_LONG, dest, tag, MPI_COMM_WORLD);
> > rc = MPI_Recv (&gigamatrix[0], exchanged,
> > MPI_UNSIGNED_LONG, source, tag, MPI_COMM_WORLD, &Stat);
> > gettimeofday (&timing_end, NULL);
> >
> > totaltime = (timing_end.tv_sec - timing_start.tv_sec) *
> > 1000000 + (timing_end.tv_usec - timing_start.tv_usec);
> > memset (&timing_start, 0, sizeof(struct timeval));
> > memset (&timing_end, 0, sizeof(struct timeval));
> > fprintf (stdout, "%d kB\t%d\n", exp, totaltime);
> > }
> >
> > fprintf(stderr, "task complete\n");
> >
> > } else {
> > if (rank >= 1) {
> > dest = 0;
> > source = 0;
> >
> > rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> > rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag,
> > MPI_COMM_WORLD);
> > rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> > rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> > rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag,
> > MPI_COMM_WORLD);
> > rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> > MPI_COMM_WORLD, &Stat);
> >
> > for (e = 0; e < NUM_MSG; e++) {
> > exp = pow (2, e);
> > exchanged = 64 * exp;
> >
> > rc = MPI_Recv (&gigamatrix[0], (unsigned)
> > exchanged, MPI_UNSIGNED_LONG, source, tag, MPI_COMM_WORLD, &Stat);
> > rc = MPI_Send (&gigamatrix[0], (unsigned)
> > exchanged, MPI_UNSIGNED_LONG, dest, tag, MPI_COMM_WORLD);
> >
> > }
> > }
> > }
> >
> > MPI_Finalize ();
> >
> > return 0;
> > }
> >
> >
> > _______________________________________________
> > users mailing list
> > users_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/users
>