Hello, i ve corrected the syntax and added the flag you suggested, but unfortunately the result doen't change.

randori ~ # mpirun --display-map --mca btl tcp,self  -np 2 -host randori,tatami graph
[randori:22322]  Map for job: 1    Generated by mapping mode: byslot
     Starting vpid: 0    Vpid range: 2    Num app_contexts: 1
     Data for app_context: index 0    app: graph
         Num procs: 2
         Argv[0]: graph
         Env[0]: OMPI_MCA_btl=tcp,self
         Env[1]: OMPI_MCA_rmaps_base_display_map=1
         Env[2]: OMPI_MCA_orte_precondition_transports=d45d47f6e1ed0e0b-691fd7f24609dec3
         Env[3]: OMPI_MCA_rds=proxy
         Env[4]: OMPI_MCA_ras=proxy
         Env[5]: OMPI_MCA_rmaps=proxy
         Env[6]: OMPI_MCA_pls=proxy
         Env[7]: OMPI_MCA_rmgr=proxy
         Working dir: /root (user: 0)
         Num maps: 1
         Data for app_context_map: Type: 1    Data: randori,tatami
     Num elements in nodes list: 2
     Mapped node:
         Cell: 0    Nodename: randori    Launch id: -1    Username: NULL
         Daemon name:
             Data type: ORTE_PROCESS_NAME    Data Value: NULL
         Oversubscribed: False    Num elements in procs list: 1
         Mapped proc:
             Proc Name:
             Data type: ORTE_PROCESS_NAME    Data Value: [0,1,0]
             Proc Rank: 0    Proc PID: 0    App_context index: 0

     Mapped node:
         Cell: 0    Nodename: tatami    Launch id: -1    Username: NULL
         Daemon name:
             Data type: ORTE_PROCESS_NAME    Data Value: NULL
         Oversubscribed: False    Num elements in procs list: 1
         Mapped proc:
             Proc Name:
             Data type: ORTE_PROCESS_NAME    Data Value: [0,1,1]
             Proc Rank: 1    Proc PID: 0    App_context index: 0
Master thread reporting
matrix size 33554432 kB, time is in [us]

(and then it just hangs)

Vittorio

On Fri, Feb 27, 2009 at 6:00 PM, <users-request@open-mpi.org> wrote:

Date: Fri, 27 Feb 2009 08:22:17 -0700
From: Ralph Castain <rhc@lanl.gov>
Subject: Re: [OMPI users] TCP instead of openIB doesn't work
To: Open MPI Users <users@open-mpi.org>
Message-ID: <E3C4683C-1F97-4558-AB68-006E39A8334B@lanl.gov>
Content-Type: text/plain; charset=US-ASCII; format=flowed; delsp=yes

I'm not entirely sure what is causing the problem here, but one thing
does stand out. You have specified two -host options for the same
application - this is not our normal syntax. The usual way of
specifying this would be:

mpirun  --mca btl tcp,self  -np 2 -host randori,tatami hostname

I'm not entirely sure what OMPI does when it gets two separate -host
arguments - could be equivalent to the above syntax, but could also
cause some unusual behavior.

Could you retry your job with the revised syntax? Also, could you add
--display-map to your mpirun cmd line? This will tell us where OMPI
thinks the procs are going, and a little info about how it interpreted
your cmd line.

Thanks
Ralph


On Feb 27, 2009, at 8:00 AM, Vittorio Giovara wrote:

> Hello, i'm posting here another problem of my installation
> I wanted to benchmark the differences between tcp and openib transport
>
> if i run a simple non mpi application i get
> randori ~ # mpirun  --mca btl tcp,self  -np 2 -host randori -host
> tatami hostname
> randori
> tatami
>
> but as soon as i switch to my benchmark program i have
> mpirun  --mca btl tcp,self  -np 2 -host randori -host tatami graph
> Master thread reporting
> matrix size 33554432 kB, time is in [us]
>
> and instead of starting the send/receive functions it just hangs
> there; i also checked the transmitted packets with wireshark but
> after the handshake no more packets are exchanged
>
> I read in the archives that there were some problems in this area
> and so i tried what was suggested in previous emails
>
> mpirun --mca btl ^openib  -np 2 -host randori -host tatami graph
> mpirun --mca pml ob1  --mca btl tcp,self  -np 2 -host randori -host
> tatami graph
>
> gives exactly the same output as before (no mpisend/receive)
> while the next commands gives something more interesting
>
> mpirun --mca pml cm  --mca btl tcp,self  -np 2 -host randori -host
> tatami graph
> --------------------------------------------------------------------------
> No available pml components were found!
>
> This means that there are no components of this type installed on your
> system or all the components reported that they could not be used.
>
> This is a fatal error; your MPI process is likely to abort.  Check the
> output of the "ompi_info" command and ensure that components of this
> type are available on your system.  You may also wish to check the
> value of the "component_path" MCA parameter and ensure that it has at
> least one directory that contains valid MCA components.
>
> --------------------------------------------------------------------------
> [tatami:06619] PML cm cannot be selected
> mpirun noticed that job rank 0 with PID 6710 on node randori exited
> on signal 15 (Terminated).
>
> which is not possible as if i do ompi_info --param all there is the
> CM pml component
>
>                  MCA pml: cm (MCA v1.0, API v1.0, Component v1.2.8)
>                  MCA pml: ob1 (MCA v1.0, API v1.0, Component v1.2.8)
>
>
> my test program is quite simple, just a couple of MPI_Send and
> MPI_Recv (just after the signature)
> do you have any ideas that might help me?
> thanks a lot
> Vittorio
>
> ========================
> #include "mpi.h"
> #include <stdio.h>
> #include <stdlib.h>
> #include <string.h>
> #include <math.h>
>
> #define M_COL 4096
> #define M_ROW 524288
> #define NUM_MSG 25
>
> unsigned long int  gigamatrix[M_ROW][M_COL];
>
> int main (int argc, char *argv[]) {
>     int numtasks, rank, dest, source, rc, tmp, count, tag=1;
>     unsigned long int  exp, exchanged;
>     unsigned long int i, j, e;
>     unsigned long matsize;
>     MPI_Status Stat;
>     struct timeval timing_start, timing_end;
>     double inittime = 0;
>     long int totaltime = 0;
>
>     MPI_Init (&argc, &argv);
>     MPI_Comm_size (MPI_COMM_WORLD, &numtasks);
>     MPI_Comm_rank (MPI_COMM_WORLD, &rank);
>
>
>     if (rank == 0) {
>         fprintf (stderr, "Master thread reporting\n", numtasks - 1);
>         matsize = (long) M_COL * M_ROW / 64;
>         fprintf (stderr, "matrix size %d kB, time is in [us]\n",
> matsize);
>
>         source = 1;
>         dest = 1;
>
>         /*warm up phase*/
>         rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
>         rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> MPI_COMM_WORLD, &Stat);
>         rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
>         rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
>         rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> MPI_COMM_WORLD, &Stat);
>         rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag, MPI_COMM_WORLD);
>
>         for (e = 0; e < NUM_MSG; e++) {
>             exp = pow (2, e);
>             exchanged = 64 * exp;
>
>             /*timing of ops*/
>             gettimeofday (&timing_start, NULL);
>             rc = MPI_Send (&gigamatrix[0], exchanged,
> MPI_UNSIGNED_LONG, dest, tag, MPI_COMM_WORLD);
>             rc = MPI_Recv (&gigamatrix[0], exchanged,
> MPI_UNSIGNED_LONG, source, tag, MPI_COMM_WORLD, &Stat);
>             gettimeofday (&timing_end, NULL);
>
>             totaltime = (timing_end.tv_sec - timing_start.tv_sec) *
> 1000000 + (timing_end.tv_usec - timing_start.tv_usec);
>             memset (&timing_start, 0, sizeof(struct timeval));
>             memset (&timing_end, 0, sizeof(struct timeval));
>             fprintf (stdout, "%d kB\t%d\n", exp, totaltime);
>         }
>
>         fprintf(stderr, "task complete\n");
>
>     } else {
>         if (rank >= 1) {
>             dest = 0;
>             source = 0;
>
>             rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> MPI_COMM_WORLD, &Stat);
>             rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag,
> MPI_COMM_WORLD);
>             rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> MPI_COMM_WORLD, &Stat);
>             rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> MPI_COMM_WORLD, &Stat);
>             rc = MPI_Send (&tmp, 1, MPI_INT, dest, tag,
> MPI_COMM_WORLD);
>             rc = MPI_Recv (&tmp, 1, MPI_INT, source, tag,
> MPI_COMM_WORLD, &Stat);
>
>             for (e = 0; e < NUM_MSG; e++) {
>                 exp = pow (2, e);
>                 exchanged = 64 * exp;
>
>                 rc = MPI_Recv (&gigamatrix[0], (unsigned)
> exchanged, MPI_UNSIGNED_LONG, source, tag, MPI_COMM_WORLD, &Stat);
>                 rc = MPI_Send (&gigamatrix[0], (unsigned)
> exchanged, MPI_UNSIGNED_LONG, dest, tag, MPI_COMM_WORLD);
>
>             }
>         }
>     }
>
>     MPI_Finalize ();
>
>     return 0;
> }
>
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users