Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2008-09-17 11:52:12


> Date: Wed, 17 Sep 2008 16:23:59 +0200
> From: "Sofia Aparicio Secanellas" <saparicio_at_[hidden]>
> Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
> To: "Open MPI Users" <users_at_[hidden]>
> Message-ID: <0625EEFB84E04647A1930A963A8DF7E3_at_aparicio1>
> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
> reply-type=response
>
> Hello Terry,
>
> Thank you very much for your help.
>
>
>> > Sofia,
>> >
>> > I took your program and actually ran it successfully on my systems using
>> > Open MPI r19400. A couple questions:
>> >
>> > 1. Have you tried to run the program on a single node?
>> > mpirun -np 2 --host 10.4.5.123 --prefix /usr/local
>> > ./PruebaSumaParalela.out
>> >
>>
>
> Yes. In this case, the program works perfectly.
>
>
>> > 2. Can you try and run the code the following way and is the output
>> > different?
>> > mpirun -np 2 --host 10.4.5.123,edu_at_10.4.5.126 --mca mpi_preconnect_all
>> > 1 --prefix /usr/local ./PruebaSumaParalela.out
>> >
>>
>
> The program also hangs but the output is different. In both computers I get
> the following:
>
> Inicio
> Inicio
> totalnodes:2
> mynode:0
> Inicio Recv
>
>
Ok, so it looks like rank 1 is not getting out of MPI_Init
>> > 3. When the program hangs can you attach a debugger to one of the
>> > processes and print out a stack?
>> >
>>
>
> I do not know how to do that.
>
>
With Solaris, I usually do the following:
% dbx - <pid of process>
dbx> where
<stack prints out>

>> > 4. What version of Open MPI are you using, on what type of machine, using
>> > which OS?
>> >
>>
>
> Openmpi-1.2.2 in both computers
>
> In 10.4.5.123 I have:
> Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34 UTC 2008
> i686 GNU/Linux
>
> In edu_at_10.4.5.126 I have:
> K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC
> 2007 i686 GNU/Linux
>
>
Sorry for the bonehead question but is edu_at_10.4.5.126 the actual machine
name? Is its IP address really 10.4.5.126? Can you try that instead?
I would guess the issue is that the tcp btl is somehow not matching the
two nodes as being connected to each other.

--td
> Sofia