Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
From: Sofia Aparicio Secanellas (saparicio_at_[hidden])
Date: 2008-09-18 07:12:46


Hello Terry,

Finally, I have installed dbx. I enclose a file with the result that I get
when I type "dbx - PID of mpirun..." and then "where" on computer 10.4.5.123
.

Do you have any idea what could be the problem?

Thanks a lot!!

Sofia

----- Original Message -----
From: "Terry Dontje" <Terry.Dontje_at_[hidden]>
To: <users_at_[hidden]>
Sent: Wednesday, September 17, 2008 5:52 PM
Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv

>
>> Date: Wed, 17 Sep 2008 16:23:59 +0200
>> From: "Sofia Aparicio Secanellas" <saparicio_at_[hidden]>
>> Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
>> To: "Open MPI Users" <users_at_[hidden]>
>> Message-ID: <0625EEFB84E04647A1930A963A8DF7E3_at_aparicio1>
>> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>> reply-type=response
>>
>> Hello Terry,
>>
>> Thank you very much for your help.
>>
>>
>>> > Sofia,
>>> >
>>> > I took your program and actually ran it successfully on my systems
>>> > using Open MPI r19400. A couple questions:
>>> >
>>> > 1. Have you tried to run the program on a single node?
>>> > mpirun -np 2 --host 10.4.5.123 --prefix /usr/local
>>> > ./PruebaSumaParalela.out
>>> >
>>>
>>
>> Yes. In this case, the program works perfectly.
>>
>>
>>> > 2. Can you try and run the code the following way and is the output
>>> > different?
>>> > mpirun -np 2 --host 10.4.5.123,edu_at_10.4.5.126 --mca
>>> > mpi_preconnect_all 1 --prefix /usr/local ./PruebaSumaParalela.out
>>> >
>>>
>>
>> The program also hangs but the output is different. In both computers I
>> get the following:
>>
>> Inicio
>> Inicio
>> totalnodes:2
>> mynode:0
>> Inicio Recv
>>
>>
> Ok, so it looks like rank 1 is not getting out of MPI_Init
>>> > 3. When the program hangs can you attach a debugger to one of the
>>> > processes and print out a stack?
>>> >
>>>
>>
>> I do not know how to do that.
>>
>>
> With Solaris, I usually do the following:
> % dbx - <pid of process>
> dbx> where
> <stack prints out>
>
>>> > 4. What version of Open MPI are you using, on what type of machine,
>>> > using which OS?
>>> >
>>>
>>
>> Openmpi-1.2.2 in both computers
>>
>> In 10.4.5.123 I have:
>> Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34 UTC
>> 2008 i686 GNU/Linux
>>
>> In edu_at_10.4.5.126 I have:
>> K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC
>> 2007 i686 GNU/Linux
>>
>>
> Sorry for the bonehead question but is edu_at_10.4.5.126 the actual machine
> name? Is its IP address really 10.4.5.126? Can you try that instead? I
> would guess the issue is that the tcp btl is somehow not matching the two
> nodes as being connected to each other.
>
> --td
>> Sofia
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> No virus found in this incoming message
> Checked by PC Tools AntiVirus (4.0.0.26 - 10.100.007).
> http://www.pctools.com/free-antivirus/

No virus found in this outgoing message
Checked by PC Tools AntiVirus (4.0.0.26 - 10.100.007).
http://www.pctools.com/free-antivirus/