Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-09-17 11:59:49


Additionally, since you technically have a heterogeneous situation
(different OS versions on each node), you might want to:

- compile and install OMPI separately on each node (preferably in the
same filesystem location, though)
- compile and install your MPI app separately on each node (preferably
in the same filesystem location)

You *could* be seeing differences between libc on each node, etc.

On Sep 17, 2008, at 11:52 AM, Terry Dontje wrote:

>
>> Date: Wed, 17 Sep 2008 16:23:59 +0200
>> From: "Sofia Aparicio Secanellas" <saparicio_at_[hidden]>
>> Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
>> To: "Open MPI Users" <users_at_[hidden]>
>> Message-ID: <0625EEFB84E04647A1930A963A8DF7E3_at_aparicio1>
>> Content-Type: text/plain; format=flowed; charset="iso-8859-1";
>> reply-type=response
>>
>> Hello Terry,
>>
>> Thank you very much for your help.
>>
>>
>>> > Sofia,
>>> >
>>> > I took your program and actually ran it successfully on my
>>> systems using > Open MPI r19400. A couple questions:
>>> >
>>> > 1. Have you tried to run the program on a single node?
>>> > mpirun -np 2 --host 10.4.5.123 --prefix /usr/local > ./
>>> PruebaSumaParalela.out
>>> >
>>>
>>
>> Yes. In this case, the program works perfectly.
>>
>>
>>> > 2. Can you try and run the code the following way and is the
>>> output > different?
>>> > mpirun -np 2 --host 10.4.5.123,edu_at_10.4.5.126 --mca
>>> mpi_preconnect_all > 1 --prefix /usr/local ./PruebaSumaParalela.out
>>> >
>>>
>>
>> The program also hangs but the output is different. In both
>> computers I get the following:
>>
>> Inicio
>> Inicio
>> totalnodes:2
>> mynode:0
>> Inicio Recv
>>
>>
> Ok, so it looks like rank 1 is not getting out of MPI_Init
>>> > 3. When the program hangs can you attach a debugger to one of
>>> the > processes and print out a stack?
>>> >
>>>
>>
>> I do not know how to do that.
>>
>>
> With Solaris, I usually do the following:
> % dbx - <pid of process>
> dbx> where
> <stack prints out>
>
>>> > 4. What version of Open MPI are you using, on what type of
>>> machine, using > which OS?
>>> >
>>>
>>
>> Openmpi-1.2.2 in both computers
>>
>> In 10.4.5.123 I have:
>> Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34
>> UTC 2008 i686 GNU/Linux
>>
>> In edu_at_10.4.5.126 I have:
>> K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23
>> 19:50:39 UTC 2007 i686 GNU/Linux
>>
>>
> Sorry for the bonehead question but is edu_at_10.4.5.126 the actual
> machine name? Is its IP address really 10.4.5.126? Can you try
> that instead? I would guess the issue is that the tcp btl is
> somehow not matching the two nodes as being connected to each other.
>
> --td
>> Sofia
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems