Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
From: Sofia Aparicio Secanellas (saparicio_at_[hidden])
Date: 2008-09-18 04:19:37


Hello Terry,

Yes, "edu" is the user and 10.4.5.126 is the IP address. Because both
computers have different usernames, I think that I should write the username
otherwise it does not work. In fact, on the computer 10.4.5.123 I write:

mpirun -np 2 --host 10.4.5.123,edu_at_10.4.5.126 --prefix /usr/local
./PruebaSumaParalela.out

and on the computer 10.4.5.126 I write:

mpirun -np 2 --host sofia_at_10.4.5.123,10.4.5.126 --prefix /usr/local
./PruebaSumaParalela.out

If I try only with the IP and I write on the computer 10.4.5.123:

mpirun -np 2 --host 10.4.5.123,10.4.5.126 --prefix /usr/local
./PruebaSumaParalela.out

then the computer ask me the password of sofia_at_10.4.5.126 and then I type
the password and does not work because the user is "edu" not "sofia".

I am trying to install dbx, if I can attach a debugger I will tell you.

Thank you very much.

Sofia

>>
>> Hello Terry,
>>
>> Thank you very much for your help.
>>
>>
>>> > Sofia,
>>> >
>>> > I took your program and actually ran it successfully on my systems
>>> > using Open MPI r19400. A couple questions:
>>> >
>>> > 1. Have you tried to run the program on a single node?
>>> > mpirun -np 2 --host 10.4.5.123 --prefix /usr/local
>>> > ./PruebaSumaParalela.out
>>> >
>>>
>>
>> Yes. In this case, the program works perfectly.
>>
>>
>>> > 2. Can you try and run the code the following way and is the output
>>> > different?
>>> > mpirun -np 2 --host 10.4.5.123,edu_at_10.4.5.126 --mca
>>> > mpi_preconnect_all 1 --prefix /usr/local ./PruebaSumaParalela.out
>>> >
>>>
>>
>> The program also hangs but the output is different. In both computers I
>> get the following:
>>
>> Inicio
>> Inicio
>> totalnodes:2
>> mynode:0
>> Inicio Recv
>>
>>
> Ok, so it looks like rank 1 is not getting out of MPI_Init
>>> > 3. When the program hangs can you attach a debugger to one of the
>>> > processes and print out a stack?
>>> >
>>>
>>
>> I do not know how to do that.
>>
>>
> With Solaris, I usually do the following:
> % dbx - <pid of process>
> dbx> where
> <stack prints out>
>
>>> > 4. What version of Open MPI are you using, on what type of machine,
>>> > using which OS?
>>> >
>>>
>>
>> Openmpi-1.2.2 in both computers
>>
>> In 10.4.5.123 I have:
>> Ubuntu Linux pichurra 2.6.22-15-generic #1 SMP Tue Jun 10 09:21:34 UTC
>> 2008 i686 GNU/Linux
>>
>> In edu_at_10.4.5.126 I have:
>> K-Ubuntu Linux hp1-Linux 2.6.20-16-generic #2 SMP Sun Sep 23 19:50:39 UTC
>> 2007 i686 GNU/Linux
>>
>>
> Sorry for the bonehead question but is edu_at_10.4.5.126 the actual machine
> name? Is its IP address really 10.4.5.126? Can you try that instead? I
> would guess the issue is that the tcp btl is somehow not matching the two
> nodes as being connected to each other.

No virus found in this outgoing message
Checked by PC Tools AntiVirus (4.0.0.26 - 10.100.007).
http://www.pctools.com/free-antivirus/