Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
From: Terry Dontje (Terry.Dontje_at_[hidden])
Date: 2008-09-23 10:23:22


Hello Sofia,

After talking with another OMPI member can you humor me and do
"/sbin/iptables -L" on both your machines. You'll need to be root to
do such.

--td

Date: Tue, 23 Sep 2008 06:02:30 -0400
From: Terry Dontje <Terry.Dontje_at_[hidden]>
Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
To: users_at_[hidden]
Message-ID: <48D8BEB6.8040901_at_[hidden]>
Content-Type: text/plain; format=flowed; charset=ISO-8859-1

Hello Sofia, Looking at your stack trace it is what I thought was
happening and that is one process is stuck trying to connect to the
other. The stack unfortunately does not give enough information as to
why. The only suggestion I could give is walk through a debuggable
version of the code from ompi_init_do_preconnect and see if you can find
where the process is calling connect and see if the connect call is
failing. If you don't have a firewall I am not sure what is then
blocking the connection from happening. Either the address somehow is
being mashed up or something else. --td Date: Mon, 22 Sep 2008 10:49:41
+0200 From: "Sofia Aparicio Secanellas" <saparicio_at_[hidden]>
Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv To: "Open
MPI Users" <users_at_[hidden]> Message-ID:
<2F607CC2B43A422B80CEBBD540BFFE8B_at_aparicio1> Content-Type: text/plain;
charset="iso-8859-1"; Format="flowed" Hello Terry, I do not have an
active firewall. I have typed on both computers: netstat -lnut I enclose
you the results. I have also written on both computers: mpirun -np 2
--host 10.1.10.208,10.1.10.240 --mca mpi_preconnect_all 1 --prefix
/usr/local -mca btl self,tcp -mca btl_tcp_if_include eth1
./PruebaSumaParalela.out I enclose you the results. Thank you. Sofia
----- Original Message ----- From: "Terry Dontje" <Terry.Dontje_at_[hidden]>
To: <users_at_[hidden]> Sent: Friday, September 19, 2008 7:54 PM
Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv

>> > > Hello Sofia,
>> > >
>> > > After further reflection I wonder if you have a firewall that is
>> > > preventing connections to certain ports.
>> > >
>> > > --td
>> > >
>> > > Terry Dontje wrote:
>>
> >
>
>>>> >> >> Hello Sofia,
>>>> >> >>
>>>> >> >> Ok, so I really wanted the stack of when you run with "-mca
>>>> >> >> mpi_preconnect_all 1" I believe you'll see that one of the processes
>>>> >> >> will be in init. However, the stack still probably will not help me help
>>>> >> >> you. What needs to happen is to step through the code in dbx while the
>>>> >> >> connection is trying to be established. I am hoping you might find the
>>>> >> >> connect call fails or that we've been given an interface that somehow
>>>> >> >> cannot reach the other node. However, when you specified "-mca
>>>> >> >> btl_tcp_if_include eth1" that should have forced things to use the
>>>> >> >> interface you need. So it really comes down to why are we not connecting
>>>> >> >> to the eth1 address? Are we failing on routing to that address or is the
>>>> >> >> connect failing because we are trying to use a port that we are not
>>>> >> >> really allowed to use or is it something else?
>>>> >> >>
>>>> >> >> I don't think it is a routing problem since you are able to reach each
>>>> >> >> node via ssh. Is there someone else on the list that might want to lend
>>>> >> >> a hand here? I feel like I am missing something obvious going on here.
>>>> >> >>
>>>> >> >> --td
>>>>
>> >>
>>
>>>>>> >>> >>> Date: Fri, 19 Sep 2008 16:09:11 +0200
>>>>>> >>> >>> From: "Sofia Aparicio Secanellas" <saparicio_at_[hidden]>
>>>>>> >>> >>> Subject: Re: [OMPI users] Problem with MPI_Send and MPI_Recv
>>>>>> >>> >>> To: "Open MPI Users" <users_at_[hidden]>
>>>>>> >>> >>> Message-ID: <1BBF50FE29F743B5829CC3785F47CADD_at_aparicio1>
>>>>>> >>> >>> Content-Type: text/plain; charset="iso-8859-1"; Format="flowed"
>>>>>> >>> >>>
>>>>>> >>> >>> Hello Terry,
>>>>>> >>> >>>
>>>>>> >>> >>> I have installed 1.2.7 and I obtain the same result.
>>>>>> >>> >>>
>>>>>> >>> >>> I will explain you what I have done.
>>>>>> >>> >>>
>>>>>> >>> >>> 1. On my computer edu_at_10.1.10.240 I have added a new user called sofia.
>>>>>> >>> >>> This way I have sofia_at_10.1.10.208 and sofia_at_10.1.10.240.
>>>>>> >>> >>> 2. I have downloaded the openmpi 1.2.7 from the openmpi website on both
>>>>>> >>> >>> computers in /home/sofia/Desktop.
>>>>>> >>> >>> 3. I have installed everything using "sudo ./configure", "sudo make" and
>>>>>> >>> >>> "sudo make install".
>>>>>> >>> >>> 4. To make ssh not ask me for a password. I have typed in
>>>>>> >>> >>> sofia_at_10.1.10.208 "ssh-keygen -t dsa", "cd $HOME/.ssh" and "cp
>>>>>> >>> >>> id_dsa.pub authorized_keys". I have copied the directory
>>>>>> >>> >>> "/home/sofia/.ssh" from sofia_at_10.1.10.208 to /home/sofia/.ssh in
>>>>>> >>> >>> sofia_at_10.1.10.240. The ssh command without password works on computer
>>>>>> >>> >>> sofia_at_10.1.10.208 but computer sofia_at_10.1.10.208 ask me for a
>>>>>> >>> >>> passphrase and for the password. Is it normal?
>>>>>> >>> >>> 5. I have created a directory "/home/sofia/programasparalelos" on both
>>>>>> >>> >>> computers and I have given permissions to the directory with "chmod
>>>>>> >>> >>> 777".
>>>>>> >>> >>> 6. I have copied on both computers in "/home/sofia/programasparalelos"
>>>>>> >>> >>> the program "PruebaSumaParalela.c" (I have changed a little bit the
>>>>>> >>> >>> program, I enclose you the new program) and I have compiled using "mpicc
>>>>>> >>> >>> PruebaSumaParalela.c -o PruebaSumaParalela.out".
>>>>>> >>> >>>
>>>>>> >>> >>> 7. Now I run the program on both computersusing the command:
>>>>>> >>> >>>
>>>>>> >>> >>> mpirun -np2 --host 10.1.10.208,10.1.10.240 --prefix /usr/local
>>>>>> >>> >>> ./PruebaSumaParalela.out
>>>>>> >>> >>>
>>>>>> >>> >>> When I run the program I obtain 3 PIDs executing on every computer, 2
>>>>>> >>> >>> of "./PruebaSumaParalela.out" and 1 of "mpirun -np2 --host
>>>>>> >>> >>> 10.1.10.208,10.1.10.240 --prefix /usr/local ./PruebaSumaParalela.out". I
>>>>>> >>> >>> enclose you the results obtained on every computer for every
>>>>>> >>> >>> "./PruebaSumaParalela.out".
>>>>>> >>> >>>
>>>>>> >>> >>> Thank you very much.
>>>>>> >>> >>>
>>>>>> >>> >>> Sofia
>>>>>> >>> >>>
>>>>>>
>>> >>>
>>>
>>>> >> >>
>>>> >> >>
>>>>
>> >>
>> > >
>>