Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Problems with Mpi Accept - ORTE_ERROR_LOG
From: Rodrigo Oliveira (rsilva.oliveira_at_[hidden])
Date: 2011-07-04 14:34:54


Thanks for the response, Ralph.

I checked my application and it seems not have a race condition in the
accept stage. The server is started and it stores the port name in a file.
When a client is started, it gets this port name and tries to connect. In my
tests the error happens about 1 time in 10 executions.

It still working without confidence.

On Tue, Jun 28, 2011 at 11:10 PM, Ralph Castain <rhc_at_[hidden]> wrote:

> Looking deeper, I believe we may have a race condition in the code. Sadly,
> that error message is actually irrelevant, but causes the code to abort.
>
> It can be triggered by race conditions in the app as well, but ultimately
> is something we need to clean up.
>
>
> On Jun 27, 2011, at 9:29 AM, Rodrigo Oliveira wrote:
>
> Hi there.
>
> I am developing a server/client application using Open MPI 1.5.3. In a point of the server code I open a port to receive connections from a client. After that, I call the function MPI_Comm_accept and on the client side I call MPI_Comm_connect. Sometimes I get an ORTE_ERROR_LOG, as showed bellow.
>
> before accept in host hydra9 port name = 4108386304.0;tcp://150.164.3.204:48761;tcp://192.168.63.9:48761+4108386305.0tcp://150.164.3.204:49211;tcp://192.168.63.9:49211:300
> [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file base/grpcomm_base_allgather.c at line 220
> [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file base/grpcomm_base_modex.c at line 116
> [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file grpcomm_bad_module.c at line 608
> [hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file dpm_orte.c at line 379
> MPI 2 C++ exception throwing is disabled, MPI::mpi_errno has the error code
> after accept in host hydra9 error code = 17
> MPI 2 C++ exception throwing is disabled, MPI::mpi_errno has the error code
>
> The mpi_errno is 17 and I could not find a clear explanation about this error. It occurs sporadically. Sometimes the application works, sometimes does not.
>
>
> Any ideas?
>
> Thanks
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>