If you download a 1.5 tarball tagged at r24853 or above, the problem should be fixed.


On Jul 4, 2011, at 12:34 PM, Rodrigo Oliveira wrote:


Thanks for the response, Ralph.

I checked my application and it seems not have a race condition in the accept stage. The server is started and it stores the port name in a file. When a client is started, it gets this port name and tries to connect. In my tests the error happens about 1 time in 10 executions.

It still working without confidence.

On Tue, Jun 28, 2011 at 11:10 PM, Ralph Castain <rhc@open-mpi.org> wrote:
Looking deeper, I believe we may have a race condition in the code. Sadly, that error message is actually irrelevant, but causes the code to abort.

It can be triggered by race conditions in the app as well, but ultimately is something we need to clean up.


On Jun 27, 2011, at 9:29 AM, Rodrigo Oliveira wrote:

Hi there.
I am developing a server/client application using Open MPI 1.5.3. In a point of the server code I open a port to receive connections from a client. After that, I call the function MPI_Comm_accept and on the client side I call MPI_Comm_connect. Sometimes I get an ORTE_ERROR_LOG, as showed bellow.
before accept in host hydra9 port name = 4108386304.0;tcp://150.164.3.204:48761;tcp://192.168.63.9:48761+4108386305.0tcp://150.164.3.204:49211;tcp://192.168.63.9:49211:300                                             
[hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file base/grpcomm_base_allgather.c at line 220              
[hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file base/grpcomm_base_modex.c at line 116                  
[hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file grpcomm_bad_module.c at line 608                       
[hydra9:11199] [[62689,1],0] ORTE_ERROR_LOG: Not found in file dpm_orte.c at line 379                                 
MPI 2 C++ exception throwing is disabled, MPI::mpi_errno has the error code                                           
after accept in host hydra9 error code = 17                                                                           
MPI 2 C++ exception throwing is disabled, MPI::mpi_errno has the error code
The mpi_errno is 17 and I could not find a clear explanation about this error. It occurs sporadically. Sometimes the application works, sometimes does not.

Any ideas?

Thanks
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users


_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users