We are working towards thread safety, but nowhere near ready yet. 

On May 6, 2013, at 3:39 AM, Hugo Daniel Meyer <meyer.hugo@gmail.com> wrote:

Sorry, i've sent the message without finishing it.

Hello to @ll.

I'm not sure if this is the correct list to post this question, but maybe i'm dealing with a bug.

I have develop an event logging mechanism where application processes connect to event loggers (using MPI_Lookup, MPI_open_port, MPI_Comm_Connect, MPI_Comm_Accept, etc) that are part of another MPI application.

Well, i have develop my own vprotocol where once a process receive a message try to establish a connection with an event logger which is a thread that belongs to another mpi application. 

The event logger application consists in one mpi process per node with multiple threads waiting for connections of MPI processes from the main application. 

I'm suspecting that there is a problem with the critical regions when processes try to connect with the threads of the event logger. 

I'm attaching two short examples that i have written in order to show the problem. First, i launch the event-logger application:

mpirun -n 2 --machinefile machinefile2-th --report-uri URIFILE ./test-thread

Then i launch the example as this:

mpirun -n 16 --machinefile machine16 --ompi-server file:URIFILE ./thread_logger_connect

I have obtained this output:

Published: radic_eventlog[1,6], ret=0
[clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315
[clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315
[clus2:16104] *** An error occurred in MPI_Comm_accept
[clus2:16104] *** on communicator MPI_COMM_SELF
[clus2:16104] *** MPI_ERR_UNKNOWN: unknown error
[clus2:16104] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
--------------------------------------------------------------------------
mpirun has exited due to process rank 1 with PID 16104 on
node clus2 exiting improperly. There are two reasons this could occur:

1. this process did not call "init" before exiting, but others in
the job did. This can cause a job to hang indefinitely while it waits
for all processes to call "init". By rule, if one process calls "init",
then ALL processes must call "init" prior to termination.

2. this process called "init", but exited without calling "finalize".
By rule, all processes that call "init" MUST call "finalize" prior to
exiting or it will be considered an "abnormal termination"

This may have caused other processes in the application to be
terminated by signals sent by mpirun (as reported here).


If i use mutex in order to serialized the access to MPI_Comm_Accept, the behavior is ok, but shoudn't the MPI_comm_accept be thread safe?

Best regards.

Hugo Meyer

P.d.: This occurs with openmpi1.5.1 and also with also with an old version of the trunk (1.7).


2013/5/6 Hugo Daniel Meyer <meyer.hugo@gmail.com>
Hello to @ll.

I'm not sure if this is the correct list to post this question, but maybe i'm dealing with a bug.

I have develop an event logging mechanism where application processes connect to event loggers (using MPI_Lookup, MPI_open_port, MPI_Comm_Connect, MPI_Comm_Accept, etc) that are part of another MPI application.

Well, i have develop my own vprotocol where once a process receive a message try to establish a connection with an event logger which is a thread that belongs to another mpi application. 

The event logger application consists in one mpi process per node with multiple threads waiting for connections of MPI processes from the main application. 

I'm suspecting that there is a problem with the critical regions when processes try to connect with the threads of the event logger. 

I'm attaching two short examples that i have written in order to show the problem. First, i launch the event-logger application:



If i use mutex in order to serialized the access to MPI_Comm_Accept,




<event_logger.c><main-mpi-app.c>_______________________________________________
devel mailing list
devel@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/devel