Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_Comm_accept with multiple threads.
From: Hugo Daniel Meyer (meyer.hugo_at_[hidden])
Date: 2013-05-06 06:39:24


Sorry, i've sent the message without finishing it.

Hello to @ll.

I'm not sure if this is the correct list to post this question, but maybe
i'm dealing with a bug.

I have develop an event logging mechanism where application processes
connect to event loggers (using MPI_Lookup, MPI_open_port,
MPI_Comm_Connect, MPI_Comm_Accept, etc) that are part of another MPI
application.

Well, i have develop my own vprotocol where once a process receive a
message try to establish a connection with an event logger which is a
thread that belongs to another mpi application.

The event logger application consists in one mpi process per node with
multiple threads waiting for connections of MPI processes from the main
application.

I'm suspecting that there is a problem with the critical regions when
processes try to connect with the threads of the event logger.

I'm attaching two short examples that i have written in order to show the
problem. First, i launch the event-logger application:

mpirun -n 2 --machinefile machinefile2-th --report-uri URIFILE ./test-thread

Then i launch the example as this:

mpirun -n 16 --machinefile machine16 --ompi-server file:URIFILE
./thread_logger_connect

I have obtained this output:

*Published: radic_eventlog[1,6], ret=0*
*[clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file dpm_orte.c at line 315*
*[clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past
end of buffer in file dpm_orte.c at line 315*
*[clus2:16104] *** An error occurred in MPI_Comm_accept*
*[clus2:16104] *** on communicator MPI_COMM_SELF*
*[clus2:16104] *** MPI_ERR_UNKNOWN: unknown error*
*[clus2:16104] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)*
*--------------------------------------------------------------------------*
*mpirun has exited due to process rank 1 with PID 16104 on*
*node clus2 exiting improperly. There are two reasons this could occur:*
*
*
*1. this process did not call "init" before exiting, but others in*
*the job did. This can cause a job to hang indefinitely while it waits*
*for all processes to call "init". By rule, if one process calls "init",*
*then ALL processes must call "init" prior to termination.*
*
*
*2. this process called "init", but exited without calling "finalize".*
*By rule, all processes that call "init" MUST call "finalize" prior to*
*exiting or it will be considered an "abnormal termination"*
*
*
*This may have caused other processes in the application to be*
*terminated by signals sent by mpirun (as reported here).*

If i use mutex in order to serialized the access to MPI_Comm_Accept, the
behavior is ok, but shoudn't the MPI_comm_accept be thread safe?

Best regards.

Hugo Meyer

P.d.: This occurs with openmpi1.5.1 and also with also with an old version
of the trunk (1.7).

2013/5/6 Hugo Daniel Meyer <meyer.hugo_at_[hidden]>

> Hello to @ll.
>
> I'm not sure if this is the correct list to post this question, but maybe
> i'm dealing with a bug.
>
> I have develop an event logging mechanism where application processes
> connect to event loggers (using MPI_Lookup, MPI_open_port,
> MPI_Comm_Connect, MPI_Comm_Accept, etc) that are part of another MPI
> application.
>
> Well, i have develop my own vprotocol where once a process receive a
> message try to establish a connection with an event logger which is a
> thread that belongs to another mpi application.
>
> The event logger application consists in one mpi process per node with
> multiple threads waiting for connections of MPI processes from the main
> application.
>
> I'm suspecting that there is a problem with the critical regions when
> processes try to connect with the threads of the event logger.
>
> I'm attaching two short examples that i have written in order to show the
> problem. First, i launch the event-logger application:
>
>
>
> If i use mutex in order to serialized the access to MPI_Comm_Accept,
>
>
>
>