Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_Comm_accept with multiple threads.
From: Ralph Castain (rhc_at_[hidden])
Date: 2013-05-06 08:51:10


We are working towards thread safety, but nowhere near ready yet.

On May 6, 2013, at 3:39 AM, Hugo Daniel Meyer <meyer.hugo_at_[hidden]> wrote:

> Sorry, i've sent the message without finishing it.
>
> Hello to @ll.
>
> I'm not sure if this is the correct list to post this question, but maybe i'm dealing with a bug.
>
> I have develop an event logging mechanism where application processes connect to event loggers (using MPI_Lookup, MPI_open_port, MPI_Comm_Connect, MPI_Comm_Accept, etc) that are part of another MPI application.
>
> Well, i have develop my own vprotocol where once a process receive a message try to establish a connection with an event logger which is a thread that belongs to another mpi application.
>
> The event logger application consists in one mpi process per node with multiple threads waiting for connections of MPI processes from the main application.
>
> I'm suspecting that there is a problem with the critical regions when processes try to connect with the threads of the event logger.
>
> I'm attaching two short examples that i have written in order to show the problem. First, i launch the event-logger application:
>
> mpirun -n 2 --machinefile machinefile2-th --report-uri URIFILE ./test-thread
>
> Then i launch the example as this:
>
> mpirun -n 16 --machinefile machine16 --ompi-server file:URIFILE ./thread_logger_connect
>
> I have obtained this output:
>
> Published: radic_eventlog[1,6], ret=0
> [clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315
> [clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past end of buffer in file dpm_orte.c at line 315
> [clus2:16104] *** An error occurred in MPI_Comm_accept
> [clus2:16104] *** on communicator MPI_COMM_SELF
> [clus2:16104] *** MPI_ERR_UNKNOWN: unknown error
> [clus2:16104] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> --------------------------------------------------------------------------
> mpirun has exited due to process rank 1 with PID 16104 on
> node clus2 exiting improperly. There are two reasons this could occur:
>
> 1. this process did not call "init" before exiting, but others in
> the job did. This can cause a job to hang indefinitely while it waits
> for all processes to call "init". By rule, if one process calls "init",
> then ALL processes must call "init" prior to termination.
>
> 2. this process called "init", but exited without calling "finalize".
> By rule, all processes that call "init" MUST call "finalize" prior to
> exiting or it will be considered an "abnormal termination"
>
> This may have caused other processes in the application to be
> terminated by signals sent by mpirun (as reported here).
>
>
> If i use mutex in order to serialized the access to MPI_Comm_Accept, the behavior is ok, but shoudn't the MPI_comm_accept be thread safe?
>
> Best regards.
>
> Hugo Meyer
>
> P.d.: This occurs with openmpi1.5.1 and also with also with an old version of the trunk (1.7).
>
>
> 2013/5/6 Hugo Daniel Meyer <meyer.hugo_at_[hidden]>
> Hello to @ll.
>
> I'm not sure if this is the correct list to post this question, but maybe i'm dealing with a bug.
>
> I have develop an event logging mechanism where application processes connect to event loggers (using MPI_Lookup, MPI_open_port, MPI_Comm_Connect, MPI_Comm_Accept, etc) that are part of another MPI application.
>
> Well, i have develop my own vprotocol where once a process receive a message try to establish a connection with an event logger which is a thread that belongs to another mpi application.
>
> The event logger application consists in one mpi process per node with multiple threads waiting for connections of MPI processes from the main application.
>
> I'm suspecting that there is a problem with the critical regions when processes try to connect with the threads of the event logger.
>
> I'm attaching two short examples that i have written in order to show the problem. First, i launch the event-logger application:
>
>
>
> If i use mutex in order to serialized the access to MPI_Comm_Accept,
>
>
>
>
> <event_logger.c><main-mpi-app.c>_______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel