Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] MPI_Comm_accept with multiple threads.
From: Hugo Daniel Meyer (meyer.hugo_at_[hidden])
Date: 2013-05-06 09:03:19


Thanks for the reply Ralph.

I will look for a way to deal with this situation for the moment.

Regards.

Hugo

2013/5/6 Ralph Castain <rhc_at_[hidden]>

> We are working towards thread safety, but nowhere near ready yet.
>
> On May 6, 2013, at 3:39 AM, Hugo Daniel Meyer <meyer.hugo_at_[hidden]>
> wrote:
>
> Sorry, i've sent the message without finishing it.
>
> Hello to @ll.
>
> I'm not sure if this is the correct list to post this question, but maybe
> i'm dealing with a bug.
>
> I have develop an event logging mechanism where application processes
> connect to event loggers (using MPI_Lookup, MPI_open_port,
> MPI_Comm_Connect, MPI_Comm_Accept, etc) that are part of another MPI
> application.
>
> Well, i have develop my own vprotocol where once a process receive a
> message try to establish a connection with an event logger which is a
> thread that belongs to another mpi application.
>
> The event logger application consists in one mpi process per node with
> multiple threads waiting for connections of MPI processes from the main
> application.
>
> I'm suspecting that there is a problem with the critical regions when
> processes try to connect with the threads of the event logger.
>
> I'm attaching two short examples that i have written in order to show the
> problem. First, i launch the event-logger application:
>
> mpirun -n 2 --machinefile machinefile2-th --report-uri URIFILE
> ./test-thread
>
> Then i launch the example as this:
>
> mpirun -n 16 --machinefile machine16 --ompi-server file:URIFILE
> ./thread_logger_connect
>
> I have obtained this output:
>
> *Published: radic_eventlog[1,6], ret=0*
> *[clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past
> end of buffer in file dpm_orte.c at line 315*
> *[clus2:16104] [[39125,1],1] ORTE_ERROR_LOG: Data unpack would read past
> end of buffer in file dpm_orte.c at line 315*
> *[clus2:16104] *** An error occurred in MPI_Comm_accept*
> *[clus2:16104] *** on communicator MPI_COMM_SELF*
> *[clus2:16104] *** MPI_ERR_UNKNOWN: unknown error*
> *[clus2:16104] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)*
> *
> --------------------------------------------------------------------------
> *
> *mpirun has exited due to process rank 1 with PID 16104 on*
> *node clus2 exiting improperly. There are two reasons this could occur:*
> *
> *
> *1. this process did not call "init" before exiting, but others in*
> *the job did. This can cause a job to hang indefinitely while it waits*
> *for all processes to call "init". By rule, if one process calls "init",*
> *then ALL processes must call "init" prior to termination.*
> *
> *
> *2. this process called "init", but exited without calling "finalize".*
> *By rule, all processes that call "init" MUST call "finalize" prior to*
> *exiting or it will be considered an "abnormal termination"*
> *
> *
> *This may have caused other processes in the application to be*
> *terminated by signals sent by mpirun (as reported here).*
>
>
> If i use mutex in order to serialized the access to MPI_Comm_Accept, the
> behavior is ok, but shoudn't the MPI_comm_accept be thread safe?
>
> Best regards.
>
> Hugo Meyer
>
> P.d.: This occurs with openmpi1.5.1 and also with also with an old version
> of the trunk (1.7).
>
>
> 2013/5/6 Hugo Daniel Meyer <meyer.hugo_at_[hidden]>
>
>> Hello to @ll.
>>
>> I'm not sure if this is the correct list to post this question, but maybe
>> i'm dealing with a bug.
>>
>> I have develop an event logging mechanism where application processes
>> connect to event loggers (using MPI_Lookup, MPI_open_port,
>> MPI_Comm_Connect, MPI_Comm_Accept, etc) that are part of another MPI
>> application.
>>
>> Well, i have develop my own vprotocol where once a process receive a
>> message try to establish a connection with an event logger which is a
>> thread that belongs to another mpi application.
>>
>> The event logger application consists in one mpi process per node with
>> multiple threads waiting for connections of MPI processes from the main
>> application.
>>
>> I'm suspecting that there is a problem with the critical regions when
>> processes try to connect with the threads of the event logger.
>>
>> I'm attaching two short examples that i have written in order to show the
>> problem. First, i launch the event-logger application:
>>
>>
>>
>> If i use mutex in order to serialized the access to MPI_Comm_Accept,
>>
>>
>>
>>
> <event_logger.c><main-mpi-app.c>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>