Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_ERR_TRUNCATE on MPI_Testsome
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-09-30 10:22:18


On Sep 26, 2008, at 1:45 PM, Robert Kubrick wrote:

> I'm not sure how should I interpret this message:
>
> [local:17344] *** An error occurred in MPI_Testsome
> [local:17344] *** on communicator MPI COMMUNICATOR 5 CREATE FROM 0
> [local:17344] *** MPI_ERR_TRUNCATE: message truncated
> [local:17344] *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpiexec noticed that job rank 0 with PID 17338 on node local exited
> on signal 15 (Terminated).
> 3 additional processes aborted (not shown)
>
> I am assuming that the error was triggered because one of the
> buffers I set in the MPI_Recv_init() calls can not contain the
> incoming message.

Sorry for the delay in replying.

This is likely the cause -- MPI defines this as a run-time error.

> However, I don't understand why job rank 0 terminates first. The
> only process that contains a call to MPI_Testsome has actually rank
> 3, and it's receiving messages from rank 0.

The aborting process sends a message to kill all the other processes
in the job before it dies itself (i.e., to obey the semantics of an
MPI abort). Hence, it's likely that there's a race going on here and
process 0 dies before 3, so mpirun reports that first.

> Also I think it would be a good idea to print the message tag in the
> error log.

Mm. Good point. I'll file this as a feature request -- we have
centralized error reporting for the abort sequence, so it'll take a
little noodling to get that in there. Probably won't happen for v1.3[.
0], but that's good real-world feedback to have. Thanks!

-- 
Jeff Squyres
Cisco Systems