Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_ERR_TRUNCATE with MPI_Revc without Infinipath
From: Tom Riddle (rarebitusa_at_[hidden])
Date: 2008-08-18 18:36:41


Thanks George, I will update and try the latest repo. However I'd like to describe our usage case a bit more to see if there is something that may not be proper in our development approach. Forgive me if this is repetitious...

We have configured and built OpenMPI originally on a machine with Infinipath / PSM installed. Since we desire a flexible software development environment across a number of machines (most of them are without the Infinipath hw), we run these same OpenMPI bins in a shared user area. That means other developer's machines, which do not have Infinipath / PSM installed locally, will simulate the multiple machine communication by running in shared memory mode.  But again these OpenMPI bins have been configured with Infinipath support.

So we see the error when running in shared memory mode on machines that don't have Infinipath, so is there a way at runtime that you can force shared memory mode exclusively? We are wondering if designating MPI_ANY_SOURCE may then direct OpenMPI to look at every possible communications mode and that probably would cause conflicts if there wasn't psm libs present.

Hope this makes sense, Tom

Things were working without issue until we went to the wildcard MPI_ANY_SOURCE on our receives but only on machines without . I guess I wonder what is the mechanism when in a wildcard mode.

--- On Sun, 8/17/08, George Bosilca <bosilca_at_[hidden]> wrote:
From: George Bosilca <bosilca_at_[hidden]>
Subject: Re: [OMPI users] MPI_ERR_TRUNCATE with MPI_Revc without Infinipath
To: rarebitusa_at_[hidden], "Open MPI Users" <users_at_[hidden]>
Date: Sunday, August 17, 2008, 2:42 PM

Tom,

I did the same modification as you on the osu_latency and the
resulting application run to completion. I don't get any TRUNCATE
error messages. I'm using the latest version of Open MPI (1.4a1r19313).

There was a bug that might be related to your problem but our commit
log shows it was fixed by commit 18830 on July 9.

   george.

On Aug 13, 2008, at 5:49 PM, Tom Riddle wrote:

> Hi,
>
> A bit more info wrt the question below. I have run other releases of
> OpenMPI and they seem to be fine. The reason I need to run the
> latest is because it supports valgrind fully.
>
> openmpi-1.2.4
> openmpi-1.3ar18303
>
> TIA, Tom
>
> --- On Tue, 8/12/08, Tom Riddle <rarebitusa_at_[hidden]> wrote:
>
> Hi,
>
> I am getting a curious error on a simple communications test. I have
> altered the std mvapich osu_latency test to accept receives from any
> source and I get the following error
>
> [d013.sc.net:15455] *** An error occurred in MPI_Recv
> [d013.sc.net:15455] *** on communicator MPI_COMM_WORLD
> [d013.sc.net:15455] *** MPI_ERR_TRUNCATE: message truncated
> [d013.sc.net:15455] *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> the code change was...
>
> MPI_Recv(r_buf, size, MPI_CHAR, MPI_ANY_SOURCE, 1, MPI_COMM_WORLD,
> &reqstat);
>
> the command line I run was
>
> > mpirun -np 2 ./osu_latency
>
> Now I run this on 2 types of host machine configurations. One that
> has Infinipath HCAs installed and another that doesn't. I run both
> of these in shared memory mode ie: dual processes on the same node.
> I have verified that when I am on the host with Infinipath I am
> actually running the OpenMPI mpirun, not the mpi that comes with the
> HCA.
>
> I have built OpenMPI with psm support from a fairly recent svn pull
> and run the same bins on both host machines... The config was as
> follows:
> > $ ../configure --prefix=/opt/wkspace/openmpi-1.3 CC=gcc CXX=g++
> > --disable-mpi-f77 --enable-debug --enable-memchecker
> > --with-psm=/usr/include --with-valgrind=/opt/wkspace/valgrind-3.3.0/
> > mpirun --version
> mpirun (Open MPI) 1.4a1r18908
>
> The error presents itself only on the host that does not have
> Infinipath installed. I have combed through the mca args to see if
> there is a setting I am missing but I cannot see anything obvious.
>
> Any input would be appreciated. Thanks. Tom
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users