Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Question about compilng with fPIC
From: Blosch, Edwin L (edwin.l.blosch_at_[hidden])
Date: 2011-09-21 12:33:57


Follow-up: I misread the coding, so now I think mpi_iprobe is probably not being used for this case. I'll have to pin the blame somewhere else. -fPIC definitely fixes the problem, as I tried removing -mcmodel=medium and it still worked. Our usual communication pattern is mpi_irecv, mpi_isend, mpi_waitall; perhaps there is something unhealthy in the semantics there.

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Blosch, Edwin L
Sent: Wednesday, September 21, 2011 10:44 AM
To: Open MPI Users
Subject: EXTERNAL: [OMPI users] Question about compilng with fPIC

Follow-up to a mislabeled thread: "How could OpenMPI (or MVAPICH) affect floating-point results?"

I have found a solution to my problem, but I would like to understand the underlying issue better.

To rehash: An Intel-compiled executable linked with MVAPICH runs fine; linked with OpenMPI fails. The earliest symptom I could see was some strange difference in numerical values of quantities that should be unaffected by MPI calls. Tim's advice guided me to assume memory corruption. Eugene's advice guided me to explore the detailed differences in compilation.

I observed that the MVAPICH mpif90 wrapper adds -fPIC.

I tried adding -fPIC and -mcmodel=medium to the compilation of the OpenMPI-linked executable. Now it works fine. I haven't tried without -mcmodel=medium, but my guess is -fPIC did the trick.

Does anyone know why compiling with -fPIC has helped? Does it suggest an application problem or an OpenMPI problem?

To note: This is an Infiniband-based cluster. The application does pretty basic MPI-1 operations: send, recv, bcast, reduce, allreduce, gather, gather, isend, irecv, waitall. There is one task that uses iprobe with MPI_ANY_TAG, but this task is only involved in certain cases (including this one). Conversely, cases that do not call iprobe have not yet been observed to crash. I am deducing that this function is the problem.

Thanks,

Ed

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Blosch, Edwin L
Sent: Tuesday, September 20, 2011 11:46 AM
To: Open MPI Users
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

Thank you for this explanation. I will assume that my problem here is some kind of memory corruption.

-----Original Message-----
From: users-bounces_at_[hidden] [mailto:users-bounces_at_[hidden]] On Behalf Of Tim Prince
Sent: Tuesday, September 20, 2011 10:36 AM
To: users_at_[hidden]
Subject: Re: [OMPI users] EXTERNAL: Re: How could OpenMPI (or MVAPICH) affect floating-point results?

On 9/20/2011 10:50 AM, Blosch, Edwin L wrote:

> It appears to be a side effect of linkage that is able to change a compute-only routine's answers.
>
> I have assumed that max/sqrt/tiny/abs might be replaced, but some other kind of corruption may be going on.
>

Those intrinsics have direct instruction set translations which
shouldn't vary from -O1 on up nor with linkage options nor be affected
by MPI or insertion of WRITEs.

-- 
Tim Prince
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users