Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Troy Telford (ttelford_at_[hidden])
Date: 2006-06-02 16:12:42


On Fri, 02 Jun 2006 13:37:07 -0600, Jeff Squyres (jsquyres)
<jsquyres_at_[hidden]> wrote:

> Troy --
>
> Just to make sure I understand the issues:
>
> - 1.1
> - presta com works fine
> - presta allred fails with the MPI_Gather error
> - 1.0.3
> - presta com fails with MPI_Gather error
> - presta allred fails with the MPI_Gather error
>
> And these all *only* fail on the pre-production Linux version you've
> got; they all pass on FC4.
>
> Is that correct?

Quite correct. (well, with caveats -- FC4 has shown some scaling issues
that are in tickets #40 & #41; but Open MPI/FC4 works fine with -np 4)

If I didn't say so already, here's what I would add:
   * If I add -mca btl tcp,sm,self (effectively disabling the openib mca),
and allred works fine. If I use -mca btl openib,sm,self, it breaks.
   * If I use -mca btl tcp,sm,self with com, the error is the same as with
-mca btl openib,sm,self. (And com works fine in either case with 1.1,
but breaks with 1.0.3)

A bit of additional info: I am able to run linpack (hpl), HPCC, and IMB
on Open MPI 1.1, 1.0.3, and 1.0.2 on this pre-production distro.

All tests were done with two nodes, each having two CPUs per node. (-np 4)
>> -----Original Message-----
>> From: users-bounces_at_[hidden]
>> [mailto:users-bounces_at_[hidden]] On Behalf Of Troy Telford
>> Sent: Friday, June 02, 2006 12:46 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] openib /compiler issue?
>>
>> On Thu, 01 Jun 2006 17:49:53 -0600, Troy Telford
>> <ttelford_at_[hidden]> wrote:
>>
>> > the 'com' test ends with:
>> > [n1:04941] *** An error occurred in MPI_Gather
>> > [n1:04941] *** on communicator MPI_COMM_WORLD
>> > [n1:04941] *** MPI_ERR_ARG: invalid argument of some other kind
>> > [n1:04941] *** MPI_ERRORS_ARE_FATAL (goodbye)
>> >
>> > And yes, I'm going to try out the dev snapshots of 1.0.3
>> and 1.1... I'm
>> > just not there yet...
>>
>> I've now tried it on 1.0.3 and 1.1 nightly builds:
>> ***presta 'com'***
>> 1.1 works fine (hooray!!!)
>>
>> 1.0.3 doesn't work fine (booo!!!!)
>> [n1:28313] *** An error occurred in MPI_Gather
>> [n1:28313] *** on communicator MPI_COMM_WORLD
>> [n1:28313] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28313] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>> ***presta 'allred' (allreduce)***
>> 1.0.3 has the following error:
>> mpirun -np 4 -machinefile machines -prefix $MPIHOME allred 10 10 10
>> [n1:28366] *** An error occurred in MPI_Gather
>> [n1:28366] *** on communicator MPI_COMM_WORLD
>> [n1:28366] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28366] *** MPI_ERRORS_ARE_FATAL (goodbye)
>> [n1:28367] *** An error occurred in MPI_Gather
>> [n1:28367] *** on communicator MPI_COMM_WORLD
>> [n1:28367] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28367] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>> 1.1 has the following error:
>> mpirun -np 4 -machinefile machines -prefix $MPIHOME allred 10 10 10
>> [n1:28536] *** An error occurred in MPI_Gather
>> [n1:28537] *** An error occurred in MPI_Gather
>> [n1:28537] *** on communicator MPI_COMM_WORLD
>> [n1:28537] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28537] *** MPI_ERRORS_ARE_FATAL (goodbye)
>> [n1:28536] *** on communicator MPI_COMM_WORLD
>> [n1:28536] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28536] *** MPI_ERRORS_ARE_FATAL (goodbye)

-- 
Troy Telford