On Fri, 02 Jun 2006 13:37:07 -0600, Jeff Squyres (jsquyres)
<jsquyres_at_[hidden]> wrote:
> Troy --
>
> Just to make sure I understand the issues:
>
> - 1.1
> - presta com works fine
> - presta allred fails with the MPI_Gather error
> - 1.0.3
> - presta com fails with MPI_Gather error
> - presta allred fails with the MPI_Gather error
>
> And these all *only* fail on the pre-production Linux version you've
> got; they all pass on FC4.
>
> Is that correct?
Quite correct. (well, with caveats -- FC4 has shown some scaling issues
that are in tickets #40 & #41; but Open MPI/FC4 works fine with -np 4)
If I didn't say so already, here's what I would add:
* If I add -mca btl tcp,sm,self (effectively disabling the openib mca),
and allred works fine. If I use -mca btl openib,sm,self, it breaks.
* If I use -mca btl tcp,sm,self with com, the error is the same as with
-mca btl openib,sm,self. (And com works fine in either case with 1.1,
but breaks with 1.0.3)
A bit of additional info: I am able to run linpack (hpl), HPCC, and IMB
on Open MPI 1.1, 1.0.3, and 1.0.2 on this pre-production distro.
All tests were done with two nodes, each having two CPUs per node. (-np 4)
>> -----Original Message-----
>> From: users-bounces_at_[hidden]
>> [mailto:users-bounces_at_[hidden]] On Behalf Of Troy Telford
>> Sent: Friday, June 02, 2006 12:46 PM
>> To: Open MPI Users
>> Subject: Re: [OMPI users] openib /compiler issue?
>>
>> On Thu, 01 Jun 2006 17:49:53 -0600, Troy Telford
>> <ttelford_at_[hidden]> wrote:
>>
>> > the 'com' test ends with:
>> > [n1:04941] *** An error occurred in MPI_Gather
>> > [n1:04941] *** on communicator MPI_COMM_WORLD
>> > [n1:04941] *** MPI_ERR_ARG: invalid argument of some other kind
>> > [n1:04941] *** MPI_ERRORS_ARE_FATAL (goodbye)
>> >
>> > And yes, I'm going to try out the dev snapshots of 1.0.3
>> and 1.1... I'm
>> > just not there yet...
>>
>> I've now tried it on 1.0.3 and 1.1 nightly builds:
>> ***presta 'com'***
>> 1.1 works fine (hooray!!!)
>>
>> 1.0.3 doesn't work fine (booo!!!!)
>> [n1:28313] *** An error occurred in MPI_Gather
>> [n1:28313] *** on communicator MPI_COMM_WORLD
>> [n1:28313] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28313] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>> ***presta 'allred' (allreduce)***
>> 1.0.3 has the following error:
>> mpirun -np 4 -machinefile machines -prefix $MPIHOME allred 10 10 10
>> [n1:28366] *** An error occurred in MPI_Gather
>> [n1:28366] *** on communicator MPI_COMM_WORLD
>> [n1:28366] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28366] *** MPI_ERRORS_ARE_FATAL (goodbye)
>> [n1:28367] *** An error occurred in MPI_Gather
>> [n1:28367] *** on communicator MPI_COMM_WORLD
>> [n1:28367] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28367] *** MPI_ERRORS_ARE_FATAL (goodbye)
>>
>> 1.1 has the following error:
>> mpirun -np 4 -machinefile machines -prefix $MPIHOME allred 10 10 10
>> [n1:28536] *** An error occurred in MPI_Gather
>> [n1:28537] *** An error occurred in MPI_Gather
>> [n1:28537] *** on communicator MPI_COMM_WORLD
>> [n1:28537] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28537] *** MPI_ERRORS_ARE_FATAL (goodbye)
>> [n1:28536] *** on communicator MPI_COMM_WORLD
>> [n1:28536] *** MPI_ERR_ARG: invalid argument of some other kind
>> [n1:28536] *** MPI_ERRORS_ARE_FATAL (goodbye)
--
Troy Telford
|