Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Improving error messages
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2008-06-20 20:13:04


I agree so much that I just recently filed a bug about this same issue:

     https://svn.open-mpi.org/trac/ompi/ticket/1338

Thanks for the feedback, though -- this turns it from a hypothetical
issue into a "it has happened to at least one user" issue...

On Jun 20, 2008, at 8:00 PM, Scott Atchley wrote:

> Hi all,
>
> We had a customer using 1.2.6 with MX. We were running his jobs,
> some of which used the MX BTL and some used the MX MTL.
>
> He added a few more nodes to the cluster and installed the same
> OMPI. When we tried to run jobs that spanned the new nodes, the jobs
> failed. I did not keep the error messages, but it seems to be a
> standard message about a component such as "self" not found.
>
> The problem in fact was that he installed OMPI, but for some reason
> neither the MX BTL nor the MX MTL were installed. Thus, the failure.
> I do not believe the error message for the BTL runs ever
> specifically mentioned a missing MX component even though we were
> setting "--mca btl self,sm,mx" (we did not specify MX when using the
> MTL, we only used "--mca pml cm".
>
> It would be helpful in the case where a OMPI cannot run _and_ a
> module is specifically requested but not available to be mentioned
> in the error message.
>
> Thanks,
>
> Scott
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems