I agree so much that I just recently filed a bug about this same issue:
Thanks for the feedback, though -- this turns it from a hypothetical
issue into a "it has happened to at least one user" issue...
On Jun 20, 2008, at 8:00 PM, Scott Atchley wrote:
> Hi all,
> We had a customer using 1.2.6 with MX. We were running his jobs,
> some of which used the MX BTL and some used the MX MTL.
> He added a few more nodes to the cluster and installed the same
> OMPI. When we tried to run jobs that spanned the new nodes, the jobs
> failed. I did not keep the error messages, but it seems to be a
> standard message about a component such as "self" not found.
> The problem in fact was that he installed OMPI, but for some reason
> neither the MX BTL nor the MX MTL were installed. Thus, the failure.
> I do not believe the error message for the BTL runs ever
> specifically mentioned a missing MX component even though we were
> setting "--mca btl self,sm,mx" (we did not specify MX when using the
> MTL, we only used "--mca pml cm".
> It would be helpful in the case where a OMPI cannot run _and_ a
> module is specifically requested but not available to be mentioned
> in the error message.
> users mailing list