Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] connect() fails - inhomogeneous cluster
From: borno_borno_at_[hidden]
Date: 2014-06-17 12:56:38


Well, maybe but when I use the more verbose Output --mca btl_base_verbose 30 --mca oob_base_verbose 30, I can see that the right ip was found for each hostname, but the cnnection fails with:
 
[Ries][[35743,1],2][/openmpi/1.6.5/openmpi-1.6.5/ompi/mca/btl/tcp/btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to <ip of Euler> failed: No route to host (113) 
 
Gesendet: Dienstag, 17. Juni 2014 um 15:44 Uhr
Von: Reuti <reuti@Staff.Uni-Marburg.DE>
An: "Open MPI Users" <users@open-mpi.org>
Betreff: Re: [OMPI users] connect() fails - inhomogeneous cluster
Am 17.06.2014 um 14:53 schrieb borno_borno@gmx.de:

> I should have written that...
>
> mpirun -np n --hostfile host.cfg
>
> mpi@Ries slots=n_1 max_slots=n_1
> mpi@Euler slots=n_2 max_slots=n_2

Although it's defined to use characters in a case insensitive manner in hostnames, my experience is that not all calls are mapping it in a proper way. To avoid any confusion because of this, it's best to have them all in lowercase. I don't know whether this is related to your observation.

-- Reuti


> It is arranged that the sum over the n_i is equal to n.
>
> Kurt
> Gesendet: Dienstag, 17. Juni 2014 um 14:25 Uhr
> Von: Reuti <reuti@staff.uni-marburg.de>
> An: "Open MPI Users" <users@open-mpi.org>
> Betreff: Re: [OMPI users] connect() fails - inhomogeneous cluster
> Hi,
>
> Am 17.06.2014 um 13:00 schrieb Borno Knuttelski:
>
> > this is the first time I contact this list. I'm using OpenMPI 1.6.5 on an inhomogeneous cluster with 2 machines. Short: With few processes everything works fine but with some more my application crashes. (Yes, I can guarantee that in every scenario I start processes on both machines). I posted the problem already with all details on stackoverflow (http://stackoverflow.com/questions/24164825/mpi-connect-fails-inhomogeneous-cluster). Please have a look at it. What exactly is the problem and how can I fix it?
>
> How do you start the program - just with `mpiexec` and a proper hostfile and number of slots?
>
> -- Reuti
>
>
> > Every help and guess is appreciated and will be tested...
> > Thanks in advance,
> >
> > Kurt
> > _______________________________________________
> > users mailing list
> > users@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24662.php
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24663.php
> _______________________________________________
> users mailing list
> users@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24664.php

_______________________________________________
users mailing list
users@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24666.php