Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] connect() fails - inhomogeneous cluster
From: borno_borno_at_[hidden]
Date: 2014-06-17 12:56:38


Well, maybe but when I use the more verbose Output --mca btl_base_verbose 30 --mca oob_base_verbose 30, I can see that the right ip was found for each hostname, but the cnnection fails with:
 
[Ries][[35743,1],2][/openmpi/1.6.5/openmpi-1.6.5/ompi/mca/btl/tcp/btl_tcp_endpoint.c:655:mca_btl_tcp_endpoint_complete_connect] connect() to <ip of Euler> failed: No route to host (113) 
 
Gesendet: Dienstag, 17. Juni 2014 um 15:44 Uhr
Von: Reuti <reuti@Staff.Uni-Marburg.DE>
An: "Open MPI Users" <users@open-mpi.org>
Betreff: Re: [OMPI users] connect() fails - inhomogeneous cluster
Am 17.06.2014 um 14:53 schrieb borno_borno@gmx.de:

> I should have written that...
>
> mpirun -np n --hostfile host.cfg
>
> mpi@Ries slots=n_1 max_slots=n_1
> mpi@Euler slots=n_2 max_slots=n_2

Although it's defined to use characters in a case insensitive manner in hostnames, my experience is that not all calls are mapping it in a proper way. To avoid any confusion because of this, it's best to have them all in lowercase. I don't know whether this is related to your observation.

-- Reuti


> It is arranged that the sum over the n_i is equal to n.
>
> Kurt
> Gesendet: Dienstag, 17. Juni 2014 um 14:25 Uhr
> Von: Reuti <reuti@staff.uni-marburg.de>
> An: "Open MPI Users" <users@open-mpi.org>
> Betreff: Re: [OMPI users] connect() fails - inhomogeneous cluster
> Hi,
>
> Am 17.06.2014 um 13:00 schrieb Borno Knuttelski:
>
> > this is the first time I contact this list. I'm using OpenMPI 1.6.5 on an inhomogeneous cluster with 2 machines. Short: With few processes everything works fine but with some more my application crashes. (Yes, I can guarantee that in every scenario I start processes on both machines). I posted the problem already with all details on stackoverflow (http://stackoverflow.com/questions/24164825/mpi-connect-fails-inhomogeneous-cluster). Please have a look at it. What exactly is the problem and how can I fix it?
>
> How do you start the program - just with `mpiexec` and a proper hostfile and number of slots?
>
> -- Reuti
>
>
> > Every help and guess is appreciated and will be tested...
> > Thanks in advance,
> >
> > Kurt
> > _______________________________________________
> > users mailing list
> > users@open-mpi.org
> > Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> > Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24662.php
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24663.php
> _______________________________________________
> users mailing list
> users@open-mpi.org
> Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
> Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24664.php

_______________________________________________
users mailing list
users@open-mpi.org
Subscription: http://www.open-mpi.org/mailman/listinfo.cgi/users
Link to this post: http://www.open-mpi.org/community/lists/users/2014/06/24666.php