Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] unknown af_family recieved errors...
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2012-01-28 07:42:56


(sorry for the delay in this reply; this mail came while I was at the MPI Forum meeting. Travel always makes my disastrous INBOX even worse...)

As a bit of explanation, I can surmise part of what is happening here.

When you run on only one machine, the TCP communications plugin (i.e., the "BTL") is not used -- only the shared memory (sm) BTL is used. Hence, you don't see the warnings. That being said, you could force the TCP BTL to be used instead of the sm BTL by using:

  mpirun --mca btl tcp,self -np 2 my_test_program

When you run across multiple nodes, the TCP BTL is used by default. And therefore these warnings come up.

These warnings refer to IP interfaces that Open MPI found that it doesn't recognize. What is the output of ifconfig on your machine?

On Jan 16, 2012, at 9:11 PM, Hamilton Fischer wrote:

>
> ----- Forwarded Message -----
> From: Hamilton Fischer <fischerhamilton_at_[hidden]>
> To: "user_at_[hidden]" <user_at_[hidden]>
> Sent: Monday, January 16, 2012 9:09 PM
> Subject: unknown af_family recieved errors...
>
> Hi, I'm having odd issues with my "cluster", I guess. This very simple example works on one machine, but it gives a load of errors and hangs afterwards when I try to make it work on parrallelize it across the network.
>
> #include <stdio.h>
> #include "mpi.h"
>
> int
> main(int argc, char *argv[])
> {
> int rank, size;
> MPI_Init(&argc, &argv);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &size);
>
> if (rank == 0)
> {
> int i;
> for(i=1; i < size; ++i)
> {
> int s=1;
> MPI_Send(&s, 1, MPI_INT, i, 1, MPI_COMM_WORLD);
> }
> }
> else
> {
> int r;
> MPI_Recv(&r, 1, MPI_INT, 0, 1, MPI_COMM_WORLD, NULL);
> printf("%d got a %d\n", rank, r);
> }
> MPI_Finalize();
> return 0;
> }
>
> If I do `mpirun -np 3 a.out', where a.out is the executable, I get obvious output:
>
> 1 got a 1
> 2 got a 1
>
> Now, let's say I go on the network. I use `mpirun --hostfile ../combin_host a.out', where my hostfile is simply:
>
> # Hostfile
> angryrock_at_192.168.0.1 slots=4
> # Hostfile
> user_at_192.168.0.102 slots=2
> user_at_192.168.0.103 slots=2
> user_at_192.168.0.104 slots=2
> user_at_192.168.0.105 slots=2
>
> I get this...
>
> [localhost:04756] mca_btl_tcp_proc: unknown af_family received: 1
> [localhost:04756] unknown address family for tcp: 0
> [localhost:04756] mca_btl_tcp_proc: unknown af_family received: 1
> [localhost:04756] unknown address family for tcp: 0
> [localhost:04610] mca_btl_tcp_proc: unknown af_family received: 1
> [localhost:04610] unknown address family for tcp: 0
> [localhost:04048] mca_btl_tcp_proc: unknown af_family received: 1
> ...
> [localhost:04123] unknown address family for tcp: 0
> 1 got a 1
> 2 got a 1
> 3 got a 1
> ^Cmpirun: killing job...
>
> The ellipsis encompases a few lines of the same thing probably for each host. The ending part no doubt is a.out executing on my machine. As is obvious, at the end, I have to kill it because it hangs.
>
> Any help as to what my issue might be? It obviously is an installation issue...
>
> Thanks,
> noobermin
>
>
>
>
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to:
http://www.cisco.com/web/about/doing_business/legal/cri/