Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-10-09 16:06:56


Our Mac expert (Brian Barrett) just recently left the project for
greener pastures. He's the guy who typically answered Mac/XGrid
questions -- I'm afraid that I have no idea how any of that XGrid
stuff works... :-(

Is there anyone else around who can answer XGrid questions? Warner?

On Oct 4, 2007, at 11:29 PM, Jinhui Qin wrote:

> Hi,
> I have set up an Xgrid including one laptop and 7 Mac mini nodes
> (all are duo core machines). I have also installed openMPI (openmpi
> 1.2.1) on all nodes. The laptop node (hostname: sib) has three
> roles: agent, controller and client, all the other nodes are only
> agents.
>
> When I started "mpirun -n 8 /bin/hostname" on my laptop node
> terminal, it shows all 8 nodes' hostnames correctly. It seems that
> xgrid works fine.
>
> Then I wanted to run a simple mpi code. The source code "Hello.c"
> has been compiled (use mpicc) and the excuatalbe "Hello" has been
> copied to each node under same path(I have also tested they all run
> properly on each of the local nodes.). when I asked for 1 or 2
> processors to run the job, xgrid worked fine, but when I asked for
> 3 or more processors, all jobs were failed. Following are the
> commands and the results/messages that I got.
>
> Can anybody help me out?
>
> *************************************
> running "hostname" and the results, they looks good.
> *************************************
> sib:sharcnet$ mpirun -n 8 /bin/hostname
> node2
> node8
> node4
> node5
> node3
> node7
> sib
> node6
>
> *************************************
> the simple mpi program Hello.c source code
> *************************************
> #include
> #include
>
> int main(int argc, char *argv[]) {
> int numprocs, rank, namelen;
> char processor_name[MPI_MAX_PROCESSOR_NAME];
>
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Get_processor_name(processor_name, &namelen);
>
> printf("Process %d on %s out of %d\n", rank, processor_name,
> numprocs);
>
> MPI_Finalize();
> }
>
> *************************************
> ask for 1 and 2 processors to run "Hello"
> and the results are all good
> *************************************
> sib:sharcnet$ mpirun -n 1 ~/openMPI_sutuff/Hello
> Process 0 on sib out of 1
>
> sib:sharcnet$ mpiurun -n 2 ~/openMPI_stuff/Hello
> Process 1 on node2 out of 2
> Process 0 on sib out of 2
>
> *************************************
> Here is the problem when
> ask for 3 processors to run the job,
> following are all the messages I got
> *************************************
>
> sib:sharcnet$ mpirun -n 3 ~/openMPI_stuff/Hello
>
> Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
>
> Process 0.1.2 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
>
> It looks like MPI_INIT failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
>
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> It looks like MPI_INIT failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
>
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpirun noticed that job rank 0 with PID 817 on node xgrid-node-0
> exited on signal 15 (Terminated).
>
> sib:sharcnet$
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems