Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-10-09 16:06:56


Our Mac expert (Brian Barrett) just recently left the project for
greener pastures. He's the guy who typically answered Mac/XGrid
questions -- I'm afraid that I have no idea how any of that XGrid
stuff works... :-(

Is there anyone else around who can answer XGrid questions? Warner?

On Oct 4, 2007, at 11:29 PM, Jinhui Qin wrote:

> Hi,
> I have set up an Xgrid including one laptop and 7 Mac mini nodes
> (all are duo core machines). I have also installed openMPI (openmpi
> 1.2.1) on all nodes. The laptop node (hostname: sib) has three
> roles: agent, controller and client, all the other nodes are only
> agents.
>
> When I started "mpirun -n 8 /bin/hostname" on my laptop node
> terminal, it shows all 8 nodes' hostnames correctly. It seems that
> xgrid works fine.
>
> Then I wanted to run a simple mpi code. The source code "Hello.c"
> has been compiled (use mpicc) and the excuatalbe "Hello" has been
> copied to each node under same path(I have also tested they all run
> properly on each of the local nodes.). when I asked for 1 or 2
> processors to run the job, xgrid worked fine, but when I asked for
> 3 or more processors, all jobs were failed. Following are the
> commands and the results/messages that I got.
>
> Can anybody help me out?
>
> *************************************
> running "hostname" and the results, they looks good.
> *************************************
> sib:sharcnet$ mpirun -n 8 /bin/hostname
> node2
> node8
> node4
> node5
> node3
> node7
> sib
> node6
>
> *************************************
> the simple mpi program Hello.c source code
> *************************************
> #include
> #include
>
> int main(int argc, char *argv[]) {
> int numprocs, rank, namelen;
> char processor_name[MPI_MAX_PROCESSOR_NAME];
>
> MPI_Init(&argc, &argv);
> MPI_Comm_size(MPI_COMM_WORLD, &numprocs);
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Get_processor_name(processor_name, &namelen);
>
> printf("Process %d on %s out of %d\n", rank, processor_name,
> numprocs);
>
> MPI_Finalize();
> }
>
> *************************************
> ask for 1 and 2 processors to run "Hello"
> and the results are all good
> *************************************
> sib:sharcnet$ mpirun -n 1 ~/openMPI_sutuff/Hello
> Process 0 on sib out of 1
>
> sib:sharcnet$ mpiurun -n 2 ~/openMPI_stuff/Hello
> Process 1 on node2 out of 2
> Process 0 on sib out of 2
>
> *************************************
> Here is the problem when
> ask for 3 processors to run the job,
> following are all the messages I got
> *************************************
>
> sib:sharcnet$ mpirun -n 3 ~/openMPI_stuff/Hello
>
> Process 0.1.1 is unable to reach 0.1.2 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
>
> Process 0.1.2 is unable to reach 0.1.1 for MPI communication.
> If you specified the use of a BTL component, you may have
> forgotten a component (such as "self") in the list of
> usable components.
>
> It looks like MPI_INIT failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
>
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
>
> It looks like MPI_INIT failed for some reason; your parallel
> process is
> likely to abort. There are many reasons that a parallel process can
> fail during MPI_INIT; some of which are due to configuration or
> environment
> problems. This failure appears to be an internal failure; here's some
> additional information (which may only be relevant to an Open MPI
> developer):
>
> PML add procs failed
> --> Returned "Unreachable" (-12) instead of "Success" (0)
>
> *** An error occurred in MPI_Init
> *** before MPI was initialized
> *** MPI_ERRORS_ARE_FATAL (goodbye)
> mpirun noticed that job rank 0 with PID 817 on node xgrid-node-0
> exited on signal 15 (Terminated).
>
> sib:sharcnet$
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems