Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] How to set a process on a host but not bound to any core
From: Gan, Qi PW (Qi.Gan2_at_[hidden])
Date: 2014-04-10 10:25:09


We have OMPI 1.4.0, 1.4.5 and 1.6.5 installed on our system.

>>What version of OMPI are you using? We have a "seq" mapper that does what you want, but the precise cmd line option for directing to use it depends a bit on the version.

>>On Apr 9, 2014, at 9:22 AM, Gan, Qi PW <Qi.Gan2_at_[hidden]> wrote:

> Hi,
>
> I have a problem when setting the processes of a parallel job with specified order. Suppose a job with 6 processes (rank0 to rank5) needs to run on 3 hosts (A, B, C) with following order:
> Rank0 -- A
> Rank1 -- B
> Rank2 -- B
> Rank3 -- C
> Rank4 -- A
> Rank5 -- C
> Specifying this order (ABBCAC) in hostfile doesn't work because Open MPI only supports "byslot" (AABBCC) or "bynode" (ABCABC) ranking orders.
>
> However, if I use rankfile to implement this order in the format of
> rank 0=A slot=<slot setting>
> rank 0=B slot=<slot setting>
> rank 0=B slot=<slot setting>
> rank 0=C slot=<slot setting>
> rank 0=A slot=<slot setting>
> rank 0=C slot=<slot setting>
> I run into another problem on how to determine the <slot setting> for each rank. If I bind each rank to all cores/CPUs on a node (e.g. rank 0=A slot=0-n, where n is the maximal CPU number), I run into the following errors:
>
> *** An error occurred in MPI_comm_size
> *** on a NULL communicator
> *** Unknown error
> *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
> forrtl: severe (174): SIGSEGV, segmentation fault occurred
>
> If I don't select all cores, I need to identify which cores are available to my job in order to avoid CPU oversubscribing since the nodes are shared by multiple jobs.
>
> Our system is the intel based cluster (12 or 16 cores per node) and the job is submitted by LSF batch submitter.
>
> Here is my question: how to implement a specified order of processes at node level without binding at core/cpu level?
>
> Any help and suggestions would be appreciated.
>
> Thanks,
> Chee
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users