Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug? openMPI interpretation of SLURM environment variables
From: Ralph Castain (rhc_at_[hidden])
Date: 2009-08-30 12:04:02


After talking with the slurm folks and tracking down the history of
how OMPI dealt with this variable, I have made a change to OMPI's use
of it. This should now work correctly in the upcoming release.

Thanks
Ralph

On Aug 24, 2009, at 2:22 PM, matthew.piehl_at_[hidden] wrote:

> Hello again,
>
> As you requested:
>
> node64-test ~>salloc -n7
> salloc: Granted job allocation 827
>
> node64-test ~>srun hostname
> node64-17.xxxx.xxxx.xxxx.xxxx
> node64-17.xxxx.xxxx.xxxx.xxxx
> node64-20.xxxx.xxxx.xxxx.xxxx
> node64-18.xxxx.xxxx.xxxx.xxxx
> node64-19.xxxx.xxxx.xxxx.xxxx
> node64-18.xxxx.xxxx.xxxx.xxxx
> node64-19.xxxx.xxxx.xxxx.xxxx
>
> node64-test ~>printenv | grep SLURM
> SLURM_NODELIST=node64-[17-20]
> SLURM_NNODES=4
> SLURM_JOBID=827
> SLURM_TASKS_PER_NODE=2(x3),1
> SLURM_JOB_ID=827
> SLURM_NPROCS=7
> SLURM_JOB_NODELIST=node64-[17-20]
> SLURM_JOB_CPUS_PER_NODE=2(x4)
> SLURM_JOB_NUM_NODES=4
>
> Thanks again for your time.
> Matt
>
>> Very interesting! I see the problem - we have never encountered the
>> SLURM_TASKS_PER_NODE in that format, while the
>> SLURM_JOB_CPUS_PER_NODE
>> indicates that we have indeed been allocated two processors on each
>> of the
>> nodes! So when you just do mpirun without specifying the number of
>> processes, we will launch 4 processes (2 on each node) since that
>> is what
>> SLURM told us we have been given.
>>
>> Interesting configuration you have there.
>>
>> I can add some logic that tests for internal consistency between
>> the two
>> and
>> compensates for the discrepancy. Can you get a slightly bigger
>> allocation,
>> one that covers several nodes? For example, "salloc -n7"? And then
>> send
>> the
>> output again from "printenv | grep SLURM"?
>>
>> I need to see if your configuration use a regex to describe the
>> SLURM_TASKS_PER_NODE, and what it looks like.
>>
>> Thanks
>> Ralph
>>
>>
>>
>> On Mon, Aug 24, 2009 at 1:55 PM, <matthew.piehl_at_[hidden]> wrote:
>>
>>> Hello,
>>>
>>> Hopefully the below information will be helpful.
>>>
>>> SLURM Version: 1.3.15
>>>
>>> node64-test ~>salloc -n3
>>> salloc: Granted job allocation 826
>>>
>>> node64-test ~>srun hostname
>>> node64-24.xxxx.xxxx.xxxx.xxxx
>>> node64-25.xxxx.xxxx.xxxx.xxxx
>>> node64-24.xxxx.xxxx.xxxx.xxxx
>>>
>>> node64-test ~>printenv | grep SLURM
>>> SLURM_NODELIST=node64-[24-25]
>>> SLURM_NNODES=2
>>> SLURM_JOBID=826
>>> SLURM_TASKS_PER_NODE=2,1
>>> SLURM_JOB_ID=826
>>> SLURM_NPROCS=3
>>> SLURM_JOB_NODELIST=node64-[24-25]
>>> SLURM_JOB_CPUS_PER_NODE=2(x2)
>>> SLURM_JOB_NUM_NODES=2
>>>
>>> node64-test ~>mpirun --display-allocation hostname
>>>
>>> ====================== ALLOCATED NODES ======================
>>>
>>> Data for node: Name: node64-test.xxxx.xxxx.xxxx.xxxx Num slots: 0
>>> Max slots: 0
>>> Data for node: Name: node64-24 Num slots: 2 Max slots: 0
>>> Data for node: Name: node64-25 Num slots: 2 Max slots: 0
>>>
>>> =================================================================
>>> node64-24.xxxx.xxxx.xxxx.xxxx
>>> node64-24.xxxx.xxxx.xxxx.xxxx
>>> node64-25.xxxx.xxxx.xxxx.xxxx
>>> node64-25.xxxx.xxxx.xxxx.xxxx
>>>
>>>
>>> Thanks,
>>> Matt
>>>
>>>> Haven't seen that before on any of our machines.
>>>>
>>>> Could you do "printenv | grep SLURM" after the salloc and send the
>>>> results?
>>>>
>>>> What version of SLURM is this?
>>>>
>>>> Please run "mpirun --display-allocation hostname" and send the
>>> results.
>>>>
>>>> Thanks
>>>> Ralph
>>>>
>>>> On Mon, Aug 24, 2009 at 11:30 AM, <matthew.piehl_at_[hidden]> wrote:
>>>>
>>>>> Hello,
>>>>>
>>>>> I've seem to run into an interesting problem with openMPI. After
>>>>> allocating 3 processors and confirming that the 3 processors are
>>>>> allocated. mpirun on a simple mpitest program seems to run on 4
>>>>> processors. We have 2 processors per node. I can repeat this case
>>> with
>>>>> any
>>>>> odd number of nodes, openMPI seems to take any remaining
>>>>> processors
>>> on
>>>>> the
>>>>> box. We are running openMPI v1.3.3. Here is an example of what
>>> happens:
>>>>>
>>>>> node64-test ~>salloc -n3
>>>>> salloc: Granted job allocation 825
>>>>>
>>>>> node64-test ~>srun hostname
>>>>> node64-28.xxxx.xxxx.xxxx.xxxx
>>>>> node64-28.xxxx.xxxx.xxxx.xxxx
>>>>> node64-29.xxxx.xxxx.xxxx.xxxx
>>>>>
>>>>> node64-test ~>MX_RCACHE=0
>>>>> LD_LIBRARY_PATH="/hurd/mpi/openmpi/lib:/usr/local/mx/lib" mpirun
>>>>> mpi_pgms/mpitest
>>>>> MPI domain size: 4
>>>>> I am rank 000 - node64-28.xxxx.xxxx.xxxx.xxxx
>>>>> I am rank 003 - node64-29.xxxx.xxxx.xxxx.xxxx
>>>>> I am rank 001 - node64-28.xxxx.xxxx.xxxx.xxxx
>>>>> I am rank 002 - node64-29.xxxx.xxxx.xxxx.xxxx
>>>>>
>>>>>
>>>>>
>>>>> For those who may be curious here is the program:
>>>>>
>>>>> #include <stdio.h>
>>>>> #include <stdlib.h>
>>>>> #include <mpi.h>
>>>>>
>>>>> extern int main(int argc, char *argv[]);
>>>>>
>>>>> extern int main(int argc, char *argv[])
>>>>>
>>>>> {
>>>>> auto int rank,
>>>>> size,
>>>>> namelen;
>>>>>
>>>>> MPI_Status status;
>>>>>
>>>>> static char processor_name[MPI_MAX_PROCESSOR_NAME];
>>>>>
>>>>> MPI_Init(&argc, &argv);
>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>>
>>>>> if ( rank == 0 )
>>>>> {
>>>>> MPI_Get_processor_name(processor_name, &namelen);
>>>>> fprintf(stdout,"My name is: %s\n",processor_name);
>>>>> fprintf(stdout,"Cluster size is: %d\n", size);
>>>>>
>>>>> }
>>>>> else
>>>>> {
>>>>> MPI_Get_processor_name(processor_name, &namelen);
>>>>> fprintf(stdout,"My name is: %s\n",processor_name);
>>>>> }
>>>>>
>>>>> MPI_Finalize();
>>>>> return(0);
>>>>> }
>>>>>
>>>>>
>>>>> I'm curious if this is a bug in the way openMPI interprets SLURM
>>>>> environment variables. If you have any ideas or need any more
>>>>> information
>>>>> let me know.
>>>>>
>>>>>
>>>>> Thanks.
>>>>> Matt
>>>>>
>>>>> _______________________________________________
>>>>> users mailing list
>>>>> users_at_[hidden]
>>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>>>
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users