Very interesting! I see the problem - we have never encountered the SLURM_TASKS_PER_NODE in that format, while the SLURM_JOB_CPUS_PER_NODE indicates that we have indeed been allocated two processors on each of the nodes! So when you just do mpirun without specifying the number of processes, we will launch 4 processes (2 on each node) since that is what SLURM told us we have been given.
Interesting configuration you have there.
I can add some logic that tests for internal consistency between the two and compensates for the discrepancy. Can you get a slightly bigger allocation, one that covers several nodes? For example, "salloc -n7"? And then send the output again from "printenv | grep SLURM"?
I need to see if your configuration use a regex to describe the SLURM_TASKS_PER_NODE, and what it looks like.
Thanks
Ralph
Hello,
Hopefully the below information will be helpful.
SLURM Version: 1.3.15
node64-test ~>salloc -n3
salloc: Granted job allocation 826
node64-test ~>srun hostname
node64-24.xxxx.xxxx.xxxx.xxxx
node64-25.xxxx.xxxx.xxxx.xxxx
node64-24.xxxx.xxxx.xxxx.xxxx
node64-test ~>printenv | grep SLURM
SLURM_NODELIST=node64-[24-25]
SLURM_NNODES=2
SLURM_JOBID=826
SLURM_TASKS_PER_NODE=2,1
SLURM_JOB_ID=826
SLURM_NPROCS=3
SLURM_JOB_NODELIST=node64-[24-25]
SLURM_JOB_CPUS_PER_NODE=2(x2)
SLURM_JOB_NUM_NODES=2
node64-test ~>mpirun --display-allocation hostname
====================== ALLOCATED NODES ======================
Data for node: Name: node64-test.xxxx.xxxx.xxxx.xxxx Num slots: 0
Max slots: 0
Data for node: Name: node64-24 Num slots: 2 Max slots: 0
Data for node: Name: node64-25 Num slots: 2 Max slots: 0
=================================================================
node64-24.xxxx.xxxx.xxxx.xxxx
node64-24.xxxx.xxxx.xxxx.xxxx
node64-25.xxxx.xxxx.xxxx.xxxx
node64-25.xxxx.xxxx.xxxx.xxxx
Thanks,
Matt
> Haven't seen that before on any of our machines.
>
> Could you do "printenv | grep SLURM" after the salloc and send the
> results?
>
> What version of SLURM is this?
>
> Please run "mpirun --display-allocation hostname" and send the results.
>
> Thanks
> Ralph
>
> On Mon, Aug 24, 2009 at 11:30 AM, <matthew.piehl@ndsu.edu> wrote:
>
>> Hello,
>>
>> I've seem to run into an interesting problem with openMPI. After
>> allocating 3 processors and confirming that the 3 processors are
>> allocated. mpirun on a simple mpitest program seems to run on 4
>> processors. We have 2 processors per node. I can repeat this case with
>> any
>> odd number of nodes, openMPI seems to take any remaining processors on
>> the
>> box. We are running openMPI v1.3.3. Here is an example of what happens:
>>
>> node64-test ~>salloc -n3
>> salloc: Granted job allocation 825
>>
>> node64-test ~>srun hostname
>> node64-28.xxxx.xxxx.xxxx.xxxx
>> node64-28.xxxx.xxxx.xxxx.xxxx
>> node64-29.xxxx.xxxx.xxxx.xxxx
>>
>> node64-test ~>MX_RCACHE=0
>> LD_LIBRARY_PATH="/hurd/mpi/openmpi/lib:/usr/local/mx/lib" mpirun
>> mpi_pgms/mpitest
>> MPI domain size: 4
>> I am rank 000 - node64-28.xxxx.xxxx.xxxx.xxxx
>> I am rank 003 - node64-29.xxxx.xxxx.xxxx.xxxx
>> I am rank 001 - node64-28.xxxx.xxxx.xxxx.xxxx
>> I am rank 002 - node64-29.xxxx.xxxx.xxxx.xxxx
>>
>>
>>
>> For those who may be curious here is the program:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <mpi.h>
>>
>> extern int main(int argc, char *argv[]);
>>
>> extern int main(int argc, char *argv[])
>>
>> {
>> auto int rank,
>> size,
>> namelen;
>>
>> MPI_Status status;
>>
>> static char processor_name[MPI_MAX_PROCESSOR_NAME];
>>
>> MPI_Init(&argc, &argv);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>
>> if ( rank == 0 )
>> {
>> MPI_Get_processor_name(processor_name, &namelen);
>> fprintf(stdout,"My name is: %s\n",processor_name);
>> fprintf(stdout,"Cluster size is: %d\n", size);
>>
>> }
>> else
>> {
>> MPI_Get_processor_name(processor_name, &namelen);
>> fprintf(stdout,"My name is: %s\n",processor_name);
>> }
>>
>> MPI_Finalize();
>> return(0);
>> }
>>
>>
>> I'm curious if this is a bug in the way openMPI interprets SLURM
>> environment variables. If you have any ideas or need any more
>> information
>> let me know.
>>
>>
>> Thanks.
>> Matt
>>
>> _______________________________________________
>> users mailing list
>> users@open-mpi.org
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users
_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users