Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Bug? openMPI interpretation of SLURM environment variables
From: matthew.piehl_at_[hidden]
Date: 2009-08-24 15:55:34


Hello,

Hopefully the below information will be helpful.

SLURM Version: 1.3.15

node64-test ~>salloc -n3
salloc: Granted job allocation 826

node64-test ~>srun hostname
node64-24.xxxx.xxxx.xxxx.xxxx
node64-25.xxxx.xxxx.xxxx.xxxx
node64-24.xxxx.xxxx.xxxx.xxxx

node64-test ~>printenv | grep SLURM
SLURM_NODELIST=node64-[24-25]
SLURM_NNODES=2
SLURM_JOBID=826
SLURM_TASKS_PER_NODE=2,1
SLURM_JOB_ID=826
SLURM_NPROCS=3
SLURM_JOB_NODELIST=node64-[24-25]
SLURM_JOB_CPUS_PER_NODE=2(x2)
SLURM_JOB_NUM_NODES=2

node64-test ~>mpirun --display-allocation hostname

====================== ALLOCATED NODES ======================

 Data for node: Name: node64-test.xxxx.xxxx.xxxx.xxxx Num slots: 0
Max slots: 0
 Data for node: Name: node64-24 Num slots: 2 Max slots: 0
 Data for node: Name: node64-25 Num slots: 2 Max slots: 0

=================================================================
node64-24.xxxx.xxxx.xxxx.xxxx
node64-24.xxxx.xxxx.xxxx.xxxx
node64-25.xxxx.xxxx.xxxx.xxxx
node64-25.xxxx.xxxx.xxxx.xxxx

Thanks,
Matt

> Haven't seen that before on any of our machines.
>
> Could you do "printenv | grep SLURM" after the salloc and send the
> results?
>
> What version of SLURM is this?
>
> Please run "mpirun --display-allocation hostname" and send the results.
>
> Thanks
> Ralph
>
> On Mon, Aug 24, 2009 at 11:30 AM, <matthew.piehl_at_[hidden]> wrote:
>
>> Hello,
>>
>> I've seem to run into an interesting problem with openMPI. After
>> allocating 3 processors and confirming that the 3 processors are
>> allocated. mpirun on a simple mpitest program seems to run on 4
>> processors. We have 2 processors per node. I can repeat this case with
>> any
>> odd number of nodes, openMPI seems to take any remaining processors on
>> the
>> box. We are running openMPI v1.3.3. Here is an example of what happens:
>>
>> node64-test ~>salloc -n3
>> salloc: Granted job allocation 825
>>
>> node64-test ~>srun hostname
>> node64-28.xxxx.xxxx.xxxx.xxxx
>> node64-28.xxxx.xxxx.xxxx.xxxx
>> node64-29.xxxx.xxxx.xxxx.xxxx
>>
>> node64-test ~>MX_RCACHE=0
>> LD_LIBRARY_PATH="/hurd/mpi/openmpi/lib:/usr/local/mx/lib" mpirun
>> mpi_pgms/mpitest
>> MPI domain size: 4
>> I am rank 000 - node64-28.xxxx.xxxx.xxxx.xxxx
>> I am rank 003 - node64-29.xxxx.xxxx.xxxx.xxxx
>> I am rank 001 - node64-28.xxxx.xxxx.xxxx.xxxx
>> I am rank 002 - node64-29.xxxx.xxxx.xxxx.xxxx
>>
>>
>>
>> For those who may be curious here is the program:
>>
>> #include <stdio.h>
>> #include <stdlib.h>
>> #include <mpi.h>
>>
>> extern int main(int argc, char *argv[]);
>>
>> extern int main(int argc, char *argv[])
>>
>> {
>> auto int rank,
>> size,
>> namelen;
>>
>> MPI_Status status;
>>
>> static char processor_name[MPI_MAX_PROCESSOR_NAME];
>>
>> MPI_Init(&argc, &argv);
>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>
>> if ( rank == 0 )
>> {
>> MPI_Get_processor_name(processor_name, &namelen);
>> fprintf(stdout,"My name is: %s\n",processor_name);
>> fprintf(stdout,"Cluster size is: %d\n", size);
>>
>> }
>> else
>> {
>> MPI_Get_processor_name(processor_name, &namelen);
>> fprintf(stdout,"My name is: %s\n",processor_name);
>> }
>>
>> MPI_Finalize();
>> return(0);
>> }
>>
>>
>> I'm curious if this is a bug in the way openMPI interprets SLURM
>> environment variables. If you have any ideas or need any more
>> information
>> let me know.
>>
>>
>> Thanks.
>> Matt
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users