Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking
From: adrian sabou (adrian.sabou_at_[hidden])
Date: 2012-02-02 04:49:56


Hi, The only example that works is hello_c.c. All others (that use MPI_Send and MPI_Recv)(connectivity_c.c and ring_c.c) block after the first MPI_Send / MPI_Recv (although the first Send/Receive pair works well for all processes, subsequent Send/Receive pairs block). My slurm version is 2.1.0. It is also worth mentioning that all examples work when not using SLURM (launching with "mpirun -np 5 <exaple_app>"). Blocking occurs only when I try to run on multiple hosts with SLURM ("salloc -N5 mpirun <example_app>"). Adrian ________________________________ From: Jeff Squyres <jsquyres_at_[hidden]> To: adrian sabou <adrian.sabou_at_[hidden]>; Open MPI Users <users_at_[hidden]> Sent: Wednesday, February 1, 2012 10:32 PM Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking On Jan 31, 2012, at 11:16 AM, adrian sabou wrote: > Like I said, a very simple program. > When launching this application with SLURM (using "salloc -N2 mpirun ./<my_app>"), it hangs at the barrier. Are you able to run the MPI example programs in examples/ ? > However, it passes the barrier if I launch it without SLURM (using "mpirun -np 2 ./<my_app>"). I first noticed this problem when my application hanged if I tried to send two successive messages from a process to another. Only the first MPI_Send would work. The second MPI_Send would block indefinitely. I was wondering whether any of you have encountered a similar problem, or may have an ideea as to what is causing the Send/Receive pair to block when using SLURM. The exact output in my console is as follows: >  >        salloc: Granted job allocation 1138 >        Process 0 - Sending... >        Process 1 - Receiving... >        Process 1 - Received. >        Process 1 - Barrier reached. >        Process 0 - Sent. >        Process 0 - Barrier reached. >        (it just hangs here) >  > I am new to MPI programming and to OpenMPI and would greatly appreciate any help. My OpenMPI version is 1.4.4 (although I have also tried it on 1.5.4), my SLURM version is 0.3.3-1 (slurm-llnl 2.1.0-1), I'm not sure what SLURM version that is -- my "srun --version" shows 2.2.4.  0.3.3 would be pretty ancient, no? -- Jeff Squyres jsquyres_at_[hidden] For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/