Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking
From: adrian sabou (adrian.sabou_at_[hidden])
Date: 2012-02-02 04:49:56


Hi, The only example that works is hello_c.c. All others (that use MPI_Send and MPI_Recv)(connectivity_c.c and ring_c.c) block after the first MPI_Send / MPI_Recv (although the first Send/Receive pair works well for all processes, subsequent Send/Receive pairs block). My slurm version is 2.1.0. It is also worth mentioning that all examples work when not using SLURM (launching with "mpirun -np 5 <exaple_app>"). Blocking occurs only when I try to run on multiple hosts with SLURM ("salloc -N5 mpirun <example_app>"). Adrian ________________________________ From: Jeff Squyres <jsquyres_at_[hidden]> To: adrian sabou <adrian.sabou_at_[hidden]>; Open MPI Users <users_at_[hidden]> Sent: Wednesday, February 1, 2012 10:32 PM Subject: Re: [OMPI users] OpenMPI / SLURM -> Send/Recv blocking On Jan 31, 2012, at 11:16 AM, adrian sabou wrote: > Like I said, a very simple program. > When launching this application with SLURM (using "salloc -N2 mpirun ./<my_app>"), it hangs at the barrier. Are you able to run the MPI example programs in examples/ ? > However, it passes the barrier if I launch it without SLURM (using "mpirun -np 2 ./<my_app>"). I first noticed this problem when my application hanged if I tried to send two successive messages from a process to another. Only the first MPI_Send would work. The second MPI_Send would block indefinitely. I was wondering whether any of you have encountered a similar problem, or may have an ideea as to what is causing the Send/Receive pair to block when using SLURM. The exact output in my console is as follows: >  >        salloc: Granted job allocation 1138 >        Process 0 - Sending... >        Process 1 - Receiving... >        Process 1 - Received. >        Process 1 - Barrier reached. >        Process 0 - Sent. >        Process 0 - Barrier reached. >        (it just hangs here) >  > I am new to MPI programming and to OpenMPI and would greatly appreciate any help. My OpenMPI version is 1.4.4 (although I have also tried it on 1.5.4), my SLURM version is 0.3.3-1 (slurm-llnl 2.1.0-1), I'm not sure what SLURM version that is -- my "srun --version" shows 2.2.4.  0.3.3 would be pretty ancient, no? -- Jeff Squyres jsquyres_at_[hidden] For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/