Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] mpirun only works when -np <4
From: Matthew MacManes (macmanes_at_[hidden])
Date: 2009-12-09 13:20:11

Thanks Ashley, I'll try your tool..

I would think that this is an error in the programs I am trying to use, too, but this is a problem with 2 different programs, written by 2 different groups.. One of them might be bad, but both.. seems unlikely.

Interestingly the results for the connectivity_c test that is included with OMPI... works fine with -np <8. For -np >8 it works some of the time, other times it HANGS. I have got to believe that this is a big clue!! Also, when it hangs, sometimes I get the message "mpirun was unable to cleanly terminate the daemons on the nodes shown below" Note that NO nodes are shown below. Once, I got -np 250 to pass the connectivity test, but I was not able to replicate this reliable, so I'm not sure if it was a fluke, or what. Here is a like to a screenshop of TOP when connectivity_c is hung with -np 14.. I see that 2 processes are only at 50% CPU usage.. Hmmmm

The other tests, ring_c, hello_c, as well as the cxx versions of these guys with with all values of -np.

Unfortunately, I could not get valgrind to work...

Thanks, Matt

On Dec 9, 2009, at 2:37 AM, Ashley Pittman wrote:

> On Tue, 2009-12-08 at 08:30 -0800, Matthew MacManes wrote:
>> There are 8 physical cores, or 16 with hyperthreading enabled.
> That should be meaty enough.
>> 1st of all, let me say that when I specify that -np is less than 4
>> processors (1, 2, or 3), both programs seem to work as expected. Also,
>> the non-mpi version of each of them works fine.
> Presumably the non-mpi version is serial however? this this doesn't mean
> the program is bug-free or that the parallel version isn't broken.
> There are any number of apps that don't work above N processes, in fact
> probably all programs break for some value of N, it's normally a little
> higher then 3 however.
>> Thus, I am pretty sure that this is a problem with MPI rather that
>> with the program code or something else.
>> What happens is simply that the program hangs..
> I presume you mean here the output stops? The program continues to use
> CPU cycles but no longer appears to make any progress?
> I'm of the opinion that this is most likely a error in your program, I
> would start by using either valgrind or padb.
> You can run the app under valgrind using the following mpirun options,
> this will give you four files named v.log.0 to v.log.3 which you can
> check for errors in the normal way. The "--mca btl tcp,self" option
> will disable shared memory which can create false positives.
> mpirun -n 4 --mca btl tcp,self valgrind --log-file=v.log.%
> Alternatively you can run the application, wait for it to hang and then
> in another window run my tool, padb, which will show you the MPI message
> queues and stack traces which should show you where it's hung,
> instructions and sample output are on this page.
>> There are no error messages, and there is no clue from anything else
>> (system working fine otherwise- no RAM issues, etc). It does not hang
>> at the same place everytime, sometimes in the very beginning, sometime
>> near the middle..
>> Could this an issue with hyperthreading? A conflict with something?
> Unlikely, if there was a problem in OMPI running more than 3 processes
> it would have been found by now. I regularly run 8 process applications
> on my dual-core netbook alongside all my desktop processes without
> issue, it runs fine, a little slowly but fine.
> All this talk about binding and affinity won't help either, process
> binding is about squeezing the last 15% of performance out of a system
> and making performance reproducible, it has no bearing on correctness or
> scalability. If you're not running on a dedicated machine which with
> firefox running I guess you aren't then there would be a good case for
> leaving it off anyway.
> Ashley,
> --
> Ashley Pittman, Bath, UK.
> Padb - A parallel job inspection tool for cluster computing
> _______________________________________________
> users mailing list
> users_at_[hidden]

Matthew MacManes
PhD Candidate
University of California- Berkeley
Museum of Vertebrate Zoology
Phone: 510-495-5833
Lab Website:
Personal Website: