Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] Dual quad core Opteron hangs on Bcast.
From: Matthew MacManes (macmanes_at_[hidden])
Date: 2010-01-04 14:26:30


also, you can use -mca btl ^sm which, at least for me, actually gives better performance than does increasing fifos..

Matt

On Jan 3, 2010, at 10:04 PM, Louis Rossi wrote:

> I am having a problem with BCast hanging on a dual quad core Opteron (2382, 2.6GHz, Quad Core, 4 x 512KB L2, 6MB L3 Cache) system running FC11 with openmpi-1.4. The LD_LIBRARY_PATH and PATH variables are correctly set. I have used the FC11 rpm distribution of openmpi and built openmpi-1.4 locally with the same results. The problem was first observed in a larger reliable CFD code, but I can create the problem with a simple demo code (attached). The code attempts to execute 2000 pairs of broadcasts.
>
> The hostfile contains a single line
> <machinename> slots=8
>
> If I run it with 4 cores or fewer, the code will run fine.
>
> If I run it with 5 cores or more, it will hang some of the time after successfully executing several hundred broadcasts. The number varies from run to run. The code usually finishes with 5 cores. The probability of hanging seems to increase with the number of nodes. The syntax I use is simple.
>
> mpiexec -machinefile hostfile -np 5 bcast_example
>
> There was some discussion of a similar problem on the user list, but I could not find a resolution. I have tried setting the processor affinity (--mca mpi_paffinity_alone 1). I have tried varying the broadcast algorithm (--mca coll_tuned_bcast_algorithm 1-6). I have also tried excluding (-mca oob_tcp_if_exclude) my eth1 interface (see ifconfig.txt attached) which is not connected to anything. None of these changed the outcome.
>
> Any thoughts or suggestions would be appreciated.
>
> --
> "Through nonaction, no action is left undone." --Lao Tzu
>
> Louis F. Rossi rossi_at_[hidden]
> Department of Mathematical Sciences http://www.math.udel.edu/~rossi
> University of Delaware (302) 831-1880 (voice)
> Newark, DE 19716 (302) 831-4511 (fax)
>
> <bcast_example.c.gz><ompi_info.txt.gz><ifconfig.txt.gz>_______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

_________________________________
Matthew MacManes
PhD Candidate
University of California- Berkeley
Museum of Vertebrate Zoology
Phone: 510-495-5833
Lab Website: http://ib.berkeley.edu/labs/lacey
Personal Website: http://macmanes.com/