Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI fails to run with -np larger than 10
From: Seyyed Mohtadin Hashemi (haadah_at_[hidden])
Date: 2012-04-16 04:14:02


I did try with both MaxSessions and MaxStartups set to 200, unfortunately
it did not help - I still got the same errors as before.

> Date: Sat, 14 Apr 2012 12:58:49 -0400
>
From: Tim Miller <btamiller_at_[hidden]>
>
Subject: Re: [OMPI users] OpenMPI fails to run with -np larger than 10
>
To: Open MPI Users <users_at_[hidden]>
>
Message-ID:
>
<CAMsSzSBxTv4u1MLE=ZGMc73N2+k6fkj3-KP_PQB2H=P+YvzTkg_at_[hidden]>
>
Content-Type: text/plain; charset=windows-1252
>

> This may or may not be related, but I've had similar issues on RHEL
>
6.x and clones when using the SSH job launcher and running more than
>
10 processes per node. It sounds like you're only distributing 6
>
processes per node, so it doesn't sound like your problem, but you
>
might want to check your hostfile and make sure you're not
>
oversubscribing one of the nodes. The trick I've found to launch > 10
>
processes per node via SSH is to set MaxSessions to some number higher
>
than 10 in /etc/ssh/sshd_config (I choose 100, somewhat arbitrarily).
>

> Assuming you're using the SSH launcher on an RHEL 6 derivative, you
>
might give this a try. It's an SSH issue, not an OpenMPI one.
>

> Regards,
>
Tim
>

> On Thu, Apr 12, 2012 at 9:04 AM, Seyyed Mohtadin Hashemi
>
<haadah_at_[hidden]> wrote:
>
> Hello,
>
>
>
> I have a very peculiar problem: I have a micro cluster with three nodes
> (18
>
> cores total); the nodes are clones of each other and connected to a
> frontend
>
> via Ethernet and Debian squeeze as the OS for all nodes. When I run
> parallel
>
> jobs I can used up ?-np 10? if I go further the job crashes, I have
>
> primarily done tests with GROMACS (because that is what I will be running)
>
> but have also used OSU Micro-Benchmarks 3.5.2.
>
>
>
> For a simple parallel job I use: ?path/mpirun ?hostfile path/hostfile ?np
> XX
>
> ?d ?display-map path/mdrun_mpi ?s path/topol.tpr ?o path/output.trr?
>
>
>
> (path is global) For ?np XX being smaller than or 10 it works, however as
>
> soon as I make use of 11 or larger the whole thing crashes. The terminal
>
> dump is attached to this mail: when_working.txt is for ??np 10?,
>
> when_crash.txt is for ??np 12?, and OpenMPI_info.txt is output from
>
> ?path/mpirun --bynode --hostfile path/hostfile --tag-output ompi_info -v
>
> ompi full ?parsable?
>
>
>
> I have tried OpenMPI v.1.4.2 all the way up to beta v1.5.5, and all yield
>
> the same result.
>
>
>
> The output files are from a new install I did today: I formatted all nodes
>
> and started from a fresh minimal install of Squeeze and used "apt-get
>
> install gromacs gromacs-openmpi" and installed all dependencies. Then I
> ran
>
> two jobs using the parameters described above, I also did one with OSU
> bench
>
> (data is not included) it also crashed with ?-np? larger than 10.
>
>
>
> I hope somebody can help figure out what is wrong and how I can fix it.
>
>
>
> Best regards,
>
> Mohtadin
>
>
>
>
> *****************************************************************************
>
> **
>
> **
>
> ** WARNING: This email contains an attachment of a very suspicious type.
>
> **
>
> ** You are urged NOT to open this attachment unless you are absolutely
>
> **
>
> ** sure it is legitimate. Opening this attachment may cause irreparable
>
> **
>
> ** damage to your computer and your files. If you have any questions
>
> **
>
> ** about the validity of this message, PLEASE SEEK HELP BEFORE OPENING IT.
>
> **
>
> **
>
> **
>
> ** This warning was added by the IU Computer Science Dept. mail scanner.
>
> **
>
>
> *****************************************************************************
>
>
>
>
>
> _______________________________________________
>
> users mailing list
>
> users_at_[hidden]
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>