Dear Paul,

I checked the way 'mpirun -np N <cmd>' you mentioned, but it was the same problem.

I guess it may related to the system I used, because I have used it correctly in another XP 32 bit system.

I look forward to more advice.Thanks.

Zhangping


发件人: "users-request@open-mpi.org" <users-request@open-mpi.org>
收件人: users@open-mpi.org
发送日期: 2011/5/19 (周四) 11:00:02 上午
主 题: users Digest, Vol 1910, Issue 2

Send users mailing list submissions to
    users@open-mpi.org

To subscribe or unsubscribe via the World Wide Web, visit
    http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
    users-request@open-mpi.org

You can reach the person managing the list at
    users-owner@open-mpi.org

When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."


Today's Topics:

  1. Re: Error: Entry Point Not Found (Paul van der Walt)
  2. Re: Openib with > 32 cores per node (Robert Horton)
  3. Re: Openib with > 32 cores per node (Samuel K. Gutierrez)


----------------------------------------------------------------------

Message: 1
Date: Thu, 19 May 2011 16:14:02 +0100
From: Paul van der Walt <paul@denknerd.nl>
Subject: Re: [OMPI users] Error: Entry Point Not Found
To: Open MPI Users <users@open-mpi.org>
Message-ID: <BANLkTinjZ0CNtchQJCZYhfGSnR51jPuP7w@mail.gmail.com>
Content-Type: text/plain; charset=UTF-8

Hi,

On 19 May 2011 15:54, Zhangping Wei <zhangping_wei@yahoo.com> wrote:
> 4, I use command window to run it in this way: ?mpirun ?n 4 ?**.exe ?,then I

Probably not the problem, but shouldn't that be 'mpirun -np N <cmd>' ?

Paul

--
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org



------------------------------

Message: 2
Date: Thu, 19 May 2011 16:37:56 +0100
From: Robert Horton <r.horton@qmul.ac.uk>
Subject: Re: [OMPI users] Openib with > 32 cores per node
To: Open MPI Users <users@open-mpi.org>
Message-ID: <1305819476.9663.148.camel@moelwyn>
Content-Type: text/plain; charset="UTF-8"

On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
> Hi,
>
> Try the following QP parameters that only use shared receive queues.
>
> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
>

Thanks for that. If I run the job over 2 x 48 cores it now works and the
performance seems reasonable (I need to do some more tuning) but when I
go up to 4 x 48 cores I'm getting the same problem:

[compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one] error creating qp errno says Cannot allocate memory
[compute-1-7.local:18106] *** An error occurred in MPI_Isend
[compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
[compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
[compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)

Any thoughts?

Thanks,
Rob
--
Robert Horton
System Administrator (Research Support) - School of Mathematical Sciences
Queen Mary, University of London
r.horton@qmul.ac.uk  -  +44 (0) 20 7882 7345



------------------------------

Message: 3
Date: Thu, 19 May 2011 09:59:13 -0600
From: "Samuel K. Gutierrez" <samuel@lanl.gov>
Subject: Re: [OMPI users] Openib with > 32 cores per node
To: Open MPI Users <users@open-mpi.org>
Message-ID: <B3E83138-9AF0-48C0-871C-DBBB2E712E12@lanl.gov>
Content-Type: text/plain; charset=us-ascii

Hi,

On May 19, 2011, at 9:37 AM, Robert Horton wrote

> On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
>> Hi,
>>
>> Try the following QP parameters that only use shared receive queues.
>>
>> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
>>
>
> Thanks for that. If I run the job over 2 x 48 cores it now works and the
> performance seems reasonable (I need to do some more tuning) but when I
> go up to 4 x 48 cores I'm getting the same problem:
>
> [compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one] error creating qp errno says Cannot allocate memory
> [compute-1-7.local:18106] *** An error occurred in MPI_Isend
> [compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
> [compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
> [compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>
> Any thoughts?

How much memory does each node have?  Does this happen at startup?

Try adding:

-mca btl_openib_cpc_include rdmacm

I'm not sure if your version of OFED supports this feature, but maybe using XRC may help.  I **think** other tweaks are needed to get this going, but I'm not familiar with the details.

Hope that helps,

Samuel K. Gutierrez
Los Alamos National Laboratory


>
> Thanks,
> Rob
> --
> Robert Horton
> System Administrator (Research Support) - School of Mathematical Sciences
> Queen Mary, University of London
> r.horton@qmul.ac.uk  -  +44 (0) 20 7882 7345
>
> _______________________________________________
> users mailing list
> users@open-mpi.org
> http://www.open-mpi.org/mailman/listinfo.cgi/users






------------------------------

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

End of users Digest, Vol 1910, Issue 2
**************************************