发件人: "users-request@open-mpi.org" <users-request@open-mpi.org>
收件人: users@open-mpi.org
发送日期: 2011/5/19 (周四) 11:00:02
上午
主 题: users Digest, Vol 1910, Issue 2
Send users mailing list submissions to
users@open-mpi.orgTo subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/usersor, via email, send a message with subject or body 'help' to
users-request@open-mpi.orgYou can reach the person managing the list at
users-owner@open-mpi.orgWhen replying, please edit your Subject line so it is more specific
than
"Re: Contents of users digest..."
Today's Topics:
1. Re: Error: Entry Point Not Found (Paul van der Walt)
2. Re: Openib with > 32 cores per node (Robert Horton)
3. Re: Openib with > 32 cores per node (Samuel K. Gutierrez)
----------------------------------------------------------------------
Message: 1
Date: Thu, 19 May 2011 16:14:02 +0100
From: Paul van der Walt <
paul@denknerd.nl>
Subject: Re: [OMPI users] Error: Entry Point Not Found
To: Open MPI Users <
users@open-mpi.org>
Message-ID: <
BANLkTinjZ0CNtchQJCZYhfGSnR51jPuP7w@mail.gmail.com>
Content-Type: text/plain;
charset=UTF-8
Hi,
On 19 May 2011 15:54, Zhangping Wei <
zhangping_wei@yahoo.com> wrote:
> 4, I use command window to run it in this way: ?mpirun ?n 4 ?**.exe ?,then I
Probably not the problem, but shouldn't that be 'mpirun -np N <cmd>' ?
Paul
--
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org------------------------------
Message: 2
Date: Thu, 19 May 2011 16:37:56 +0100
From: Robert Horton <
r.horton@qmul.ac.uk>
Subject: Re: [OMPI users] Openib with > 32 cores per node
To: Open MPI Users <
users@open-mpi.org>
Message-ID:
<1305819476.9663.148.camel@moelwyn>
Content-Type: text/plain; charset="UTF-8"
On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
> Hi,
>
> Try the following QP parameters that only use shared receive queues.
>
> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
>
Thanks for that. If I run the job over 2 x 48 cores it now works and the
performance seems reasonable (I need to do some more tuning) but when I
go up to 4 x 48 cores I'm getting the same problem:
[compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one] error creating qp errno says Cannot allocate memory
[compute-1-7.local:18106] *** An error occurred in MPI_Isend
[compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
[compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
[compute-1-7.local:18106]
*** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
Any thoughts?
Thanks,
Rob
--
Robert Horton
System Administrator (Research Support) - School of Mathematical Sciences
Queen Mary, University of London
r.horton@qmul.ac.uk - +44 (0) 20 7882 7345
------------------------------
Message: 3
Date: Thu, 19 May 2011 09:59:13 -0600
From: "Samuel K. Gutierrez" <
samuel@lanl.gov>
Subject: Re: [OMPI users] Openib with > 32 cores per node
To: Open MPI Users <
users@open-mpi.org>
Message-ID: <
B3E83138-9AF0-48C0-871C-DBBB2E712E12@lanl.gov>
Content-Type: text/plain; charset=us-ascii
Hi,
On May 19, 2011, at 9:37 AM, Robert Horton wrote
> On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
>> Hi,
>>
>> Try the following QP parameters that only use shared receive queues.
>>
>> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
>>
>
> Thanks for that. If I run the job over 2 x 48 cores it now works and the
> performance seems reasonable (I need to do some more tuning) but when I
> go up to 4 x 48 cores I'm getting the same problem:
>
> [compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one] error creating qp errno says Cannot allocate memory
>
[compute-1-7.local:18106] *** An error occurred in MPI_Isend
> [compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
> [compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
> [compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
>
> Any thoughts?
How much memory does each node have? Does this happen at startup?
Try adding:
-mca btl_openib_cpc_include rdmacm
I'm not sure if your version of OFED supports this feature, but maybe using XRC may help. I **think** other tweaks are needed to get this going, but I'm not familiar with the details.
Hope that helps,
Samuel K. Gutierrez
Los Alamos National Laboratory
>
> Thanks,
> Rob
> --
> Robert Horton
> System Administrator (Research Support) - School of Mathematical Sciences
> Queen Mary, University of London
>
r.horton@qmul.ac.uk - +44 (0) 20 7882 7345
>
> _______________________________________________
> users mailing list
>
users@open-mpi.org>
http://www.open-mpi.org/mailman/listinfo.cgi/users------------------------------
_______________________________________________
users mailing list
users@open-mpi.orghttp://www.open-mpi.org/mailman/listinfo.cgi/usersEnd of users Digest, Vol 1910, Issue 2
**************************************