Dear Paul,
I checked the way 'mpirun -np N <cmd>' you mentioned, but it was the same
problem.
I guess it may related to the system I used, because I have used it correctly in
another XP 32 bit system.
I look forward to more advice.Thanks.
Zhangping
________________________________
·¢¼þÈË£º "users-request_at_[hidden]" <users-request_at_[hidden]>
ÊÕ¼þÈË£º users_at_[hidden]
·¢ËÍÈÕÆÚ£º 2011/5/19 (ÖÜËÄ) 11:00:02 ÉÏÎç
Ö÷ Ì⣺ users Digest, Vol 1910, Issue 2
Send users mailing list submissions to
users_at_[hidden]
To subscribe or unsubscribe via the World Wide Web, visit
http://www.open-mpi.org/mailman/listinfo.cgi/users
or, via email, send a message with subject or body 'help' to
users-request_at_[hidden]
You can reach the person managing the list at
users-owner_at_[hidden]
When replying, please edit your Subject line so it is more specific
than "Re: Contents of users digest..."
Today's Topics:
1. Re: Error: Entry Point Not Found (Paul van der Walt)
2. Re: Openib with > 32 cores per node (Robert Horton)
3. Re: Openib with > 32 cores per node (Samuel K. Gutierrez)
----------------------------------------------------------------------
Message: 1
Date: Thu, 19 May 2011 16:14:02 +0100
From: Paul van der Walt <paul_at_[hidden]>
Subject: Re: [OMPI users] Error: Entry Point Not Found
To: Open MPI Users <users_at_[hidden]>
Message-ID: <BANLkTinjZ0CNtchQJCZYhfGSnR51jPuP7w_at_[hidden]>
Content-Type: text/plain; charset=UTF-8
Hi,
On 19 May 2011 15:54, Zhangping Wei <zhangping_wei_at_[hidden]> wrote:
> 4, I use command window to run it in this way: ?mpirun ?n 4 ?**.exe ?,then I
Probably not the problem, but shouldn't that be 'mpirun -np N <cmd>' ?
Paul
--
O< ascii ribbon campaign - stop html mail - www.asciiribbon.org
------------------------------
Message: 2
Date: Thu, 19 May 2011 16:37:56 +0100
From: Robert Horton <r.horton_at_[hidden]>
Subject: Re: [OMPI users] Openib with > 32 cores per node
To: Open MPI Users <users_at_[hidden]>
Message-ID: <1305819476.9663.148.camel_at_moelwyn>
Content-Type: text/plain; charset="UTF-8"
On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
> Hi,
>
> Try the following QP parameters that only use shared receive queues.
>
> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
>
Thanks for that. If I run the job over 2 x 48 cores it now works and the
performance seems reasonable (I need to do some more tuning) but when I
go up to 4 x 48 cores I'm getting the same problem:
[compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
error creating qp errno says Cannot allocate memory
[compute-1-7.local:18106] *** An error occurred in MPI_Isend
[compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
[compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
[compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI job will now abort)
Any thoughts?
Thanks,
Rob
--
Robert Horton
System Administrator (Research Support) - School of Mathematical Sciences
Queen Mary, University of London
r.horton_at_[hidden] - +44 (0) 20 7882 7345
------------------------------
Message: 3
Date: Thu, 19 May 2011 09:59:13 -0600
From: "Samuel K. Gutierrez" <samuel_at_[hidden]>
Subject: Re: [OMPI users] Openib with > 32 cores per node
To: Open MPI Users <users_at_[hidden]>
Message-ID: <B3E83138-9AF0-48C0-871C-DBBB2E712E12_at_[hidden]>
Content-Type: text/plain; charset=us-ascii
Hi,
On May 19, 2011, at 9:37 AM, Robert Horton wrote
> On Thu, 2011-05-19 at 08:27 -0600, Samuel K. Gutierrez wrote:
>> Hi,
>>
>> Try the following QP parameters that only use shared receive queues.
>>
>> -mca btl_openib_receive_queues S,12288,128,64,32:S,65536,128,64,32
>>
>
> Thanks for that. If I run the job over 2 x 48 cores it now works and the
> performance seems reasonable (I need to do some more tuning) but when I
> go up to 4 x 48 cores I'm getting the same problem:
>
>[compute-1-7.local][[14383,1],86][../../../../../ompi/mca/btl/openib/connect/btl_openib_connect_oob.c:464:qp_create_one]
>] error creating qp errno says Cannot allocate memory
> [compute-1-7.local:18106] *** An error occurred in MPI_Isend
> [compute-1-7.local:18106] *** on communicator MPI_COMM_WORLD
> [compute-1-7.local:18106] *** MPI_ERR_OTHER: known error not in list
> [compute-1-7.local:18106] *** MPI_ERRORS_ARE_FATAL (your MPI job will now
>abort)
>
> Any thoughts?
How much memory does each node have? Does this happen at startup?
Try adding:
-mca btl_openib_cpc_include rdmacm
I'm not sure if your version of OFED supports this feature, but maybe using XRC
may help. I **think** other tweaks are needed to get this going, but I'm not
familiar with the details.
Hope that helps,
Samuel K. Gutierrez
Los Alamos National Laboratory
>
> Thanks,
> Rob
> --
> Robert Horton
> System Administrator (Research Support) - School of Mathematical Sciences
> Queen Mary, University of London
> r.horton_at_[hidden] - +44 (0) 20 7882 7345
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
------------------------------
_______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users
End of users Digest, Vol 1910, Issue 2
**************************************
|