Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] MPI_Send doesn't work if the data >= 2GB
From: Gus Correa (gus_at_[hidden])
Date: 2010-12-06 22:55:53


Hi Xianjun

Suggestions/Questions:

1) Did you check if malloc returns a non-NULL pointer?
Your program is assuming this, but it may not be true,
and in this case the problem is not with MPI.
You can print a message and call MPI_Abort if it doesn't.

2) Have you tried MPI_Isend/MPI_Irecv?
Or perhaps the buffered cousin MPI_Ibsend?

3) Why do you want to send these huge messages?
Wouldn't it be less of a trouble to send several
smaller messages?

I hope it helps,
Gus Correa

Xianjun wrote:
>
> Hi
>
> Are you running on two processes (mpiexec -n 2)?
> Yes
>
> Have you tried to print Gsize?
> Yes, I had checked my codes several times, and I thought the errors came
> from the OpenMpi. :)
>
> The command line I used:
> "mpirun -hostfile ./Serverlist -np 2 ./test". The "Serverlist" file
> include several computers in my network.
>
> The command line that I used to build the openmpi-1.4.1:
> ./configure --enable-debug --prefix=/usr/work/openmpi ; make all install;
>
> What interconnect do you use?
> It is normal TCP/IP interconnect with 1GB network card. when I debugged
> my codes(and the openmpi codes), I found the openMpi do call the
> "mca_pml_ob1_send_request_start_rdma(...)" function, but I was not quite
> sure which protocal was used when transfer 2BG data. Do you have any
> opinions? Thanks
>
> Best Regards
> Xianjun Meng
>
> 2010/12/7 Gus Correa <gus_at_[hidden] <mailto:gus_at_[hidden]>>
>
> Hi Xianjun
>
> Are you running on two processes (mpiexec -n 2)?
> I think this code will deadlock for more than two processes.
> The MPI_Recv won't have a matching send for rank>1.
>
> Also, this is C, not MPI,
> but you may be wrapping into the negative numbers.
> Have you tried to print Gsize?
> It is probably -2147483648 in 32bit and 64bit machines.
>
> My two cents.
> Gus Correa
>
> Mike Dubman wrote:
>
> Hi,
> What interconnect and command line do you use? For InfiniBand
> openib component there is a known issue with large transfers (2GB)
>
> https://svn.open-mpi.org/trac/ompi/ticket/2623
>
> try disabling memory pinning:
> http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned
>
>
> regards
> M
>
>
> 2010/12/6 <xjun.meng_at_[hidden] <mailto:xjun.meng_at_[hidden]>
> <mailto:xjun.meng_at_[hidden] <mailto:xjun.meng_at_[hidden]>>>
>
>
> hi,
>
> In my computers(X86-64), the sizeof(int)=4, but the
> sizeof(long)=sizeof(double)=sizeof(size_t)=8. when I checked my
> mpi.h file, I found that the definition about the sizeof(int) is
> correct. meanwhile, I think the mpi.h file was generated
> according
> to my compute environment when I compiled the Openmpi. So, my
> codes
> still don't work. :(
>
> Further, I found when I called the collective routines(such as,
> MPI_Allgatherv(...)) which are implemented by the Point 2 Point
> don't work either when the data > 2GB.
>
> Thanks
> Xianjun
>
> 2010/12/6 Tim Prince <n8tm_at_[hidden] <mailto:n8tm_at_[hidden]>
> <mailto:n8tm_at_[hidden] <mailto:n8tm_at_[hidden]>>>
>
>
> On 12/5/2010 7:13 PM, Xianjun wrote:
>
> hi,
>
> I met a question recently when I tested the MPI_send and
> MPI_Recv
> functions. When I run the following codes, the processes
> hanged and I
> found there was not data transmission in my network
> at all.
>
> BTW: I finished this test on two X86-64 computers
> with 16GB
> memory and
> installed Linux.
>
> 1 #include <stdio.h>
> 2 #include <mpi.h>
> 3 #include <stdlib.h>
> 4 #include <unistd.h>
> 5
> 6
> 7 int main(int argc, char** argv)
> 8 {
> 9 int localID;
> 10 int numOfPros;
> 11 size_t Gsize = (size_t)2 * 1024 * 1024 * 1024;
> 12
> 13 char* g = (char*)malloc(Gsize);
> 14
> 15 MPI_Init(&argc, &argv);
> 16 MPI_Comm_size(MPI_COMM_WORLD, &numOfPros);
> 17 MPI_Comm_rank(MPI_COMM_WORLD, &localID);
> 18
> 19 MPI_Datatype MPI_Type_lkchar;
> 20 MPI_Type_contiguous(2048, MPI_BYTE, &MPI_Type_lkchar);
> 21 MPI_Type_commit(&MPI_Type_lkchar);
> 22
> 23 if (localID == 0)
> 24 {
> 25 MPI_Send(g, 1024*1024, MPI_Type_lkchar, 1, 1,
> MPI_COMM_WORLD);
> 26 }
> 27
> 28 if (localID != 0)
> 29 {
> 30 MPI_Status status;
> 31 MPI_Recv(g, 1024*1024, MPI_Type_lkchar, 0, 1, \
> 32 MPI_COMM_WORLD, &status);
> 33 }
> 34
> 35 MPI_Finalize();
> 36
> 37 return 0;
> 38 }
>
> You supplied all your constants as 32-bit signed data,
> so, even
> if the count for MPI_Send() and MPI_Recv() were a larger data
> type, you would see this limit. Did you look at your
> <mpi.h> ?
>
> -- Tim Prince
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> <mailto:users_at_[hidden] <mailto:users_at_[hidden]>>
>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden] <mailto:users_at_[hidden]>
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users