Hi Gus Correa

First of all, thanks for your suggestions.

1) The malloc function do return a non_NULL pointer.

2) I didn't tried the MPI_Isend function, actually, The really function I need to use is MPI_Allgatherv(). When I used it, I found this function didn't work when the the data >= 2GB, then I debugged it and found this function finally call the MPI_Send.

3) I have a large number of data need to train. so transfer the message >= 2GB is neccerary. Although I can divided the data into smaller, but I think the effciency will become lower too.


Regards
Xianjun Meng

2010/12/7 Gus Correa <gus@ldeo.columbia.edu>
Hi Xianjun

Suggestions/Questions:

1) Did you check if malloc returns a non-NULL pointer?
Your program is assuming this, but it may not be true,
and in this case the problem is not with MPI.
You can print a message and call MPI_Abort if it doesn't.

2) Have you tried MPI_Isend/MPI_Irecv?
Or perhaps the buffered cousin MPI_Ibsend?

3) Why do you want to send these huge messages?
Wouldn't it be less of a trouble to send several
smaller messages?

I hope it helps,
Gus Correa

Xianjun wrote:

Hi

Are you running on two processes (mpiexec -n 2)?
Yes

Have you tried to print Gsize?
Yes, I had checked my codes several times, and I thought the errors came from the OpenMpi. :)

The command line I used:
"mpirun -hostfile ./Serverlist -np 2 ./test". The "Serverlist" file include several computers in my network.

The command line that I used to build the openmpi-1.4.1:
./configure --enable-debug --prefix=/usr/work/openmpi ; make all install;

What interconnect do you use?
It is normal TCP/IP interconnect with 1GB network card. when I debugged my codes(and the openmpi codes), I found the openMpi do call the "mca_pml_ob1_send_request_start_rdma(...)" function, but I was not quite sure which protocal was used when transfer 2BG data. Do you have any opinions? Thanks

Best Regards
Xianjun Meng

2010/12/7 Gus Correa <gus@ldeo.columbia.edu <mailto:gus@ldeo.columbia.edu>>


   Hi Xianjun

   Are you running on two processes (mpiexec -n 2)?
   I think this code will deadlock for more than two processes.
   The MPI_Recv won't have a matching send for rank>1.

   Also, this is C, not MPI,
   but you may be wrapping into the negative numbers.
   Have you tried to print Gsize?
   It is probably -2147483648 in 32bit and 64bit machines.

   My two cents.
   Gus Correa

   Mike Dubman wrote:

       Hi,
       What interconnect and command line do you use? For InfiniBand
       openib component there is a known issue with large transfers (2GB)

       https://svn.open-mpi.org/trac/ompi/ticket/2623

       try disabling memory pinning:
       http://www.open-mpi.org/faq/?category=openfabrics#large-message-leave-pinned


       regards
       M


       2010/12/6 <xjun.meng@gmail.com <mailto:xjun.meng@gmail.com>
       <mailto:xjun.meng@gmail.com <mailto:xjun.meng@gmail.com>>>



          hi,

          In my computers(X86-64), the sizeof(int)=4, but the
          sizeof(long)=sizeof(double)=sizeof(size_t)=8. when I checked my
          mpi.h file, I found that the definition about the sizeof(int) is
          correct. meanwhile, I think the mpi.h file was generated
       according
          to my compute environment when I compiled the Openmpi. So, my
       codes
          still don't work. :(

          Further, I found when I called the collective routines(such as,
          MPI_Allgatherv(...)) which are implemented by the Point 2 Point
          don't work either when the data > 2GB.

          Thanks
          Xianjun

          2010/12/6 Tim Prince <n8tm@aol.com <mailto:n8tm@aol.com>
       <mailto:n8tm@aol.com <mailto:n8tm@aol.com>>>



              On 12/5/2010 7:13 PM, Xianjun wrote:

                  hi,

                  I met a question recently when I tested the MPI_send and
                  MPI_Recv
                  functions. When I run the following codes, the processes
                  hanged and I
                  found there was not data transmission in my network
       at all.

                  BTW: I finished this test on two X86-64 computers
       with 16GB
                  memory and
                  installed Linux.

                  1 #include <stdio.h>
                  2 #include <mpi.h>
                  3 #include <stdlib.h>
                  4 #include <unistd.h>
                  5
                  6
                  7 int main(int argc, char** argv)
                  8 {
                  9 int localID;
                  10 int numOfPros;
                  11 size_t Gsize = (size_t)2 * 1024 * 1024 * 1024;
                  12
                  13 char* g = (char*)malloc(Gsize);
                  14
                  15 MPI_Init(&argc, &argv);
                  16 MPI_Comm_size(MPI_COMM_WORLD, &numOfPros);
                  17 MPI_Comm_rank(MPI_COMM_WORLD, &localID);
                  18
                  19 MPI_Datatype MPI_Type_lkchar;
                  20 MPI_Type_contiguous(2048, MPI_BYTE, &MPI_Type_lkchar);
                  21 MPI_Type_commit(&MPI_Type_lkchar);
                  22
                  23 if (localID == 0)
                  24 {
                  25 MPI_Send(g, 1024*1024, MPI_Type_lkchar, 1, 1,
                  MPI_COMM_WORLD);
                  26 }
                  27
                  28 if (localID != 0)
                  29 {
                  30 MPI_Status status;
                  31 MPI_Recv(g, 1024*1024, MPI_Type_lkchar, 0, 1, \
                  32 MPI_COMM_WORLD, &status);
                  33 }
                  34
                  35 MPI_Finalize();
                  36
                  37 return 0;
                  38 }

              You supplied all your constants as 32-bit signed data,
       so, even
              if the count for MPI_Send() and MPI_Recv() were a larger data
              type, you would see this limit. Did you look at your
       <mpi.h> ?

              --         Tim Prince

              _______________________________________________
              users mailing list
              users@open-mpi.org <mailto:users@open-mpi.org>
       <mailto:users@open-mpi.org <mailto:users@open-mpi.org>>


              http://www.open-mpi.org/mailman/listinfo.cgi/users



          _______________________________________________
          users mailing list
          users@open-mpi.org <mailto:users@open-mpi.org>
       <mailto:users@open-mpi.org <mailto:users@open-mpi.org>>


          http://www.open-mpi.org/mailman/listinfo.cgi/users



       ------------------------------------------------------------------------

       _______________________________________________
       users mailing list
       users@open-mpi.org <mailto:users@open-mpi.org>
       http://www.open-mpi.org/mailman/listinfo.cgi/users


   _______________________________________________
   users mailing list
   users@open-mpi.org <mailto:users@open-mpi.org>
   http://www.open-mpi.org/mailman/listinfo.cgi/users



------------------------------------------------------------------------

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users

_______________________________________________
users mailing list
users@open-mpi.org
http://www.open-mpi.org/mailman/listinfo.cgi/users