Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: [OMPI users] »Ø¸´£º »Ø¸´£º can you help me please ?thanks
From: ºúÑî (781578278_at_[hidden])
Date: 2013-12-09 08:05:43


I have a server with 12 cores.when I run mpi program with 10 processors.only three processors works.Here are a picture about the problem
   

  

  
 why?Is the problem with process schedule??

 ------------------ ԭʼÓʼþ ------------------
  ·¢¼þÈË: "Bruno Coutinho";<coutinho_at_[hidden]>;
 ·¢ËÍʱ¼ä: 2013Äê12ÔÂ6ÈÕ(ÐÇÆÚÎå) ÍíÉÏ11:14
 ÊÕ¼þÈË: "Open MPI Users"<users_at_[hidden]>;
 
 Ö÷Ìâ: Re: [OMPI users]»Ø¸´£º can you help me please ?thanks

 

 Probably it was the changing from eager to rendezvous protocols as Jeff said.

 If you don't know what are these, read this:
 https://computing.llnl.gov/tutorials/mpi_performance/#Protocols

 http://blogs.cisco.com/performance/what-is-an-mpi-eager-limit/

 http://blogs.cisco.com/performance/eager-limits-part-2/

 

 You can tune eager limit chaning mca parameters btl_tcp_eager_limit (for tcp), btl_self_eager_limit (comunication fron one process to itself), btl_sm_eager_limit (shared memory) and btl_udapl_eager_limit or btl_openib_eager_limit (if you use infiniband).
 

 2013/12/6 Jeff Squyres (jsquyres) <jsquyres_at_[hidden]>
 I sent you some further questions yesterday:

    http://www.open-mpi.org/community/lists/users/2013/12/23158.php
  

On Dec 6, 2013, at 1:35 AM, ºúÑî <781578278_at_[hidden]> wrote:

> Here is my code:
> int*a=(int*)malloc(sizeof(int)*number);
> MPI_Send(a,number, MPI_INT, 1, 1,MPI_COMM_WORLD);
>
> int*b=(int*)malloc(sizeof(int)*number);
> MPI_Recv(b, number, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
>
> number here is the size of my array(eg,a or b).
> I have try it on my local compute and my rocks cluster.On rocks cluster, one processor on one frontend node use "MPI_Send" send a message ,other processors on compute nodes use "MPI_Recv" receive message .
> when number is least than 10000,other processors can receive message fast;
> but when number is more than 15000,other processors can receive message slowly
> why?? becesue openmpi API ?? or other problems?
>
> it spends me a few days , I want your help,thanks for all readers. good luck for you
>
>
>
>
> ------------------ ԭʼÓʼþ ------------------
> ·¢¼þÈË: "Ralph Castain";<rhc_at_[hidden]>;
> ·¢ËÍʱ¼ä: 2013Äê12ÔÂ5ÈÕ(ÐÇÆÚËÄ) ÍíÉÏ6:52
> ÊÕ¼þÈË: "Open MPI Users"<users_at_[hidden]>;
> Ö÷Ìâ: Re: [OMPI users] can you help me please ?thanks
>
> You are running 15000 ranks on two nodes?? My best guess is that you are swapping like crazy as your memory footprint problem exceeds available physical memory.
>
>
>
> On Thu, Dec 5, 2013 at 1:04 AM, ºúÑî <781578278_at_[hidden]> wrote:
> My ROCKS cluster includes one frontend and two compute nodes.In my program,I have use the openmpi API such as MPI_Send and MPI_Recv . but when I run the progam with 3 processors . one processor send a message ,other receive message .here are some code.
> int*a=(int*)malloc(sizeof(int)*number);
> MPI_Send(a,number, MPI_INT, 1, 1,MPI_COMM_WORLD);
>
> int*b=(int*)malloc(sizeof(int)*number);
> MPI_Recv(b, number, MPI_INT, 0, MPI_ANY_TAG, MPI_COMM_WORLD, &status);
>
> when number is least than 10000,it runs fast.
> but number is more than 15000,it runs slowly
>
> why?? becesue openmpi API ?? or other problems?
> ------------------ ԭʼÓʼþ ------------------
> ·¢¼þÈË: "Ralph Castain";<rhc_at_[hidden]>;
> ·¢ËÍʱ¼ä: 2013Äê12ÔÂ3ÈÕ(ÐÇÆÚ¶þ) ÖÐÎç1:39
> ÊÕ¼þÈË: "Open MPI Users"<users_at_[hidden]>;
> Ö÷Ìâ: Re: [OMPI users] can you help me please ?thanks
>
>
>
>
>
> On Mon, Dec 2, 2013 at 9:23 PM, ºúÑî <781578278_at_[hidden]> wrote:
> A simple program at my 4-node ROCKS cluster runs fine with command:
> /opt/openmpi/bin/mpirun -np 4 -machinefile machines ./sort_mpi6
>
>
> Another bigger programs runs fine on the head node only with command:
>
> cd ./sphere; /opt/openmpi/bin/mpirun -np 4 ../bin/sort_mpi6
>
> But with the command:
>
> cd /sphere; /opt/openmpi/bin/mpirun -np 4 -machinefile ../machines
> ../bin/sort_mpi6
>
> It gives output that:
>
> ../bin/sort_mpi6: error while loading shared libraries: libgdal.so.1: cannot open
> shared object file: No such file or directory
> ../bin/sort_mpi6: error while loading shared libraries: libgdal.so.1: cannot open
> shared object file: No such file or directory
> ../bin/sort_mpi6: error while loading shared libraries: libgdal.so.1: cannot open
> shared object file: No such file or directory
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users




 --
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/


  _______________________________________________
users mailing list
users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/users




40F6DC95_E690AF16.27C0A552.jpg