Point 1: "- it doesn't seem like you are checking the return code of
plpa_sched_setaffinity. You might want to ensure that it's returning
success." -- I modified the test prpgram to test return value for all calls
to: plpa_sched_setaffinity(), the return value is 0 for all ranks.
Point 2: "- you might want to call plpa_sched_getaffinity and verify that the
mask you set is the mask that the OS actually uses (it *should* be,
but...)" -- This is the toughest one to check since I don't really understand
the plpa_cpu_set_t type. I've modified my test program (included below) to
have two variables of this type (mycpu,mycpu_check)one is set with the
function:
PLPA_CPU_ZERO(&mycpu);
The other is set with:
error_code=plpa_sched_getaffinity(0, sizeof(plpa_cpu_set_t),
&mycpu_check);
(This is done after the call to plpa_sched_setaffinity, I've also checked
that the return value set in error_code is zero)
I can use my debugger (kdbg) to compare the values of the two variables and
they appear to be the same.
(I've included a screen shot of the debugger to illustrate exactly what I
mean -- I hope)
Point 3: "also make sure that there aren't other processes consuming your
cores -- e.g., try running the same test on 6 cores instead of 8 to let other
OS/daemons have 2 free cores without trashing the execute times of your MPI
processes." -- I modifed my execution to use 6 processors, ececution time
still varies by over a factor of two.
Is there anything else I should check?
Thanks for your help.
Regards,
Mike Hughes
New test program.....
#include <string>
#include <iostream>
#include <math.h>
#include <time.h>
#include <mpi.h>
#ifdef __cplusplus
extern "C"
{
#include <plpa.h>
}
#else
#include <plpa.h>
#endif
const int DUMMYTAG=1;
//I usually use my favorite debugger for whatever platform I'm on (I use
//Sun's workshop debugger on Solaris) and put in this type of code, usually
//pretty soon after MPI_Init():
static void wait_for_debugger(void)
{
int error_code;
int rank;
error_code=MPI_Comm_rank(MPI_COMM_WORLD, &rank);
if (rank == 0)
{
printf("Waiting for debugger attachment. Please hit
enter.\n");
getchar();
}
error_code=MPI_Barrier(MPI_COMM_WORLD);
}
/****************************************************************************
*************/
// comppilation: mpic++ plpa_use.cpp -lplpa -o plpa_use
// Invocation: (assuming you want eight processes on eight processors)
// mpirun -np <numberOfProcessors> plpa_use <limit>
// will compute sin(3.14159) <limit> times on <numberOfProcessors> processors
// typical invocation: mpirun -np 8 plpa_use 10000000
/****************************************************************************
*************/
int main(int argc, char *argv[]);
int main(int argc, char *argv[])
{
int ntasks,error_code, myrank,rank,limit,i;
double z;
clock_t startTime, stopTime;
MPI_Status status;
plpa_cpu_set_t mycpu,mycpu_check;
MPI_Init(&argc, &argv);
limit = atoi(argv[1]);
error_code=MPI_Errhandler_set(MPI_COMM_WORLD,MPI_ERRORS_RETURN);
error_code=MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
error_code=MPI_Comm_size(MPI_COMM_WORLD,&ntasks);
wait_for_debugger();
//USE PLPA To lock each rank onto different
processors--------------------------------------------------
PLPA_CPU_ZERO(&mycpu);
PLPA_CPU_SET(myrank,&mycpu);
error_code=plpa_sched_setaffinity(0, sizeof(plpa_cpu_set_t),
&mycpu);//set first argument to zero to use calling process pid
std::cout << "\tProcess of rank:"<< myrank << "
plpa_sched_setaffinity returned: " << error_code <<std::endl;
error_code=plpa_sched_getaffinity(0, sizeof(plpa_cpu_set_t),
&mycpu_check);//set first argument to zero to use calling process pid
std::cout << "\tProcess of rank:"<< myrank << "
plpa_sched_getaffinity returned: " << error_code <<std::endl;
if(1)
std::cout << "\tProcess of rank:"<< myrank << "
plpa_sched_setaffinity seems to have worked" <<std::endl;
if(myrank==0)
{
error_code=MPI_Comm_size(MPI_COMM_WORLD,&ntasks);
if (PLPA_PROBE_OK == plpa_api_probe())
{
std::cout << "PLP is working on processor for rank 0"
<< std::endl;
}
startTime = clock ();
for(rank=1;rank<ntasks;rank++)
{
std::cout << "Processor rank=0, Sent
limit="<<limit<<" To Processor: "<< rank << std::endl;
error_code=MPI_Send(&limit,1,MPI_LONG,rank,DUMMYTAG,MPI_COMM_WORLD);
}
for(rank=1;rank<ntasks;rank++)//Get back results in order
{
error_code=MPI_Recv(&rank,1,MPI_LONG,rank,MPI_ANY_TAG,MPI_COMM_WORLD,&status)
;
std::cout << "\tProcess of rank:"<< rank << "
Completed execution" << std::endl;
}
stopTime = clock ();
std::cout << "MANAGER::main: Execution Time: " <<
((double)(stopTime-startTime))/((double)CLOCKS_PER_SEC) << " sec" <<
std::endl;
}
else
{
error_code=MPI_Recv(&limit,1,MPI_LONG,0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
for(i=0;i<limit;i++)
z=sin(3.14159);
error_code=MPI_Send(&myrank,1,MPI_LONG,0,DUMMYTAG,MPI_COMM_WORLD);
}
error_code=MPI_Finalize(); /* cleanup MPI */
return 0;
}
-----Original Message-----
From: plpa-users-bounces_at_[hidden]
[mailto:plpa-users-bounces_at_[hidden]] On Behalf Of Jeff Squyres
Sent: Monday, January 14, 2008 10:13 PM
To: PLPA users list
Subject: Re: [PLPA users] Newbie question
One would hope that OMPI's affinity_alone param should do what it is supposed
to do, but perhaps it's not -- your explicit use of PLPA is a good way to
test it.
There could be a few things occurring here:
- it doesn't seem like you are checking the return code of
plpa_sched_setaffinity. You might want to ensure that it's returning
success.
- you might want to call plpa_sched_getaffinity and verify that the mask you
set is the mask that the OS actually uses (it *should* be,
but...)
- also make sure that there aren't other processes consuming your cores --
e.g., try running the same test on 6 cores instead of 8 to let other
OS/daemons have 2 free cores without trashing the execute times of your MPI
processes.
On Jan 12, 2008, at 10:16 PM, Hughes, Mike wrote:
> I am trying to use the plpa library to lock each rank of an MPI
> program onto different processors. I have eight cores on my computer.
> I've written the test program below, but I am not sure it is actually
> doing what I intended (assigning a process to one and only one core)
> since the execution times of the program vary by up to a factor of two
>
> (I used to have an eight core workstation built using an intel
> motherboard and I could run MPI programs on it with the command:
> mpirun --mca mpi_paffinity_alone 1 -np 8 <ProgramName> and execution
> times varied by only
> 0.01 seconds out of ~48 sec. I'm now using a supermicro-based
> workstation running ubuntu gutsy and the --mca mpi_paffinity_alone 1
> results in execution times varying by over a factor of two. In
> addition, total exection time has more than doubled comapred to intel
> based system running the same version of linux. The OpenMPI FAQ seems
> to suggest that this may be due to processor affinity not working.)
>
> The relevant lines in the code below are:
>
> //USE PLPA To lock each rank onto different
> processors---------------------------------
> PLPA_CPU_ZERO(&mycpu);
> PLPA_CPU_SET(myrank,&mycpu);
> error_code=plpa_sched_setaffinity(0, sizeof(plpa_cpu_set_t),
> &mycpu);
> //set first argument to zero to use calling process pid
>
> Is this the correct way to use the library?
>
> BTW the results of plpa_info are:
> [msh_at_hugherNaught] $ plpa_info
> PLPA_PROBE_OK
> [msh_at_hugherNaught] $
>
> Regards,
>
> msh
>
> Test program listing....
> #include <string>
> #include <iostream>
>
> #include <math.h>
> #include <time.h>
> #include <mpi.h>
>
> #ifdef __cplusplus
> extern "C"
> {
> #include <plpa.h>
> }
> #else
> #include <plpa.h>
> #endif
> const int DUMMYTAG=1;
>
> //I usually use my favorite debugger for whatever platform I'm on (I
> use //Sun's workshop debugger on Solaris) and put in this type of
> code, usually //pretty soon after MPI_Init():
> static void wait_for_debugger(void)
> {
> int error_code;
> int rank;
>
> error_code=MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> if (rank == 0)
> {
> printf("Waiting for debugger attachment. Please hit
> enter.\n");
> getchar();
> }
>
> error_code=MPI_Barrier(MPI_COMM_WORLD);
> }
>
> /
> **********************************************************************
> ******
> *************/
> // compilation: mpic++ plpa_use.cpp -lplpa -o plpa_use // Invocation:
> (assuming you want eight processes on eight processors) // mpirun -np
> <numberOfProcessors> plpa_use <limit> // will compute sin(3.14159)
> <limit> times on <numberOfProcessors> processors // typical
> invocation: mpirun -np 8 plpa_use 10000000 /
> **********************************************************************
> ******
> *************/
> int main(int argc, char *argv[]);
>
> int main(int argc, char *argv[])
> {
> int ntasks,error_code, myrank,rank,limit,i;
> double z;
> clock_t startTime, stopTime;
> MPI_Status status;
> plpa_cpu_set_t mycpu;
>
> MPI_Init(&argc, &argv);
> limit = atoi(argv[1]);
>
> error_code=MPI_Errhandler_set(MPI_COMM_WORLD,MPI_ERRORS_RETURN);
> error_code=MPI_Comm_rank(MPI_COMM_WORLD,&myrank);
> error_code=MPI_Comm_size(MPI_COMM_WORLD,&ntasks);
>
> wait_for_debugger();
>
> //USE PLPA To lock each rank onto different
> processors---------------------------------
> PLPA_CPU_ZERO(&mycpu);
> PLPA_CPU_SET(myrank,&mycpu);
> error_code=plpa_sched_setaffinity(0, sizeof(plpa_cpu_set_t),
> &mycpu);
> //set first argument to zero to use calling process pid
>
> if(myrank==0)
> {
> error_code=MPI_Comm_size(MPI_COMM_WORLD,&ntasks);
>
> if (PLPA_PROBE_OK == plpa_api_probe())
> {
> std::cout << "PLP is working on processor for
> rank 0"
> << std::endl;
> }
>
> startTime = clock ();
> for(rank=1;rank<ntasks;rank++)
> {
> std::cout << "Processor rank=0, Sent
> limit="<<limit<<" To Processor: "<<
> rank << std::endl;
>
> error_code=MPI_Send(&limit,1,MPI_LONG,rank,DUMMYTAG,MPI_COMM_WORLD);
> }
> for(rank=1;rank<ntasks;rank++)//Get back results in
> order
>
> {
>
> error_code=MPI_Recv(&rank,
> 1,MPI_LONG,rank,MPI_ANY_TAG,MPI_COMM_WORLD,&status)
> ;
> std::cout << "\tProcess of rank:"<< rank << "
> Completed execution" << std::endl;
> }
> stopTime = clock ();
> std::cout << "MANAGER::main: Execution Time: " <<
> ((double)(stopTime-startTime))/
> ((double)CLOCKS_PER_SEC) << "
> sec" << std::endl;
> }
> else
> {
>
> error_code=MPI_Recv(&limit,1,MPI_LONG,
> 0,MPI_ANY_TAG,MPI_COMM_WORLD,&status);
> for(i=0;i<limit;i++)
> z=sin(3.14159);
>
> error_code=MPI_Send(&myrank,1,MPI_LONG,0,DUMMYTAG,MPI_COMM_WORLD);
> }
> error_code=MPI_Finalize(); /* cleanup MPI */
>
> return 0;
> }
>
> _______________________________________________
> plpa-users mailing list
> plpa-users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/plpa-users
--
Jeff Squyres
Cisco Systems
_______________________________________________
plpa-users mailing list
plpa-users_at_[hidden]
http://www.open-mpi.org/mailman/listinfo.cgi/plpa-users
|