Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Error when using OpenMPI with SGE multiple hosts
From: Reuti (reuti_at_[hidden])
Date: 2010-10-12 12:12:57


Am 12.10.2010 um 15:49 schrieb Dave Love:

> Chris Jewell <chris.jewell_at_[hidden]> writes:
>
>> I've scrapped this system now in favour of the new SGE core binding feature.
>
> How does that work, exactly? I thought the OMPI SGE integration didn't
> support core binding, but good if it does.

With the default binding_instance set to "set" (the default) the shepherd should bind the processes to cores already. With other types of binding_instance these selected cores must be forward to the application via an environment variable or in the hostfile.

As this is only a hint to SGE and not a hard request, the user must plan a little bit the allocation beforehand. Especially if you oversubscribe a machine it won't work. When I look at /proc/*/status it's mentioned there as it happened. And it's also noted in "config" file of each job's .../active_jobs/... file. E.g. a top shows:

 9926 ms04 39 19 3756 292 228 R 25 0.0 0:19.31 ever
 9927 ms04 39 19 3756 292 228 R 25 0.0 0:19.31 ever
 9925 ms04 39 19 3756 288 228 R 25 0.0 0:19.30 ever
 9928 ms04 39 19 3756 292 228 R 25 0.0 0:19.30 ever

for 4 forks of an endless loop in one and the same jobscript when submitted with `qsub -binding linear:1 demo.sh`. Well, the funny thing is that with this kernel version I still get a load of 4, despite the fact that all 4 forks are bound to one core. Should it really be four?

-- Reuti

> ______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users