I gather you have access to bjs? Could you use bjs to get a node allocation,
and then send me a printout of the environment? All I need to see is what
your environment looks like - how does the system tell you what nodes you
have been allocated?
Then we can make something that will solve your problem.
On 11/2/06 1:10 AM, "hpetit_at_[hidden]" <hpetit_at_[hidden]> wrote:
> Thank you for your support Ralf, I really appreciate.
> I have now a better understanding of your very first answer asking if I had a
> NODES environment variable.
> It was related to the fact that your platform is configured with LSF.
> I have read some tutorials about LSF and it seems that LSF provides a "llogin"
> command that creates an environment where the NODES variables is permanently
> Then, under this "llogin" environment, all jobs are automatically allocated to
> the nodes defined with NODES.
> This is why, I think, the spawning works fine in this condition.
> Unfortunately, LSF is commercial and then I am not able to install it on my
> I whish I can not do anything more on my side now.
> You proposed to concoct something over the next few days. I look forward to
> hearing from you.
> Date: Tue, 31 Oct 2006 06:53:53 -0700
> From: Ralph H Castain <rhc_at_[hidden]>
> Subject: Re: [OMPI users] MPI_Comm_spawn multiple bproc support
> To: "Open MPI Users <users_at_[hidden]>" <users_at_[hidden]>
> Message-ID: <C16CA381.5759%rhc_at_[hidden]>
> Content-Type: text/plain; charset="ISO-8859-1"
> Aha! Thanks for your detailed information - that helps identify the problem.
> See some thoughts below.
> On 10/31/06 3:49 AM, "hpetit_at_[hidden]" <hpetit_at_[hidden]> wrote:
>> Thank you for you quick reply Ralf,
>> As far as I know, the NODES environment variable is created when a job is
>> submitted to the bjs scheduler.
>> The only way I know (but I am a bproc newbe) is to use the bjssub command.
> That is correct. However, Open MPI requires that ALL of the nodes you are
> going to use must be allocated in advance. In other words, you have to get
> an allocation large enough to run your entire job - both the initial
> application and anything you comm_spawn.
> I wish I could help you with the proper bjs commands to get an allocation,
> but I am not familiar with bjs and (even after multiple Google searches)
> cannot find any documentation on that code. Try doing a "bjs --help" and see
> what it says.
>> Then, I have retried my test with the following running command: "bjssub -i
>> mpirun -np 1 main_exe".
>> I guess, this problem comes from the way I set the parameters to the spawned
>> program. Instead of giving instructions to spawn the program on a specific
>> host, I should set parameters to spawn the program on a specific node.
>> But I do not know how to do it.
> What you did was fine. "host" is the correct field to set. I suspect two
> possible issues:
> 1. The specified host may not be in the allocation. In the case you showed
> here, I would expect it to be since you specified the same host we are
> already on. However, you might try running mpirun with the "--nolocal"
> option - this will force mpirun to launch the processes on a machine other
> than the one you are on (typically you are on the head node. In many bproc
> machines, this node is not included in an allocation as the system admins
> don't want you running MPI jobs on it).
> 2. We may have something wrong in our code for this case. I'm not sure how
> well that has been tested, especially in the 1.1 code branch.
>> Then, I have a bunch of questions:
>> - when mpi is used together with bproc, is it necessary to use bjssub or bjs
>> in general ?
> You have to use some kind of resource manager to obtain a node allocation
> for your use. At our site, we use LSF - other people use bjs. Anything that
> sets the NODE variable is fine.
>> - I was wondering if I had to submit to bjs the spawned program ? i.e do I
>> have to add 'bjssub' to the commands parameter of the MPI_Comm_spawn_mutliple
>> call ?
> You shouldn't have to do so. I suspect, however, that bjssub is not getting
> a large enough allocation for your combined mpirun + spawned job. I'm not
> familiar enough with bjs to know for certain.
>> As you can see, I am still not able to spawn a program and need some more
>> Do you have a some examples describing how to do it ?
> Unfortunately, not in the 1.1 branch, nor do I have one for
> comm_spawn_multiple that uses the "host" field. I can try to concoct
> something over the next few days, though, and verify that our code is
> working correctly.
> --------------------- ALICE SECURITE ENFANTS ---------------------
> Protégez vos enfants des dangers d'Internet en installant Sécurité Enfants, le
> contrôle parental d'Alice.
> users mailing list