Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Ralph H Castain (rhc_at_[hidden])
Date: 2006-11-07 09:37:50


Hi Herve

Sorry you are experiencing these problems. Part of the problem is that I
have no access to a BJS machine. I suspect the issue you are encountering is
that our interface to BJS may not be correct - the person that wrote it, I
believe, may have used the wrong environmental variables. At least, that is
what some of the Bproc folks have said.

Let me look into this a little more - no point in you continuing to thrash
on this. I'll challenge the Bproc folks to give me access to a BJS machine.

Again, I'm sorry for the trouble.
Ralph

On 11/7/06 7:27 AM, "hpetit_at_[hidden]" <hpetit_at_[hidden]> wrote:

> Hi Ralf, sorry for the delay in the answer but I encountered some difficulties
> to access to internet since yesterday.
>
> I have tried all your suggestions but I continue to experience problems.
> Actually, I have a problem with bjs on the one hand that I may submit to a
> bproc forum and I still spawn problem on the other hand.
>
> Let's focus first on the spawn problem.
>
> Even with a "bjssub -i bash" or "bjssub -n 1 -i bash" command, I continue to
> have the log:
> mpirun -np 1 main_exe machine10
> main_exe: Begining of main_exe
> main_exe: Call MPI_Init
> main_exe: MPI_Info_set soft result=0
> main_exe: MPI_Info_set node result=0
> main_exe: Call MPI_Comm_spawn_multiple()
> --------------------------------------------------------------------------
> Some of the requested hosts are not included in the current allocation for the
> application:
> ./spawned_exe
> The requested hosts were:
> machine10
>
> Verify that you have mapped the allocated resources properly using the
> --host specification.
> --------------------------------------------------------------------------
> [setics10:07250] [0,0,0] ORTE_ERROR_LOG: Out of resource in file
> base/rmaps_base_node.c at line 210
> [setics10:07250] [0,0,0] ORTE_ERROR_LOG: Out of resource in file rmaps_rr.c at
> line 331
>
> This problem is observed whatever the slave node is on the same machine than
> the master or not.
>
> On the bjs side of the problem. I have run bjssub under gdb and I could
> observed that I did not go into the code part that setenv NODES variable, so I
> stayed with the default value NODES=0.
>
> The question is,
> is the spawn problem a result of the bjs problem ? or are they two independant
> problems ?
>
> The good thing would be to find some other people with a debian platform,
> bproc, bjs and openmpi active. So that, we could check if I have made
> something wrong during the installation phase or if there is really a
> incompatibility problem in open mpi.
>
> Thank you so much for all you support, I wish it is not succesful yet.
>
> Regards.
>
> Herve
>
> Date: Fri, 03 Nov 2006 14:10:20 -0700
> From: Ralph H Castain <rhc_at_[hidden]>
> Subject: Re: [OMPI users] MPI_Comm_spawn multiple bproc support
> To: "Open MPI Users <users_at_[hidden]>" <users_at_[hidden]>
> Message-ID: <C170FE4C.59B3%rhc_at_[hidden]>
> Content-Type: text/plain; charset="ISO-8859-1"
>
> Okay, I picked up some further info that may help you.
>
>>> The "bjsub -i /bin/env" only sets up the NODES for the session of
>>> /bin/env. Probably what he wants is "bjssub -i /bin/bash" and start
>>> bpsh/mpirun from the new shell.
>
> I would recommend doing as they suggest. Also, they noted that you failed to
> specify the number of nodes you wanted on the bjssub command line. As a
> result, the system gave you only one node (hence the NODES=0 instead of
> NODES=0, 1).
>
> If you do a "man bjssub", or a "bjssub --help", you should (hopefully) find
> out how to specify the desired number of nodes.
>
> Hope that helps.
> Ralph
>
>
> On 11/2/06 6:46 AM, "Ralph Castain" <rhc_at_[hidden]> wrote:
>
>> I truly appreciate your patience. Let me talk to some of our Bproc folks and
>> see if they can tell me what is going on. I agree - I would have expected
>> the NODES to be 0,1. The fact that you are getting just 0 explains the
>> behavior you are seeing with Open MPI.
>>
>> I also know (though I don't the command syntax) that you can get a long-term
>> allocation from bjs (i.e., one that continues until you logout). Let me dig
>> a little and see how that is done.
>>
>> Again, I appreciate your patience.
>> Ralph
>>
>>
>> On 11/2/06 6:32 AM, "hpetit_at_[hidden]" <hpetit_at_[hidden]> wrote:
>>
>>> I again Ralf,
>>>
>>>> I gather you have access to bjs? Could you use bjs to get a node
>>>> allocation,
>>>> and then send me a printout of the environment?
>>>
>>> I have slightly changed my cluster configuration for something like:
>>> master is running on a machine call: machine10
>>> node 0 is running on a machine call: machine10 (same as master then)
>>> node 1 is running on a machine call: machine14
>>>
>>> node 0 and 1 are up
>>>
>>> My bjs configration allocates node 0 and 1 to the default pool
>>> <--------------->
>>> pool default
>>> policy simple
>>> nodes 0-1
>>> <----------------->
>>>
>>> Be default, when I run "env" in a terminal, NODES variable is not present.
>>> If I run env under a job submission command like "bjsub -i env", then I can
>>> see the following new environments variable.
>>> NODES=0
>>> JOBID=27 (for instance)
>>> BPROC_RANK=0000000
>>> BPROC_PROGNAME=/usr/bin/env
>>>
>>> When the command is over, NODES is unset again.
>>>
>>> What is strange is that I would have expected that NODES=0,1. I do not know
>>> if
>>> you bjs users have the same behaviour.
>>>
>>> Hopefully, it is the kind of information you were expecting.
>>>
>>> Regards.
>>>
>>> Herve
>>>
>>>
>>>
>>>
>>> --------------------- ALICE SECURITE ENFANTS ---------------------
>>> Prot?gez vos enfants des dangers d'Internet en installant S?curit? Enfants,
>>> le
>>> contr?le parental d'Alice.
>>> http://www.aliceadsl.fr/securitepc/default_copa.asp
>>>
>>>
>>>
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>
>>
>>
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
>
>
>
>
> ------------------------------
>
> Message: 2
> Date: Sat, 4 Nov 2006 14:04:54 +0100 (CET)
> From: <pgarcia_at_[hidden]>
> Subject: [OMPI users] Technical inquiry
> To: users_at_[hidden]
> Message-ID: <4444924729pgarcia_at_[hidden]>
> Content-Type: text/plain; charset="iso-8859-1"
>
>
> Hi, everydoby. Good afternoon.
>
> I've just configured and installed the openmpi-1.1.2 on a kubuntu
> GNU/linux, and I'm trying now to compile the hello.c example without
> results.
>
>> root_at_kubuntu:/home/livestrong/mpi/test# uname -a
>> Linux kubuntu 2.6.15-23-386 #1 PREEMPT Tue May 23 13:49:40 UTC 2006
>> i686 GNU/Linux
>
> Hello.c
> -------
> #include "/usr/lib/mpich-mpd/include/mpi.h"
> #include <stdio.h>
> int main (int argc, char** argv)
> {
> MPI_Init(&argc, &argv);
> printf("Hello word.\n");
> MPI_Finalize();
> return(0);
> }
>
> The error that I'm finding is this:
>
> root_at_kubuntu:/home/livestrong/mpi/prueba# mpirun -np 2 hello
> 0 - MPI_INIT : MPIRUN chose the wrong device ch_p4; program needs
> device ch_p4mpd
> /usr/lib/mpich/bin/mpirun.ch_p4: line 243: 16625 Segmentation
> fault "/home/livestrong/mpi/prueba/hello" -p4pg
> "/home/livestrong/mpi/prueba/PI16545" -p4wd "/home/livestrong/mpi/prueba"
>
> Does anybody know what it can be the problem?
>
> Regards and thank you very much in advance.
>
> Pablo.
>
> PD: I send the ompi_info output and the config.log to you.
>
> Besides
> -------------- next part --------------
> A non-text attachment was scrubbed...
> Name: question.tar.gz
> Type: application/octet-stream
> Size: 59009 bytes
> Desc:
> Url :
> http://www.open-mpi.org/MailArchives/users/attachments/20061104/dd281cc5/attac
> hment.obj
>
> ------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> End of users Digest, Vol 425, Issue 1
> *************************************
>
>
> --------------------- ALICE SECURITE ENFANTS ---------------------
> Protégez vos enfants des dangers d'Internet en installant Sécurité Enfants, le
> contrôle parental d'Alice.
> http://www.aliceadsl.fr/securitepc/default_copa.asp
>
>
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users