Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI devel] bug?
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-10-01 02:32:09


I will take a look,
originally it supposed to bind process to CPU#1 and CPU #3.

On Fri, Sep 25, 2009 at 4:57 PM, Eugene Loh <Eugene.Loh_at_[hidden]> wrote:

> Thanks, filed as https://svn.open-mpi.org/trac/ompi/ticket/2030
>
> Ralph Castain wrote:
>
> Circling some off-list comments back to the list...while we could and
>> should error-out easier, this really isn't a supportable operation. What
>> the cmd
>>
>> mpirun -n 2 -slot-list 1,3 foo
>>
>> appears to do is cause us to launch a 2-process job consisting of vpid=1
>> and vpid=3, as opposed to the normal vpid=0 and 1.
>>
>> Not only is ORTE not prepared to handle this scenario, I believe it will
>> cause problems in some areas within OMPI.
>>
>> I can try to make it fail nicer - someone with more knowledge of the
>> intended slot-list behavior would have to make it do what they actually
>> intended, or at least explain what is supposed o happen.
>>
>> Ralph
>>
>> On Sep 24, 2009, at 7:03 PM, Eugene Loh wrote:
>>
>> mpirun -V
>>> mpirun (Open MPI) 1.4a1-1
>>>
>>> Ralph Castain wrote:
>>>
>>> Sigh - you really need to remember to tell us what version you're
>>>> talking about.
>>>>
>>>> On Sep 24, 2009, at 5:39 PM, Eugene Loh wrote:
>>>>
>>>> I assume this is a bug?
>>>>>
>>>>> % mpirun -np 2 -slot-list 1,3 hostname
>>>>> [saem9:10337] [[455,0],0] ORTE_ERROR_LOG: Not found in file base/
>>>>> odls_base_default_fns.c at line 875
>>>>> [saem9:10337] *** Process received signal ***
>>>>> [saem9:10337] Signal: Segmentation fault (11)
>>>>> [saem9:10337] Signal code: Address not mapped (1)
>>>>> [saem9:10337] Failing at address: 0x4c
>>>>> [saem9:10337] [ 0] [0xffffe600]
>>>>> [saem9:10337] [ 1] /home/eugene/CTperf/test-CT821/paff_bug2/src/
>>>>> myopt/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x78a) [0xf7f8c206]
>>>>> [saem9:10337] [ 2] /home/eugene/CTperf/test-CT821/paff_bug2/src/
>>>>> myopt/lib/openmpi/mca_plm_rsh.so [0xf7d13564]
>>>>> [saem9:10337] [ 3] mpirun [0x804b49d]
>>>>> [saem9:10337] [ 4] mpirun [0x804a456]
>>>>> [saem9:10337] [ 5] /lib/libc.so.6(__libc_start_main+0xdc) [0xf7d348ac]
>>>>> [saem9:10337] [ 6] mpirun(orte_daemon_recv+0x201) [0x804a3b1]
>>>>> [saem9:10337] *** End of error message ***
>>>>> Segmentation fault
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>