Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] bug?
From: Lenny Verkhovsky (lenny.verkhovsky_at_[hidden])
Date: 2009-10-01 02:32:09


I will take a look,
originally it supposed to bind process to CPU#1 and CPU #3.

On Fri, Sep 25, 2009 at 4:57 PM, Eugene Loh <Eugene.Loh_at_[hidden]> wrote:

> Thanks, filed as https://svn.open-mpi.org/trac/ompi/ticket/2030
>
> Ralph Castain wrote:
>
> Circling some off-list comments back to the list...while we could and
>> should error-out easier, this really isn't a supportable operation. What
>> the cmd
>>
>> mpirun -n 2 -slot-list 1,3 foo
>>
>> appears to do is cause us to launch a 2-process job consisting of vpid=1
>> and vpid=3, as opposed to the normal vpid=0 and 1.
>>
>> Not only is ORTE not prepared to handle this scenario, I believe it will
>> cause problems in some areas within OMPI.
>>
>> I can try to make it fail nicer - someone with more knowledge of the
>> intended slot-list behavior would have to make it do what they actually
>> intended, or at least explain what is supposed o happen.
>>
>> Ralph
>>
>> On Sep 24, 2009, at 7:03 PM, Eugene Loh wrote:
>>
>> mpirun -V
>>> mpirun (Open MPI) 1.4a1-1
>>>
>>> Ralph Castain wrote:
>>>
>>> Sigh - you really need to remember to tell us what version you're
>>>> talking about.
>>>>
>>>> On Sep 24, 2009, at 5:39 PM, Eugene Loh wrote:
>>>>
>>>> I assume this is a bug?
>>>>>
>>>>> % mpirun -np 2 -slot-list 1,3 hostname
>>>>> [saem9:10337] [[455,0],0] ORTE_ERROR_LOG: Not found in file base/
>>>>> odls_base_default_fns.c at line 875
>>>>> [saem9:10337] *** Process received signal ***
>>>>> [saem9:10337] Signal: Segmentation fault (11)
>>>>> [saem9:10337] Signal code: Address not mapped (1)
>>>>> [saem9:10337] Failing at address: 0x4c
>>>>> [saem9:10337] [ 0] [0xffffe600]
>>>>> [saem9:10337] [ 1] /home/eugene/CTperf/test-CT821/paff_bug2/src/
>>>>> myopt/lib/libopen-rte.so.0(orte_plm_base_launch_apps+0x78a) [0xf7f8c206]
>>>>> [saem9:10337] [ 2] /home/eugene/CTperf/test-CT821/paff_bug2/src/
>>>>> myopt/lib/openmpi/mca_plm_rsh.so [0xf7d13564]
>>>>> [saem9:10337] [ 3] mpirun [0x804b49d]
>>>>> [saem9:10337] [ 4] mpirun [0x804a456]
>>>>> [saem9:10337] [ 5] /lib/libc.so.6(__libc_start_main+0xdc) [0xf7d348ac]
>>>>> [saem9:10337] [ 6] mpirun(orte_daemon_recv+0x201) [0x804a3b1]
>>>>> [saem9:10337] *** End of error message ***
>>>>> Segmentation fault
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>> devel mailing list
>>> devel_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>>
>>
>>
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>