Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] New odls component fails
From: Alex Margolin (alex.margolin_at_[hidden])
Date: 2012-03-17 18:18:57


On 03/17/2012 08:16 PM, Ralph Castain wrote:
> I don't think you need to .ompi_ignore all those components. First, you need to use the --without-hwloc option (you misspelled it below as --disable-hwloc).
I missed it, thank you.
> Assuming you removed the relevant code from your clone of the default odls module, I suspect the calls are being made in ompi/runtime/ompi_mpi_init.c. If the process detects it isn't bound, it looks to see if it should bind itself. I thought that code was also turned "off" if we configured without-hwloc, so you might have to check it.
I didn't remove any code from the default module. Should I have? (All I
added was inserting "mosrun -w" before the app name in the argv)
Could you please explain what do you mean by "bound" and how can I bind
processes?
Also, I'm now getting a similar error, but a quick check shows
ess_base_nidmap.c doesn't exist in the trunk:

...
[singularity:01899] OPAL dss:unpack: got type 22 when expecting type 16
[singularity:01899] [[46635,1],0] ORTE_ERROR_LOG: Pack data mismatch in
file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 57
[singularity:01899] [[46635,1],0] ORTE_ERROR_LOG: Pack data mismatch in
file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
[singularity:01899] [[46635,1],0] ORTE_ERROR_LOG: Pack data mismatch in
file ../../../orte/runtime/orte_init.c at line 132
--------------------------------------------------------------------------
...
> Shared memory is a separate issue. If you want/need to avoid it, then run with -mca btl ^sm and this will turn off all shared memory calls.
After my last post I tried to rebuild and then even the simplest app
wouldn't start. Turns out I disabled all the shmem (mmap, posix, sysv)
and orte wouldn't start without any (so I had to turn it back on). Could
you tell me if there is a way to run the application without making any
mmap() calls with MAP_SHARED? Currently, mosrun is run with -w asking it
to fail (return -1) on any such system-call.

Thanks for your help,
Alex

>
>
> On Mar 17, 2012, at 11:51 AM, Alex Margolin wrote:
>
>> [singularity:15041] [[35712,0],0] orted_recv_cmd: received message from [[35712,1],0]
>> [singularity:15041] defining message event: orted/orted_comm.c 172
>> [singularity:15041] [[35712,0],0] orted_recv_cmd: reissued recv
>> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor called by [[35712,1],0] for tag 1
>> [singularity:15041] [[35712,0],0] orted:comm:process_commands() Processing Command: ORTE_DAEMON_SYNC_WANT_NIDMAP
>> [singularity:15041] [[35712,0],0] orte:daemon:cmd:processor: processing commands completed
>> [singularity:15042] OPAL dss:unpack: got type 33 when expecting type 12
>> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../orte/util/nidmap.c at line 429
>> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../../../orte/mca/ess/base/ess_base_nidmap.c at line 62
>> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../../../../orte/mca/ess/env/ess_env_module.c at line 173
>> [singularity:15042] [[35712,1],0] ORTE_ERROR_LOG: Pack data mismatch in file ../../../orte/runtime/orte_init.c at line 132
>> --------------------------------------------------------------------------
>> It looks like MPI_INIT failed for some reason; your parallel process is
>> likely to abort. There are many reasons that a parallel process can
>> fail during MPI_INIT; some of which are due to configuration or environment
>> problems. This failure appears to be an internal failure; here's some
>> additional information (which may only be relevant to an Open MPI
>> developer):
>>
>> ompi_mpi_init: orte_init failed
>> --> Returned "Pack data mismatch" (-22) instead of "Success" (0)
>> --------------------------------------------------------------------------
>> *** The MPI_Init() function was called before MPI_INIT was invoked.
>> *** This is disallowed by the MPI standard.
>> *** Your MPI job will now abort.
>>