Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: [OMPI devel] New MOSIX components draft
From: Alex Margolin (alex.margolin_at_[hidden])
Date: 2012-03-31 12:04:12


Hi,

I think i'm close to finishing an initial version of the MOSIX support
for open-mpi. A perliminary draft is attached.
The support consists of two modules: ODLS module for launching processes
under MOSIX, and BTL module for efficient communication between processes.
I'm not quite there yet - I'm sure the BTL module needs more work...
first because it fails (see error output below) and second because I'm
not sure I got all the function output right. I've written some
documentation inside the code, which is pretty short at the moment. The
ODLS component is working fine.

Is it possible someone will take a look at my code to see if i'm in the
right direction? I would like to submit my code to the repository
eventually... I know of quite a few open-mpi users interested in MOSIX
support (they know I'm working on it), and I was hoping to publish some
benchmark results for it at the upcoming EuroMPI.

P.S. I get the following Error - I'm pretty sure my BTL is to blame here:

alex_at_singularity:~/huji/benchmarks/simple$ mpirun -mca btl_base_verbose
100 -mca btl self,mosix hello
[singularity:10838] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_mpool_sm: libmca_common_sm.so.0: cannot open
shared object file: No such file or directory (ignored)
[singularity:10838] mca: base: components_open: Looking for btl components
[singularity:10838] mca: base: components_open: opening btl components
[singularity:10838] mca: base: components_open: found loaded component mosix
[singularity:10838] mca: base: components_open: component mosix register
function successful
[singularity:10838] mca: base: components_open: component mosix open
function successful
[singularity:10838] mca: base: components_open: found loaded component self
[singularity:10838] mca: base: components_open: component self has no
register function
[singularity:10838] mca: base: components_open: component self open
function successful
[singularity:10838] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_coll_sm: libmca_common_sm.so.0: cannot open
shared object file: No such file or directory (ignored)
[singularity:10838] select: initializing btl component mosix
[singularity:10838] select: init of component mosix returned success
[singularity:10838] select: initializing btl component self
[singularity:10838] select: init of component self returned success
[singularity:10838] *** Process received signal ***
[singularity:10838] Signal: Segmentation fault (11)
[singularity:10838] Signal code: Address not mapped (1)
[singularity:10838] Failing at address: 0x30
[singularity:10838] [ 0] /lib/x86_64-linux-gnu/libc.so.6(+0x36420)
[0x7fa94a3cd420]
[singularity:10838] [ 1] /lib/x86_64-linux-gnu/libc.so.6(+0x84391)
[0x7fa94a41b391]
[singularity:10838] [ 2] /lib/x86_64-linux-gnu/libc.so.6(__strdup+0x16)
[0x7fa94a41b086]
[singularity:10838] [ 3]
/usr/local/lib/libmpi.so.0(opal_argv_append_nosize+0xf7) [0x7fa94add66a4]
[singularity:10838] [ 4] /usr/local/lib/openmpi/mca_bml_r2.so(+0x1cf5)
[0x7fa946177cf5]
[singularity:10838] [ 5] /usr/local/lib/openmpi/mca_bml_r2.so(+0x1e50)
[0x7fa946177e50]
[singularity:10838] [ 6]
/usr/local/lib/openmpi/mca_pml_ob1.so(mca_pml_ob1_add_procs+0x12f)
[0x7fa946382b6d]
[singularity:10838] [ 7] /usr/local/lib/libmpi.so.0(ompi_mpi_init+0x909)
[0x7fa94acd1549]
[singularity:10838] [ 8] /usr/local/lib/libmpi.so.0(MPI_Init+0x16c)
[0x7fa94ad033ec]
[singularity:10838] [ 9]
/home/alex/huji/benchmarks/simple/hello(_ZN3MPI4InitERiRPPc+0x23) [0x409e2d]
[singularity:10838] [10]
/home/alex/huji/benchmarks/simple/hello(main+0x22) [0x408f66]
[singularity:10838] [11]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed) [0x7fa94a3b830d]
[singularity:10838] [12] /home/alex/huji/benchmarks/simple/hello()
[0x408e89]
[singularity:10838] *** End of error message ***
--------------------------------------------------------------------------
mpirun noticed that process rank 0 with PID 10838 on node singularity
exited on signal 11 (Segmentation fault).
--------------------------------------------------------------------------
alex_at_singularity:~/huji/benchmarks/simple$ mpirun -mca btl self,tcp hello
[singularity:10841] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_mpool_sm: libmca_common_sm.so.0: cannot open
shared object file: No such file or directory (ignored)
[singularity:10841] mca: base: component_find: unable to open
/usr/local/lib/openmpi/mca_coll_sm: libmca_common_sm.so.0: cannot open
shared object file: No such file or directory (ignored)
Hello world!
alex_at_singularity:~/huji/benchmarks/simple$