Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Vprotocol pessimist - Open MPI 1.4.1 and 1.4.2a1r22558
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2010-02-24 04:54:48


Hi,

The instructions you found are now obsolete. I'll update them, thank you for pointing out.

The new procedure to use uncoordinated checkpoint is now
mpirun -mca vprotocol pessimist -mca pml ob1,v [regular arguments].

The version available in trunk does not support actual restart due to lack of runtime support, and is limited to performance evaluation of FT cost without failures. There is an ongoing proposal to include such support in the main branch. However, we do have a branched version of Open MPI including all the necessary support that I can be provided on request. Please also consider that this is an ongoing research effort that has not yet matured enough to be used in a production environment.

Aurelien Bouteiller

--
Dr. Aurelien Bouteiller
Innovative Computing Laboratory at the University of Tennessee
Le 6 févr. 2010 à 10:21, Caciano Machado a écrit :
> Hi,
> 
> I'm following the instructions found at
> https://svn.open-mpi.org/trac/ompi/wiki/EventLog_CR to run an
> application with the vprotocol pessimist enabled. I believe that I'm
> doing something wrong but I can't figure out the problem.
> 
> I have compiled Open MPI 1.4.1 and 1.4.2a1r22558 with the parameters:
> ./configure --prefix=/usr/local/openmpi-v/ --with-ft=cr
> --with-blcr=/usr/local/blcr/
> 
> Here is my configuration file:
> vprotocol_pessimist_priority=10
> pml_base_verbose=10
> pbl_v_verbose=500
> 
> The command line:
> mpirun -am /etc/v -np 2 -machinefile /etc/machinefile ep.B.8
> 
> And the mpirun output:
> ##############################################################################3
> [xiru-10:03440] mca: base: components_open: Looking for pml components
> [xiru-10:03440] mca: base: components_open: opening pml components
> [xiru-10:03440] mca: base: components_open: found loaded component cm
> [xiru-10:03440] mca: base: components_open: component cm has no
> register function
> [xiru-10:03440] mca: base: component_find: unable to open
> /usr/local/openmpi-v/lib/openmpi/mca_mtl_mx: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> 
> [xiru-10:03440] mca: base: components_open: component cm open function
> successful
> [xiru-10:03440] mca: base: components_open: found loaded component crcpw
> [xiru-10:03440] mca: base: components_open: component crcpw has no
> register function
> [xiru-10:03440] mca: base: components_open: component crcpw open
> function successful
> [xiru-10:03440] mca: base: components_open: found loaded component csum
> [xiru-10:03440] mca: base: components_open: component csum has no
> register function
> [xiru-10:03440] mca: base: component_find: unable to open
> /usr/local/openmpi-v/lib/openmpi/mca_btl_mx: perhaps a missing symbol,
> or compiled for a different version of Open MPI? (ignored)
> [xiru-10:03440] mca: base: components_open: component csum open
> function successful
> [xiru-10:03440] mca: base: components_open: found loaded component ob1
> [xiru-10:03440] mca: base: components_open: component ob1 has no
> register function
> [xiru-10:03440] mca: base: components_open: component ob1 open
> function successful
> [xiru-10:03440] mca: base: components_open: found loaded component v
> [xiru-10:03440] mca: base: components_open: component v has no register function
> [xiru-10:03440] mca: base: components_open: component v open function successful
> --------------------------------------------------------------------------
> [[65326,1],0]: A high-performance Open MPI point-to-point messaging module
> was unable to find any relevant network interfaces:
> 
> Module: OpenFabrics (openib)
>  Host: xiru-10.portoalegre.grenoble.grid5000.fr
> 
> Another transport will be used instead, although this may result in
> lower performance.
> --------------------------------------------------------------------------
> [xiru-10:03440] select: initializing pml component cm
> [xiru-10:03440] select: init returned failure for component cm
> [xiru-10:03440] select: component crcpw not in the include list
> [xiru-10:03440] select: component csum not in the include list
> [xiru-10:03440] select: initializing pml component ob1
> [xiru-10:03440] select: init returned priority 20
> [xiru-10:03440] select: component v not in the include list
> [xiru-10:03440] selected ob1 best priority 20
> [xiru-10:03440] select: component ob1 selected
> [xiru-10:03440] mca: base: close: component cm closed
> [xiru-10:03440] mca: base: close: unloading component cm
> [xiru-10:03440] mca: base: close: component crcpw closed
> [xiru-10:03440] mca: base: close: unloading component crcpw
> [xiru-10:03440] mca: base: close: component csum closed
> [xiru-10:03440] mca: base: close: unloading component csum
> [xiru-10:03440] mca: base: close: component v closed
> [xiru-10:03440] mca: base: close: unloading component v
> ...
> 
> #########################################################3
> 
> It seems that the vprotocol module is not loading properly. Does
> anyone have a solution to run Open MPI with this module?
> 
> Regards,
> Caciano Machado
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel