Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] pml_v question
From: Aurélien Bouteiller (bouteill_at_[hidden])
Date: 2008-03-05 14:27:26


Hi,

to enable the vprotocol pessimist, you have to specify -mca vprotocol
pessimist. This parameter takes precedence on the priority. Let me
know if you hit success :]

Aurelien

Le 5 mars 08 à 13:55, Leonardo Fialho a écrit :

> Hi All,
>
> I´m trying to use the pml_v (pessimist) with FT components, but during
> the loading the pml_v closes and close the vprotocol_pessimist too...
> according the following:
>
> (log of only one process...)
>
> $ mpirun -np 2 -hostfile ../hostfile -am ../ft-enable-cr -v -d ./
> ping 10 1
>
> opal_cr: init: Verbose Level: 128
> opal_cr: init: FT Enabled: 1
> opal_cr: init: OPAL CR Allow OPAL Only: 0
> opal_cr: init: Is a tool program: 0
> opal_cr: init: Checkpoint Signal: 10
> opal_cr: init: Temp Directory: /tmp
> proc_info: hnp_uri
> 1251737600.0;tcp://172.20.5.128:46169;tcp://
> 158.109.65.178:46169;tcp://10.8.0.1:46169
> daemon uri 1251737600.1;tcp://172.20.5.1:39991
> App) Named Pipes (/tmp/opal_cr_prog_read.17352)
> (/tmp/opal_cr_prog_write.17352)
> orte_cr: init: orte_cr_init()
> mca: base: components_open: Looking for pml components
> mca: base: components_open: opening pml components
> mca: base: components_open: found loaded component cm
> mca: base: components_open: component cm open function successful
> mca: base: components_open: found loaded component crcpw
> pml:crcpw: open()
> pml:crcpw: open: priority = -128
> pml:crcpw: open: verbosity = 128
> mca: base: components_open: component crcpw open function successful
> mca: base: components_open: found loaded component dr
> mca: base: components_open: component dr open function successful
> mca: base: components_open: found loaded component ob1
> mca: base: components_open: component ob1 open function successful
> mca: base: components_open: found loaded component v
> pml_v: loaded
> pml_v: vprotocol_pessimist: component_open: read priority 120
> mca: base: components_open: component v open function successful
> select: initializing pml component cm
> select: init returned failure for component cm
> select: initializing pml component crcpw
> pml:crcpw: component_init: Priority -128
> select: init returned priority -128
> pml:select: Wrapper Component: Component crcpw was determined to be a
> Wrapper PML with priority -128
> select: component dr not in the include list
> select: initializing pml component ob1
> select: init returned priority 20
> select: component v not in the include list
> selected ob1 best priority 20
> select: component ob1 selected
> mca: base: close: component cm closed
> mca: base: close: unloading component cm
> mca: base: close: component dr closed
> mca: base: close: unloading component dr
> pml_v: parasite_close: Ok, I accept to die and let ob1 component
> finish
> pml_v: vprotocol_pessimist: component_close
> pml_v: mca: base: close: component pessimist closed
> pml_v: mca: base: close: unloading component pessimist
> mca: base: close: component v closed
> mca: base: close: unloading component v
> pml:select: Wrapping: Component ob1 [20] is being wrapped by component
> crcpw [-128]
> pml:crcpw: component_init: Wrap the selected component ob1
> pml:crcpw: component_init: Initalize Wrapper
> ompi_cr: init: ompi_cr_init()
> ompi_cr: finalize: ompi_cr_finalize()
> pml:crcpw: component_finalize: Finalize
> mca: base: close: component ob1 closed
> mca: base: close: unloading component ob1
> orte_cr: finalize: orte_cr_finalize()
>
> The MCA parameters are (except the verbose parameters):
>
> vprotocol_pessimist_priority=120 (very, very big...?)
> snapc_base_global_snapshot_dir=/tmp/checkpoints
> snapc_base_store_in_place=0
> opal_cr_allow_opal_only=0
> mca_base_component_distill_checkpoint_ready=0
> ft_cr_enabled=1
> crs=
> rml_wrapper=ftrm
> snapc=single (similar to full but do a checkpoint of only one process)
> filem=rsh
> pml_wrapper=crcpw
> crcp=uncoord (similar to coord but need to do checkpoint of only one
> process)
> btl=tcp,self
>
> Thanks,
> Leonardo Fialho
>
> --
> Leonardo Fialho
> Computer Architecture and Operating Systems Department - CAOS
> Universidad Autonoma de Barcelona - UAB
> ETSE, Edifcio Q, QC/3088
> http://www.caos.uab.es
> Phone: +34-93-581-2888
> Fax: +34-93-581-2478
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

--
* Dr. Aurélien Bouteiller
* Sr. Research Associate at Innovative Computing Laboratory
* University of Tennessee
* 1122 Volunteer Boulevard, suite 350
* Knoxville, TN 37996
* 865 974 6321