Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Pessimist Event Logger
From: Hugo Daniel Meyer (meyer.hugo_at_[hidden])
Date: 2012-02-01 10:04:14


Adding some more context.

When trying to use the event logger (by using MPI_ANY_SOURCE) i get this
error:

[clus9:28158] defining message event: ../../orte/runtime/orte_data_server.c
414
[clus9:28158] [[56904,0],0] data server: lookup on service
ompi_ft_event_logger[0]
[clus9:28158] [[56904,0],0] data server: service ompi_ft_event_logger[0]
not found
[clus5:7310] *** An error occurred in
../../../../../ompi/mca/vprotocol/pessimist/vprotocol_pessimist_eventlog.h:
failed to connect to an Event Logger
[clus5:7310] *** on communicator MPI_COMM_NULL
[clus5:7310] *** MPI_ERR_INTERN: internal error
[clus5:7310] *** MPI_ERRORS_ARE_FATAL: your MPI job will now abort

The event_logger is not found, and of course the connection is not made.
The service ompi_ft_event_logger is not defined apparently.

Thanks for the help.

Hugo

2012/1/31 Hugo Daniel Meyer <meyer.hugo_at_[hidden]>

> Hello again.
>
> I've found where the connection with the event logger takes places. I've
> some doubts about the next section of code:
>
> *rc = ompi_dpm.connect_accept(MPI_COMM_SELF, 0, port, true, el_comm);*
>
> * if(OMPI_SUCCESS != rc) {*
>
> * ORTE_ERROR_LOG(rc);*
>
> * }*
>
> * /* Send Rank, receive max buffer size and max_clock back */*
>
> * MPI_Comm_rank(MPI_COMM_WORLD, &rank);*
>
> * rc = mca_pml_v.host_pml.pml_send(&rank, 1, MPI_INTEGER, 0,*
>
> *
> VPROTOCOL_PESSIMIST_EVENTLOG_NEW_CLIENT_CMD,*
>
> * MCA_PML_BASE_SEND_STANDARD,*
>
> * mca_vprotocol_receiver.el_comm);*
>
> * if(OPAL_UNLIKELY(MPI_SUCCESS != rc))*
>
> * OMPI_ERRHANDLER_INVOKE(mca_vprotocol_receiver.el_comm, rc,*
>
> * __FILE__ ": failed sending event logger
> handshake");*
>
> * rc = mca_pml_v.host_pml.pml_recv(&connect_info, 2,
> MPI_UNSIGNED_LONG_LONG,*
>
> * 0,
> VPROTOCOL_PESSIMIST_EVENTLOG_NEW_CLIENT_CMD,*
>
> * mca_vprotocol_receiver.el_comm,
> MPI_STATUS_IGNORE);*
>
> * if(OPAL_UNLIKELY(MPI_SUCCESS != rc))
> \*
>
> * OMPI_ERRHANDLER_INVOKE(mca_vprotocol_receiver.el_comm, rc,
> \*
>
> * __FILE__ ": failed receiving event logger
> handshake");*
>
>
> I understand that you make a connection using the dpm framework between
> the process 0 (the logger) and yourself (MPI_COMM_SELF). But then, you send
> and receive messages with pml. My question is: ¿Where is posted the recv of
> the event_logger? I didn't find where in the code the event_logger receives
> the rank, and answer the handshake.
>
> Thanks for your help.
>
> Hugo Meyer
>
> 2012/1/30 Hugo Daniel Meyer <meyer.hugo_at_[hidden]>
>
> Hello Aurelien.
>
> 2012/1/27 Aurélien Bouteiller <bouteill_at_[hidden]>
>
> Hugo,
>
> It seems you want to implement some sort of remote pessimistic logging -a
> la MPICH-V1- ?
> MPICH-V: Toward a Scalable Fault Tolerant MPI for Volatile Nodes -- George
> Bosilca, Aurélien Bouteiller, Franck Cappello, Samir Djilali, Gilles Fédak,
> Cécile Germain, Thomas Hérault, Pierre Lemarinier, Oleg Lodygensky,
> Frédéric Magniette, Vincent Néri, Anton Selikhov -- In proceedings of The
> IEEE/ACM SC2002 Conference, Baltimore USA, November 2002
>
> We could say that is similar because i use a distributed logging
> mechanism, but is a little diferent because my Memory Channels and
> Checkpoint Servers are the processing nodes, i don't have specials nodes to
> take care of the message log and checkpoints.
>
>
> In the PML-V, unlike older designs, the payload of messages and the
> non-deterministic events follow a different path. The payload of messages
> is logged on the sender's volatile memory, while the non-deterministic
> events are sent to a stable event logger, before allowing the process to
> impact the state of others (the code you have found in the previous email).
> The best depiction of this distinction can be read in this paper
> @inproceedings{DBLP:conf/europar/BouteillerHBD11,
> author = {Aurelien Bouteiller and
> Thomas H{\'e}rault and
> George Bosilca and
> Jack J. Dongarra},
> title = {Correlated Set Coordination in Fault Tolerant Message Logging
> Protocols},
> booktitle = {Euro-Par 2011 Parallel Processing - 17th International
> Conference, Proceedings, Part II},
> month = {September},
> year = {2011},
> pages = {51-64},
> publisher = {Springer},
> series = {Lecture Notes in Computer Science},
> volume = {6853},
> year = {2011},
> isbn = {978-3-642-23396-8},
> doi = {http://dx.doi.org/10.1007/978-3-642-23397-5_6},
>
> I will take a look to this paper to clarify this distinction.
>
>
>
>
> If you intend to store both payload and message log on a remote node, I
> suggest you look at the "sender-based" hooks, as this is where the message
> payload is managed, and adapt from here. The event loggers can already
> manage a subset only of the processes (if you launch as many EL as
> processes, you get a 1-1 mapping), but they never handle message payload;
> you'll have to add all this yourself is it so pleases you.
>
> Yes, i need to store every message, not only the non-deterministics
> one. In my case every node is an event logger. Let's say that i have 16
> processes in four nodes (four by node), so the messages received by all
> processes residing in node0 need to be stored in node1, and the received
> messages received by all processes residing in node1, need to be stored in
> node2, and so on.
>
> If i understand correctly, i have to modify the behavior in
> ompi/mca/vprotocol/pessimist, to manage the message payload. And another
> question, is there a way to launch ELs in every node? or i will have to
> modify this too?
>
> Thanks a lot for your help Aurélien.
>
> Hugo
>
>
>
> Le 27 janv. 2012 à 11:19, Hugo Daniel Meyer a écrit :
>
> > Hello Aurélien.
> >
> > Thanks for the clarification. Considering what you've mentioned i will
> have to make some adaptations, because to me, every single message has to
> be logged. So, a sender not only will be sending messages to the receiver,
> but also to an event logger. Is there any considerations that i've to take
> into account when modifying the code?. My initial idea is to use the
> el_comm with a group of event loggers (because every node uses a different
> event logger in my approach), and then send the messages to them as you do
> when using MPI_ANY_SOURCE.
> >
> > Thanks for your help.
> >
> > Hugo Meyer
> >
> > 2012/1/27 Aurélien Bouteiller <bouteill_at_[hidden]>
> > Hugo,
> >
> > Your program does not have non-deterministic events. Therefore, there
> are no events to log. If you add MPI_ANY_SOURCE, you should see this code
> being called. Please contact me again if you need more help.
> >
> > Aurelien
> >
> >
> > Le 27 janv. 2012 à 10:21, Hugo Daniel Meyer a écrit :
> >
> > > Hello @ll.
> > >
> > > George, i'm using some pieces of the pessimist vprotocol. I've
> observed that when you do a send, you call vprotocol_receiver_event_flush
> and here the macro __VPROTOCOL_RECEIVER_SEND_BUFFER is called. I've noticed
> that here you try send a copy of the message to process 0 using the
> el_comm. This section of code is never executed, at least in my examples.
> So, the message is never sent to the Event Logger, am i correct with this?
> I think that this is happening because the
> mca_vprotocol_pessimist.event_buffer_length is always 0.
> > >
> > > Is there something that i've got to turn on, or i will have to modify
> this behavior manually to connect and send messages to the EL?
> > >
> > > Thanks in advance.
> > >
> > > Hugo Meyer
> > > _______________________________________________
> > > devel mailing list
> > > devel_at_[hidden]
> > > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > --
> > * Dr. Aurélien Bouteiller
> > * Researcher at Innovative Computing Laboratory
> > * University of Tennessee
> > * 1122 Volunteer Boulevard, suite 350
> > * Knoxville, TN 37996
> > * 865 974 6321
> >
> >
> >
> >
> >
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
> >
> > _______________________________________________
> > devel mailing list
> > devel_at_[hidden]
> > http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
> --
> * Dr. Aurélien Bouteiller
> * Researcher at Innovative Computing Laboratory
> * University of Tennessee
> * 1122 Volunteer Boulevard, suite 350
> * Knoxville, TN 37996
> * 865 974 6321
>
>
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
>