Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] Open MPI and CRIU stdout/stderr
From: Jeff Squyres (jsquyres) (jsquyres_at_[hidden])
Date: 2014-03-19 09:25:13


On Mar 19, 2014, at 9:13 AM, Adrian Reber <adrian_at_[hidden]> wrote:

> What does Open MPI do with the file descriptors for stdout/stderr?

We admittedly do funny things with stdin, stdout, and stderr... The short version is that OMPI intercepts all the stdin, stdout, and stderr from each MPI process and relays it back up to mpirun through our IOF subsystem (IOF = I/O forwarding).

Consider: users launch N processes (potentially on multiple different servers) via

   mpirun --hostfile hosts -np N my_mpi_executable

They also expect to be able to use standard shell redirection via the mpirun command. For example:

   mpirun --hostfile hosts -np N my_mpi_executable |& tee out.txt

To explain what happens, we have to explain a little of how OMPI launches processes. Let's take the ssh case, for simplicity (there are other mechanisms it can use to launch on remote servers, but for the purposes of this discussion, they're basically variants of what happens with ssh).

1. mpirun parses the hosts hostfile and extracts the list of servers on which to launch.
2. mpirun fork/execs an ssh command to each remote node, and launches the Open MPI helper daemon "orted"
3. The orted launches on the remote server, does some housekeeping, and eventually receives the launch command from mpirun
4. The launch command contains the executable and argv to fork/exec, and how many of them.
5. For example: mpirun --hostfile hosts -np 4 my_mpi_executable. If the "hosts" file contains serverA and serverB, then mpirun would launch 2 ssh's -- one each to serverA and serverB. After some startup negotiation, mpirun would send a launch command telling the orted on each of serverA and serverB to launch 2 copies of my_mpi_executable.
6. For each child that the orted will create, it:
   - creates (up to) 3 pipes, for: stdin, stdout, stderr
   - forks
   - closes stdin, stdout, stderr
   - dups the pipes into 0, 1, 2
   - (by default, we actually close stdin on all processes except the first one)
   - execs my_mpi_application
7. In this way, the orted can intercept the stdout/stderr from the process and send it back to mpirun, which can then write it on its own stdout/stderr. And therefore shell redirection from mpirun works as expected.
8. Similarly, the stdin from mpirun can be sent to any process where we kept stdin open (as mentioned above, by default, this is only the first process).

In short: the orted acts as a proxy for the stdout and stderr (and potentially stdin) for all launched processes.

> Would it make sense to close stdout/stderr of each checkpointed process
> before checkpointing it?

Maybe...?

But my gut reaction is that you don't want to because of the "continue" case. I.e., having the orted go through all the IOF setup again could be a bit tricky... We didn't need to do this for other checkpointing systems.

-- 
Jeff Squyres
jsquyres_at_[hidden]
For corporate legal information go to: http://www.cisco.com/web/about/doing_business/legal/cri/