I'm facing a problem in orte/oob/tcp/, more particularly in file oob_tcp_msg.c. Some network interruptions were making my program hanging and not crashing (a basic helloworld).
Thus, I reproduced the problem with gdb, by simulating an error on read (jumping from line 357 to 367, oob_tcp_msg.c). Then, openmpi close the socket, make the shutdown and then is hanging.
It seems that there is an exception callback function (mca_oob_tcp.oob_exception_callback) "planned" but not implemented yet.
Any idea on how to solve this problem ? Or is this the expected behavior when we lose connection ? Did I missed anything ?
Thanks in advance,