I see - then the problem is that at least one node is unable to communicate via TCP back to where mpirun is executing. Might be a firewall, or could be that we are selecting the wrong network if multiple NICs are around. I assume that you use additional nodes when running against the larger dataset?
On Jan 22, 2013, at 9:34 AM, Jure PeÄar <pegasus_at_[hidden]> wrote:
> On Thu, 17 Jan 2013 11:54:13 -0800
> Ralph Castain <rhc_at_[hidden]> wrote:
>> Or is this happening on startup of the larger job, or during a call to MPI_Comm_spawn?
> This happens on a startup. Mpirun spawns processes and when they start talking to eachother during setup phase, I get this kind of error. Running time in such case is less than a minute.
> Jure PeÄar
> users mailing list