Dear All:
I run a parallel job on 6 nodes of an OpenMPI cluster.
But I got error:
rank 0 in job 82 system.cluster_37948 caused collective abort of all ranks
exit status of rank 0: killed by signal 9
It seems that there is segmentation fault on node 0.
But, if the program is run for a short time, no problem.
Any help is appreciated.
thanks,
Jack
July 22 2010
The New Busy is not the old busy. Search, chat and e-mail from your inbox. Get started.