This web mail archive is frozen.
This page is part of a frozen web archive of this mailing list.
You can still navigate around this archive, but know that no new mails
have been added to it since July of 2016.
Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.
Hi all,Can anyone explain me the fault tolerant features in OpenMPI? I've read the FAQs and some papers about this topic listed in open-mpi.org, but still can't figure out when one node of my supercomputer system fails down during computing, what would happen with the fault tolerant mechanism in OpenMPI, and what should we system administrator do after the failure (or before). Can anyone help me? My boss want me to deploy OpenMPI in our system cuz he want the fault tolerant feature.Thanks very much.---------------CHEN SongR&D DepartmentNational Supercomputer Center in TianjinBinhai New Area, Tianjin, China