Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

Subject: Re: [OMPI users] system() call corrupts MPI processes
From: Randolph Pullen (randolph_pullen_at_[hidden])
Date: 2012-01-19 19:51:14


I assume that the SIGCHLD was released after starting the daemon ie on return of the system() call ________________________________ From: Durga Choudhury <dpchoudh_at_[hidden]> To: Open MPI Users <users_at_[hidden]> Sent: Friday, 20 January 2012 2:22 AM Subject: Re: [OMPI users] system() call corrupts MPI processes This is just a thought: according to the system() man page, 'SIGCHLD' is blocked during the execution of the program. Since you are executing your command as a daemon in the background, it will be permanently blocked. Does OpenMPI daemon depend on SIGCHLD in any way? That is about the only difference that I can think of between running the command stand-alone (which works) and running via a system() API call (that does not work). Best Durga On Thu, Jan 19, 2012 at 9:52 AM, Jeff Squyres <jsquyres_at_[hidden]> wrote: > Which network transport are you using, and what version of Open MPI are you using?  Do you have OpenFabrics support compiled into your Open MPI installation? > > If you're just using TCP and/or shared memory, I can't think of a reason immediately as to why this wouldn't work, but there may be a subtle interaction in there somewhere that causes badness (e.g., memory corruption). > > > On Jan 19, 2012, at 1:57 AM, Randolph Pullen wrote: > >> >> I have a section in my code running in rank 0 that must start a perl program that it then connects to via a tcp socket. >> The initialisation section is shown here: >> >>     sprintf(buf, "%s/session_server.pl -p %d &", PATH,port); >>     int i = system(buf); >>     printf("system returned %d\n", i); >> >> >> Some time after I run this code, while waiting for the data from the perl program, the error below occurs: >> >> qplan connection >> DCsession_fetch: waiting for Mcode data... >> [dc1:05387] [[40050,1],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file rml_oob_send.c at line 105 >> [dc1:05387] [[40050,1],0] could not get route to [[INVALID],INVALID] >> [dc1:05387] [[40050,1],0] ORTE_ERROR_LOG: A message is attempting to be sent to a process whose contact information is unknown in file base/plm_base_proxy.c at line 86 >> >> >> It seems that the linux system() call is breaking OpenMPI internal connections.  Removing the system() call and executing the perl code externaly fixes the problem but I can't go into production like that as its a security issue. >> >> Any ideas ? >> >> (environment: OpenMPI 1.4.1 on kernel Linux dc1 2.6.18-274.3.1.el5.028stab094.3  using TCP and mpirun) >> _______________________________________________ >> users mailing list >> users_at_[hidden] >> http://www.open-mpi.org/mailman/listinfo.cgi/users > > > -- > Jeff Squyres > jsquyres_at_[hidden] > For corporate legal information go to: > http://www.cisco.com/web/about/doing_business/legal/cri/ > > > _______________________________________________ > users mailing list > users_at_[hidden] > http://www.open-mpi.org/mailman/listinfo.cgi/users _______________________________________________ users mailing list users_at_[hidden] http://www.open-mpi.org/mailman/listinfo.cgi/users