Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] TotalView Memory debugging and OpenMPI
From: Ralph Castain (rhc.openmpi_at_[hidden])
Date: 2011-05-11 17:34:06


That would be a problem, I fear. We need to push those envars into the environment.

Is there some particular problem causing what you see? We have no other reports of this issue, and orterun has had that code forever.

Sent from my iPad

On May 11, 2011, at 2:05 PM, Peter Thompson <peter.thompson_at_[hidden]> wrote:

> We've gotten a few reports of problems with memory debugging when using OpenMPI under TotalView. Usually, TotalView will attach tot he processes started after an MPI_Init. However in the case where memory debugging is enabled, things seemed to run away or fail. My analysis showed that we had a number of core files left over from the attempt, and all were mpirun (or orterun) cores. It seemed to be a regression on our part, since testing seemed to indicate this worked okay before TotalView 8.9.0-0, so I filed an internal bug and passed it to engineering. After giving our engineer a brief tutorial on how to build a debug version of OpenMPI, he found what appears to be a problem in the code for orterun.c. He's made a slight change that fixes the issue in 1.4.2, 1.4.3, 1.4.4rc2 and 1.5.3, those being the versions he's tested with so far. He doesn't subscribe to this list that I know of, so I offered to pass this by the group. Of course, I'm not sure if this is exactly the right place to submit pa
tches, but I'm sure you'd tell me where to put it if I'm in the wrong here. It's a short patch, so I'll cut and paste it, and attach as well, since cut and paste can do weird things to formatting.
>
> Credit goes to Ariel Burton for this patch. Of course he used TotalVIew to find this ;-) It shows up if you do 'mpirun -tv -np 4 ./foo' or 'totalview mpirun -a -np 4 ./foo'
>
> Cheers,
> PeterT
>
>
> more ~/patches/anbs-patch
> *** orte/tools/orterun/orterun.c 2010-04-13 13:30:34.000000000 -0400
> --- /home/anb/packages/openmpi-1.4.2/linux-x8664-iwashi/installation/bin/../../.
> ./src/openmpi-1.4.2/orte/tools/orterun/orterun.c 2011-05-09 20:28:16.5881
> 83000 -0400
> ***************
> *** 1578,1588 ****
> }
> if (NULL != env) {
> size1 = opal_argv_count(env);
> for (j = 0; j < size1; ++j) {
> ! putenv(env[j]);
> }
> }
> /* All done */
> --- 1578,1600 ----
> }
> if (NULL != env) {
> size1 = opal_argv_count(env);
> for (j = 0; j < size1; ++j) {
> ! /* Use-after-Free error possible here. putenv does not copy
> ! the string passed to it, and instead stores only the pointer.
> ! env[j] may be freed later, in which case the pointer
> ! in environ will now be left dangling into a deallocated
> ! region.
> ! So we make a copy of the variable.
> ! */
> ! char *s = strdup(env[j]);
> !
> ! if (NULL == s) {
> ! return OPAL_ERR_OUT_OF_RESOURCE;
> ! }
> ! putenv(s);
> }
> }
> /* All done */
>
> *** orte/tools/orterun/orterun.c 2010-04-13 13:30:34.000000000 -0400
> --- /home/anb/packages/openmpi-1.4.2/linux-x8664-iwashi/installation/bin/../../../src/openmpi-1.4.2/orte/tools/orterun/orterun.c 2011-05-09 20:28:16.588183000 -0400
> ***************
> *** 1578,1588 ****
> }
>
> if (NULL != env) {
> size1 = opal_argv_count(env);
> for (j = 0; j < size1; ++j) {
> ! putenv(env[j]);
> }
> }
>
> /* All done */
>
> --- 1578,1600 ----
> }
>
> if (NULL != env) {
> size1 = opal_argv_count(env);
> for (j = 0; j < size1; ++j) {
> ! /* Use-after-Free error possible here. putenv does not copy
> ! the string passed to it, and instead stores only the pointer.
> ! env[j] may be freed later, in which case the pointer
> ! in environ will now be left dangling into a deallocated
> ! region.
> ! So we make a copy of the variable.
> ! */
> ! char *s = strdup(env[j]);
> !
> ! if (NULL == s) {
> ! return OPAL_ERR_OUT_OF_RESOURCE;
> ! }
> ! putenv(s);
> }
> }
>
> /* All done */
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users