Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |  

This web mail archive is frozen.

This page is part of a frozen web archive of this mailing list.

You can still navigate around this archive, but know that no new mails have been added to it since July of 2016.

Click here to be taken to the new web archives of this list; it includes all the mails that are in this frozen archive plus all new mails that have been sent to the list since it was migrated to the new archives.

From: Laurent Nguyen (laurent.nguyen_at_[hidden])
Date: 2007-05-10 12:35:36


Hi Tim,

Ok, I thank you for all theses precisions. I also add "static int
pls_poe_cancel_operation(void)" similary to you, and I can continue the
compilation. But, I had another problem. In ompi/mpi/cxx/mpicxx.cc,
three variables are already defined. The preprocessor set them to the
constant of C. So, I put theses lines in comment:
   //const int SEEK_SET = MPI_SEEK_SET;
   //const int SEEK_CUR = MPI_SEEK_CUR;
   //const int SEEK_END = MPI_SEEK_END;

After that, I can achieve to compile OpenMPI. I didn't try to launch it
in rsh mode. But I tried to launch it with POE.

But firstly I remind here my experience with OpenMPI 1.1.x on IBM. My
machine has some restriction, but I have two ways for launching an
application:
- interactive mode: OpenMPI didn't work in this mode. I have this error:
    $ export MP_PROCS=2
    $ mpiexec -n 2 myprog.exe
  ERROR: 0031-125 Fewer nodes (1) specified in
/tmpdir/inter/int.ssos181-130093928631562/a-UWUb than tasks (2).

  I think it is because of my machine configuration

- batch mode (for queuing): OpenMPI worked, but some functions didn't
work (like MPI_Comm_Spawn). And it seems that performances during
communications are very bad. (But in intra-nodes, it has the same
performance as MPI constructor)

Then, I hope OpenMPI 1.2.xxx work on SP4, but I have the same problem in
interactive mode. And in batch mode, I have the error:
[0,0,0] ORTE_ERROR_LOG: Not implemented in file errmgr_hnp.c at line 90
--------------------------------------------------------------------------
mpiexec was unable to cleanly terminate the daemons for this job.
Returned value Not implemented instead of ORTE_SUCCESS.

--------------------------------------------------------------------------

I think it is like you said before, POE isn't yet implemented.

I was interested for OpenMPI because it support MPI-2. Since OpenMPI
1.1.1, I install all the version on my SP4 for testing. My impressions are:
- it seems to be very difficult for developpers to implement OpenMPI on
SP4 and I hope one day they achieve it ;)
- in my context, my institution puts many restrictions on the use of our
machine, that's why my tests are incomplete. (On the same way, rsh
command is forbidden between our nodes...)

So, I really thank you for your explanations and precisions.

Best Regards,

**************************************
NGUYEN Anh-Khai Laurent
Equipe Support Utilisateur

Email : laurent.nguyen_at_[hidden]
Tél : 01.69.35.85.66
Adresse : IDRIS - Institut du Développement et des Ressources en
               Informatique Scientifique
               CNRS
               Batiment 506
               BP 167
               F - 91403 ORSAY Cedex
Site Web : http://www.idris.fr
**************************************

Tim Prins a écrit :
> Hi Laurent,
>
> Unfortunately, as far as I know, none of the current Open MPI developers has
> access to a system with POE, so the POE process launcher has fallen into
> disrepair. Attached is a patch that should allow you to compile (however, you
> may also need to add #include <signal.h> to pls_poe_module.c).
>
> Though this should allow the compile to succeed, launching with POE may not
> work (it has not been tested for quite a while). If it doesn't work, you
> should use the rsh launcher instead (pass -mca pls rsh on the command line,
> or set the parameter using one of the methods here:
> http://www.open-mpi.org/faq/?category=tuning#setting-mca-params).
>
> Sorry about this. We have an IBM machine at my institution which I am told
> will have POE on it 'soon', but I am not sure when. Once it does, we will be
> working on getting POE well supported again.
>
> I should mention that we do use LoadLeveler on one of our machines and Open
> MPI seems to work with it quite well. I would be interested in hearing how it
> works for you.
>
> Hope this helps, let me know if this works.
>
> Thanks,
>
> Tim
>
> On Thursday 10 May 2007 02:57 am, Laurent Nguyen wrote:
>> Hello,
>>
>> I tried to install OpenMPI 1.2 but I saw there some problems when
>> compiling files with POE. When OpenMPI 1.2.1 was released, I saw in the
>> bug fixes that this problem was fixed. Then I tried, but it still
>> doesn't work. The problem comes from orte/mca/pls/poe/pls_poe_module.c.
>> A static function "static int pls_poe_cancel_operation(void);" is
>> declared but not defined in the files. I don't know if my configuration
>> make it bug.
>>
>> So, if someone achieved to install OpenMPI 1.2.1 on IBM, I would like to
>> have some advices.
>>
>> Thank you for your help,
>>
>> PS: I attached some output files of my installation
>>
>> ------------------------------------------------------------------------
>>
>> Index: orte/mca/pls/poe/pls_poe_module.c
>> ===================================================================
>> --- orte/mca/pls/poe/pls_poe_module.c (revision 14640)
>> +++ orte/mca/pls/poe/pls_poe_module.c (working copy)
>> @@ -37,6 +37,7 @@
>> #include "opal/mca/base/mca_base_param.h"
>> #include "opal/util/argv.h"
>> #include "opal/util/opal_environ.h"
>> +#include "opal/util/output.h"
>>
>> #include "orte/mca/errmgr/errmgr.h"
>> #include "orte/mca/gpr/gpr.h"
>> @@ -69,7 +70,10 @@
>> static int pls_poe_signal_job(orte_jobid_t jobid, int32_t signal, opal_list_t *attrs);
>> static int pls_poe_signal_proc(const orte_process_name_t *name, int32_t signal);
>> static int pls_poe_finalize(void);
>> -static int pls_poe_cancel_operation(void);
>> +static int pls_poe_cancel_operation(void) {
>> + return ORTE_ERR_NOT_IMPLEMENTED;
>> +}
>> +
>>
>> orte_pls_base_module_t orte_pls_poe_module = {
>> pls_poe_launch_job,