Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OMPI looking for PBS file?
From: John R Cary (cary_at_[hidden])
Date: 2010-03-15 20:02:15


On Mar 14, 2010, at 3:20 PM, Josh Bernstein wrote:

> Hi John,
>
> Mpiexec isn't needed with OMPI, in fact if you are using the one from OSC, it only works with MPICH.

Hi Josh,

I guess I don't understand. I think we do link against torque, but what I
am trying to do is multiple mpi runs. So I qsub a script that might have
in it

script1.sh

script2.sh

...

Inside of script1.sh is some various logic culminating in

  mpiexec <app> -i appinputfile1

script2.sh similarly invokes

  mpiexec <app> -i appinputfile2

but then those fail as shown below.

So I am not sure what is going on.

Thx....John

>
>
> Instead just build OMPI with --with-tm, and it will link against TORQUE and start up and track jobs properly.
>
> -Joshua Bernstein
> Penguin Computing
>
> On Mar 14, 2010, at 21:35, "John R. Cary" <cary_at_[hidden]> wrote:
>
>> I have a script that launches a bunch of runs on some compute nodes of
>> a cluster. Once I get through the queue, I query PBS for my machine
>> file, then I copy that to a local file 'nodes' which I use for mpiexec:
>>
>> mpiexec -machinefile /home/research/cary/projects/vpall/vptests/nodes -np 6 /hom
>> e/research/cary/projects/vpall/builds/vorpal/par/vorpal/vorpal -i bathtubAntenna
>> .in -dim 2 -o bathtubAntenna2p -n 100 -d 100
>>
>> but this fails with
>>
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../../orte/mca/ras/tm/ras_tm_module.c at line 153
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../../orte/mca/ras/tm/ras_tm_module.c at line 87
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../orte/mca/ras/base/ras_base_allocate.c at line 133
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../orte/mca/plm/base/plm_base_launch_support.c at line 72
>> [node47:07004] [[25769,0],0] ORTE_ERROR_LOG: File open failure in file ../../../
>> ../../orte/mca/plm/tm/plm_tm_module.c at line 167
>> --------------------------------------------------------------------------
>> A daemon (pid unknown) died unexpectedly on signal 1 while attempting to
>> launch so we are aborting.
>>
>> The appropriate code snippet is
>>
>> /* setup the full path to the PBS file */
>> filename = opal_os_path(false, mca_ras_tm_component.nodefile_dir,
>> pbs_jobid, NULL);
>> fp = fopen(filename, "r");
>> if (NULL == fp) {
>> ORTE_ERROR_LOG(ORTE_ERR_FILE_OPEN_FAILURE);
>> free(filename);
>> return ORTE_ERR_FILE_OPEN_FAILURE;
>> }
>>
>> which kind of looks like it might be trying to open my pbs file instead
>> of the file I gave on the command line? I really don't know, but does
>> anyone have any ideas here?
>>
>> Thx....John Cary
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
John R Cary
cary_at_[hidden]