Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] Tmpdir work for first process only
From: Jeff Squyres (jsquyres_at_[hidden])
Date: 2007-11-15 09:53:28


Are you running all of these processes on the same machine, or
multiple different machines?

If you're running 400 processes on the same machine, it may well be
that you are simply running out of memory or other OS resources. In
particular, I've never seem iof fail that way before (iof is our I/O
forwarding subsystem).

Looking at the iof code, the error you're seeing occurs when iof is
trying to create a pipe between our OMPI "helper daemon" and the newly
spawned user executable and fails. The only reason that I can guess
for why that would happen is if a max limit of pipes have been created
on a machine and the OS refuses to create any more...?

On Nov 14, 2007, at 9:36 PM, Clement Kam Man Chu wrote:

> Hi,
>
> I have configured out why the tmpdir parameter works for the first
> process. I got another problem if I tried to run 400 processes (no
> problem if under 400 processes). I got an error "ORTE_ERROR_LOG: Out
> of
> resource in file base/iof_base_setup.c at line 106". I attached the
> message as below:
>
> [ac27:12442] [0,0,0] setting up session dir with
> [ac27:12442] tmpdir /jobfs/z07/247752.ac-pbs
> [ac27:12442] universe default-universe-12442
> [ac27:12442] user kxc565
> [ac27:12442] host ac27
> [ac27:12442] jobid 0
> [ac27:12442] procid 0
> [ac27:12442] procdir:
> /jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565_at_ac27_0/default-
> universe-12442/0/0
> [ac27:12442] jobdir:
> /jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565_at_ac27_0/default-
> universe-12442/0
> [ac27:12442] unidir:
> /jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565_at_ac27_0/default-
> universe-12442
> [ac27:12442] top: openmpi-sessions-kxc565_at_ac27_0
> [ac27:12442] tmp: ??
> [ac27:12442] [0,0,0] contact_file
> /jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565_at_ac27_0/default-
> universe-12442/universe-setup.txt
> [ac27:12442] [0,0,0] wrote setup file
> [ac27:12447] [0,0,1] setting up session dir with
> [ac27:12447] universe default-universe-12442
> [ac27:12447] user kxc565
> [ac27:12447] host ac27
> [ac27:12447] jobid 0
> [ac27:12447] procid 1
> [ac27:12447] procdir:
> /jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565_at_ac27_0/default-
> universe-12442/0/1
> [ac27:12447] jobdir:
> /jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565_at_ac27_0/default-
> universe-12442/0
> [ac27:12447] unidir:
> /jobfs/z07/247752.ac-pbs/openmpi-sessions-kxc565_at_ac27_0/default-
> universe-12442
> [ac27:12447] top: openmpi-sessions-kxc565_at_ac27_0
> [ac27:12447] tmp: /jobfs/z07/247752.ac-pbs
> [ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
> base/iof_base_setup.c at line 106
> [ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
> odls_default_module.c at line 663
> [ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
> odls_default_module.c at line 1191
> [ac27:12447] [0,0,1] ORTE_ERROR_LOG: Out of resource in file orted.c
> at
> line 594
> [ac27:12442] spawn: in job_state_callback(jobid = 1, state = 0x80)
> mpirun noticed that job rank 0 with PID 0 on node ac27 exited on
> signal
> 15 (Terminated).
> [ac27:12447] sess_dir_finalize: job session dir not empty - leaving
> [ac27:12447] sess_dir_finalize: proc session dir not empty - leaving
> [ac27:12442] sess_dir_finalize: proc session dir not empty - leaving
>
>
> Thanks,
> Clement
>
> Clement Kam Man Chu wrote:
>> Hi,
>>
>> I am using openmpi 1.2.3 under ia64 machine. I typed "mpirun -d --
>> tmpdir
>> /home/565/kxc565/tmpdir -mca btl sm -np 400 ./testprogram". I found
>> only
>> the first process can use my parameter setting to store tmp file, but
>> the second process used its default setting to store tmp file in /tmp
>> directory. How can I change all processes stored in a directory I
>> required? I have attached the message from openmpi for more in
>> details.
>> Thanks for any help.
>>
>> Cheers,
>> Clement
>>
>>
>> [ac27:27928] [0,0,0] setting up session dir with
>> [ac27:27928] tmpdir /home/565/kxc565/tmpdir
>> [ac27:27928] universe default-universe-27928
>> [ac27:27928] user kxc565
>> [ac27:27928] host ac27
>> [ac27:27928] jobid 0
>> [ac27:27928] procid 0
>> [ac27:27928] procdir:
>> /home/565/kxc565/tmpdir/openmpi-sessions-kxc565_at_ac27_0/default-
>> universe-27928/0/0
>> [ac27:27928] jobdir:
>> /home/565/kxc565/tmpdir/openmpi-sessions-kxc565_at_ac27_0/default-
>> universe-27928/0
>> [ac27:27928] unidir:
>> /home/565/kxc565/tmpdir/openmpi-sessions-kxc565_at_ac27_0/default-
>> universe-27928
>> [ac27:27928] top: openmpi-sessions-kxc565_at_ac27_0
>> [ac27:27928] tmp: ?
>> [ac27:27928] [0,0,0] contact_file
>> /home/565/kxc565/tmpdir/openmpi-sessions-kxc565_at_ac27_0/default-
>> universe-27928/universe-setup.txt
>> [ac27:27928] [0,0,0] wrote setup file
>> [ac27:27932] [0,0,1] setting up session dir with
>> [ac27:27932] universe default-universe-27928
>> [ac27:27932] user kxc565
>> [ac27:27932] host ac27
>> [ac27:27932] jobid 0
>> [ac27:27932] procid 1
>> [ac27:27932] procdir:
>> /tmp/openmpi-sessions-kxc565_at_ac27_0/default-universe-27928/0/1
>> [ac27:27932] jobdir:
>> /tmp/openmpi-sessions-kxc565_at_ac27_0/default-universe-27928/0
>> [ac27:27932] unidir:
>> /tmp/openmpi-sessions-kxc565_at_ac27_0/default-universe-27928
>> [ac27:27932] top: openmpi-sessions-kxc565_at_ac27_0
>> [ac27:27932] tmp: /tmp
>> [ac27:27932] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
>> base/iof_base_setup.c at line 106
>> [ac27:27932] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
>> odls_default_module.c at line 663
>> [ac27:27932] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
>> odls_default_module.c at line 1191
>> [ac27:27932] [0,0,1] ORTE_ERROR_LOG: Out of resource in file
>> orted.c at
>> line 594
>> [ac27:27928] spawn: in job_state_callback(jobid = 1, state = 0x80)
>> mpirun noticed that job rank 0 with PID 0 on node ac27 exited on
>> signal
>> 15 (Terminated).
>> [ac27:27932] sess_dir_finalize: job session dir not empty - leaving
>> [ac27:27932] sess_dir_finalize: proc session dir not empty - leaving
>> [ac27:27928] sess_dir_finalize: proc session dir not empty - leaving
>>
>>
>
>
> --
> Clement Kam Man Chu
> Research Assistant
> Faculty of Information Technology
> Monash University, Caulfield Campus
> Ph: 61 3 9903 2355
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users

-- 
Jeff Squyres
Cisco Systems