Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] mpiblast + openmpi + gridengine job faila to run
From: Reuti (reuti_at_[hidden])
Date: 2008-12-23 06:15:15


Hi,

Am 23.12.2008 um 12:03 schrieb Sangamesh B:

> Hello,
>
> I've compiled MPIBLAST-1.5.0-pio app on Rocks 4.3,Voltaire
> infiniband based Linux cluster using Open MPI-1.2.8 + intel 10
> compilers.
>
> The job is not running. Let me explain the configs:
>
> SGE job script:
>
> $ cat sge_submit.sh
> #!/bin/bash
>
> #$ -N OMPI-Blast-Job
>
> #$ -S /bin/bash
>
> #$ -cwd
>
> #$ -e err.$JOB_ID.$JOB_NAME
>
> #$ -o out.$JOB_ID.$JOB_NAME
>
> #$ -pe orte 4
>
> /opt/openmpi_intel/1.2.8/bin/mpirun -np $NSLOTS
> /opt/apps/mpiblast-150-pio_OMPI/bin/mpiblast -p blastp -d
> Mtub_CDC1551_.faa -i 586_seq.fasta -o test.out
>
> The PE orte is:
>
> $ qconf -sp orte
> pe_name orte
> slots 999
> user_lists NONE
> xuser_lists NONE
> start_proc_args /bin/true
> stop_proc_args /bin/true
> allocation_rule $fill_up
> control_slaves FALSE
> job_is_first_task TRUE

you will need here:

control_slaves TRUE
job_is_first_task FALSE

-- Reuti

> urgency_slots min
>
> # /opt/openmpi_intel/1.2.8/bin/ompi_info | grep gridengine
> MCA ras: gridengine (MCA v1.0, API v1.3, Component
> v1.2.8)
> MCA pls: gridengine (MCA v1.0, API v1.3, Component
> v1.2.8)
>
> The SGE error and output files for the job are as follows:
>
> $ cat err.88.OMPI-Blast-Job
> error: executing task of job 88 failed:
> [compute-0-1.local:06151] ERROR: A daemon on node compute-0-1.local
> failed to start as expected.
> [compute-0-1.local:06151] ERROR: There may be more information
> available from
> [compute-0-1.local:06151] ERROR: the 'qstat -t' command on the Grid
> Engine tasks.
> [compute-0-1.local:06151] ERROR: If the problem persists, please
> restart the
> [compute-0-1.local:06151] ERROR: Grid Engine PE job
> [compute-0-1.local:06151] ERROR: The daemon exited unexpectedly
> with status 1.
>
> $ cat out.88.OMPI-Blast-Job
>
> There is nothing in output file.
>
> The qstat shows that job is running at some node. But on that node,
> there is no mpiblast processes running as seen by top command.
>
> The ps command:
>
> # ps -ef | grep mpiblast
> locuz 4018 4017 0 16:25 ? 00:00:00
> /opt/openmpi_intel/1.2.8/bin/mpirun -np 4
> /opt/apps/mpiblast-150-pio_OMPI/bin/mpiblast -p blastp -d
> Mtub_CDC1551_.faa -i 586_seq.fasta -o test.out
> root 4120 4022 0 16:27 pts/0 00:00:00 grep mpiblast
>
> shows this.
>
> The ibv_rc_pingpong tests work fine. The output of lsmod:
>
> # lsmod | grep ib
> ib_sdp 57788 0
> rdma_cm 38292 3 rdma_ucm,rds,ib_sdp
> ib_addr 11400 1 rdma_cm
> ib_local_sa 14864 1 rdma_cm
> ib_mthca 157396 2
> ib_ipoib 83928 0
> ib_umad 20656 0
> ib_ucm 21256 0
> ib_uverbs 46896 8 rdma_ucm,ib_ucm
> ib_cm 42536 3 rdma_cm,ib_ipoib,ib_ucm
> ib_sa 28512 4 rdma_cm,ib_local_sa,ib_ipoib,ib_cm
> ib_mad 43432 5
> ib_local_sa,ib_mthca,ib_umad,ib_cm,ib_sa
> ib_core 70544 14
> rdma_ucm,rds,ib_sdp,rdma_cm,iw_cm,ib_local_sa,ib_mthca,ib_ipoib,ib_uma
> d,ib_ucm,ib_uverbs,ib_cm,ib_sa,ib_mad
> ipv6 285089 23 ib_ipoib
> libata 124585 1 ata_piix
> scsi_mod 144529 2 libata,sd_mod
>
> What might be the problem?
> We've used Voltaire OFA Roll from rocks - Gridstack.
>
> Thanks,
> Sangamesh
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users