It actually turned out to be something on my end. The modification actually works. :-)
As Rayson guessed, I am using Hadoop. I actually didn't know that OpenMPI was integrated with Torque but we're moving away from using Torque. In terms of making the code public, I'll have to double check, but there weren't too many changes involved.
On 4/29/09 11:37 AM, "Jeff Squyres" <jsquyres_at_[hidden]> wrote:
On Apr 29, 2009, at 1:38 PM, Jerry Ye wrote:
> I'm currently working in an environment where I cannot use SSH to
> launch child processes. Instead, the process with rank 0 skips the
> ssh_child function in plm_rsh_module.c and the child processes are
> all started at the same time on different machines. Coordination is
> done with static jobids and ports. I have sucessfully modified the
> code to get the hello_c example working.
Excellent. What mechanism are you using to start your jobs? Would it
be easier to fork the rsh plm into your own plugin? Is this code you
can share with the community?
> However, I'm having problems with inter-process communication when
> using MPI_Bcast. Is there something else that I'm obviously missing?
The PLM just starts up jobs -- other plugins are used for MPI
communications. E.g., the TCP BTL is probably what you're using for
MPI communications. Is that where it's failing?
devel mailing list