Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

From: Dirk Eddelbuettel (edd_at_[hidden])
Date: 2007-10-10 13:27:02


Jeff,

Thanks for the reply. I have gotten much closer, and it looks like all
wounds were self-inflicted. More below.

On 9 October 2007 at 22:01, Jeff Squyres wrote:
| On Oct 9, 2007, at 3:50 PM, Dirk Eddelbuettel wrote:
|
| > edd_at_ron:~$ orterun -n 2 --mca mca_component_show_load_errors 1 r -e
| > 'library(Rmpi); print(mpi.comm.rank(0))'
| > [ron:18360] mca: base: component_find: unable to open osc pt2pt:
| > file not found (ignored)
| > [ron:18361] mca: base: component_find: unable to open osc pt2pt:
| > file not found (ignored)
|
| Truly odd. Looking in the code, this error message is displayed when
| lt_dlopen() of the component fails for some reason (the Libtool
| portable wrapper library around dlopen() and friends). We print out
| the error string that libltdl returns to us, and it's apparently
| "file not found". This *usually* refers to the fact that a
| dependency of the DSO that we're trying to open wasn't found (not
| that the DSO itself wasn't found).
|
| Your list of ldd dependencies didn't show anything odd, so I can't
| imagine why it would get a "file not found" kind of error.
|
| An off the wall question: are you compiling / building Open MPI on
| one system and running it on another, where perhaps the dependencies
| are slightly different and therefore causing a failure? This is a
| pretty weak question to ask, because I assume that *many* OMPI
| components would fail to open if this were the case, but I thought
| I'd ask anyway...

It's a fair question, but the Debian dependencies are usually good enough. [
The answer is 'yes and no' as I build what gets onto Debian's mirrors, but
using a standardised chroot whereas I then run it on my normal system. So the
the same-yet-different machine. And there can be differences, but this is
typically caught by the package management layer. ]

| Another whacky question: does the error happen when you start your
| test program manually (without mpirun)?

That made no difference.

| Does this happen for all MPI programs (potentially only those that
| use the MPI-2 one-sided stuff), or just your R environment?

This is the likely winner.

It seems indeed due to R's Rmpi package. Running a simple mpitest.c shows no
error message. We will look at the Rmpi initialization to see what could
cause this.
 
| At this point, all I can suggest is firing up a debugger and stepping
| through the code in ld_dlopenext() to see why exactly it is failing.

Seems like I avoided that trip to the dentist. ;-)

Moreover, despite my attempts at checking and double checking, my apparent
'works on Debian but not on Ubuntu' was due to a LAM / OpenMPI mix on my
Ubuntu machine at work. Sorry, that was another false alarm.

| Sorry I don't have a better suggestion than this... :-\

You were spot-on and most helpful. Thanks a bunch.

Cheers, Dirk

-- 
Three out of two people have difficulties with fractions.