Open MPI logo

Open MPI User's Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Open MPI User's mailing list

Subject: Re: [OMPI users] OpenMPI load data to multiple nodes
From: Gus Correa (gus_at_[hidden])
Date: 2010-07-12 21:44:34


Hi Jack/Jinxu

Jack Bryan wrote:
> Dear All,
>
> I am working on a multi-computer Open MPI cluster system.
>
> If I put some data files in /home/mypath/folder, is it possible that all
> non-head nodes can access the files in the folder ?
>

Yes, possible, for instance, if the /home/mypath/folder directory is
NFS mounted on all nodes/computers.
Otherwise, if all disks and directories are local to each computer,
you need to copy the input files to the local disks before you
start, and copy the output files back to your login computer after the
program ends.

> I need to load some data to some nodes, if all nodes can access the
> data, I do not need to load them to each node one by one.
>
> If multiple nodes access the same file to get data, is there conflict ?
>

To some extent.
The OS (on the computer where the file is located)
will do the arbitration on which process gets the hold of the file at
each time.
If you have 1000 processes, this means a lot of arbitration,
and most likely contention.
Even for two processes only, if the processes are writing data to a
single file, this won't ensure that they write
the output data in the order that you want.

> For example,
>
> fopen(myFile) by node 1, at the same time fopen(myFile) by node 2.
>
> Is it allowed to do that on MPI cluster without conflict ?
>

I think MPI won't have any control over this.
It is up to the operational system, and depends on
which process gets its "fopen" request to the OS first,
which is not a deterministic sequence of events.
That is not a clean technique.

You could instead:

1) Assign a single process, say, rank 0,
to read and write data from/to the file(s).
Then use, say, MPI_Scatter[v] and MPI_Gather[v],
to distribute and collect the data back and forth
between that process (rank 0) and all other processes.

That is an old fashioned but very robust technique.
It avoids any I/O conflict or contention among processes.
All the data flows across the processes via MPI.
The OS receives I/O requests from a single process (rank 0).

Besides MPI_Gather/MPI_Scatter, look also at MPI_Bcast,
if you need to send the same data to all processes,
assuming the data is being read by a single process.

2) Alternatively, you could use the MPI I/O functions,
if your files are binary.

I hope it helps,
Gus Correa

> Any help is appreciated.
>
> Jinxu Ding
>
> July 12 2010
>
> ------------------------------------------------------------------------
> The New Busy think 9 to 5 is a cute idea. Combine multiple calendars
> with Hotmail. Get busy.
> <http://www.windowslive.com/campaign/thenewbusy?tile=multicalendar&ocid=PID28326::T:WLMTAGL:ON:WL:en-US:WM_HMP:042010_5>
>
>
> ------------------------------------------------------------------------
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users