Open MPI logo

Open MPI Development Mailing List Archives

  |   Home   |   Support   |   FAQ   |   all Development mailing list

Subject: Re: [OMPI devel] malloc(0) warnings
From: Edgar Gabriel (gabriel_at_[hidden])
Date: 2010-05-06 18:59:45


I'll look into it in the next couple of days.
Thanks
Edgar

George Bosilca wrote:
> This is an artifact of using the gatherv (or the scatterv) on an inter-communicator without any useful data (i.e. either count of zero or empty datatypes). Looks more like a synchronization than a real operation.
>
> george.
>
> On May 5, 2010, at 20:17 , Lisandro Dalcin wrote:
>
>> After building 1.4.2 with debug flags to configure, I get this (I've
>> got these warnings in previous releases, too):
>>
>> malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94)
>> malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94)
>> malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94)
>> malloc debug: Request for 0 bytes (coll_inter_gatherv.c, 94)
>>
>> malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82)
>> malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82)
>> malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82)
>> malloc debug: Request for 0 bytes (coll_inter_scatterv.c, 82)
>>
>>
>> --
>> Lisandro Dalcin
>> ---------------
>> CIMEC (INTEC/CONICET-UNL)
>> Predio CONICET-Santa Fe
>> Colectora RN 168 Km 472, Paraje El Pozo
>> Tel: +54-342-4511594 (ext 1011)
>> Tel/Fax: +54-342-4511169
>> _______________________________________________
>> devel mailing list
>> devel_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/devel
>
>
> _______________________________________________
> devel mailing list
> devel_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/devel
> Return-Path: <svn-bounces_at_[hidden]>
> X-Original-To: gabriel_at_[hidden]
> Delivered-To: gabriel_at_[hidden]
> Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id EFAA223CB74;
> Thu, 6 May 2010 15:57:22 -0500 (CDT)
> X-Virus-Scanned: amavisd-new at cs.uh.edu
> Received: from dijkstra.cs.uh.edu ([127.0.0.1])
> by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024)
> with ESMTP id yimyxDvtFBmi; Thu, 6 May 2010 15:57:21 -0500 (CDT)
> Received: from milliways.osl.iu.edu (milliways.osl.iu.edu [129.79.245.239])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id 4508323CB70;
> Thu, 6 May 2010 15:57:20 -0500 (CDT)
> Received: from milliways.osl.iu.edu (localhost [127.0.0.1])
> by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id o46KvK3G020072;
> Thu, 6 May 2010 16:57:20 -0400
> Received: from sourcehaven.osl.iu.edu (sourcehaven.osl.iu.edu [129.79.245.235])
> by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id
> o46KvITp020066 for <svn_at_[hidden]>; Thu, 6 May 2010 16:57:18 -0400
> Received: from sourcehaven.osl.iu.edu (localhost [127.0.0.1])
> by sourcehaven.osl.iu.edu (8.13.1/8.13.1/NULLCLIENT_1.7) with ESMTP id
> o46KvIjb002462 for <svn_at_[hidden]>; Thu, 6 May 2010 16:57:18 -0400
> Received: (from apache_at_localhost)
> by sourcehaven.osl.iu.edu (8.13.1/8.13.1/Submit) id o46KvHti002438
> for svn_at_[hidden]; Thu, 6 May 2010 16:57:17 -0400
> Date: Thu, 6 May 2010 16:57:17 -0400
> Message-Id: <201005062057.o46KvHti002438_at_[hidden]>
> X-Authentication-Warning: sourcehaven.osl.iu.edu: apache set sender to
> rhc_at_[hidden] using -f
> From: rhc_at_[hidden]
> To: svn_at_[hidden]
> MIME-Version: 1.0
> Subject: [OMPI svn] svn:open-mpi r23106
> X-BeenThere: svn_at_[hidden]
> X-Mailman-Version: 2.1.11rc1
> Precedence: list
> Reply-To: devel_at_[hidden]
> List-Id: Open MPI SVN activity <svn.open-mpi.org>
> List-Unsubscribe: <http://www.open-mpi.org/mailman/options.cgi/svn>,
> <mailto:svn-request_at_[hidden]?subject=unsubscribe>
> List-Post: <mailto:svn_at_[hidden]>
> List-Help: <mailto:svn-request_at_[hidden]?subject=help>
> List-Subscribe: <http://www.open-mpi.org/mailman/listinfo.cgi/svn>,
> <mailto:svn-request_at_[hidden]?subject=subscribe>
> Content-Type: text/plain; charset="us-ascii"
> Content-Transfer-Encoding: 7bit
> Sender: svn-bounces_at_[hidden]
> Errors-To: svn-bounces_at_[hidden]
> Status: O
> X-UID: 88090
> Content-Length: 1900
> X-Keywords:
>
> Author: rhc
> Date: 2010-05-06 16:57:17 EDT (Thu, 06 May 2010)
> New Revision: 23106
> URL: https://svn.open-mpi.org/trac/ompi/changeset/23106
>
> Log:
> More cleanup on paffinity....groan
>
> It is okay to not have a paffinity module IF you aren't using paffinity anyway. So don't error out of MPI_Init because a paffinity module wasn't selected.
>
> Cleanup error reporting in the odls default module to (once and for all!) eliminate messages originating in the fork'd process. Create some new error codes to allow us to pass enough info back to the parent process to provide useful error messages.
>
>
> Text files modified:
> trunk/opal/include/opal/constants.h | 63 +++++++-----
> trunk/opal/mca/paffinity/base/paffinity_base_select.c | 17 +-
> trunk/opal/mca/paffinity/base/paffinity_base_service.c | 72 +++----------
> trunk/opal/mca/paffinity/base/paffinity_base_wrappers.c | 20 +-
> trunk/opal/runtime/opal_init.c | 32 +++++
> trunk/orte/include/orte/constants.h | 79 ++++++++------
> trunk/orte/mca/odls/default/help-odls-default.txt | 34 ++++-
> trunk/orte/mca/odls/default/odls_default_module.c | 206 ++++++++++++++++-----------------------
> trunk/orte/util/error_strings.c | 3
> 9 files changed, 256 insertions(+), 270 deletions(-)
>
>
> Diff not shown due to size (56064 bytes).
> To see the diff, run the following command:
>
> svn diff -r 23105:23106 --no-diff-deleted
>
> _______________________________________________
> svn mailing list
> svn_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/svn
> Return-Path: <wesner_at_[hidden]>
> X-Original-To: gabriel_at_[hidden]
> Delivered-To: gabriel_at_[hidden]
> Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id 0482623CB76
> for <gabriel_at_[hidden]>; Thu, 6 May 2010 16:12:34 -0500 (CDT)
> X-Virus-Scanned: amavisd-new at cs.uh.edu
> Received: from dijkstra.cs.uh.edu ([127.0.0.1])
> by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024)
> with ESMTP id qeP868W347GH for <gabriel_at_[hidden]>;
> Thu, 6 May 2010 16:12:30 -0500 (CDT)
> Received: from mail.hlrs.de (mail.hlrs.de [141.58.2.50])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id 4F0CB23CB0B
> for <gabriel_at_[hidden]>; Thu, 6 May 2010 16:12:30 -0500 (CDT)
> Received: from localhost (localhost [127.0.0.1])
> by mail.hlrs.de (Postfix) with ESMTP id C959F680A01A;
> Thu, 6 May 2010 23:12:26 +0200 (CEST)
> X-Virus-Scanned: amavisd-new at hlrs.de
> Received: from mail.hlrs.de ([127.0.0.1])
> by localhost (mail.hlrs.de [127.0.0.1]) (amavisd-new, port 10024)
> with ESMTP id tcgSAU+RQ0RV; Thu, 6 May 2010 23:12:26 +0200 (CEST)
> Received: from [192.168.121.3] (unknown [213.178.173.108])
> by mail.hlrs.de (Postfix) with ESMTPSA id 3DC3E680A016;
> Thu, 6 May 2010 23:12:25 +0200 (CEST)
> Subject: Re: EuroMPI2010
> Mime-Version: 1.0 (Apple Message framework v1078)
> Content-Type: text/plain; charset=iso-8859-1
> From: Stefan Wesner <wesner_at_[hidden]>
> In-Reply-To: <9971474.4448.1273174128565.JavaMail.root_at_epsilon>
> Date: Thu, 6 May 2010 23:12:22 +0200
> Cc: Edgar Gabriel <gabriel_at_[hidden]>,
> Edgar Gabriel <egabriel_at_[hidden]>,
> Rainer Keller <keller_at_[hidden]>,
> EuroMPI2010 <eurompi2010_at_[hidden]>
> Content-Transfer-Encoding: quoted-printable
> Message-Id: <5D05A0B7-9AF7-46D1-B6EB-66670CF32A29_at_[hidden]>
> References: <9971474.4448.1273174128565.JavaMail.root_at_epsilon>
> To: Rolf Rabenseifner <rabenseifner_at_[hidden]>
> X-Mailer: Apple Mail (2.1078)
> Status: O
> X-UID: 88091
> Content-Length: 9229
> X-Keywords:
>
> Hi,
>
> warum auch immer ist als forward ermail adresse keller_at_[hidden] =
> konfiguriert...
>
> Stefan.
> --
> Dr.-Ing. Stefan Wesner
> Deputy Director, Head of Applications & Visualization
> High Performance Computing Center of University Stuttgart
> Nobelstrasse 19, D-70569 Stuttgart, Germany
> Phone: +49 711-685 6 4275
> Mobile: +49 172 1354054
> Fax: +49 711-685 5 4275
>
>
> On 06.05.2010, at 21:28, Rolf Rabenseifner wrote:
>
>> Hallo Edgar,
>> =20
>> Cool, ich habe soeben Review-Paper-20 hochgeladen.
>> Es funktioniert sogar ohne neues Einloggen.
>> Bekommst Du nun auch die Mails an eurompi2010_at_[hidden] ?
>> =20
>> Besten Dank.
>> Rolf
>> =20
>> ----- Original Message -----
>>> ich glaube ich sehe das Problem:
>>> =20
>>> Bei der Konfiguration gibt es den Menupunkt 'Can non-chairs add or
>>> modify reviews', und das war auf 'no' gesetzt. Ich habe es jetzt mal
>>> auf 'yes' gesetzt, mich als Rolf eingeloggt, und jetzt ist der
>>> Menupunkt zum
>>> uploaden der papers da.
>>> =20
>>> Viele Gruesse
>>> Edgar
>>> =20
>>> Rolf Rabenseifner wrote:
>>>> Danke Edgar,
>>>> =20
>>>> Und Ihr solltet vielleicht auch schauen, ob noch weitere Mails an
>>>> eurompi2010_at_[hidden] einfach unbeantwortet blieben,
>>>> oder ob meine die einzige war.
>>>> Und warum sie bei Dir, Edgar nicht ankommen.
>>>> =20
>>>> Viele Gr=FC=DFe
>>>> Rolf
>>>> =20
>>>> ----- Original Message -----
>>>>> yep, ich habe weitere Menupunkte.
>>>>> =20
>>>>> Ich habe mich aber als ein anderes Mitglied des Program Kommittees
>>>>> eingeloggt, und der Punkt scheint in der Tat zu fehlen.
>>>>> Rainer/Stefan, da Ihr den Premium service fuer Easychair bezahlt
>>>>> habt, kann einer von
>>>>> euch mal die Leute anpingen was da falsch ist?
>>>>> =20
>>>>> Viele Gruesse
>>>>> Edgar
>>>>> =20
>>>>> Rolf Rabenseifner wrote:
>>>>>> Hallo Edgar,
>>>>>> =20
>>>>>> danke bzgl. Pap23.
>>>>>> =20
>>>>>> Bzgl. Upload:
>>>>>> =20
>>>>>> Diesen Menue-Punkt gibt es bei mir nicht!!!!!!!!!
>>>>>> =20
>>>>>> Im Menue "Reviews" gibt es als Pop-up und als Webpage folgende
>>>>>> Punkte:
>>>>>> http://www.easychair.org/conferences/review.cgi?a=3Dp0266d40517a
>>>>>> =20
>>>>>> Reviews
>>>>>> Select one of the following options.
>>>>>> =20
>>>>>> - Reviews on papers assigned to me
>>>>>> - Download offline review forms
>>>>>> - Subreviewers
>>>>>> =20
>>>>>> Sieht die Seite bei Dir anders aus?
>>>>>> =20
>>>>>> Sch=F6ne Gr=FC=DFe
>>>>>> Rolf
>>>>>> =20
>>>>>> ----- Original Message -----
>>>>>>> Hallo Rolf,
>>>>>>> =20
>>>>>>> Rolf Rabenseifner wrote:
>>>>>>>> Hallo Stefan, Rainer und Edgar,
>>>>>>>> =20
>>>>>>>> ich wei=DF nicht, ob meine Mails an eurompi2010_at_[hidden]
>>>>>>>> wirklich irgendwo ankommen - daher nun auch direkt an Euch.
>>>>>>> hm, ich habe um ehrlich zu sein keine email gesehen von Dir. Tut
>>>>>>> die eurompi2010_at_[hidden] wirklich die emails an uns
>>>>>>> verschicken, oder
>>>>>>> wohing gehen sie? Ich sehe auch nichts in den spam filtern.
>>>>>>> =20
>>>>>>>> 2 Probleme:
>>>>>>>> =20
>>>>>>>> - bei
>>>>>>>> http://www.easychair.org/conferences/review.cgi?a=3Dp0266d40517a
>>>>>>>> d.h. unter "Reviews" bzw. "My papers" finde ich keine
>>>>>>>> M=F6glichkeit meinen Review abzuliefern, d.h. diese
>>>>>>>> reviews_form.txt files.
>>>>>>>> Falls ich nicht nur zu dumm bin, die offensichtliche Stelle
>>>>>>>> sofort zu sehen, dann sollte dieses Problem Eurerseits
>>>>>>>> m=F6glichst schnell gel=F6st werden, da es dann wahrscheinlich =
> das
>>>>>>>> gesmte Program Committee betrifft.
>>>>>>> Ich habe gerade probiert, wenn Du bei Revies auf den Knopf
>>>>>>> 'Upload reviews' gehst, kannst Du die Form hochladen. Das scheint
>>>>>>> zu tun.
>>>>>>> =20
>>>>>>>> - Das Paper 23 ist immernoch nicht in meiner "Reviews-->my
>>>>>>>> papers"
>>>>>>>> Liste wieder sichtbar, siehe angeh=E4ngte Mail, zu der ich nie
>>>>>>>> eine Antwort bekam.
>>>>>>>> (H=E4tte sie vielleicht in Deutsch schreiben sollen,
>>>>>>>> aber ich wei=DF nicht, wer alles auf eurompi2010_at_[hidden]
>>>>>>>> eingetragen ist.)
>>>>>>> done, paper 23 ist zusaetzlich fuer Dich eingetragen.
>>>>>>> Normalerweise protestieren wir nicht wenn jemand freiwillig mehr
>>>>>>> Arbeit leisten
>>>>>>> moechte.
>>>>>>> =20
>>>>>>> Viele Gruesse
>>>>>>> Edgar
>>>>>>> =20
>>>>>>> =20
>>>>>>>> Sch=F6ne Gr=FC=DFe
>>>>>>>> Rolf
>>>>>>>> =20
>>>>>>>> =20
>>>>>>>> ----- Original Message -----
>>>>>>>>> Hi all,
>>>>>>>>> =20
>>>>>>>>> I did not receive any answer within the last 3 days.
>>>>>>>>> I expect that you are in a state where you are not
>>>>>>>>> able to make further changes without problems.
>>>>>>>>> Therefore, I'll review the currently assigned papers 20 and 22.
>>>>>>>>> Please assign also the paper 23 to me because I've
>>>>>>>>> already done parts of the review.
>>>>>>>>> =20
>>>>>>>>> I need the decision because I'm flying many hours
>>>>>>>>> next week, which is always a good time for reviewing.
>>>>>>>>> =20
>>>>>>>>> Best regards and happy weekend
>>>>>>>>> Rolf
>>>>>>>>> =20
>>>>>>>>> ----- Original Message (Apr. 27) -----
>>>>>>>>>> Hi Stefan, Rainer, Edgar,
>>>>>>>>>> =20
>>>>>>>>>> I started already yesterday to review paper 23.
>>>>>>>>>> Yesterday, I was assigned to papers 22, 23, and 24.
>>>>>>>>>> =20
>>>>>>>>>> Papers in my main area of expertise are:
>>>>>>>>>> - 23 (area of my PhD and work in last 3 years),
>>>>>>>>>> - 2, 10 (related to my work of optimization of
>>>>>>>>>> collective reduction operations).
>>>>>>>>>> =20
>>>>>>>>>> An additional conflict is paper 21, because Rainer is
>>>>>>>>>> a direct colleague at HLRS. I entered this conflict into
>>>>>>>>>> EasyChair database and therefore, the paper was removed
>>>>>>>>>> from my review list.
>>>>>>>>>> =20
>>>>>>>>>> It would be nice,
>>>>>>>>>> - if I can get back paper 23 for review.
>>>>>>>>>> - if you can substitute papers 20+22 by 2+10.
>>>>>>>>>> =20
>>>>>>>>>> Best regards
>>>>>>>>>> Rolf
>>>>>>>>>> =20
>>>>>>>>>> =20
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> Dear Rolf,
>>>>>>>>>>> the papers of the EuroMPI2010 conference have been assigned
>>>>>>>>>>> to the PC members. Please make sure, that the automatic
>>>>>>>>>>> conflict detection worked fine.
>>>>>>>>>>> =20
>>>>>>>>>>> We kindly ask you to please log into EasyChair
>>>>>>>>>>> (http://www.easychair.org) to check & download Your assigned
>>>>>>>>>>> papers and notify us of further conflicts by
>>>>>>>>>>> 30th of April.
>>>>>>>>>>> =20
>>>>>>>>>>> =20
>>>>>>>>>>> We would like to encourage a discussion on on papers where
>>>>>>>>>>> the reviews show different opinions.
>>>>>>>>>>> Therefore we would like to ask You to please submit Your
>>>>>>>>>>> review by 12th of May
>>>>>>>>>>> to be able to make the deadline for the notification of
>>>>>>>>>>> authors on 20th of May.
>>>>>>>>>>> If you cannot proceed with the review or this is not
>>>>>>>>>>> convenient for you please do not hesitate to contact us and
>>>>>>>>>>> we will submit
>>>>>>>>>>> the papers to another reviewer.
>>>>>>>>>>> =20
>>>>>>>>>>> The review process will be open but anonymous -- You should
>>>>>>>>>>> be able to see other reviewers input, after you have
>>>>>>>>>>> submitted your review. If you have any questions, please do
>>>>>>>>>>> not hesitate
>>>>>>>>>>> to contact us.
>>>>>>>>>>> =20
>>>>>>>>>>> Thank you very much.
>>>>>>>>>>> =20
>>>>>>>>>>> Best regards,
>>>>>>>>>>> the Program Chairs of EuroMPI 2010.
>>>>>>>>>> -- Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>>>>>>>> rabenseifner_at_[hidden] High Performance Computing Center (HLRS)
>>>>>>>>>> . phone ++49(0)711/685-65530 University of Stuttgart . . . . .
>>>>>>>>>> . .
>>>>>>>>>> . .. fax
>>>>>>>>>> ++49(0)711 / 685-65832
>>>>>>>>>> Head of Dpmt Parallel Computing . . .
>>>>>>>>>> www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550
>>>>>>>>>> Stuttgart, Germany . (Office: Allmandring 30)
>>>>>>>>> -- Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>>>>>>> rabenseifner_at_[hidden] High Performance Computing Center (HLRS) .
>>>>>>>>> phone ++49(0)711/685-65530 University of Stuttgart . . . . . .
>>>>>>>>> . . .. fax
>>>>>>>>> ++49(0)711 / 685-65832
>>>>>>>>> Head of Dpmt Parallel Computing . . .
>>>>>>>>> www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550
>>>>>>>>> Stuttgart, Germany . (Office: Allmandring 30)
>>>>>>> -- Edgar Gabriel
>>>>>>> Assistant Professor
>>>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>>>>> Department of Computer Science University of Houston
>>>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>>> -- Edgar Gabriel
>>>>> Assistant Professor
>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>>> Department of Computer Science University of Houston
>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>> =20
>>> =20
>>> -- Edgar Gabriel
>>> Assistant Professor
>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>> Department of Computer Science University of Houston
>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>> =20
>> --=20
>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner_at_[hidden]
>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
>> Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30)
> Return-Path: <wesner_at_[hidden]>
> X-Original-To: gabriel_at_[hidden]
> Delivered-To: gabriel_at_[hidden]
> Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id BDAAA23CB0B
> for <gabriel_at_[hidden]>; Thu, 6 May 2010 16:12:34 -0500 (CDT)
> X-Virus-Scanned: amavisd-new at cs.uh.edu
> Received: from dijkstra.cs.uh.edu ([127.0.0.1])
> by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024)
> with ESMTP id f04bIjpkp-Kp for <gabriel_at_[hidden]>;
> Thu, 6 May 2010 16:12:32 -0500 (CDT)
> Received: from smtp3.cc.uh.edu (smtp3.cc.uh.edu [129.7.234.210])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id B40A123CB53
> for <gabriel_at_[hidden]>; Thu, 6 May 2010 16:12:32 -0500 (CDT)
> Received: from smtp3.cc.uh.edu (smtp3.cc.uh.edu [127.0.0.1])
> by localhost (Postfix) with SMTP id 9FF5455F024A
> for <gabriel_at_[hidden]>; Thu, 6 May 2010 16:12:32 -0500 (CDT)
> Received: from mail.hlrs.de (mail.hlrs.de [141.58.2.50])
> by smtp3.cc.uh.edu (Postfix) with ESMTP id DA68455F0246
> for <egabriel_at_[hidden]>; Thu, 6 May 2010 16:12:31 -0500 (CDT)
> Received: from localhost (localhost [127.0.0.1])
> by mail.hlrs.de (Postfix) with ESMTP id C959F680A01A;
> Thu, 6 May 2010 23:12:26 +0200 (CEST)
> X-Virus-Scanned: amavisd-new at hlrs.de
> Received: from mail.hlrs.de ([127.0.0.1])
> by localhost (mail.hlrs.de [127.0.0.1]) (amavisd-new, port 10024)
> with ESMTP id tcgSAU+RQ0RV; Thu, 6 May 2010 23:12:26 +0200 (CEST)
> Received: from [192.168.121.3] (unknown [213.178.173.108])
> by mail.hlrs.de (Postfix) with ESMTPSA id 3DC3E680A016;
> Thu, 6 May 2010 23:12:25 +0200 (CEST)
> Subject: Re: EuroMPI2010
> Mime-Version: 1.0 (Apple Message framework v1078)
> Content-Type: text/plain; charset=iso-8859-1
> From: Stefan Wesner <wesner_at_[hidden]>
> In-Reply-To: <9971474.4448.1273174128565.JavaMail.root_at_epsilon>
> Date: Thu, 6 May 2010 23:12:22 +0200
> Cc: Edgar Gabriel <gabriel_at_[hidden]>,
> Edgar Gabriel <egabriel_at_[hidden]>,
> Rainer Keller <keller_at_[hidden]>,
> EuroMPI2010 <eurompi2010_at_[hidden]>
> Content-Transfer-Encoding: quoted-printable
> Message-Id: <5D05A0B7-9AF7-46D1-B6EB-66670CF32A29_at_[hidden]>
> References: <9971474.4448.1273174128565.JavaMail.root_at_epsilon>
> To: Rolf Rabenseifner <rabenseifner_at_[hidden]>
> X-Mailer: Apple Mail (2.1078)
> X-PMX-Version: 5.5.9.395186, Antispam-Engine: 2.7.2.376379, Antispam-Data: 2010.5.6.205714
> Status: O
> X-UID: 88092
> Content-Length: 9229
> X-Keywords:
>
> Hi,
>
> warum auch immer ist als forward ermail adresse keller_at_[hidden] =
> konfiguriert...
>
> Stefan.
> --
> Dr.-Ing. Stefan Wesner
> Deputy Director, Head of Applications & Visualization
> High Performance Computing Center of University Stuttgart
> Nobelstrasse 19, D-70569 Stuttgart, Germany
> Phone: +49 711-685 6 4275
> Mobile: +49 172 1354054
> Fax: +49 711-685 5 4275
>
>
> On 06.05.2010, at 21:28, Rolf Rabenseifner wrote:
>
>> Hallo Edgar,
>> =20
>> Cool, ich habe soeben Review-Paper-20 hochgeladen.
>> Es funktioniert sogar ohne neues Einloggen.
>> Bekommst Du nun auch die Mails an eurompi2010_at_[hidden] ?
>> =20
>> Besten Dank.
>> Rolf
>> =20
>> ----- Original Message -----
>>> ich glaube ich sehe das Problem:
>>> =20
>>> Bei der Konfiguration gibt es den Menupunkt 'Can non-chairs add or
>>> modify reviews', und das war auf 'no' gesetzt. Ich habe es jetzt mal
>>> auf 'yes' gesetzt, mich als Rolf eingeloggt, und jetzt ist der
>>> Menupunkt zum
>>> uploaden der papers da.
>>> =20
>>> Viele Gruesse
>>> Edgar
>>> =20
>>> Rolf Rabenseifner wrote:
>>>> Danke Edgar,
>>>> =20
>>>> Und Ihr solltet vielleicht auch schauen, ob noch weitere Mails an
>>>> eurompi2010_at_[hidden] einfach unbeantwortet blieben,
>>>> oder ob meine die einzige war.
>>>> Und warum sie bei Dir, Edgar nicht ankommen.
>>>> =20
>>>> Viele Gr=FC=DFe
>>>> Rolf
>>>> =20
>>>> ----- Original Message -----
>>>>> yep, ich habe weitere Menupunkte.
>>>>> =20
>>>>> Ich habe mich aber als ein anderes Mitglied des Program Kommittees
>>>>> eingeloggt, und der Punkt scheint in der Tat zu fehlen.
>>>>> Rainer/Stefan, da Ihr den Premium service fuer Easychair bezahlt
>>>>> habt, kann einer von
>>>>> euch mal die Leute anpingen was da falsch ist?
>>>>> =20
>>>>> Viele Gruesse
>>>>> Edgar
>>>>> =20
>>>>> Rolf Rabenseifner wrote:
>>>>>> Hallo Edgar,
>>>>>> =20
>>>>>> danke bzgl. Pap23.
>>>>>> =20
>>>>>> Bzgl. Upload:
>>>>>> =20
>>>>>> Diesen Menue-Punkt gibt es bei mir nicht!!!!!!!!!
>>>>>> =20
>>>>>> Im Menue "Reviews" gibt es als Pop-up und als Webpage folgende
>>>>>> Punkte:
>>>>>> http://www.easychair.org/conferences/review.cgi?a=3Dp0266d40517a
>>>>>> =20
>>>>>> Reviews
>>>>>> Select one of the following options.
>>>>>> =20
>>>>>> - Reviews on papers assigned to me
>>>>>> - Download offline review forms
>>>>>> - Subreviewers
>>>>>> =20
>>>>>> Sieht die Seite bei Dir anders aus?
>>>>>> =20
>>>>>> Sch=F6ne Gr=FC=DFe
>>>>>> Rolf
>>>>>> =20
>>>>>> ----- Original Message -----
>>>>>>> Hallo Rolf,
>>>>>>> =20
>>>>>>> Rolf Rabenseifner wrote:
>>>>>>>> Hallo Stefan, Rainer und Edgar,
>>>>>>>> =20
>>>>>>>> ich wei=DF nicht, ob meine Mails an eurompi2010_at_[hidden]
>>>>>>>> wirklich irgendwo ankommen - daher nun auch direkt an Euch.
>>>>>>> hm, ich habe um ehrlich zu sein keine email gesehen von Dir. Tut
>>>>>>> die eurompi2010_at_[hidden] wirklich die emails an uns
>>>>>>> verschicken, oder
>>>>>>> wohing gehen sie? Ich sehe auch nichts in den spam filtern.
>>>>>>> =20
>>>>>>>> 2 Probleme:
>>>>>>>> =20
>>>>>>>> - bei
>>>>>>>> http://www.easychair.org/conferences/review.cgi?a=3Dp0266d40517a
>>>>>>>> d.h. unter "Reviews" bzw. "My papers" finde ich keine
>>>>>>>> M=F6glichkeit meinen Review abzuliefern, d.h. diese
>>>>>>>> reviews_form.txt files.
>>>>>>>> Falls ich nicht nur zu dumm bin, die offensichtliche Stelle
>>>>>>>> sofort zu sehen, dann sollte dieses Problem Eurerseits
>>>>>>>> m=F6glichst schnell gel=F6st werden, da es dann wahrscheinlich =
> das
>>>>>>>> gesmte Program Committee betrifft.
>>>>>>> Ich habe gerade probiert, wenn Du bei Revies auf den Knopf
>>>>>>> 'Upload reviews' gehst, kannst Du die Form hochladen. Das scheint
>>>>>>> zu tun.
>>>>>>> =20
>>>>>>>> - Das Paper 23 ist immernoch nicht in meiner "Reviews-->my
>>>>>>>> papers"
>>>>>>>> Liste wieder sichtbar, siehe angeh=E4ngte Mail, zu der ich nie
>>>>>>>> eine Antwort bekam.
>>>>>>>> (H=E4tte sie vielleicht in Deutsch schreiben sollen,
>>>>>>>> aber ich wei=DF nicht, wer alles auf eurompi2010_at_[hidden]
>>>>>>>> eingetragen ist.)
>>>>>>> done, paper 23 ist zusaetzlich fuer Dich eingetragen.
>>>>>>> Normalerweise protestieren wir nicht wenn jemand freiwillig mehr
>>>>>>> Arbeit leisten
>>>>>>> moechte.
>>>>>>> =20
>>>>>>> Viele Gruesse
>>>>>>> Edgar
>>>>>>> =20
>>>>>>> =20
>>>>>>>> Sch=F6ne Gr=FC=DFe
>>>>>>>> Rolf
>>>>>>>> =20
>>>>>>>> =20
>>>>>>>> ----- Original Message -----
>>>>>>>>> Hi all,
>>>>>>>>> =20
>>>>>>>>> I did not receive any answer within the last 3 days.
>>>>>>>>> I expect that you are in a state where you are not
>>>>>>>>> able to make further changes without problems.
>>>>>>>>> Therefore, I'll review the currently assigned papers 20 and 22.
>>>>>>>>> Please assign also the paper 23 to me because I've
>>>>>>>>> already done parts of the review.
>>>>>>>>> =20
>>>>>>>>> I need the decision because I'm flying many hours
>>>>>>>>> next week, which is always a good time for reviewing.
>>>>>>>>> =20
>>>>>>>>> Best regards and happy weekend
>>>>>>>>> Rolf
>>>>>>>>> =20
>>>>>>>>> ----- Original Message (Apr. 27) -----
>>>>>>>>>> Hi Stefan, Rainer, Edgar,
>>>>>>>>>> =20
>>>>>>>>>> I started already yesterday to review paper 23.
>>>>>>>>>> Yesterday, I was assigned to papers 22, 23, and 24.
>>>>>>>>>> =20
>>>>>>>>>> Papers in my main area of expertise are:
>>>>>>>>>> - 23 (area of my PhD and work in last 3 years),
>>>>>>>>>> - 2, 10 (related to my work of optimization of
>>>>>>>>>> collective reduction operations).
>>>>>>>>>> =20
>>>>>>>>>> An additional conflict is paper 21, because Rainer is
>>>>>>>>>> a direct colleague at HLRS. I entered this conflict into
>>>>>>>>>> EasyChair database and therefore, the paper was removed
>>>>>>>>>> from my review list.
>>>>>>>>>> =20
>>>>>>>>>> It would be nice,
>>>>>>>>>> - if I can get back paper 23 for review.
>>>>>>>>>> - if you can substitute papers 20+22 by 2+10.
>>>>>>>>>> =20
>>>>>>>>>> Best regards
>>>>>>>>>> Rolf
>>>>>>>>>> =20
>>>>>>>>>> =20
>>>>>>>>>> ----- Original Message -----
>>>>>>>>>>> Dear Rolf,
>>>>>>>>>>> the papers of the EuroMPI2010 conference have been assigned
>>>>>>>>>>> to the PC members. Please make sure, that the automatic
>>>>>>>>>>> conflict detection worked fine.
>>>>>>>>>>> =20
>>>>>>>>>>> We kindly ask you to please log into EasyChair
>>>>>>>>>>> (http://www.easychair.org) to check & download Your assigned
>>>>>>>>>>> papers and notify us of further conflicts by
>>>>>>>>>>> 30th of April.
>>>>>>>>>>> =20
>>>>>>>>>>> =20
>>>>>>>>>>> We would like to encourage a discussion on on papers where
>>>>>>>>>>> the reviews show different opinions.
>>>>>>>>>>> Therefore we would like to ask You to please submit Your
>>>>>>>>>>> review by 12th of May
>>>>>>>>>>> to be able to make the deadline for the notification of
>>>>>>>>>>> authors on 20th of May.
>>>>>>>>>>> If you cannot proceed with the review or this is not
>>>>>>>>>>> convenient for you please do not hesitate to contact us and
>>>>>>>>>>> we will submit
>>>>>>>>>>> the papers to another reviewer.
>>>>>>>>>>> =20
>>>>>>>>>>> The review process will be open but anonymous -- You should
>>>>>>>>>>> be able to see other reviewers input, after you have
>>>>>>>>>>> submitted your review. If you have any questions, please do
>>>>>>>>>>> not hesitate
>>>>>>>>>>> to contact us.
>>>>>>>>>>> =20
>>>>>>>>>>> Thank you very much.
>>>>>>>>>>> =20
>>>>>>>>>>> Best regards,
>>>>>>>>>>> the Program Chairs of EuroMPI 2010.
>>>>>>>>>> -- Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>>>>>>>> rabenseifner_at_[hidden] High Performance Computing Center (HLRS)
>>>>>>>>>> . phone ++49(0)711/685-65530 University of Stuttgart . . . . .
>>>>>>>>>> . .
>>>>>>>>>> . .. fax
>>>>>>>>>> ++49(0)711 / 685-65832
>>>>>>>>>> Head of Dpmt Parallel Computing . . .
>>>>>>>>>> www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550
>>>>>>>>>> Stuttgart, Germany . (Office: Allmandring 30)
>>>>>>>>> -- Dr. Rolf Rabenseifner . . . . . . . . . .. email
>>>>>>>>> rabenseifner_at_[hidden] High Performance Computing Center (HLRS) .
>>>>>>>>> phone ++49(0)711/685-65530 University of Stuttgart . . . . . .
>>>>>>>>> . . .. fax
>>>>>>>>> ++49(0)711 / 685-65832
>>>>>>>>> Head of Dpmt Parallel Computing . . .
>>>>>>>>> www.hlrs.de/people/rabenseifner Nobelstr. 19, D-70550
>>>>>>>>> Stuttgart, Germany . (Office: Allmandring 30)
>>>>>>> -- Edgar Gabriel
>>>>>>> Assistant Professor
>>>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>>>>> Department of Computer Science University of Houston
>>>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>>> -- Edgar Gabriel
>>>>> Assistant Professor
>>>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>>>> Department of Computer Science University of Houston
>>>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>>>> =20
>>> =20
>>> -- Edgar Gabriel
>>> Assistant Professor
>>> Parallel Software Technologies Lab http://pstl.cs.uh.edu
>>> Department of Computer Science University of Houston
>>> Philip G. Hoffman Hall, Room 524 Houston, TX-77204, USA
>>> Tel: +1 (713) 743-3857 Fax: +1 (713) 743-3335
>> =20
>> --=20
>> Dr. Rolf Rabenseifner . . . . . . . . . .. email rabenseifner_at_[hidden]
>> High Performance Computing Center (HLRS) . phone ++49(0)711/685-65530
>> University of Stuttgart . . . . . . . . .. fax ++49(0)711 / 685-65832
>> Head of Dpmt Parallel Computing . . . www.hlrs.de/people/rabenseifner
>> Nobelstr. 19, D-70550 Stuttgart, Germany . (Office: Allmandring 30)
> Return-Path: <users-bounces_at_[hidden]>
> X-Original-To: gabriel_at_[hidden]
> Delivered-To: gabriel_at_[hidden]
> Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id 1792423CB6D
> for <gabriel_at_[hidden]>; Thu, 6 May 2010 16:29:05 -0500 (CDT)
> X-Virus-Scanned: amavisd-new at cs.uh.edu
> Received: from dijkstra.cs.uh.edu ([127.0.0.1])
> by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024)
> with ESMTP id r4MtRA9MloZy for <gabriel_at_[hidden]>;
> Thu, 6 May 2010 16:29:03 -0500 (CDT)
> Received: from milliways.osl.iu.edu (milliways.osl.iu.edu [129.79.245.239])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id E924823CB5A
> for <gabriel_at_[hidden]>; Thu, 6 May 2010 16:29:02 -0500 (CDT)
> Received: from milliways.osl.iu.edu (localhost [127.0.0.1])
> by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id o46LSlDr022281;
> Thu, 6 May 2010 17:28:49 -0400
> Received: from mail1.ldeo.columbia.edu (mail1.ldeo.columbia.edu
> [129.236.19.100])
> by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id
> o46LSgTQ022276
> for <users_at_[hidden]>; Thu, 6 May 2010 17:28:46 -0400
> Received: from claudius.ldeo.columbia.edu (claudius.ldgo.columbia.edu
> [129.236.21.127]) (user=gus mech=PLAIN bits=0)
> by mail1.ldeo.columbia.edu (8.14.3/8.14.3/MAIL-LDEO-1.9) with ESMTP id
> o46LSg0a001054
> (version=TLSv1/SSLv3 cipher=DHE-RSA-AES256-SHA bits=256 verify=NOT)
> for <users_at_[hidden]>; Thu, 6 May 2010 17:28:42 -0400 (EDT)
> Message-ID: <4BE33485.9090202_at_[hidden]>
> Date: Thu, 06 May 2010 17:28:37 -0400
> From: Gus Correa <gus_at_[hidden]>
> User-Agent: Thunderbird 2.0.0.23 (X11/20090825)
> MIME-Version: 1.0
> To: Open MPI Users <users_at_[hidden]>
> References: <4BE08F2A.6000101_at_[hidden]> <441ACF3B-34A9-4FF5-B78E-B9A8DF4E87F3_at_[hidden]> <4BE09531.9040300_at_[hidden]> <0D4ABFDD-9802-4D77-BF70-C7EC3198F32A_at_[hidden]> <4BE0A505.2000005_at_[hidden]> <4BE0CB62.7080300_at_[hidden]> <10B2585F-576A-4B18-A83E-E8E16582329C_at_[hidden]> <4BE1AB3A.4010504_at_[hidden]> <9A3FCC9C-56DE-4DE4-A781-460CCC083CE9_at_[hidden]> <4BE1EC79.3030608_at_[hidden]> <20100505235456.GA5622_at_sopalepc> <7DC1D35D-11C6-4F4B-870A-031FF11F7B30_at_[hidden]> <4BE2D427.4090708_at_[hidden]> <4BE2F269.1090508_at_[hidden]> <4BE2F857.9090007_at_[hidden]>
> <4BE303FF.4020708_at_[hidden]>
> <A70200B0-1EBA-4212-A0D6-22CB3405339E_at_[hidden]>
> In-Reply-To: <A70200B0-1EBA-4212-A0D6-22CB3405339E_at_[hidden]>
> X-Scanned-By: MIMEDefang 2.64 on 129.236.19.105
> X-PMX-Version: 5.5.9.388399, Antispam-Engine: 2.7.2.376379,
> Antispam-Data: 2010.5.6.211515
> X-PerlMx-Spam: Gauge=X, Probability=10%, Report='
> TO_IN_SUBJECT 0.5, BODY_SIZE_6000_6999 0, BODY_SIZE_7000_LESS 0,
> __BOUNCE_CHALLENGE_SUBJ 0, __BOUNCE_NDR_SUBJ_EXEMPT 0,
> __CP_URI_IN_BODY 0, __CT 0, __CTE 0, __CT_TEXT_PLAIN 0,
> __HAS_MSGID 0, __MIME_TEXT_ONLY 0, __MIME_VERSION 0,
> __MOZILLA_MSGID 0, __SANE_MSGID 0, __TO_MALFORMED_2 0, __URI_NS ,
> __USER_AGENT 0'
> Subject: Re: [OMPI users] How do I run OpenMPI safely on
> a Nehalem standalone machine?
> X-BeenThere: users_at_[hidden]
> X-Mailman-Version: 2.1.11rc1
> Precedence: list
> Reply-To: Open MPI Users <users_at_[hidden]>
> List-Id: Open MPI Users <users.open-mpi.org>
> List-Unsubscribe: <http://www.open-mpi.org/mailman/options.cgi/users>,
> <mailto:users-request_at_[hidden]?subject=unsubscribe>
> List-Archive: <http://www.open-mpi.org/MailArchives/users>
> List-Post: <mailto:users_at_[hidden]>
> List-Help: <mailto:users-request_at_[hidden]?subject=help>
> List-Subscribe: <http://www.open-mpi.org/mailman/listinfo.cgi/users>,
> <mailto:users-request_at_[hidden]?subject=subscribe>
> Content-Transfer-Encoding: 7bit
> Content-Type: text/plain; charset="us-ascii"; Format="flowed"
> Sender: users-bounces_at_[hidden]
> Errors-To: users-bounces_at_[hidden]
> Status: O
> X-UID: 88093
> Content-Length: 6268
> X-Keywords:
>
> Hi Samuel
>
> Samuel K. Gutierrez wrote:
>> Hi Gus,
>>
>> This may not help, but it's worth a try. If it's not too much trouble,
>> can you please reconfigure your Open MPI installation with
>> --enable-debug and then rebuild? After that, may we see the stack trace
>> from a core file that is produced after the segmentation fault?
>>
>> Thanks,
>>
>> --
>> Samuel K. Gutierrez
>> Los Alamos National Laboratory
>>
>
> Thank you for the suggestion.
>
> I am a bit reluctant to try this because when it fails,
> it *really* fails.
> Most of the times the machine doesn't even return the prompt,
> and in all cases it freezes and requires a hard reboot.
> It is not a segfault that the OS can catch, I guess.
> I wonder if enabling debug mode would do much for us,
> and get to the point of dumping a core, or just die before that.
>
> Gus Correa
> ---------------------------------------------------------------------
> Gustavo Correa
> Lamont-Doherty Earth Observatory - Columbia University
> Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>> On May 6, 2010, at 12:01 PM, Gus Correa wrote:
>>
>>> Hi Eugene
>>>
>>> Thanks for the detailed answer.
>>>
>>> *************
>>>
>>> 1) Now I can see and use the btl_sm_num_fifos component:
>>>
>>> I had committed already "btl = ^sm" to the openmpi-mca-params.conf
>>> file. This apparently hides the btl_sm_num_fifos from ompi_info.
>>>
>>> After I switched to no options in openmpi-mca-params.conf,
>>> then ompi_info showed the btl_sm_num_fifos component.
>>>
>>> ompi_info --all | grep btl_sm_num_fifos
>>> MCA btl: parameter "btl_sm_num_fifos" (current value:
>>> "1", data source: default value)
>>>
>>> A side comment:
>>> This means that the system administrator can
>>> hide some Open MPI options from the users, depending on what
>>> he puts in the openmpi-mca-params.conf file, right?
>>>
>>> *************
>>>
>>> 2) However, running with "sm" still breaks, unfortunately:
>>>
>>> Boomer!
>>> I get the same errors that I reported in my very
>>> first email, if I increase the number of processes to 16,
>>> to explore the hyperthreading range.
>>>
>>> This is using "sm" (i.e. not excluded in the mca config file),
>>> and btl_sm_num_fifos (mpiexec command line)
>>>
>>> The machine hangs, requires a hard reboot, etc, etc,
>>> as reported earlier. See the below, please.
>>>
>>> So, I guess the conclusion is that I can use sm,
>>> but I have to remain within the range of physical cores (8),
>>> not oversubscribe, not try to explore the HT range.
>>> Should I expect it to work also for np>number of physical cores?
>>>
>>> I wonder if this would still work with np<=8, but with heavier code.
>>> (I only used hello_c.c so far.)
>>> Not sure I'll be able to test this, the user wants to use the machine.
>>>
>>>
>>> $mpiexec -mca btl_sm_num_fifos 4 -np 4 a.out
>>> Hello, world, I am 0 of 4
>>> Hello, world, I am 1 of 4
>>> Hello, world, I am 2 of 4
>>> Hello, world, I am 3 of 4
>>>
>>> $ mpiexec -mca btl_sm_num_fifos 8 -np 8 a.out
>>> Hello, world, I am 0 of 8
>>> Hello, world, I am 1 of 8
>>> Hello, world, I am 2 of 8
>>> Hello, world, I am 3 of 8
>>> Hello, world, I am 4 of 8
>>> Hello, world, I am 5 of 8
>>> Hello, world, I am 6 of 8
>>> Hello, world, I am 7 of 8
>>>
>>> $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out
>>> --------------------------------------------------------------------------
>>>
>>> mpiexec noticed that process rank 8 with PID 3659 on node
>>> spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault).
>>> --------------------------------------------------------------------------
>>>
>>> $
>>>
>>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>>> kernel:------------[ cut here ]------------
>>>
>>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>>> kernel:invalid opcode: 0000 [#1] SMP
>>>
>>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>>> kernel:last sysfs file:
>>> /sys/devices/system/cpu/cpu15/topology/physical_package_id
>>>
>>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>>> kernel:Stack:
>>>
>>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>>> kernel:Call Trace:
>>>
>>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>>> kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00
>>> 4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73
>>> 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01
>>>
>>> *****************
>>>
>>> Many thanks,
>>> Gus Correa
>>> ---------------------------------------------------------------------
>>> Gustavo Correa
>>> Lamont-Doherty Earth Observatory - Columbia University
>>> Palisades, NY, 10964-8000 - USA
>>> ---------------------------------------------------------------------
>>>
>>>
>>> Eugene Loh wrote:
>>>> Gus Correa wrote:
>>>>> Hi Eugene
>>>>>
>>>>> Thank you for answering one of my original questions.
>>>>>
>>>>> However, there seems to be a problem with the syntax.
>>>>> Is it really "-mca btl btl_sm_num_fifos=some_number"?
>>>> No. Try "--mca btl_sm_num_fifos 4". Or,
>>>> % setenv OMPI_MCA_btl_sm_num_fifos 4
>>>> % ompi_info -a | grep btl_sm_num_fifos # check that things were
>>>> set correctly
>>>> % mpirun -n 4 a.out
>>>>> When I grep any component starting with btl_sm I get nothing:
>>>>>
>>>>> ompi_info --all | grep btl_sm
>>>>> (No output)
>>>> I'm no guru, but I think the reason has something to do with
>>>> dynamically loaded somethings. E.g.,
>>>> % /home/eugene/ompi/bin/ompi_info --all | grep btl_sm_num_fifos
>>>> (no output)
>>>> % setenv OPAL_PREFIX /home/eugene/ompi
>>>> % set path = ( $OPAL_PREFIX/bin $path )
>>>> % ompi_info --all | grep btl_sm_num_fifos
>>>> MCA btl: parameter "btl_sm_num_fifos" (current value:
>>>> "1", data source: default value)
>>>> _______________________________________________
>>>> users mailing list
>>>> users_at_[hidden]
>>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>>> _______________________________________________
>>> users mailing list
>>> users_at_[hidden]
>>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>> _______________________________________________
>> users mailing list
>> users_at_[hidden]
>> http://www.open-mpi.org/mailman/listinfo.cgi/users
>
> _______________________________________________
> users mailing list
> users_at_[hidden]
> http://www.open-mpi.org/mailman/listinfo.cgi/users
> Return-Path: <users-bounces_at_[hidden]>
> X-Original-To: gabriel_at_[hidden]
> Delivered-To: gabriel_at_[hidden]
> Received: from localhost (dijkstra.cs.uh.edu [127.0.0.1])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id DCA5923CB75
> for <gabriel_at_[hidden]>; Thu, 6 May 2010 16:34:49 -0500 (CDT)
> X-Virus-Scanned: amavisd-new at cs.uh.edu
> Received: from dijkstra.cs.uh.edu ([127.0.0.1])
> by localhost (dijkstra.cs.uh.edu [127.0.0.1]) (amavisd-new, port 10024)
> with ESMTP id 9aIlmplg06In for <gabriel_at_[hidden]>;
> Thu, 6 May 2010 16:34:47 -0500 (CDT)
> Received: from milliways.osl.iu.edu (milliways.osl.iu.edu [129.79.245.239])
> by dijkstra.cs.uh.edu (Postfix) with ESMTP id 674E823CB74
> for <gabriel_at_[hidden]>; Thu, 6 May 2010 16:34:47 -0500 (CDT)
> Received: from milliways.osl.iu.edu (localhost [127.0.0.1])
> by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id o46LYYhG022846;
> Thu, 6 May 2010 17:34:34 -0400
> Received: from rtp-iport-2.cisco.com (rtp-iport-2.cisco.com [64.102.122.149])
> by milliways.osl.iu.edu (8.13.1/8.13.1/IUCS_2.92) with ESMTP id
> o46LYSQt022842
> for <users_at_[hidden]>; Thu, 6 May 2010 17:34:32 -0400
> Authentication-Results: rtp-iport-2.cisco.com;
> dkim=neutral (message not signed) header.i=none
> X-IronPort-Anti-Spam-Filtered: true
> X-IronPort-Anti-Spam-Result: AvsEAKbS4ktAZnwM/2dsb2JhbACeAnGja5lbhRME
> X-IronPort-AV: E=Sophos;i="4.52,343,1270425600"; d="scan'208";a="108847076"
> Received: from rtp-core-1.cisco.com ([64.102.124.12])
> by rtp-iport-2.cisco.com with ESMTP; 06 May 2010 21:34:28 +0000
> Received: from rtp-jsquyres-8714.cisco.com (rtp-jsquyres-8714.cisco.com
> [10.116.19.197])
> by rtp-core-1.cisco.com (8.13.8/8.14.3) with ESMTP id o46LYQO2004203
> for <users_at_[hidden]>; Thu, 6 May 2010 21:34:28 GMT
> Mime-Version: 1.0 (Apple Message framework v1078)
> From: Jeff Squyres <jsquyres_at_[hidden]>
> In-Reply-To: <4BE303FF.4020708_at_[hidden]>
> Date: Thu, 6 May 2010 17:34:26 -0400
> Message-Id: <2B9FC527-EACD-47DA-BC26-45332247EFCC_at_[hidden]>
> References: <4BE08F2A.6000101_at_[hidden]> <441ACF3B-34A9-4FF5-B78E-B9A8DF4E87F3_at_[hidden]> <4BE09531.9040300_at_[hidden]> <0D4ABFDD-9802-4D77-BF70-C7EC3198F32A_at_[hidden]> <4BE0A505.2000005_at_[hidden]> <4BE0CB62.7080300_at_[hidden]> <10B2585F-576A-4B18-A83E-E8E16582329C_at_[hidden]> <4BE1AB3A.4010504_at_[hidden]> <9A3FCC9C-56DE-4DE4-A781-460CCC083CE9_at_[hidden]> <4BE1EC79.3030608_at_[hidden]> <20100505235456.GA5622_at_sopalepc> <7DC1D35D-11C6-4F4B-870A-031FF11F7B30_at_[hidden]> <4BE2D427.4090708_at_[hidden]> <4BE2F269.1090508_at_[hidden]>
> <4BE2F857.9090007_at_[hidden]> <4BE303FF.4020708_at_[hidden]>
> To: Open MPI Users <users_at_[hidden]>
> X-Mailer: Apple Mail (2.1078)
> X-PMX-Version: 5.5.9.388399, Antispam-Engine: 2.7.2.376379,
> Antispam-Data: 2010.5.6.211515
> X-PerlMx-Spam: Gauge=IIIIIIII, Probability=8%, Report='
> SUPERLONG_LINE 0.05, BODY_SIZE_4000_4999 0, BODY_SIZE_5000_LESS 0,
> BODY_SIZE_7000_LESS 0, __BOUNCE_CHALLENGE_SUBJ 0,
> __BOUNCE_NDR_SUBJ_EXEMPT 0, __CP_URI_IN_BODY 0, __CT 0, __CTE 0,
> __CT_TEXT_PLAIN 0, __HAS_MSGID 0, __HAS_X_MAILER 0,
> __MIME_TEXT_ONLY 0, __MIME_VERSION 0, __MIME_VERSION_APPLEMAIL 0,
> __MSGID_APPLEMAIL 0, __SANE_MSGID 0, __TO_MALFORMED_2 0,
> __URI_NS , __USER_AGENT_APPLEMAIL 0, __X_MAILER_APPLEMAIL 0'
> X-MIME-Autoconverted: from quoted-printable to 8bit by milliways.osl.iu.edu id
> o46LYSQt022842
> Subject: Re: [OMPI users] How do I run OpenMPI safely on a
> Nehalem standalone machine?
> X-BeenThere: users_at_[hidden]
> X-Mailman-Version: 2.1.11rc1
> Precedence: list
> Reply-To: Open MPI Users <users_at_[hidden]>
> List-Id: Open MPI Users <users.open-mpi.org>
> List-Unsubscribe: <http://www.open-mpi.org/mailman/options.cgi/users>,
> <mailto:users-request_at_[hidden]?subject=unsubscribe>
> List-Archive: <http://www.open-mpi.org/MailArchives/users>
> List-Post: <mailto:users_at_[hidden]>
> List-Help: <mailto:users-request_at_[hidden]?subject=help>
> List-Subscribe: <http://www.open-mpi.org/mailman/listinfo.cgi/users>,
> <mailto:users-request_at_[hidden]?subject=subscribe>
> Content-Type: text/plain; charset="us-ascii"
> Content-Transfer-Encoding: 7bit
> Sender: users-bounces_at_[hidden]
> Errors-To: users-bounces_at_[hidden]
> Status: O
> X-UID: 88094
> Content-Length: 4035
> X-Keywords:
>
> On May 6, 2010, at 2:01 PM, Gus Correa wrote:
>
>> 1) Now I can see and use the btl_sm_num_fifos component:
>>
>> I had committed already "btl = ^sm" to the openmpi-mca-params.conf
>> file. This apparently hides the btl_sm_num_fifos from ompi_info.
>>
>> After I switched to no options in openmpi-mca-params.conf,
>> then ompi_info showed the btl_sm_num_fifos component.
>>
>> ompi_info --all | grep btl_sm_num_fifos
>> MCA btl: parameter "btl_sm_num_fifos" (current value: "1", data source: default value)
>>
>> A side comment:
>> This means that the system administrator can
>> hide some Open MPI options from the users, depending on what
>> he puts in the openmpi-mca-params.conf file, right?
>
> Correct.
>
> BUT: a user can always override the "btl" MCA param and see them again. For example, you could also have done this:
>
> echo "btl =" > ~/.openmpi/mca-params.conf
> ompi_info --all | grep btl_sm_num_fifos
> # ...will show the sm params...
>
>> 2) However, running with "sm" still breaks, unfortunately:
>>
>> Boomer!
>
> Doh!
>
>> I get the same errors that I reported in my very
>> first email, if I increase the number of processes to 16,
>> to explore the hyperthreading range.
>>
>> This is using "sm" (i.e. not excluded in the mca config file),
>> and btl_sm_num_fifos (mpiexec command line)
>>
>> The machine hangs, requires a hard reboot, etc, etc,
>> as reported earlier. See the below, please.
>
> I saw that only some probably-unrelated dmesg messages were emitted. Was there anything else revealing on the console and/or /var/log/* files? Hard reboots absolutely should not be caused by Open MPI.
>
>> So, I guess the conclusion is that I can use sm,
>> but I have to remain within the range of physical cores (8),
>> not oversubscribe, not try to explore the HT range.
>> Should I expect it to work also for np>number of physical cores?
>
> Your prior explanations of when HT is useful seemed pretty reasonable to me. Meaning: Nehalem HT will help only in some kinds of codes. Dense computation codes with few conditional branches may not benefit much from HT.
>
> But OMPI applications should always run *correctly*, regardless of HT or not-HT -- even if you're oversubscribing. The performance may suffer (sometimes dramatically) if you oversubscribe physical cores with dense computational code, but it should always run *correctly*.
>
>> I wonder if this would still work with np<=8, but with heavier code.
>> (I only used hello_c.c so far.)
>
> If hello_c is crashing your computer - even if you're running np>8 or np>16 -- something is wrong outside of Open MPI. I routinely run np=100 hello_c on machines.
>
>> $ mpiexec -mca btl_sm_num_fifos 16 -np 16 a.out
>> --------------------------------------------------------------------------
>> mpiexec noticed that process rank 8 with PID 3659 on node spinoza.ldeo.columbia.edu exited on signal 11 (Segmentation fault).
>> --------------------------------------------------------------------------
>> $
>>
>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>> kernel:------------[ cut here ]------------
>>
>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>> kernel:invalid opcode: 0000 [#1] SMP
>>
>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>> kernel:last sysfs file: /sys/devices/system/cpu/cpu15/topology/physical_package_id
>>
>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>> kernel:Stack:
>>
>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>> kernel:Call Trace:
>>
>> Message from syslogd_at_spinoza at May 6 13:38:13 ...
>> kernel:Code: 48 89 45 a0 4c 89 ff e8 e0 dd 2b 00 41 8b b6 58 03 00 00 4c 89 e7 ff c6 e8 b5 bc ff ff 41 8b 96 5c 03 00 00 48 98 48 39 d0 73 04 <0f> 0b eb fe 48 29 d0 48 89 45 a8 66 41 ff 07 49 8b 94 24 00 01
>
> I unfortunately don't know what these messages mean...
>

-- 
Edgar Gabriel
Assistant Professor
Parallel Software Technologies Lab      http://pstl.cs.uh.edu
Department of Computer Science          University of Houston
Philip G. Hoffman Hall, Room 524        Houston, TX-77204, USA
Tel: +1 (713) 743-3857                  Fax: +1 (713) 743-3335