Message-ID:

"It's God. No, not Richard Stallman, or Linus Torvalds, but God." (By Matt Welsh)

devel / comp.lang.fortran / Re: Speed of writing formatted matrices of floats in Fortran and C++

Speed of writing formatted matrices of floats in Fortran and C++

<a94a1c2c-cb90-42af-a80c-bf9eb5ac8b12n@googlegroups.com>

https://www.rocksolidbbs.com/devel/article-flat.php?id=2214&group=comp.lang.fortran#2214

copy link Newsgroups: comp.lang.fortran

X-Received: by 2002:ad4:5b8c:0:b0:45a:9340:ef92 with SMTP id 12-20020ad45b8c000000b0045a9340ef92mr2342700qvp.85.1651838626181;
Fri, 06 May 2022 05:03:46 -0700 (PDT)
X-Received: by 2002:a25:d0d4:0:b0:646:f932:5d0f with SMTP id
h203-20020a25d0d4000000b00646f9325d0fmr2002063ybg.456.1651838622982; Fri, 06
May 2022 05:03:42 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.fortran
Date: Fri, 6 May 2022 05:03:42 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=173.76.111.99; posting-account=Ac_J3QkAAABih73tf3Yz4sHazwGUM-hW
NNTP-Posting-Host: 173.76.111.99
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a94a1c2c-cb90-42af-a80c-bf9eb5ac8b12n@googlegroups.com>
Subject: Speed of writing formatted matrices of floats in Fortran and C++
From: beliavsky@aol.com (Beliavsky)
Injection-Date: Fri, 06 May 2022 12:03:46 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Beliavsky - Fri, 6 May 2022 12:03 UTC

I am finding that writing 100x100 matrices of floats is about 25% faster with gfortran than g++ using printf on Windows and 4 times faster with gfortran than g++ on WSL2. I wonder what people see on Linux and if the C++ performance can be improved (or if the comparison is invalid for some reason). The codes and scripts are at https://github.com/Beliavsky/Formatted_output_speed . Maybe Fortran benefits because a whole row of a matrix can be written with

write (iu,"(*(1x,f0.6)") x(i,:)

instead of looping over each element.

Re: Speed of writing formatted matrices of floats in Fortran and C++

<optimization-20220506145740@ram.dialup.fu-berlin.de>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=2216&group=comp.lang.fortran#2216

copy link Newsgroups: comp.lang.fortran

Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!not-for-mail
From: ram@zedat.fu-berlin.de (Stefan Ram)
Newsgroups: comp.lang.fortran
Subject: Re: Speed of writing formatted matrices of floats in Fortran and C++
Date: 6 May 2022 14:05:22 GMT
Organization: Stefan Ram
Lines: 50
Expires: 1 Apr 2023 11:59:58 GMT
Message-ID: <optimization-20220506145740@ram.dialup.fu-berlin.de>
References: <a94a1c2c-cb90-42af-a80c-bf9eb5ac8b12n@googlegroups.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de C7E7Ua+PFgdjg8P/ajXbxgDLZkCZBwV17bOSHBrQRH/yH0
X-Copyright: (C) Copyright 2022 Stefan Ram. All rights reserved.
Distribution through any means other than regular usenet
channels is forbidden. It is forbidden to publish this
article in the Web, to change URIs of this article into links,
and to transfer the body without this notice, but quotations
of parts in other Usenet posts are allowed.
X-No-Archive: Yes
Archive: no
X-No-Archive-Readme: "X-No-Archive" is set, because this prevents some
services to mirror the article in the web. But the article may
be kept on a Usenet archive server with only NNTP access.
X-No-Html: yes
Content-Language: en-US
Accept-Language: de-DE, en-US, it, fr-FR

by: Stefan Ram - Fri, 6 May 2022 14:05 UTC

Beliavsky <beliavsky@aol.com> writes:
>I am finding that writing 100x100 matrices of floats is about
>25% faster with gfortran than g++ using printf on Windows

Console output can be very slow. If one is writing to a
console, one should make sure to use the same console system
for all languages tested. Writing to a file on disk often
will be faster.

I/O usually is deemed to be slow, so I think that the
overhead of additional instructions for a loop in C++
should be negligible compared to time for memory
accesses, serialization of float values, and I/O.

In addition to times for input and output, times for
accesses to memory outside of cache memory also play a role.
These could be increased if a matrix is stored distributed
over memory areas that are distant from each other, in
contrast to a matrix that is stored in a single block.
Regular patterns of memory access are faster than
irregular patterns.

This might depend on how matrices are implemented in your
C++ program.

An L1 cache reference might take 0.5 ns, an L2 cache
reference 7 ns, while a main memory reference might take 100
ns. When optimizing, it helps to be aware of the orders of
magnitude.

CPUS HAVE A HIERARCHICAL CACHE SYSTEM
From a 2014 talk by Chandler Carruth

One cycle on a 3 GHz processor 1 ns
L1 cache reference 0.5 ns
Branch mispredict 5 ns
L2 cache reference 7 ns 14x L1 cache
Mutex lock/unlock 25 ns
Main memory reference 100 ns 20xL2, 200xL1
Compress 1K bytes with Snappy 3,000 ns
Send 1K bytes over 1 Gbps network 10,000 ns 0.01 ms
Read 4K randomly from SSD 150,000 ns 0.15 ms
Read 1 MB sequentially from memory 250,000 ns 0.25 ms
Round trip within same datacenter 500,000 ns 0.5 ms
Read 1 MB sequentially From SSD 1,000,000 ns 1 ms 4x memory
Disk seek 10,000,000 ns 10 ms 20xdatacen. RT
Read 1 MB sequentially from disk 20,000,000 ns 20 ms 80xmem.,20xSSD
Send packet CA->Netherlands->CA 150,000,000 ns 150 ms

Re: Speed of writing formatted matrices of floats in Fortran and C++

<n1cdK.243$7%M9.29@fx12.iad>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=2223&group=comp.lang.fortran#2223

copy link Newsgroups: comp.lang.fortran

Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!ecngs!feeder2.ecngs.de!178.20.174.213.MISMATCH!feeder1.feed.usenet.farm!feed.usenet.farm!peer01.ams4!peer.am4.highwinds-media.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx12.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.13; rv:91.0)
Gecko/20100101 Thunderbird/91.9.0
Subject: Re: Speed of writing formatted matrices of floats in Fortran and C++
Content-Language: en-US
Newsgroups: comp.lang.fortran
References: <a94a1c2c-cb90-42af-a80c-bf9eb5ac8b12n@googlegroups.com>
From: nospam@nowhere.org (Ron Shepard)
In-Reply-To: <a94a1c2c-cb90-42af-a80c-bf9eb5ac8b12n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 44
Message-ID: <n1cdK.243$7%M9.29@fx12.iad>
X-Complaints-To: abuse@easynews.com
Organization: Forte - www.forteinc.com
X-Complaints-Info: Please be sure to forward a copy of ALL headers otherwise we will be unable to process your complaint properly.
Date: Fri, 6 May 2022 11:29:06 -0500
X-Received-Bytes: 3862

by: Ron Shepard - Fri, 6 May 2022 16:29 UTC

On 5/6/22 7:03 AM, Beliavsky wrote:
> I am finding that writing 100x100 matrices of floats is about 25% faster with gfortran than g++ using printf on Windows and 4 times faster with gfortran than g++ on WSL2. I wonder what people see on Linux and if the C++ performance can be improved (or if the comparison is invalid for some reason). The codes and scripts are at https://github.com/Beliavsky/Formatted_output_speed . Maybe Fortran benefits because a whole row of a matrix can be written with
>
> write (iu,"(*(1x,f0.6)") x(i,:)
>
> instead of looping over each element.

In addition to treating the i/o list as a sequence of scalars or as a
vector, there is also the question, in both languages, of how often the
format string is parsed. With FORMAT statements, compilers would
typically parse the format strings at compile time, so no run time
overhead occurred for that. Then when f77 allowed format strings with
character variables, there was sometimes significant differences in i/o
costs between the two approaches. This was because the format string was
parsed anew for each execution of the write statement. Then over time,
this was optimized by compilers. First, the literal strings and
character parameters were singled out and parsed at compile time the
same way as format statements. Then the compilers started recognizing
when variable strings were unchanged between write statements, and
optimized that parsing at run time.

I was in a computer users group back in the 1980s. Over a period of a
few years, there were frequent discussions about the costs of the
different fortran compilers on the IBM mainframe machines. Some users
were pushing for the compilers that supported f77, others wanted to keep
the old compilers because at first they were more efficient. My codes
would run for hours at a time and then print a few pages of output, so
my concern was the run time efficiency of the linear algebra, do loop
executions, and so on. Someone in another group presented results that
showed the f77 compiler was about 8x slower than the old f66+ compiler.
He always showed the ratios, not the actual times. This went on for
months. Then just by chance, someone asked him what were his actual
times. It turned out that they were executions of a few seconds each.
His jobs did a little bit of calculation and i/o, and then printed out
listings that were a couple hundred pages long. His computer times were
all dominated by i/o costs, which at that time were slower for the f77
compiler than they were for the old f66+ compiler. So people like me,
running jobs for hours at a time, were being held hostage by people
running codes that consumed a few seconds of time, and it was over
trivial issues like i/o library runtime performance.

The joys of shared computing on mainframes.

$.02 -Ron Shepard

Re: Speed of writing formatted matrices of floats in Fortran and C++

<b291668e-8341-49ea-8dea-47db1b9f7c04n@googlegroups.com>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=2225&group=comp.lang.fortran#2225

copy link Newsgroups: comp.lang.fortran

X-Received: by 2002:ac8:578c:0:b0:2f3:a7b7:878f with SMTP id v12-20020ac8578c000000b002f3a7b7878fmr3673055qta.186.1651855395665;
Fri, 06 May 2022 09:43:15 -0700 (PDT)
X-Received: by 2002:a81:78ce:0:b0:2f8:fbad:c446 with SMTP id
t197-20020a8178ce000000b002f8fbadc446mr3490231ywc.498.1651855395455; Fri, 06
May 2022 09:43:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!aioe.org!news.mixmin.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.fortran
Date: Fri, 6 May 2022 09:43:15 -0700 (PDT)
In-Reply-To: <n1cdK.243$7%M9.29@fx12.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=173.76.111.99; posting-account=Ac_J3QkAAABih73tf3Yz4sHazwGUM-hW
NNTP-Posting-Host: 173.76.111.99
References: <a94a1c2c-cb90-42af-a80c-bf9eb5ac8b12n@googlegroups.com> <n1cdK.243$7%M9.29@fx12.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <b291668e-8341-49ea-8dea-47db1b9f7c04n@googlegroups.com>
Subject: Re: Speed of writing formatted matrices of floats in Fortran and C++
From: beliavsky@aol.com (Beliavsky)
Injection-Date: Fri, 06 May 2022 16:43:15 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

by: Beliavsky - Fri, 6 May 2022 16:43 UTC

On Friday, May 6, 2022 at 12:29:12 PM UTC-4, Ron Shepard wrote:
> On 5/6/22 7:03 AM, Beliavsky wrote:
> > I am finding that writing 100x100 matrices of floats is about 25% faster with gfortran than g++ using printf on Windows and 4 times faster with gfortran than g++ on WSL2. I wonder what people see on Linux and if the C++ performance can be improved (or if the comparison is invalid for some reason). The codes and scripts are at https://github.com/Beliavsky/Formatted_output_speed . Maybe Fortran benefits because a whole row of a matrix can be written with
> >
> > write (iu,"(*(1x,f0.6)") x(i,:)
> >
> > instead of looping over each element.
> In addition to treating the i/o list as a sequence of scalars or as a
> vector, there is also the question, in both languages, of how often the
> format string is parsed. With FORMAT statements, compilers would
> typically parse the format strings at compile time, so no run time
> overhead occurred for that. Then when f77 allowed format strings with
> character variables, there was sometimes significant differences in i/o
> costs between the two approaches. This was because the format string was
> parsed anew for each execution of the write statement. Then over time,
> this was optimized by compilers. First, the literal strings and
> character parameters were singled out and parsed at compile time the
> same way as format statements. Then the compilers started recognizing
> when variable strings were unchanged between write statements, and
> optimized that parsing at run time.
>
> I was in a computer users group back in the 1980s. Over a period of a
> few years, there were frequent discussions about the costs of the
> different fortran compilers on the IBM mainframe machines. Some users
> were pushing for the compilers that supported f77, others wanted to keep
> the old compilers because at first they were more efficient. My codes
> would run for hours at a time and then print a few pages of output, so
> my concern was the run time efficiency of the linear algebra, do loop
> executions, and so on. Someone in another group presented results that
> showed the f77 compiler was about 8x slower than the old f66+ compiler.
> He always showed the ratios, not the actual times. This went on for
> months. Then just by chance, someone asked him what were his actual
> times. It turned out that they were executions of a few seconds each.
> His jobs did a little bit of calculation and i/o, and then printed out
> listings that were a couple hundred pages long. His computer times were
> all dominated by i/o costs, which at that time were slower for the f77
> compiler than they were for the old f66+ compiler. So people like me,
> running jobs for hours at a time, were being held hostage by people
> running codes that consumed a few seconds of time, and it was over
> trivial issues like i/o library runtime performance.
>
> The joys of shared computing on mainframes.
>
> $.02 -Ron Shepard

Great story. Please consider also posting it at https://fortran-lang.discourse.group/t/anecdotal-fortran/704 .

Subject	Author
Speed of writing formatted matrices of floats in Fortran and C++	Beliavsky
Re: Speed of writing formatted matrices of floats in Fortran and C++	Stefan Ram
Re: Speed of writing formatted matrices of floats in Fortran and C++	Ron Shepard
Re: Speed of writing formatted matrices of floats in Fortran and C++	Beliavsky