Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

"Just Say No." - Nancy Reagan "No." - Ronald Reagan


computers / news.software.nntp / Re: INN performance curve - why so much time dealing with the history file?

SubjectAuthor
* INN performance curve - why so much time dealing with the history file?Jesse Rehmer
`* Re: INN performance curve - why so much time dealing with the history file?Russ Allbery
 `* Re: INN performance curve - why so much time dealing with the historyJesse Rehmer
  `* Re: INN performance curve - why so much time dealing with the history file?Russ Allbery
   +* Re: INN performance curve - why so much time dealing with the historyJesse Rehmer
   |`* Re: INN performance curve - why so much time dealing with the history file?Russ Allbery
   | +- Re: INN performance curve - why so much time dealing with the historygo-while
   | +- Re: INN performance curve - why so much time dealing with the historygo-while
   | `* Re: INN performance curve - why so much time dealing with the history file?Jesse Rehmer
   |  `* Re: INN performance curve - why so much time dealing with the historyBilly G. (go-while)
   |   `- Re: INN performance curve - why so much time dealing with the history file?Jesse Rehmer
   `- Re: INN performance curve - why so much time dealing with the historyJulien ÉLIE

1
INN performance curve - why so much time dealing with the history file?

<u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1981&group=news.software.nntp#1981

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: INN performance curve - why so much time dealing with the history file?
Date: Wed, 26 Jul 2023 02:28:24 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 26 Jul 2023 02:28:24 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com; posting-account="k8cWG9+Y/93vxQYza75s9JQFoL8rgVF3P1Yluveoqs0";
logging-data="74346"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:f1pjMepUcwurrye2YV4Vs1FRwYE= sha256:LLx2RueqMFnqd97NA6tnT+Y50b5qXprY1CU6Y3rNKHQ=
sha1:9eoNTnENtFrP5ClEQsFfk9aUmLA= sha256:Rk4RBtUZIIci4mznNERQ9Q4PFHeU/qsynPUT0phQuK0=
X-Usenapp: v1.27.1/l - Full License
 by: Jesse Rehmer - Wed, 26 Jul 2023 02:28 UTC

Now that I've got faster hardware and have had a chance to feed over 184
million articles to a new server, I notice a steep decline in performance from
the beginning of the process to the end. When looking at the news.daily output
it appears the majority of the time is spent dealing with the history file. I
would expect article/overview writing or perl filtering to take more time, but
perhaps there is more to dealing with the history file than I
understand.<div><div>Is this kind of curve in performance degradation expected
as the history file grows?</div><div></div><div>Timer output (note I didn't
run news.daily until many days after injection stopped, accounting for the
huge idle time):</div><div></div><div>INND timer:</div><div>Code region
Time Pct Invoked Min(ms) Avg(ms) Max(ms)</div><div>article
cancel 00:05:22.727 0.0% 205040 0.021 1.574
7.677</div><div>article cleanup 00:05:44.900 0.0% 183854425 0.002
0.002 0.002</div><div>article logging 00:11:56.897 0.1% 184142520
0.003 0.004 0.005</div><div>article parse 00:30:34.354 0.2%
265195021 0.005 0.007 0.011</div><div>article write 00:43:24.939
0.3% 183809673 0.010 0.014 0.024</div><div>data move
00:01:08.029 0.0% 95351907 0.000 0.001
0.002</div><div>hisgrep/artcncl 00:04:05.556 0.0% 205040 0.007
1.198 4.204</div><div>history grep 00:00:00.000 0.0% 0
0.000 0.000 0.000</div><div>history lookup 49:09:14.755 21.9%
183854425 0.001 0.962 2.154</div><div>history sync 00:00:01.261
0.0% 2681 0.000 0.470 121.000</div><div>history write
41:34:50.164 18.5% 183854156 0.012 0.814 1.747</div><div>idle
120:16:16.894 53.6% 22762294 0.003 19.022
600095.000</div><div>nntp read 00:12:29.088 0.1% 83339892 0.004
0.009 0.012</div><div>overview write 02:45:18.889 1.2% 183809673
0.026 0.054 0.101</div><div>perl filter 07:57:01.509 3.5%
183854156 0.124 0.156 0.251</div><div>python filter 00:06:03.073
0.0% 183854156 0.000 0.002 0.014</div><div>site send
00:00:00.000 0.0% 0 0.000 0.000
0.000</div><div></div><div>TOTAL: 224:33:0 223:43:33.035 99.6% -
- - -</div><div></div><div>Performance shown by the hourly
breakdown matches what I see with other metrics such as bandwidth, disk I/O,
etc. There is a steady decline in all of those numbers as time went on. Note I
have article size logging turned off, was trying trim down as much logging as
possible.</div><div></div><div>Incoming articles:</div><div>Date
Articles %Arts Art/sec Size %Size KB/sec</div><div>Jul 16
10:03:43 - 10:59:59 6608118 3.6% 1956.80 0.0 KB 0.0%
0.00</div><div>Jul 16 11:00:00 - 11:59:59 13467714 7.3% 3741.03 0.0 KB
0.0% 0.00</div><div>Jul 16 12:00:00 - 12:59:59 8000940 4.3% 2222.48
0.0 KB 0.0% 0.00</div><div>Jul 16 13:00:00 - 13:59:59 5748819 3.1%
1596.89 0.0 KB 0.0% 0.00</div><div>Jul 16 14:00:00 - 14:59:59 5122241
2.8% 1422.84 0.0 KB 0.0% 0.00</div><div>Jul 16 15:00:00 - 15:59:59
4287868 2.3% 1191.07 0.0 KB 0.0% 0.00</div><div>Jul 16 16:00:00 -
16:59:59 4034400 2.2% 1120.67 0.0 KB 0.0% 0.00</div><div>Jul 16
17:00:00 - 17:59:59 3548680 1.9% 985.74 0.0 KB 0.0%
0.00</div><div>Jul 16 18:00:00 - 18:59:59 3238737 1.8% 899.65 0.0 KB
0.0% 0.00</div><div>Jul 16 19:00:00 - 19:59:59 3179352 1.7% 883.15
0.0 KB 0.0% 0.00</div><div>Jul 16 20:00:00 - 20:59:59 2861511 1.6%
794.86 0.0 KB 0.0% 0.00</div><div>Jul 16 21:00:00 - 21:59:59 2557273
1.4% 710.35 0.0 KB 0.0% 0.00</div><div>Jul 16 22:00:00 - 22:59:59
2476205 1.3% 687.83 0.0 KB 0.0% 0.00</div><div>Jul 16 23:00:00 -
23:59:59 2470653 1.3% 686.29 0.0 KB 0.0% 0.00</div><div>Jul 17
00:00:00 - 00:59:59 2377684 1.3% 660.47 0.0 KB 0.0%
0.00</div><div>Jul 17 01:00:00 - 01:59:59 2214251 1.2% 615.07 0.0 KB
0.0% 0.00</div><div>Jul 17 02:00:00 - 02:59:59 2160533 1.2% 600.15
0.0 KB 0.0% 0.00</div><div>Jul 17 03:00:00 - 03:59:59 2138072 1.2%
593.91 0.0 KB 0.0% 0.00</div><div>Jul 17 04:00:00 - 04:59:59 2064017
1.1% 573.34 0.0 KB 0.0% 0.00</div><div>Jul 17 05:00:00 - 05:59:59
1975211 1.1% 548.67 0.0 KB 0.0% 0.00</div><div>Jul 17 06:00:00 -
06:59:59 1902269 1.0% 528.41 0.0 KB 0.0% 0.00</div><div>Jul 17
07:00:00 - 07:59:59 1859862 1.0% 516.63 0.0 KB 0.0%
0.00</div><div>Jul 17 08:00:00 - 08:59:59 1826617 1.0% 507.39 0.0 KB
0.0% 0.00</div><div>Jul 17 09:00:00 - 09:59:59 1803042 1.0% 500.85
0.0 KB 0.0% 0.00</div><div>Jul 17 10:00:00 - 10:59:59 1779735 1.0%
494.37 0.0 KB 0.0% 0.00</div><div>Jul 17 11:00:00 - 11:59:59 1724235
0.9% 478.95 0.0 KB 0.0% 0.00</div><div>Jul 17 12:00:00 - 12:59:59
1694753 0.9% 470.76 0.0 KB 0.0% 0.00</div><div>Jul 17 13:00:00 -
13:59:59 1687997 0.9% 468.89 0.0 KB 0.0% 0.00</div><div>Jul 17
14:00:00 - 14:59:59 1689864 0.9% 469.41 0.0 KB 0.0%
0.00</div><div>Jul 17 15:00:00 - 15:59:59 1667083 0.9% 463.08 0.0 KB
0.0% 0.00</div><div>Jul 17 16:00:00 - 16:59:59 1614929 0.9% 448.59
0.0 KB 0.0% 0.00</div><div>Jul 17 17:00:00 - 17:59:59 1551492 0.8%
430.97 0.0 KB 0.0% 0.00</div><div>Jul 17 18:00:00 - 18:59:59 1510100
0.8% 419.47 0.0 KB 0.0% 0.00</div><div>Jul 17 19:00:00 - 19:59:59
1516064 0.8% 421.13 0.0 KB 0.0% 0.00</div><div>Jul 17 20:00:00 -
20:59:59 1504238 0.8% 417.84 0.0 KB 0.0% 0.00</div><div>Jul 17
21:00:00 - 21:59:59 1511102 0.8% 419.75 0.0 KB 0.0%
0.00</div><div>Jul 17 22:00:00 - 22:59:59 1498772 0.8% 416.33 0.0 KB
0.0% 0.00</div><div>Jul 17 23:00:00 - 23:59:59 1459980 0.8% 405.55
0.0 KB 0.0% 0.00</div><div>Jul 18 00:00:00 - 00:59:59 1414708 0.8%
392.97 0.0 KB 0.0% 0.00</div><div>Jul 18 01:00:00 - 01:59:59 1404954
0.8% 390.26 0.0 KB 0.0% 0.00</div><div>Jul 18 02:00:00 - 02:59:59
1376430 0.7% 382.34 0.0 KB 0.0% 0.00</div><div>Jul 18 03:00:00 -
03:59:59 1378259 0.7% 382.85 0.0 KB 0.0% 0.00</div><div>Jul 18
04:00:00 - 04:59:59 1390281 0.8% 386.19 0.0 KB 0.0%
0.00</div><div>Jul 18 05:00:00 - 05:59:59 1386335 0.8% 385.09 0.0 KB
0.0% 0.00</div><div>Jul 18 06:00:00 - 06:59:59 1355294 0.7% 376.47
0.0 KB 0.0% 0.00</div><div>Jul 18 07:00:00 - 07:59:59 1327220 0.7%
368.67 0.0 KB 0.0% 0.00</div><div>Jul 18 08:00:00 - 08:59:59 1296572
0.7% 360.16 0.0 KB 0.0% 0.00</div><div>Jul 18 09:00:00 - 09:59:59
1276394 0.7% 354.55 0.0 KB 0.0% 0.00</div><div>Jul 18 10:00:00 -
10:59:59 1280991 0.7% 355.83 0.0 KB 0.0% 0.00</div><div>Jul 18
11:00:00 - 11:59:59 1276734 0.7% 354.65 0.0 KB 0.0%
0.00</div><div>Jul 18 12:00:00 - 12:59:59 1330133 0.7% 369.48 0.0 KB
0.0% 0.00</div><div>Jul 18 13:00:00 - 13:59:59 1321197 0.7% 367.00
0.0 KB 0.0% 0.00</div><div>Jul 18 14:00:00 - 14:59:59 1270985 0.7%
353.05 0.0 KB 0.0% 0.00</div><div>Jul 18 15:00:00 - 15:59:59 1251052
0.7% 347.51 0.0 KB 0.0% 0.00</div><div>Jul 18 16:00:00 - 16:59:59
1238624 0.7% 344.06 0.0 KB 0.0% 0.00</div><div>Jul 18 17:00:00 -
17:59:59 1227250 0.7% 340.90 0.0 KB 0.0% 0.00</div><div>Jul 18
18:00:00 - 18:59:59 1203798 0.7% 334.39 0.0 KB 0.0%
0.00</div><div>Jul 18 19:00:00 - 19:59:59 1225960 0.7% 340.54 0.0 KB
0.0% 0.00</div><div>Jul 18 20:00:00 - 20:59:59 1221275 0.7% 339.24
0.0 KB 0.0% 0.00</div><div>Jul 18 21:00:00 - 21:59:59 1199617 0.7%
333.23 0.0 KB 0.0% 0.00</div><div>Jul 18 22:00:00 - 22:59:59 1178009
0.6% 327.22 0.0 KB 0.0% 0.00</div><div>Jul 18 23:00:00 - 23:59:59
1154628 0.6% 320.73 0.0 KB 0.0% 0.00</div><div>Jul 19 00:00:00 -
00:59:59 1145227 0.6% 318.12 0.0 KB 0.0% 0.00</div><div>Jul 19
01:00:00 - 01:59:59 1111125 0.6% 308.65 0.0 KB 0.0%
0.00</div><div>Jul 19 02:00:00 - 02:59:59 1095739 0.6% 304.37 0.0 KB
0.0% 0.00</div><div>Jul 19 03:00:00 - 03:59:59 1093542 0.6% 303.76
0.0 KB 0.0% 0.00</div><div>Jul 19 04:00:00 - 04:59:59 1089209 0.6%
302.56 0.0 KB 0.0% 0.00</div><div>Jul 19 05:00:00 - 05:59:59 1090842
0.6% 303.01 0.0 KB 0.0% 0.00</div><div>Jul 19 06:00:00 - 06:59:59
1074421 0.6% 298.45 0.0 KB 0.0% 0.00</div><div>Jul 19 07:00:00 -
07:59:59 1054212 0.6% 292.84 0.0 KB 0.0% 0.00</div><div>Jul 19
08:00:00 - 08:59:59 1044447 0.6% 290.12 0.0 KB 0.0%
0.00</div><div>Jul 19 09:00:00 - 09:59:59 931031 0.5% 258.62 0.0 KB
0.0% 0.00</div><div>Jul 19 10:00:00 - 10:59:59 1079244 0.6% 299.79
0.0 KB 0.0% 0.00</div><div>Jul 19 11:00:00 - 11:59:59 1069259 0.6%
297.02 0.0 KB 0.0% 0.00</div><div>Jul 19 12:00:00 - 12:59:59 1067895
0.6% 296.64 0.0 KB 0.0% 0.00</div><div>Jul 19 13:00:00 - 13:59:59
1056197 0.6% 293.39 0.0 KB 0.0% 0.00</div><div>Jul 19 14:00:00 -
14:59:59 1064508 0.6% 295.70 0.0 KB 0.0% 0.00</div><div>Jul 19
15:00:00 - 15:59:59 1053532 0.6% 292.65 0.0 KB 0.0%
0.00</div><div>Jul 19 16:00:00 - 16:59:59 1035573 0.6% 287.66 0.0 KB
0.0% 0.00</div><div>Jul 19 17:00:00 - 17:59:59 1017242 0.6% 282.57
0.0 KB 0.0% 0.00</div><div>Jul 19 18:00:00 - 18:59:59 1019435 0.6%
283.18 0.0 KB 0.0% 0.00</div><div>Jul 19 19:00:00 - 19:59:59 1013209
0.5% 281.45 0.0 KB 0.0% 0.00</div><div>Jul 19 20:00:00 - 20:59:59
1003259 0.5% 278.68 0.0 KB 0.0% 0.00</div><div>Jul 19 21:00:00 -
21:59:59 998980 0.5% 277.49 0.0 KB 0.0% 0.00</div><div>Jul 19
22:00:00 - 22:59:59 999129 0.5% 277.54 0.0 KB 0.0%
0.00</div><div>Jul 19 23:00:00 - 23:59:59 1005058 0.5% 279.18 0.0 KB
0.0% 0.00</div><div>Jul 20 00:00:00 - 00:59:59 1004154 0.5% 278.93
0.0 KB 0.0% 0.00</div><div>Jul 20 01:00:00 - 01:59:59 983326 0.5%
273.15 0.0 KB 0.0% 0.00</div><div>Jul 20 02:00:00 - 02:59:59 984133
0.5% 273.37 0.0 KB 0.0% 0.00</div><div>Jul 20 03:00:00 - 03:59:59
956548 0.5% 265.71 0.0 KB 0.0% 0.00</div><div>Jul 20 04:00:00 -
04:59:59 955865 0.5% 265.52 0.0 KB 0.0% 0.00</div><div>Jul 20
05:00:00 - 05:59:59 949230 0.5% 263.68 0.0 KB 0.0%
0.00</div><div>Jul 20 06:00:00 - 06:59:59 938727 0.5% 260.76 0.0 KB
0.0% 0.00</div><div>Jul 20 07:00:00 - 07:59:59 947608 0.5% 263.22
0.0 KB 0.0% 0.00</div><div>Jul 20 08:00:00 - 08:59:59 955909 0.5%
265.53 0.0 KB 0.0% 0.00</div><div>Jul 20 09:00:00 - 09:59:59 951293
0.5% 264.25 0.0 KB 0.0% 0.00</div><div>Jul 20 10:00:00 - 10:59:59
939124 0.5% 260.87 0.0 KB 0.0% 0.00</div><div>Jul 20 11:00:00 -
11:59:59 945338 0.5% 262.59 0.0 KB 0.0% 0.00</div><div>Jul 20
12:00:00 - 12:59:59 913587 0.5% 253.77 0.0 KB 0.0%
0.00</div><div>Jul 20 13:00:00 - 13:59:59 926263 0.5% 257.30 0.0 KB
0.0% 0.00</div><div>Jul 20 14:00:00 - 14:59:59 930761 0.5% 258.54
0.0 KB 0.0% 0.00</div><div>Jul 20 15:00:00 - 15:59:59 908011 0.5%
252.23 0.0 KB 0.0% 0.00</div><div>Jul 20 16:00:00 - 16:59:59 863137
0.5% 239.76 0.0 KB 0.0% 0.00</div><div>Jul 20 17:00:00 - 17:59:59
851113 0.5% 236.42 0.0 KB 0.0% 0.00</div><div>Jul 20 18:00:00 -
18:59:59 843252 0.5% 234.24 0.0 KB 0.0% 0.00</div><div>Jul 20
19:00:00 - 19:25:57 218494 0.1% 140.33 0.0 KB 0.0%
0.00</div></div>


Click here to read the complete article
Re: INN performance curve - why so much time dealing with the history file?

<87351bz91n.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1982&group=news.software.nntp#1982

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: INN performance curve - why so much time dealing with the history file?
Date: Tue, 25 Jul 2023 19:44:04 -0700
Organization: The Eyrie
Message-ID: <87351bz91n.fsf@hope.eyrie.org>
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="30902"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:pnUDFBmmHmdLqn6F5keB4yWEdZE=
 by: Russ Allbery - Wed, 26 Jul 2023 02:44 UTC

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

> Now that I've got faster hardware and have had a chance to feed over 184
> million articles to a new server, I notice a steep decline in
> performance from the beginning of the process to the end. When looking
> at the news.daily output it appears the majority of the time is spent
> dealing with the history file. I would expect article/overview writing
> or perl filtering to take more time, but perhaps there is more to
> dealing with the history file than I understand.

It sounds like you didn't run news.daily while you were feeding in the
articles. My guess is that the history file index size was too small, so
you got tons of page overflows, which slows everything down considerably.
The history file is dynamically resized as part of the news.daily process,
although that process is not really designed for the case of feeding in
tons of new articles.

Pre-sizing the history file to be much larger at the start might have
helped if I'm right about the possible cause.

There is almost certainly a better algorithm to use for the history
database than what INN currently does, which is more than 30 years old.
(Just throwing everything into a modern SQL database might be a
substantial improvement, although the history file has very specific
characteristics, so at least in theory an algorithm chosen precisely for
its type of data would be fastest.)

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: INN performance curve - why so much time dealing with the history file?

<u9q2rs$1d3h$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1983&group=news.software.nntp#1983

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN performance curve - why so much time dealing with the history
file?
Date: Tue, 25 Jul 2023 22:10:52 -0500
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u9q2rs$1d3h$1@nnrp.usenet.blueworldhosting.com>
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
<87351bz91n.fsf@hope.eyrie.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 26 Jul 2023 03:10:52 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com; posting-account="k8cWG9+Y/93vxQYza75s9JQFoL8rgVF3P1Yluveoqs0";
logging-data="46193"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:0VXKQrqETc29dC5zwdD3NtErySI= sha256:qKfrE1vHUj+jLjCPy9Tvu3B1kQ0SoOuRUmYxyzVdsFY=
sha1:aDFttIoiKbvj/DSE76yvt8ZkpRY= sha256:9vfEEpRvZEq1+ERMOl12cWML/SZSeIxt2RE+ZrEDPhE=
In-Reply-To: <87351bz91n.fsf@hope.eyrie.org>
Content-Language: en-US
 by: Jesse Rehmer - Wed, 26 Jul 2023 03:10 UTC

On 7/25/23 9:44 PM, Russ Allbery wrote:
> Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
>
>> Now that I've got faster hardware and have had a chance to feed over 184
>> million articles to a new server, I notice a steep decline in
>> performance from the beginning of the process to the end. When looking
>> at the news.daily output it appears the majority of the time is spent
>> dealing with the history file. I would expect article/overview writing
>> or perl filtering to take more time, but perhaps there is more to
>> dealing with the history file than I understand.
>
> It sounds like you didn't run news.daily while you were feeding in the
> articles. My guess is that the history file index size was too small, so
> you got tons of page overflows, which slows everything down considerably.
> The history file is dynamically resized as part of the news.daily process,
> although that process is not really designed for the case of feeding in
> tons of new articles.
>
> Pre-sizing the history file to be much larger at the start might have
> helped if I'm right about the possible cause.
>
> There is almost certainly a better algorithm to use for the history
> database than what INN currently does, which is more than 30 years old.
> (Just throwing everything into a modern SQL database might be a
> substantial improvement, although the history file has very specific
> characteristics, so at least in theory an algorithm chosen precisely for
> its type of data would be fastest.)
>

Not sure why my client formatted the first message so horribly, in the
past it has never tried to do HTML-ish things. I trimmed some of the
output so it wouldn't wrap below. For the first 6 hours or so the
performance is stellar, but then trails off pretty drastically.

Correct, I didn't run news.daily until after the injection run completed.

The history file is ~18GB. How would one go about pre-sizing it? That is
one topic I don't think I've stumbled across.

Date Articles Art/sec
Jul 16 10:03:43 - 10:59:59 6608118 1956.80
Jul 16 11:00:00 - 11:59:59 13467714 3741.03
Jul 16 12:00:00 - 12:59:59 8000940 2222.48
Jul 16 13:00:00 - 13:59:59 5748819 1596.89
Jul 16 14:00:00 - 14:59:59 5122241 1422.84
Jul 16 15:00:00 - 15:59:59 4287868 1191.07
Jul 16 16:00:00 - 16:59:59 4034400 1120.67
Jul 16 17:00:00 - 17:59:59 3548680 985.74
Jul 16 18:00:00 - 18:59:59 3238737 899.65
Jul 16 19:00:00 - 19:59:59 3179352 883.15
Jul 16 20:00:00 - 20:59:59 2861511 794.86
Jul 16 21:00:00 - 21:59:59 2557273 710.35
Jul 16 22:00:00 - 22:59:59 2476205 687.83
Jul 16 23:00:00 - 23:59:59 2470653 686.29
Jul 17 00:00:00 - 00:59:59 2377684 660.47
Jul 17 01:00:00 - 01:59:59 2214251 615.07
Jul 17 02:00:00 - 02:59:59 2160533 600.15
Jul 17 03:00:00 - 03:59:59 2138072 593.91
Jul 17 04:00:00 - 04:59:59 2064017 573.34
Jul 17 05:00:00 - 05:59:59 1975211 548.67
Jul 17 06:00:00 - 06:59:59 1902269 528.41
Jul 17 07:00:00 - 07:59:59 1859862 516.63
Jul 17 08:00:00 - 08:59:59 1826617 507.39
Jul 17 09:00:00 - 09:59:59 1803042 500.85
Jul 17 10:00:00 - 10:59:59 1779735 494.37
Jul 17 11:00:00 - 11:59:59 1724235 478.95
Jul 17 12:00:00 - 12:59:59 1694753 470.76
Jul 17 13:00:00 - 13:59:59 1687997 468.89
Jul 17 14:00:00 - 14:59:59 1689864 469.41
Jul 17 15:00:00 - 15:59:59 1667083 463.08
Jul 17 16:00:00 - 16:59:59 1614929 448.59
Jul 17 17:00:00 - 17:59:59 1551492 430.97
Jul 17 18:00:00 - 18:59:59 1510100 419.47
Jul 17 19:00:00 - 19:59:59 1516064 421.13
Jul 17 20:00:00 - 20:59:59 1504238 417.84
Jul 17 21:00:00 - 21:59:59 1511102 419.75
Jul 17 22:00:00 - 22:59:59 1498772 416.33
Jul 17 23:00:00 - 23:59:59 1459980 405.55
Jul 18 00:00:00 - 00:59:59 1414708 392.97
Jul 18 01:00:00 - 01:59:59 1404954 390.26
Jul 18 02:00:00 - 02:59:59 1376430 382.34
Jul 18 03:00:00 - 03:59:59 1378259 382.85
Jul 18 04:00:00 - 04:59:59 1390281 386.19
Jul 18 05:00:00 - 05:59:59 1386335 385.09
Jul 18 06:00:00 - 06:59:59 1355294 376.47
Jul 18 07:00:00 - 07:59:59 1327220 368.67
Jul 18 08:00:00 - 08:59:59 1296572 360.16
Jul 18 09:00:00 - 09:59:59 1276394 354.55
Jul 18 10:00:00 - 10:59:59 1280991 355.83
Jul 18 11:00:00 - 11:59:59 1276734 354.65
Jul 18 12:00:00 - 12:59:59 1330133 369.48
Jul 18 13:00:00 - 13:59:59 1321197 367.00
Jul 18 14:00:00 - 14:59:59 1270985 353.05
Jul 18 15:00:00 - 15:59:59 1251052 347.51
Jul 18 16:00:00 - 16:59:59 1238624 344.06
Jul 18 17:00:00 - 17:59:59 1227250 340.90
Jul 18 18:00:00 - 18:59:59 1203798 334.39
Jul 18 19:00:00 - 19:59:59 1225960 340.54
Jul 18 20:00:00 - 20:59:59 1221275 339.24
Jul 18 21:00:00 - 21:59:59 1199617 333.23
Jul 18 22:00:00 - 22:59:59 1178009 327.22
Jul 18 23:00:00 - 23:59:59 1154628 320.73
Jul 19 00:00:00 - 00:59:59 1145227 318.12
Jul 19 01:00:00 - 01:59:59 1111125 308.65
Jul 19 02:00:00 - 02:59:59 1095739 304.37
Jul 19 03:00:00 - 03:59:59 1093542 303.76
Jul 19 04:00:00 - 04:59:59 1089209 302.56
Jul 19 05:00:00 - 05:59:59 1090842 303.01
Jul 19 06:00:00 - 06:59:59 1074421 298.45
Jul 19 07:00:00 - 07:59:59 1054212 292.84
Jul 19 08:00:00 - 08:59:59 1044447 290.12
Jul 19 09:00:00 - 09:59:59 931031 258.62
Jul 19 10:00:00 - 10:59:59 1079244 299.79
Jul 19 11:00:00 - 11:59:59 1069259 297.02
Jul 19 12:00:00 - 12:59:59 1067895 296.64
Jul 19 13:00:00 - 13:59:59 1056197 293.39
Jul 19 14:00:00 - 14:59:59 1064508 295.70
Jul 19 15:00:00 - 15:59:59 1053532 292.65
Jul 19 16:00:00 - 16:59:59 1035573 287.66
Jul 19 17:00:00 - 17:59:59 1017242 282.57
Jul 19 18:00:00 - 18:59:59 1019435 283.18
Jul 19 19:00:00 - 19:59:59 1013209 281.45
Jul 19 20:00:00 - 20:59:59 1003259 278.68
Jul 19 21:00:00 - 21:59:59 998980 277.49
Jul 19 22:00:00 - 22:59:59 999129 277.54
Jul 19 23:00:00 - 23:59:59 1005058 279.18
Jul 20 00:00:00 - 00:59:59 1004154 278.93
Jul 20 01:00:00 - 01:59:59 983326 273.15
Jul 20 02:00:00 - 02:59:59 984133 273.37
Jul 20 03:00:00 - 03:59:59 956548 265.71
Jul 20 04:00:00 - 04:59:59 955865 265.52
Jul 20 05:00:00 - 05:59:59 949230 263.68
Jul 20 06:00:00 - 06:59:59 938727 260.76
Jul 20 07:00:00 - 07:59:59 947608 263.22
Jul 20 08:00:00 - 08:59:59 955909 265.53
Jul 20 09:00:00 - 09:59:59 951293 264.25
Jul 20 10:00:00 - 10:59:59 939124 260.87
Jul 20 11:00:00 - 11:59:59 945338 262.59
Jul 20 12:00:00 - 12:59:59 913587 253.77
Jul 20 13:00:00 - 13:59:59 926263 257.30
Jul 20 14:00:00 - 14:59:59 930761 258.54
Jul 20 15:00:00 - 15:59:59 908011 252.23
Jul 20 16:00:00 - 16:59:59 863137 239.76
Jul 20 17:00:00 - 17:59:59 851113 236.42
Jul 20 18:00:00 - 18:59:59 843252 234.24
Jul 20 19:00:00 - 19:25:57 218494 140.33

Re: INN performance curve - why so much time dealing with the history file?

<87y1j3xsl4.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1984&group=news.software.nntp#1984

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: INN performance curve - why so much time dealing with the history file?
Date: Tue, 25 Jul 2023 20:24:55 -0700
Organization: The Eyrie
Message-ID: <87y1j3xsl4.fsf@hope.eyrie.org>
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
<87351bz91n.fsf@hope.eyrie.org>
<u9q2rs$1d3h$1@nnrp.usenet.blueworldhosting.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="30902"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:Bw3Dq2nUi3x9KEhuFoVpdQFN/mY=
 by: Russ Allbery - Wed, 26 Jul 2023 03:24 UTC

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

> The history file is ~18GB. How would one go about pre-sizing it? That is
> one topic I don't think I've stumbled across.

(Note that this is unnecessary now; now that you've run news.daily after
feeding in all the articles, it will have been resized to match its
current size, so this problem hopefully won't matter any more. If you're
still seeing ongoing slowness, then I misunderstood.)

Running makedbz manually will let you provide the -s flag, which specifies
the number of entries to size the history file for. news.daily will use
the current size as the expected size. When you're going to feed in a ton
of articles, you want to pass in something much, much larger, roughly the
number of entries you're expecting to have at the end. The relevant
instructions in INSTALL are:

| Next, you need to create an empty F<history> database. To do this, type:
|
| cd <pathdb in inn.conf>
| touch history
| makedbz -i -o

which possibly should mention the -s flag.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: INN performance curve - why so much time dealing with the history file?

<u9q49m$k3t$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1985&group=news.software.nntp#1985

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN performance curve - why so much time dealing with the history
file?
Date: Tue, 25 Jul 2023 22:35:18 -0500
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u9q49m$k3t$1@nnrp.usenet.blueworldhosting.com>
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
<87351bz91n.fsf@hope.eyrie.org>
<u9q2rs$1d3h$1@nnrp.usenet.blueworldhosting.com>
<87y1j3xsl4.fsf@hope.eyrie.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 26 Jul 2023 03:35:18 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com; posting-account="k8cWG9+Y/93vxQYza75s9JQFoL8rgVF3P1Yluveoqs0";
logging-data="20605"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:jFyQ+L0Q12Cw8RjezeAM3WoijVA= sha256:HQ1pxMOUEUNqZWL0s1+ebuYyRszwlMR9TjNoevF1Fpk=
sha1:Y2fehT53H/Gi/it0Lvki6vPwVU4= sha256:Cs7kLW0+P5M88LyHeF3/aNa2awlN1DlU2UYNx45o8Hw=
Content-Language: en-US
In-Reply-To: <87y1j3xsl4.fsf@hope.eyrie.org>
 by: Jesse Rehmer - Wed, 26 Jul 2023 03:35 UTC

On 7/25/23 10:24 PM, Russ Allbery wrote:
> Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
>
>> The history file is ~18GB. How would one go about pre-sizing it? That is
>> one topic I don't think I've stumbled across.
>
> (Note that this is unnecessary now; now that you've run news.daily after
> feeding in all the articles, it will have been resized to match its
> current size, so this problem hopefully won't matter any more. If you're
> still seeing ongoing slowness, then I misunderstood.)
>
> Running makedbz manually will let you provide the -s flag, which specifies
> the number of entries to size the history file for. news.daily will use
> the current size as the expected size. When you're going to feed in a ton
> of articles, you want to pass in something much, much larger, roughly the
> number of entries you're expecting to have at the end. The relevant
> instructions in INSTALL are:
>
> | Next, you need to create an empty F<history> database. To do this, type:
> |
> | cd <pathdb in inn.conf>
> | touch history
> | makedbz -i -o
>
> which possibly should mention the -s flag.
>

Got it! This run was a test to see just how long it would take, so the
new box is kind of a scratch spot for further sorting later.

I'll restore from snapshot and manually create the history file, if I'm
reading the manpage correctly, the value used for the '-s' flag is the
number of articles expected (each line being an article)?

I'd probably want something like this assuming that's correct?

makedbz -i -o -s 200000000

Re: INN performance curve - why so much time dealing with the history file?

<87tttrxrtv.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1986&group=news.software.nntp#1986

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: INN performance curve - why so much time dealing with the history file?
Date: Tue, 25 Jul 2023 20:41:16 -0700
Organization: The Eyrie
Message-ID: <87tttrxrtv.fsf@hope.eyrie.org>
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
<87351bz91n.fsf@hope.eyrie.org>
<u9q2rs$1d3h$1@nnrp.usenet.blueworldhosting.com>
<87y1j3xsl4.fsf@hope.eyrie.org>
<u9q49m$k3t$1@nnrp.usenet.blueworldhosting.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="30902"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:4UQLNNORQIdzCHuHyfNsoh7i4jY=
 by: Russ Allbery - Wed, 26 Jul 2023 03:41 UTC

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

> I'll restore from snapshot and manually create the history file, if I'm
> reading the manpage correctly, the value used for the '-s' flag is the
> number of articles expected (each line being an article)?

Yes, that's correct. The actual history index size will then be larger
than that to try to make it so that the number of entries is about 2/3rds
the size of the index.

(If I remember correctly, the dbz algorithm uses linear probing for hash
collisions, so you really want a sparse hash. If the linear probe goes
for too long, it goes to a separate overflow table, and once that happens,
the performance really tanks *and* the history file size bloats. It's
just a bad time all around, sort of like a system going to swap. I'm
betting your history file went to multiple overflow tables because it
started massively undersized.)

> I'd probably want something like this assuming that's correct?

> makedbz -i -o -s 200000000

Looks good to me, assuming that's the number of articles you're dealing
with.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: INN performance curve - why so much time dealing with the history file?

<I16wM.75152$wsc3.46118@fx13.ams4>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1987&group=news.software.nntp#1987

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.neodome.net!feeder1.feed.usenet.farm!feed.usenet.farm!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx13.ams4.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Subject: Re: INN performance curve - why so much time dealing with the history
file?
Content-Language: en-US
Newsgroups: news.software.nntp
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
<87351bz91n.fsf@hope.eyrie.org>
<u9q2rs$1d3h$1@nnrp.usenet.blueworldhosting.com>
<87y1j3xsl4.fsf@hope.eyrie.org>
<u9q49m$k3t$1@nnrp.usenet.blueworldhosting.com>
<87tttrxrtv.fsf@hope.eyrie.org>
From: no-reply@no.spam (go-while)
In-Reply-To: <87tttrxrtv.fsf@hope.eyrie.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 104
Message-ID: <I16wM.75152$wsc3.46118@fx13.ams4>
X-Complaints-To: abuse@blocknews.net
NNTP-Posting-Date: Wed, 26 Jul 2023 09:52:40 UTC
Organization: blocknews - www.blocknews.net
Date: Wed, 26 Jul 2023 12:33:49 +0200
X-Received-Bytes: 6722
 by: go-while - Wed, 26 Jul 2023 10:33 UTC

On 26.07.23 05:41, Russ Allbery wrote:
> (If I remember correctly, the dbz algorithm uses linear probing for hash
> collisions, so you really want a sparse hash. If the linear probe goes
> for too long, it goes to a separate overflow table, and once that happens,
> the performance really tanks *and* the history file size bloats. It's
> just a bad time all around, sort of like a system going to swap. I'm
> betting your history file went to multiple overflow tables because it
> started massively undersized.)

https://manpages.debian.org/testing/inn2-dev/dbz.3.en.html

DESCRIPTION
These functions provide an indexing system for rapid random access to a
text file (the base file).

Dbz stores offsets into the base text file for rapid retrieval. All
retrievals are keyed on a hash value that is generated by the
HashMessageID() function.

Dbzinit opens a database, an index into the base file base, consisting
of files base.dir , base.index , and base.hash which must already exist.
(If the database is new, they should be zero-length files.) Subsequent
accesses go to that database until dbzclose is called to close the database.

Dbzfetch searches the database for the specified key, returning the
corresponding value if any, if <--enable-tagged-hash at configure> is
specified. If <--enable-tagged-hash at configure> is not specified, it
returns true and content of ivalue is set. Dbzstore stores the key -
value pair in the database, if <--enable-tagged-hash at configure> is
specified. If <--enable-tagged-hash at configure> is not specified, it
stores the content of ivalue. Dbzstore will fail unless the database
files are writable. Dbzexists will verify whether or not the given hash
exists or not. Dbz is optimized for this operation and it may be
significantly faster than dbzfetch().

Dbzfresh is a variant of dbzinit for creating a new database with more
control over details.

Dbzfresh's size parameter specifies the size of the first hash table
within the database, in key-value pairs. Performance will be best if the
number of key-value pairs stored in the database does not exceed about
2/3 of size. (The dbzsize function, given the expected number of
key-value pairs, will suggest a database size that meets these
criteria.) Assuming that an fseek offset is 4 bytes, the .index file
will be 4 * size bytes. The .hash file will be DBZ_INTERNAL_HASH_SIZE *
size bytes (the .dir file is tiny and roughly constant in size) until
the number of key-value pairs exceeds about 80% of size. (Nothing awful
will happen if the database grows beyond 100% of size, but accesses will
slow down quite a bit and the .index and .hash files will grow somewhat.)

Dbz stores up to DBZ_INTERNAL_HASH_SIZE bytes of the message-id's hash
in the .hash file to confirm a hit. This eliminates the need to read the
base file to handle collisions. This replaces the tagmask feature in
previous dbz releases.

A size of ``0'' given to dbzfresh is synonymous with the local default;
the normal default is suitable for tables of 5,000,000 key-value pairs.
Calling dbzinit(name) with the empty name is equivalent to calling
dbzfresh(name, 0).

When databases are regenerated periodically, as in news, it is simplest
to pick the parameters for a new database based on the old one. This
also permits some memory of past sizes of the old database, so that a
new database size can be chosen to cover expected fluctuations. Dbzagain
is a variant of dbzinit for creating a new database as a new generation
of an old database. The database files for oldbase must exist. Dbzagain
is equivalent to calling dbzfresh with a size equal to the result of
applying dbzsize to the largest number of entries in the oldbase
database and its previous 10 generations.

When many accesses are being done by the same program, dbz is massively
faster if its first hash table is in memory. If the ``pag_incore'' flag
is set to INCORE_MEM, an attempt is made to read the table in when the
database is opened, and dbzclose writes it out to disk again (if it was
read successfully and has been modified). Dbzsetoptions can be used to
set the pag_incore and exists_incore flag to new value which should be
``INCORE_NO'', ``INCORE_MEM'', or ``INCORE_MMAP'' for the .hash and
..index files separately; this does not affect the status of a database
that has already been opened. The default is ``INCORE_NO'' for the
..index file and ``INCORE_MMAP'' for the .hash file. The attempt to read
the table in may fail due to memory shortage; in this case dbz fails
with an error. Stores to an in-memory database are not (in general)
written out to the file until dbzclose or dbzsync, so if robustness in
the presence of crashes or concurrent accesses is crucial, in-memory
databases should probably be avoided or the writethrough option should
be set to ``true'';

If the nonblock option is ``true'', then writes to the .hash and .index
files will be done using non-blocking I/O. This can be significantly
faster if your platform supports non-blocking I/O with files.

Dbzsync causes all buffers etc. to be flushed out to the files. It is
typically used as a precaution against crashes or concurrent accesses
when a dbz-using process will be running for a long time. It is a
somewhat expensive operation, especially for an in-memory database.

Concurrent reading of databases is fairly safe, but there is no
(inter)locking, so concurrent updating is not.

An open database occupies three stdio streams and two file descriptors;
Memory consumption is negligible (except for stdio buffers) except for
in-memory databases.

Re: INN performance curve - why so much time dealing with the history file?

<4h6wM.131236$MB_8.73705@fx01.ams4>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1988&group=news.software.nntp#1988

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer03.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx01.ams4.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:102.0) Gecko/20100101
Thunderbird/102.13.0
Subject: Re: INN performance curve - why so much time dealing with the history
file?
Content-Language: en-US
Newsgroups: news.software.nntp
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
<87351bz91n.fsf@hope.eyrie.org>
<u9q2rs$1d3h$1@nnrp.usenet.blueworldhosting.com>
<87y1j3xsl4.fsf@hope.eyrie.org>
<u9q49m$k3t$1@nnrp.usenet.blueworldhosting.com>
<87tttrxrtv.fsf@hope.eyrie.org>
From: no-reply@no.spam (go-while)
In-Reply-To: <87tttrxrtv.fsf@hope.eyrie.org>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 15
Message-ID: <4h6wM.131236$MB_8.73705@fx01.ams4>
X-Complaints-To: abuse@blocknews.net
NNTP-Posting-Date: Wed, 26 Jul 2023 10:09:04 UTC
Organization: blocknews - www.blocknews.net
Date: Wed, 26 Jul 2023 12:50:14 +0200
X-Received-Bytes: 1519
 by: go-while - Wed, 26 Jul 2023 10:50 UTC

On 26.07.23 05:41, Russ Allbery wrote:
> Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
>
>> I'd probably want something like this assuming that's correct?
>
>> makedbz -i -o -s 200000000
>
> Looks good to me, assuming that's the number of articles you're dealing
> with.
>

thanks! hits me hard.
i noticed the same slowdown to almost beeing unusable.
because i didnt want any expiry and disabled cronjobs... *kiss*

Re: INN performance curve - why so much time dealing with the history file?

<u9r5gp$eei$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1990&group=news.software.nntp#1990

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN performance curve - why so much time dealing with the history file?
Date: Wed, 26 Jul 2023 13:02:17 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u9r5gp$eei$1@nnrp.usenet.blueworldhosting.com>
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com> <87y1j3xsl4.fsf@hope.eyrie.org> <u9q49m$k3t$1@nnrp.usenet.blueworldhosting.com> <87tttrxrtv.fsf@hope.eyrie.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 26 Jul 2023 13:02:17 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com; posting-account="k8cWG9+Y/93vxQYza75s9JQFoL8rgVF3P1Yluveoqs0";
logging-data="14802"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:6xyvaMBi0PaogPzrc/hpun5WeBY= sha256:OUzmDHHKVAdDFLoo8176iT5xTdLCwYz71/LmAtDQ+Kk=
sha1:uUXAvkNpzfZeVeC5HXarbsJsLUE= sha256:HBNtk856LglU7MHI5EYLltmkHXj8cGjca5FQL4NQ02s=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Wed, 26 Jul 2023 13:02 UTC

On Jul 25, 2023 at 10:41:16 PM CDT, "Russ Allbery" <eagle@eyrie.org> wrote:

> Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
>
>> I'll restore from snapshot and manually create the history file, if I'm
>> reading the manpage correctly, the value used for the '-s' flag is the
>> number of articles expected (each line being an article)?
>
> Yes, that's correct. The actual history index size will then be larger
> than that to try to make it so that the number of entries is about 2/3rds
> the size of the index.
>
> (If I remember correctly, the dbz algorithm uses linear probing for hash
> collisions, so you really want a sparse hash. If the linear probe goes
> for too long, it goes to a separate overflow table, and once that happens,
> the performance really tanks *and* the history file size bloats. It's
> just a bad time all around, sort of like a system going to swap. I'm
> betting your history file went to multiple overflow tables because it
> started massively undersized.)
>
>> I'd probably want something like this assuming that's correct?
>
>> makedbz -i -o -s 200000000
>
> Looks good to me, assuming that's the number of articles you're dealing
> with.

9 hours and ~69 million articles in and it looks to be keeping a steady pace.

Looking at "ME time" lines, the perl filter is taking the most time now, which
I'd expect (on to remove misplaced binaries brought in from pullnews).

innd[11136]: ME time 600000 hishave 2543(1522320) hiswrite 130947(1522320)
hissync 6637(2) idle 126732(1830166) artclean 2611(1522320) artwrite
19037(1522211) artcncl 69(547) hisgrep/artcncl 20(547) overv 42242(1522211)
perl 229908(1522320) python 95(1522320) nntpread 3881(1830166) artparse
11883(3089741) artlog 4580(1523299) datamove 491(1886519)

Re: INN performance curve - why so much time dealing with the history file?

<pLnxM.253485$yXa4.14509@fx14.ams4>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2016&group=news.software.nntp#2016

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer01.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!fx14.ams4.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: NoZilla/3.11 (Hackint; Unicorn; rv:0.8.15) go-while/19720229
NewsRW/4.2.0
Subject: Re: INN performance curve - why so much time dealing with the history
file?
Content-Language: en-US
Newsgroups: news.software.nntp
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
<87y1j3xsl4.fsf@hope.eyrie.org>
<u9q49m$k3t$1@nnrp.usenet.blueworldhosting.com>
<87tttrxrtv.fsf@hope.eyrie.org>
<u9r5gp$eei$1@nnrp.usenet.blueworldhosting.com>
From: no-reply@no.spam (Billy G. (go-while))
Organization: github.com/go-while
In-Reply-To: <u9r5gp$eei$1@nnrp.usenet.blueworldhosting.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 20
Message-ID: <pLnxM.253485$yXa4.14509@fx14.ams4>
X-Complaints-To: abuse@blocknews.net
NNTP-Posting-Date: Sun, 30 Jul 2023 06:51:01 UTC
Date: Sun, 30 Jul 2023 09:34:54 +0200
X-Received-Bytes: 1815
 by: Billy G. (go-while) - Sun, 30 Jul 2023 07:34 UTC

On 26.07.23, Jesse Rehmer wrote:
>
> 9 hours and ~69 million articles in and it looks to be keeping a steady pace.
>
> Looking at "ME time" lines, the perl filter is taking the most time now, which
> I'd expect (on to remove misplaced binaries brought in from pullnews).
>
> innd[11136]: ME time 600000 hishave 2543(1522320) hiswrite 130947(1522320)
> hissync 6637(2) idle 126732(1830166) artclean 2611(1522320) artwrite
> 19037(1522211) artcncl 69(547) hisgrep/artcncl 20(547) overv 42242(1522211)
> perl 229908(1522320) python 95(1522320) nntpread 3881(1830166) artparse
> 11883(3089741) artlog 4580(1523299) datamove 491(1886519)

what hardware is this?

zfs on hdd/ssd/cache?

do you use default cleanfeed or changed any "optimization"?

Re: INN performance curve - why so much time dealing with the history file?

<ua60lk$d2hc$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2023&group=news.software.nntp#2023

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: INN performance curve - why so much time dealing with the history
file?
Date: Sun, 30 Jul 2023 17:47:00 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <ua60lk$d2hc$1@news.trigofacile.com>
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com>
<87351bz91n.fsf@hope.eyrie.org>
<u9q2rs$1d3h$1@nnrp.usenet.blueworldhosting.com>
<87y1j3xsl4.fsf@hope.eyrie.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 30 Jul 2023 15:47:00 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr:176.143.2.105";
logging-data="428588"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.13.0
Cancel-Lock: sha1:j2+STRbwyAKcXfgcIzQyOygF/zU= sha256:5YW1+rsat6rlIW19X1r3PMgLplwovsdrypf37GfFszA=
sha1:Inogyv7xr1LATvOkiMezDEOuiaQ= sha256:ZtLLuefXXVonATxestGzCdBCK5EBzGPC/wn7M/foJWU=
In-Reply-To: <87y1j3xsl4.fsf@hope.eyrie.org>
 by: Julien ÉLIE - Sun, 30 Jul 2023 15:47 UTC

Hi Russ,

Again an interesting insight of the dbz database. Thanks Jesse for this thread!

> Running makedbz manually will let you provide the -s flag, which specifies
> the number of entries to size the history file for. news.daily will use
> the current size as the expected size. When you're going to feed in a ton
> of articles, you want to pass in something much, much larger, roughly the
> number of entries you're expecting to have at the end. The relevant
> instructions in INSTALL are:
>
> | Next, you need to create an empty F<history> database. To do this, type:
> |
> | cd <pathdb in inn.conf>
> | touch history
> | makedbz -i -o
>
> which possibly should mention the -s flag.

Indeed!
Additional wording:

Next, you need to create an empty history database. To do this, type:

cd <pathdb in inn.conf>
touch history
makedbz -i -o

+ makedbz will then create a database optimized for handling about
+ 6,000,000 articles (or 500,000 if the slower tagged hash format is
+ used). If you expect to inject more articles than that, use the -s flag
+ to specify the number of entries to size the initial history file for.
+ To pre-size it for 100,000,000 articles, type:
+ + makedbz -i -o -s 100000000
+ + This initial size does not limit the number of articles the news server
+ will accept. It will just get slower when that size is exceeded, until
+ the next run of news.daily which will appropriately resize it.

I'll also update the makedbz documentation to make it clearer:

-i To ignore the old database when determining the size of the new one
to create, use the -i flag. Using the -o or -s flags implies the -i
flag.

+ When the old database is ignored, and a size is not specified with
+ -s, makedbz will count the number of lines of the current text
+ history file, add 10% to that count (for the next articles to
+ arrive), and another 50% (or 100% if the slower tagged hash format
+ is used) to determine the size of the new database to create. The
+ aim is to optimize the performances of the database, keeping it
+ filled below 2/3 of its size (or 1/2 with the tagged hash format).
+ + If no text history file exists, the new one will have the default
+ creation size (see -s).

-s *size*
makedbz will also ignore any old database if the -s flag is used to
specify the approximate number of entries in the new database.
Accurately specifying the size is an optimization that will create a
more efficient database.
+ The news server will still accept more articles, but will be slower.
Size is measured in key-value pairs (i.e. lines). (The size should
be the estimated eventual size of the file, typically the size of
the old file.)
+ + The effective size used will be larger, to optimize the performances
+ of the database.
For more information, see -i and the discussion of dbzfresh and
dbzsize in libinn_dbz(3).

+ The default is 6,666,666 when creating a new history database. (If
+ the slower tagged hash format is used, the default is 500,000.)

--
Julien ÉLIE

« Ne craignez pas d'être lent, craignez seulement d'être à l'arrêt. »

Re: INN performance curve - why so much time dealing with the history file?

<ua8fck$2v70$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2027&group=news.software.nntp#2027

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN performance curve - why so much time dealing with the history file?
Date: Mon, 31 Jul 2023 14:10:28 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <ua8fck$2v70$1@nnrp.usenet.blueworldhosting.com>
References: <u9q0c8$28ja$1@nnrp.usenet.blueworldhosting.com> <87tttrxrtv.fsf@hope.eyrie.org> <u9r5gp$eei$1@nnrp.usenet.blueworldhosting.com> <pLnxM.253485$yXa4.14509@fx14.ams4>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 31 Jul 2023 14:10:28 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com; posting-account="k8cWG9+Y/93vxQYza75s9JQFoL8rgVF3P1Yluveoqs0";
logging-data="97504"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:bnvvSXjI+TNvw58nOmti7/78vck= sha256:QGxmyI20YnLDy6D39xexCZoaIp/RH34m9RsIiptd5Yk=
sha1:Xbsy+dKhn6LYb2E45AQAY0Nr/ws= sha256:2xs7u0BE+eSbs6Yef+5cXEyS6nJhWOLZY0zN1O4EuRk=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Mon, 31 Jul 2023 14:10 UTC

On Jul 30, 2023 at 2:34:54 AM CDT, ""Billy G." <go-while)" <no-reply@no.spam>
wrote:

> On 26.07.23, Jesse Rehmer wrote:
>>
>> 9 hours and ~69 million articles in and it looks to be keeping a steady pace.
>>
>> Looking at "ME time" lines, the perl filter is taking the most time now, which
>> I'd expect (on to remove misplaced binaries brought in from pullnews).
>>
>> innd[11136]: ME time 600000 hishave 2543(1522320) hiswrite 130947(1522320)
>> hissync 6637(2) idle 126732(1830166) artclean 2611(1522320) artwrite
>> 19037(1522211) artcncl 69(547) hisgrep/artcncl 20(547) overv 42242(1522211)
>> perl 229908(1522320) python 95(1522320) nntpread 3881(1830166) artparse
>> 11883(3089741) artlog 4580(1523299) datamove 491(1886519)
>
> what hardware is this?
>
> zfs on hdd/ssd/cache?
>
> do you use default cleanfeed or changed any "optimization"?

Dell R440 with NVMe storage, ESXi 8 with FreeBSD 13.2 using ZFS.

The cleanfeed I was using here is stripped down and only checks for misplaced
binaries.

As far as 'tuning' goes, I set icdsynccount to 10000 in inn.conf. I
experimented with values up to 50000 but found no gain and when set too high I
get several seconds (3-7) of pauses in article acceptance while it does the
operation so it seemed more efficient overall to keep it lower.

I also set cycbuffupdate to 10000 in cycbuff.conf, though I'm not sure if this
had any impact, but changing the icdsynccount from 10 to 10000 was a big
change for me.

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor