Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Why won't sharks eat lawyers? Professional courtesy.


computers / news.software.nntp / INN 2.7.1 death spiral under high load?

SubjectAuthor
* INN 2.7.1 death spiral under high load?Jesse Rehmer
+* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
|`- Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
+- Re: INN 2.7.1 death spiral under high load?Syber Shock
`* Re: INN 2.7.1 death spiral under high load?Thomas Hochstein
 `* Re: INN 2.7.1 death spiral under high load?Russ Allbery
  `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
   `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
    `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
     `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
      `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
       `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
        `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
         `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
          `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
           `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
            `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
             `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
              `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               +* Re: INN 2.7.1 death spiral under high load?Russ Allbery
               |`* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               | +* Re: INN 2.7.1 death spiral under high load?Russ Allbery
               | |`* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               | | +- Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
               | | +* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               | | |`* Re: INN 2.7.1 death spiral under high load?Russ Allbery
               | | | `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               | | |  `- Re: INN 2.7.1 death spiral under high load?Russ Allbery
               | | `- Re: INN 2.7.1 death spiral under high load?Russ Allbery
               | `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
               |  `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
               |   +* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               |   |+- Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               |   |`* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
               |   | `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               |   |  `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
               |   |   `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               |   |    `- Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
               |   `- Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
               `* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
                `* Re: INN 2.7.1 death spiral under high load?Russ Allbery
                 +* Re: INN 2.7.1 death spiral under high load?Jesse Rehmer
                 |`- Re: INN 2.7.1 death spiral under high load?Russ Allbery
                 `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
                  `* Re: INN 2.7.1 death spiral under high load?Julien ÉLIE
                   `* Re: INN 2.7.1 death spiral under high load?Russ Allbery
                    `- Re: INN 2.7.1 death spiral under high load?Julien ÉLIE

Pages:12
INN 2.7.1 death spiral under high load?

<u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1806&group=news.software.nntp#1806

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 00:29:09 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 00:29:09 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="60235"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:Vu3xVl9IPz5iKxcEE6Aq9KGLUj8= sha256:obTxkksFkD6ENJce2NW8+17uV7dVyCyLXzcFbzsfW1E=
sha1:fHozqwy5rI/Ts2fO9ebq6MCUsdM= sha256:/mLmw22OOw/q8qe03HfqnxBhUhcsQkD28u91gRpIdZs=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Fri, 30 Jun 2023 00:29 UTC

I've been working on transitioning my tradspool setup to a new server with
CNFS buffers. Some of my previous threads I asked about ways the increase
concurrency to respool all the articles from server to server.

When I used innfeed no matter the connection count or dynamic-method chosen,
it takes 35 days to transmit 256 million articles to the new server. I decided
to start experimenting with multiple innxmit processes instead. When I split
my batch file into 20 parts and start the innxmit processes, somewhere between
10-15 innxmit processes (so the same number of connection to the remote
server) the remote server starts using 100% CPU but stops accepting articles
and prints the following in the logs:

Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file
descriptor

I have to forcefully kill innd, it will not respond to any commands, etc. I'm
wondering exactly what these errors mean. This server is running FreeBSD 13.2,
has 32GB of physical memory, 64GB of swap (not using any) and INN is the only
thing running.

With 10 innxmit processes on the remote server running I'm pushing ~18Mbps of
traffic and according the stats am accepting ~400,000 articles per 10 minutes,
but I have quite a bit of spare CPU and I/O cycles on both ends and would like
to go faster, if I can.

Regards,

Jesse

Re: INN 2.7.1 death spiral under high load?

<u7ljr8$4de$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1807&group=news.software.nntp#1807

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 03:57:28 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7ljr8$4de$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 03:57:28 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="4526"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:5aE8Jf1j/hX7PN+hl4Rd8iLKOpc= sha256:Sw5sLU5Cplta8VeZ3XeBMcD5m+pG6qncsH+KNRtoKGg=
sha1:doonbZjbiJu6DXNlr1k6oB0dB9s= sha256:PpDl67n/sfjFeQo3o1t2HBU8uTn0WfBFmDQrJe16bbg=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Fri, 30 Jun 2023 03:57 UTC

On Jun 29, 2023 at 7:29:09 PM CDT, "Jesse Rehmer"
<jesse.rehmer@blueworldhosting.com> wrote:

> I've been working on transitioning my tradspool setup to a new server with
> CNFS buffers. Some of my previous threads I asked about ways the increase
> concurrency to respool all the articles from server to server.
>
> When I used innfeed no matter the connection count or dynamic-method chosen,
> it takes 35 days to transmit 256 million articles to the new server. I decided
> to start experimenting with multiple innxmit processes instead. When I split
> my batch file into 20 parts and start the innxmit processes, somewhere between
> 10-15 innxmit processes (so the same number of connection to the remote
> server) the remote server starts using 100% CPU but stops accepting articles
> and prints the following in the logs:
>
> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
> Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file
> descriptor
>
> I have to forcefully kill innd, it will not respond to any commands, etc. I'm
> wondering exactly what these errors mean. This server is running FreeBSD 13.2,
> has 32GB of physical memory, 64GB of swap (not using any) and INN is the only
> thing running.
>
> With 10 innxmit processes on the remote server running I'm pushing ~18Mbps of
> traffic and according the stats am accepting ~400,000 articles per 10 minutes,
> but I have quite a bit of spare CPU and I/O cycles on both ends and would like
> to go faster, if I can.
>
> Regards,
>
> Jesse

Purposely put it into this state and gathered some more information and the
logs eventually gave more details.

When running truss I see this happening at a high rate and nothing else:

select(1047,{ 12 14 15 17 },{ 1027 1028 1030 1031 1035 1036 1037 1041 },0x0,{
198.000000 }) = 8 (0x8)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

After a few minutes the following is printed to news.notice:

Jun 29 22:51:18 gatekeeper innd[16269]: news.blueworldhosting.com:647 NCmode
"MODE STREAM" received
Jun 29 22:51:18 gatekeeper innd[16269]: free:-1 20 free but was in SMASK
Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset
0x7c4d40000: Bad file descriptor
Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
failed
Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset
0x7c4d40000: Bad file descriptor
Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
failed
Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset
0x7c4d40000: Bad file descriptor
Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
failed
Jun 29 22:51:18 gatekeeper innd[16269]: news.blueworldhosting.com connected 20
streaming allowed
Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset
0x7c4d40000: Illegal seek
Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
failed
Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset
0x7c4d40000: Illegal seek
Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
failed
Jun 29 22:51:18 gatekeeper innd[16269]: news.blueworldhosting.com:22 22
sleeping without Waker
Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: CNFSflushallheads: flushing TEXT
Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: CNFSflushallheads: flushing
BINARY
Jun 29 22:51:18 gatekeeper innd[16269]: SERVER throttled Illegal seek writing
SMstore file -- throttling
Jun 29 22:51:18 gatekeeper innd[16269]: SERVER throttle Illegal seek writing
SMstore file -- throttling
Jun 29 22:51:18 gatekeeper innd[16269]: SERVER cant store article: lseek
failed

Re: INN 2.7.1 death spiral under high load?

<83cb1679b1a662d781581d625828492a$1@sybershock.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1808&group=news.software.nntp#1808

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!rocksolid2!.POSTED!not-for-mail
From: admin@sybershock.com (Syber Shock)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Thu, 29 Jun 2023 22:51:36 -0500
Organization: sybershock.com
Message-ID: <83cb1679b1a662d781581d625828492a$1@sybershock.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: novabbs.org;
logging-data="4052591"; mail-complaints-to="usenet@novabbs.org";
posting-account="TzG3Hl99Aa0Fgb506WreKRgRTO2mG9+aGjVDifyfNqo";
 by: Syber Shock - Fri, 30 Jun 2023 03:51 UTC

On Fri, 30 Jun 2023 00:29:09 -0000 (UTC)
Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:

> I've been working on transitioning my tradspool setup to a new server
> with CNFS buffers. Some of my previous threads I asked about ways the
> increase concurrency to respool all the articles from server to
> server.

<snip>
> With 10 innxmit processes on the remote server running I'm pushing
> ~18Mbps of traffic and according the stats am accepting ~400,000
> articles per 10 minutes, but I have quite a bit of spare CPU and I/O
> cycles on both ends and would like to go faster, if I can.
>
> Regards,
>
> Jesse

My immediate thought is to ask, can the tradspool directories be
pulled from a local tradspool mirror to CNFS storage on the same
SSD/HDD? If yes, would it be faster to script some batch jobs of pushing
the tradspool to tarballs and/or rsync and send the data over the
network to the target machine?

My thought is to tarball or rsync blocks of folders to the target
machine in compressed tradspool blobs. Then deflate and pipe from local
to the CNFS import jobs on a local proxy or unix socket, if that is
possible.

--
SugarBug | https://sybershock.com

Re: INN 2.7.1 death spiral under high load?

<nsn.20230630081035.779@scatha.ancalagon.de>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1810&group=news.software.nntp#1810

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.chmurka.net!weretis.net!feeder8.news.weretis.net!news.szaf.org!thangorodrim.ancalagon.de!.POSTED.scatha.ancalagon.de!not-for-mail
From: thh@thh.name (Thomas Hochstein)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 08:10:36 +0200
Message-ID: <nsn.20230630081035.779@scatha.ancalagon.de>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Injection-Info: thangorodrim.ancalagon.de; posting-host="scatha.ancalagon.de:10.0.1.1";
logging-data="11206"; mail-complaints-to="abuse@th-h.de"
User-Agent: ForteAgent/8.00.32.1272
Cancel-Lock: sha1:wmb0uaVw852EcYj5KczHnnCFYB0=
X-NNTP-Posting-Date: Fri, 30 Jun 2023 08:10:35 +0200
X-Clacks-Overhead: GNU Terry Pratchett
X-Face: *OX>R5kq$7DjZ`^-[<HL?'n9%\ZDfCz/_FfV0_tpx7w{Vv1*byr`TC\[hV:!SJosK'1gA>1t8&@'PZ-tSFT*=<}JJ0nXs{WP<@(=U!'bOMMOH&Q0}/(W_d(FTA62<r"l)J\)9ERQ9?6|_7T~ZV2Op*UH"2+1f9[va
 by: Thomas Hochstein - Fri, 30 Jun 2023 06:10 UTC

Jesse Rehmer schrieb:

> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
> Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file descriptor

Perhaps you're running into resource limits?

| news@weidegrund /etc/news $ ulimit -a
[...]
| open files (-n) 1024
[...]

Re: INN 2.7.1 death spiral under high load?

<87y1k1cw0o.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1811&group=news.software.nntp#1811

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.chmurka.net!nntp.terraraq.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Thu, 29 Jun 2023 23:19:19 -0700
Organization: The Eyrie
Message-ID: <87y1k1cw0o.fsf@hope.eyrie.org>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<nsn.20230630081035.779@scatha.ancalagon.de>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="28946"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:0Z5xbWydaX+cT7NoS22GrRat+aw=
 by: Russ Allbery - Fri, 30 Jun 2023 06:19 UTC

Thomas Hochstein <thh@thh.name> writes:
> Jesse Rehmer schrieb:

>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
>> Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file descriptor

> Perhaps you're running into resource limits?

> | news@weidegrund /etc/news $ ulimit -a
> [...]
> | open files (-n) 1024
> [...]

I was wondering about that too, but the SMASK error is very specific to a
file descriptor being included in the select mask despite not
corresponding to an open channel. That implies something has gone
seriously wrong with the file descriptor tracking. innd then closes that
file descriptor, which I bet is what's causing the CNFS errors since I bet
it's closing the cycbuff file descriptor thinking it's a remote peer.

Why this is happening, I don't know. The number of open connections
described is not large and is way, way lower than I have run servers with
in the past. You can see that the file descriptor numbers (35, 36, 39)
are also quite low.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Re: INN 2.7.1 death spiral under high load?

<u7mee5$2gqf2$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1814&group=news.software.nntp#1814

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 13:31:17 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <u7mee5$2gqf2$1@news.trigofacile.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<u7ljr8$4de$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 11:31:17 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr:176.143.2.105";
logging-data="2648546"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.12.0
Cancel-Lock: sha1:YsZkjMV6QULcCUDdJxptadEUfG0= sha256:jYJR53l/v2sdO7uiGhTLdxo2vqYGa8TDlqwW2XSzOWU=
sha1:gvWp76xorC2PS75bKb0dnVBl9mg= sha256:WQII7E9FwzAxS/B0unvxqpytZ9QGti59Z3x8vNivOjI=
In-Reply-To: <u7ljr8$4de$1@nnrp.usenet.blueworldhosting.com>
 by: Julien ÉLIE - Fri, 30 Jun 2023 11:31 UTC

Hi Jesse,

> When running truss I see this happening at a high rate and nothing else:
>
> select(1047,{ 12 14 15 17 },{ 1027 1028 1030 1031 1035 1036 1037 1041 },0x0,{
> 198.000000 }) = 8 (0x8)
> sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
> (0x0)
> sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

Seems like when all the channels are sleeping, the loop eats the CPU
with tons of select()?
We would need understanding why they are all in sleep state whereas
you're pushing bunches of articles.

At the very end of innd/chan.c:

while (1) {
select(...);

/* If there is nothing to do, break out.
FIXME: Do we really want to keep going until sleep_count is
zero? That means that when there are sleeping channels, we do
a full traversal every time through the select loop. */

if (count == 0 && channels.sleep_count == 0)
break;
} while (fd != startpoint);

> After a few minutes the following is printed to news.notice:
>
> Jun 29 22:51:18 gatekeeper innd[16269]: news.blueworldhosting.com:647 NCmode
> "MODE STREAM" received
> Jun 29 22:51:18 gatekeeper innd[16269]: free:-1 20 free but was in SMASK
> Jun 29 22:51:18 gatekeeper innd[16269]: CNFS: lseek failed for 'TEXT' offset
> 0x7c4d40000: Bad file descriptor

It would be interesting to activate trace logs on channels.
=> ctlinnd trace i y
(on the receiving peer)
and then you start the innxmit processes on the sending peer.

Maybe we'll have interesting information about what's going on.
You'll have to make sure the debug level for news is correctly logged
(and not discarded) in your syslog configuration.

--
Julien ÉLIE

« Sème du bonheur dans le champ du voisin, tu seras surpris de constater
ce que le vent fera produire au tien. » (Juliette Saint Gelais)

Re: INN 2.7.1 death spiral under high load?

<u7mee7$2gqf2$2@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1815&group=news.software.nntp#1815

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 13:31:19 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <u7mee7$2gqf2$2@news.trigofacile.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<nsn.20230630081035.779@scatha.ancalagon.de> <87y1k1cw0o.fsf@hope.eyrie.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 11:31:19 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr:176.143.2.105";
logging-data="2648546"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.12.0
Cancel-Lock: sha1:Kp5enxIswSAoQTJoaBJrXEWBwDA= sha256:sbNZwuippzgro8t1aHKa8wkVwNGtg12Dcac8oZulS6o=
sha1:uh20Am3oz1kUjDv3lTen2ydO+Rc= sha256:cBOrvmRnsiy0TpWwGjb5QcO5HT8uxMJxZiD2I699G/Y=
In-Reply-To: <87y1k1cw0o.fsf@hope.eyrie.org>
 by: Julien ÉLIE - Fri, 30 Jun 2023 11:31 UTC

Hi Russ,

>>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
>>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
>>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
>>> Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file descriptor
>
> I was wondering about that too, but the SMASK error is very specific to a
> file descriptor being included in the select mask despite not
> corresponding to an open channel. That implies something has gone
> seriously wrong with the file descriptor tracking. innd then closes that
> file descriptor, which I bet is what's causing the CNFS errors since I bet
> it's closing the cycbuff file descriptor thinking it's a remote peer.

It seems like a channel can both be waiting to write and sleeping.
SITEchanclose() sets them both. (It may not be the right thing as
normally only one state should be given at a time as far as I understand
how states are supposed to work.)

But anyway, it does not explain why we end up with a channel of type
CTfree considered to be sleeping (which is the condition causing the
"free:-1 35 free but was in SMASK" error).
As far as I see, the sleeping state is removed when a channel gets the
CTfree type.

** Jesse, do you happen to have any "internal closing free channel"
error in your logs? In this case, the sleeping state is not explicitly
enforced.

--
Julien ÉLIE

« Whenever you set out to do something, something else must be done
first. » (Murphy's Fourth Corollary)

Re: INN 2.7.1 death spiral under high load?

<u7mi7n$28vs$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1818&group=news.software.nntp#1818

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!2.eu.feeder.erje.net!feeder.erje.net!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 12:36:07 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7mi7n$28vs$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <nsn.20230630081035.779@scatha.ancalagon.de> <87y1k1cw0o.fsf@hope.eyrie.org> <u7mee7$2gqf2$2@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 12:36:07 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="74748"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:RbzaaXC1tWIvm9SN9AzL4BtffII= sha256:94DCLTihGEWquYYm4VwQop9qq4FH6Mn30bhppP4ATPY=
sha1:2aTS8YdayWWeIfNeBq0AbZEbIbA= sha256:jOIcODwe1/upB/wBEmvg6kgY36d/TT+j2A8rc5yuwpI=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Fri, 30 Jun 2023 12:36 UTC

On Jun 30, 2023 at 6:31:19 AM CDT, "Julien ÉLIE"
<iulius@nom-de-mon-site.com.invalid> wrote:

> Hi Russ,
>
>>>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 35 free but was in SMASK
>>>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 36 free but was in SMASK
>>>> Jun 29 19:13:43 gatekeeper innd[90655]: free:-1 39 free but was in SMASK
>>>> Jun 29 19:13:43 gatekeeper innd[90655]: SERVER cant select: Bad file descriptor
>>
>> I was wondering about that too, but the SMASK error is very specific to a
>> file descriptor being included in the select mask despite not
>> corresponding to an open channel. That implies something has gone
>> seriously wrong with the file descriptor tracking. innd then closes that
>> file descriptor, which I bet is what's causing the CNFS errors since I bet
>> it's closing the cycbuff file descriptor thinking it's a remote peer.
>
> It seems like a channel can both be waiting to write and sleeping.
> SITEchanclose() sets them both. (It may not be the right thing as
> normally only one state should be given at a time as far as I understand
> how states are supposed to work.)
>
> But anyway, it does not explain why we end up with a channel of type
> CTfree considered to be sleeping (which is the condition causing the
> "free:-1 35 free but was in SMASK" error).
> As far as I see, the sleeping state is removed when a channel gets the
> CTfree type.
>
>
> ** Jesse, do you happen to have any "internal closing free channel"
> error in your logs? In this case, the sleeping state is not explicitly
> enforced.

I enabled the trace logs you mentioned. I don't have the specific error you
mentioned above, but on this latest attempt and turning on debug logging in
syslog, I have more tidbits in news.notice:

*****
Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 29 free but was in SMASK
Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 30 free but was in SMASK
Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 31 free but was in SMASK
Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 32 free but was in SMASK
Jun 30 07:24:18 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 168 in
/usr/local/news/spool/overview/a/d/u/n/alt.dating.uk.north-west.DAT: Illegal
seek
Jun 30 07:24:18 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F846A00000001@
Jun 30 07:24:21 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 169 in
/usr/local/news/spool/overview/a/d/u/n/alt.dating.uk.north-west.DAT: Illegal
seek
Jun 30 07:24:21 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F885000000001@
Jun 30 07:24:22 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 2902 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
seek
Jun 30 07:24:22 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F896600000001@
Jun 30 07:24:22 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 2903 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
seek
Jun 30 07:24:22 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F898C00000001@
Jun 30 07:24:23 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 2904 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
seek
Jun 30 07:24:23 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F8AB300000001@
Jun 30 07:24:23 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 2905 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
seek
Jun 30 07:24:23 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F8ADB00000001@
Jun 30 07:24:23 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 2906 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
seek
Jun 30 07:24:23 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F8AED00000001@
Jun 30 07:24:23 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 2907 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
seek
Jun 30 07:24:23 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F8B0100000001@
Jun 30 07:24:29 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 170 in
/usr/local/news/spool/overview/a/d/u/n/alt.dating.uk.north-west.DAT: Illegal
seek
Jun 30 07:24:29 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F91DC00000001@
Jun 30 07:24:39 gatekeeper innd[84944]: tradindexed: cannot get offset for
article 2908 in /usr/local/news/spool/overview/m/r/misc.rural.DAT: Illegal
seek
Jun 30 07:24:39 gatekeeper innd[84944]: SERVER cant store overview for
@03015445585400000000012F9C8900000001@
Jun 30 07:24:58 gatekeeper innd[84944]: news.blueworldhosting.com:878
checkpoint seconds 52 accepted 2000 refused 0 rejected 0 duplicate 0 accepted
size 5263200 duplicate size 0 rejected size 0
Jun 30 07:25:13 gatekeeper innd[84944]: news.blueworldhosting.com:22
checkpoint seconds 98 accepted 1998 refused 0 rejected 2 duplicate 0 accepted
size 10925664 duplicate size 0 rejected size 237059
Jun 30 07:25:18 gatekeeper innd[84944]: news.blueworldhosting.com:821
checkpoint seconds 65 accepted 2000 refused 0 rejected 0 duplicate 0 accepted
size 5087052 duplicate size 0 rejected size 0
Jun 30 07:25:20 gatekeeper innd[84944]: SERVER cant select: Bad file
descriptor
Jun 30 07:25:50 gatekeeper syslogd: last message repeated 561451 times
*****

The news.debug file has a ton of entries so I'm not sure what I'm looking for,
but it 'ends' like this:

*****
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCreader
Used=2896
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc
Used=3509
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226
cp->Start=0 cp->Next=2896 bp->Used=3509
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc
state=8 next "y 'Quickie' Sto"
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226
NCwritereply 18=write(226, "239 <58663374@1", 18)Jun 30 07:25:20 gatekeeper
innd[84944]: news.blueworldhosting.com:226 >239 <58663374@1>
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc
state=2 Start=3509 Next=3509 Used=3509
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226
cp->Start=3509 cp->Next=3509 bp->Used=3509
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc
state=2 next "as an^M >> "inva"
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCproc
state=2 Start=3509 Next=3509 Used=3509
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCreader
Used=2873
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc
Used=2890
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821
cp->Start=0 cp->Next=2873 bp->Used=2890
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc
state=8 next "sympatico.ca^M ."
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821
NCwritereply 18=write(821, "239 <76204197@1", 18)Jun 30 07:25:20 gatekeeper
innd[84944]: news.blueworldhosting.com:821 >239 <76204197@1>
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc
state=2 Start=2890 Next=2890 Used=2890
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821
cp->Start=2890 cp->Next=2890 bp->Used=2890
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc
state=2 next "f: Technique, c"
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:821 NCproc
state=2 Start=2890 Next=2890 Used=2890
Jun 30 07:25:20 gatekeeper innd[84944]: SERVER cant select: Bad file
descriptor
Jun 30 07:25:50 gatekeeper syslogd: last message repeated 561451 times
Jun 30 07:27:51 gatekeeper syslogd: last message repeated 2241177 times
Jun 30 07:29:17 gatekeeper syslogd: last message repeated 1581962 times
Jun 30 07:29:17 gatekeeper innd[84944]: ME time 600000 hishave 5402(83753)
hiswrite 11495(83560) hissync 11(2) idle 284870(4542195) artclean 499(83752)
artwrite 4690(83554) artcncl 77(240) hisgrep/artcncl 35(240) overv
24253(83554) perl 49799(83560) python 26(83561) nntpread 1307(175291) artparse
2053(239726) artlog 947(83786) datamove 253(145273)
Jun 30 07:29:17 gatekeeper innd[84944]: ME HISstats 77 hitpos 0 hitneg 115
missed 83561 dne
Jun 30 07:29:17 gatekeeper innd[84944]: SERVER cant select: Bad file
descriptor
Jun 30 07:29:47 gatekeeper syslogd: last message repeated 574345 times
*****

The innd process is using 100% of CPU and truss shows the following over and
over (I'm going to let it run for a while and see if it eventually dies):

*****
__sysctl("kern.hostname",2,0x2bfb054c64d0,0x2bfb054c23b8,0x0,0) = 0 (0x0)
getpid() = 84944 (0x14bd0)
sendto(3,"<59>1 2023-06-30T07:31:55.121776"...,125,0,NULL,0) = 125 (0x7d)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1057,{ 3 12 14 15 17 22 24 29 30 31 32 226 736 737 821 878 },{ },0x0,{
442.000000 }) ERR#9 'Bad file descriptor'
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
write(2,"innd: SERVER cant select: Bad fi"...,46) = 46 (0x2e)
*****


Click here to read the complete article
Re: INN 2.7.1 death spiral under high load?

<u7mjh7$2h1vi$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1819&group=news.software.nntp#1819

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 14:58:15 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <u7mjh7$2h1vi$1@news.trigofacile.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<nsn.20230630081035.779@scatha.ancalagon.de> <87y1k1cw0o.fsf@hope.eyrie.org>
<u7mee7$2gqf2$2@news.trigofacile.com>
<u7mi7n$28vs$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 12:58:15 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr:176.143.2.105";
logging-data="2656242"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.12.0
Cancel-Lock: sha1:OsNQYGjXiUAYWEa5IoKfI04AME8= sha256:3pvsNdmtuV3lkwKTzk3CaDGvzrplI1s+cuFSKNiyvZ4=
sha1:cRa9+krzPoMKeB+50/PB15LsltQ= sha256:Jwnka5uXNnEnyg0XP1/Gfg1w1GGGcFQcRw6i8jQGtq8=
In-Reply-To: <u7mi7n$28vs$1@nnrp.usenet.blueworldhosting.com>
 by: Julien ÉLIE - Fri, 30 Jun 2023 12:58 UTC

Hi Jesse,

> Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 29 free but was in SMASK
> Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 30 free but was in SMASK
> Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 31 free but was in SMASK
> Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 32 free but was in SMASK
[...]
> Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCreader
> Used=2896
> Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226
> NCwritereply 18=write(226, "239 <58663374@1", 18)Jun 30 07:25:20 gatekeeper
> innd[84944]: news.blueworldhosting.com:226 >239 <58663374@1>

It would be interesting to know what happened to channels 29 to 32.
Do they appear in the logs as channels?
Do you see patterns like "write(29, " or ":29 "?

--
Julien ÉLIE

« Animi limina. »

Re: INN 2.7.1 death spiral under high load?

<u7ml5b$28a3$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1820&group=news.software.nntp#1820

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 13:26:03 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7ml5b$28a3$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <u7mee7$2gqf2$2@news.trigofacile.com> <u7mi7n$28vs$1@nnrp.usenet.blueworldhosting.com> <u7mjh7$2h1vi$1@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 13:26:03 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="74051"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:MIobt7uALvJznaj0jN52yU3JjyA= sha256:vhKFCpMkG4OmnaJ3vMqjKLKwmup3POJsgcUJ+K6W1pQ=
sha1:bvi1mk0fMaDC/45rnxOER7qg7I8= sha256:+cO/ACy3G/KAxdOtmuXCLR96UOosS+PtQIz6nivQAyc=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Fri, 30 Jun 2023 13:26 UTC

On Jun 30, 2023 at 7:58:15 AM CDT, "Julien ÉLIE"
<iulius@nom-de-mon-site.com.invalid> wrote:

> Hi Jesse,
>
>> Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 29 free but was in SMASK
>> Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 30 free but was in SMASK
>> Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 31 free but was in SMASK
>> Jun 30 07:24:18 gatekeeper innd[84944]: free:-1 32 free but was in SMASK
> [...]
>> Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226 NCreader
>> Used=2896
>> Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:226
>> NCwritereply 18=write(226, "239 <58663374@1", 18)Jun 30 07:25:20
>> gatekeeperinnd[84944]: news.blueworldhosting.com:226 >239 <58663374@1>
>
> It would be interesting to know what happened to channels 29 to 32.
> Do they appear in the logs as channels?
> Do you see patterns like "write(29, " or ":29 "?

In news.debug I have this which seems to be the last messages logged by the
channels?

Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29
NCwritereply 19=write(29, "239 <168029265@", 19)Jun 30 07:25:20 gatekeeper
innd[84944]: news.blueworldhosting.com:29 >239 <168029265@1>
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29 NCproc
state=2 Start=1372 Next=1372 Used=1372
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29
cp->Start=1372 cp->Next=1372 bp->Used=1372
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29 NCproc
state=2 next "@mid.individual"
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:29 NCproc
state=2 Start=1372 Next=1372 Used=1372

Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 NCproc
state=2 next "TAKETHIS <14960"Jun 30 07:25:20 gatekeeper innd[84944]:
news.blueworldhosting.com:30 < TAKETHIS <149602012@1>
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 NCproc
state=7 Start=1550 Next=1550 Used=2896
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30
cp->Start=1550 cp->Next=1550 bp->Used=2896
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 NCproc
state=7 next "Path: spool1.us"
Jun 30 07:25:20 gatekeeper innd[84944]: news.blueworldhosting.com:30 NCproc
state=8 Start=1550 Next=2896 Used=2896

There is so much output it is difficult for me to pick out specifics. If it
would help I can upload a core dump I got and logs to a web server where they
can be grabbed?

Re: INN 2.7.1 death spiral under high load?

<u7mluk$1tqm$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1821&group=news.software.nntp#1821

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 13:39:32 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7mluk$1tqm$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <u7mee7$2gqf2$2@news.trigofacile.com> <u7mi7n$28vs$1@nnrp.usenet.blueworldhosting.com> <u7mjh7$2h1vi$1@news.trigofacile.com> <u7ml5b$28a3$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 13:39:32 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="63318"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:bDEdVZCDTr/dboL+tJ0vr53n4eM= sha256:/3Rm00vO2UMJOQm54DI2QjA9Wd8KvtPRPQXyYsZsFdw=
sha1:gN9XwVLjP/hX+ZEC5IEbI/DHEX0= sha256:qpw9Xsb3+KQZQDhwPg2Ab7W1RFmXqrgFGB9DVpKKXS0=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Fri, 30 Jun 2023 13:39 UTC

On this latest run innd hasn't gone completely into the state I initially saw,
but I do have quite a bit of bad file descriptor messages from tradindexed:

Jun 30 08:13:11 gatekeeper innd[60826]: free:-1 4 free but was in SMASK
Jun 30 08:13:22 gatekeeper innd[60826]: free:-1 21 free but was in SMASK
Jun 30 08:14:01 gatekeeper innd[60826]: tradindexed: cannot write index record
for 1202 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad
file descriptor
Jun 30 08:14:01 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000131775A00000001@
Jun 30 08:14:03 gatekeeper innd[60826]: tradindexed: cannot write index record
for 1203 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad
file descriptor
Jun 30 08:14:03 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000131780200000001@
Jun 30 08:14:05 gatekeeper innd[60826]: tradindexed: cannot write index record
for 1204 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad
file descriptor
Jun 30 08:14:05 gatekeeper innd[60826]: SERVER cant store overview for
@0301544558540000000001317C9B00000001@
Jun 30 08:24:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 15855 in
/usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file
descriptor
Jun 30 08:24:10 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000133D26300000001@
Jun 30 08:24:23 gatekeeper innd[60826]: tradindexed: cannot write index record
for 15856 in
/usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file
descriptor
Jun 30 08:24:23 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000133DF7700000001@
Jun 30 08:24:47 gatekeeper innd[60826]: tradindexed: cannot write index record
for 15857 in
/usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file
descriptor
Jun 30 08:24:47 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000133F33D00000001@
Jun 30 08:34:09 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19832 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:09 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136542C00000001@
Jun 30 08:34:09 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19833 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:09 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136545A00000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19834 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136548900000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19835 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013654A800000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19836 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013654B400000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19837 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013654C900000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19838 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013654E800000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19839 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013654F400000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19840 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136550400000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19841 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136550600000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19842 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136559900000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19843 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013655BF00000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19844 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013655C500000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19845 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013655D900000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19846 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013655FE00000001@
Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19847 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136560300000001@
Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19848 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136560900000001@
Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19849 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136561400000001@
Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19850 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136561C00000001@
Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19851 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136562F00000001@
Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19852 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136563C00000001@
Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19853 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136564B00000001@
Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19854 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013656B000000001@
Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19855 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013656B100000001@
Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19856 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013656CB00000001@
Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19857 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136571900000001@
Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19858 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136574600000001@
Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19859 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136575100000001@
Jun 30 08:34:13 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19860 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:13 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013657F200000001@
Jun 30 08:34:13 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19861 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:13 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136588400000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19862 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136593300000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19863 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136593B00000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19864 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136594B00000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19865 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136595600000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19866 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136596700000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19867 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136597900000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19868 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136598100000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19869 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000136598B00000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19870 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013659CB00000001@
Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19871 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013659DA00000001@
Jun 30 08:34:15 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19872 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:15 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013659EB00000001@
Jun 30 08:34:15 gatekeeper innd[60826]: tradindexed: cannot write index record
for 19873 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
file descriptor
Jun 30 08:34:15 gatekeeper innd[60826]: SERVER cant store overview for
@03015445585400000000013659FE00000001@


Click here to read the complete article
Re: INN 2.7.1 death spiral under high load?

<u7mmkc$19ks$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1822&group=news.software.nntp#1822

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 13:51:08 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7mmkc$19ks$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <u7mjh7$2h1vi$1@news.trigofacile.com> <u7ml5b$28a3$1@nnrp.usenet.blueworldhosting.com> <u7mluk$1tqm$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 13:51:08 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="42652"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:J5FUZ4m29foXcVBJmFi4ymPXGBU= sha256:Kkt83Pz8mSIIMH0VLV46lFFGgTI/lcfrTgrStPPYdmI=
sha1:Qw88ZjZW768iAfYj4oGpGXd2PFk= sha256:/HrVmX8T9kQ5vkxhswsnKk9EkZLXEHgnTR+V/jiUeEc=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Fri, 30 Jun 2023 13:51 UTC

On Jun 30, 2023 at 8:39:32 AM CDT, "Jesse Rehmer"
<jesse.rehmer@blueworldhosting.com> wrote:

> On this latest run innd hasn't gone completely into the state I initially saw,
> but I do have quite a bit of bad file descriptor messages from tradindexed:
>
> Jun 30 08:13:11 gatekeeper innd[60826]: free:-1 4 free but was in SMASK
> Jun 30 08:13:22 gatekeeper innd[60826]: free:-1 21 free but was in SMASK
> Jun 30 08:14:01 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 1202 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad
> file descriptor
> Jun 30 08:14:01 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000131775A00000001@
> Jun 30 08:14:03 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 1203 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad
> file descriptor
> Jun 30 08:14:03 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000131780200000001@
> Jun 30 08:14:05 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 1204 in /usr/local/news/spool/overview/d/a/d/de.alt.dummschwatz.IDX: Bad
> file descriptor
> Jun 30 08:14:05 gatekeeper innd[60826]: SERVER cant store overview for
> @0301544558540000000001317C9B00000001@
> Jun 30 08:24:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 15855 in
> /usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file
> descriptor
> Jun 30 08:24:10 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000133D26300000001@
> Jun 30 08:24:23 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 15856 in
> /usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file
> descriptor
> Jun 30 08:24:23 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000133DF7700000001@
> Jun 30 08:24:47 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 15857 in
> /usr/local/news/spool/overview/a/p/l/alt.politics.libertarian.IDX: Bad file
> descriptor
> Jun 30 08:24:47 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000133F33D00000001@
> Jun 30 08:34:09 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19832 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:09 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136542C00000001@
> Jun 30 08:34:09 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19833 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:09 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136545A00000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19834 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136548900000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19835 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013654A800000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19836 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013654B400000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19837 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013654C900000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19838 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013654E800000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19839 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013654F400000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19840 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136550400000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19841 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136550600000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19842 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136559900000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19843 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013655BF00000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19844 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013655C500000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19845 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013655D900000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19846 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013655FE00000001@
> Jun 30 08:34:10 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19847 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:10 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136560300000001@
> Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19848 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136560900000001@
> Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19849 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136561400000001@
> Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19850 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136561C00000001@
> Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19851 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136562F00000001@
> Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19852 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136563C00000001@
> Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19853 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136564B00000001@
> Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19854 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013656B000000001@
> Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19855 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013656B100000001@
> Jun 30 08:34:11 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19856 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:11 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013656CB00000001@
> Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19857 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136571900000001@
> Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19858 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136574600000001@
> Jun 30 08:34:12 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19859 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:12 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136575100000001@
> Jun 30 08:34:13 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19860 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:13 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013657F200000001@
> Jun 30 08:34:13 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19861 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:13 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136588400000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19862 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136593300000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19863 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136593B00000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19864 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136594B00000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19865 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136595600000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19866 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136596700000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19867 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136597900000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19868 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136598100000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19869 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @030154455854000000000136598B00000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19870 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013659CB00000001@
> Jun 30 08:34:14 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19871 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:14 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013659DA00000001@
> Jun 30 08:34:15 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19872 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:15 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013659EB00000001@
> Jun 30 08:34:15 gatekeeper innd[60826]: tradindexed: cannot write index record
> for 19873 in /usr/local/news/spool/overview/a/m/c/alt.machines.cnc.IDX: Bad
> file descriptor
> Jun 30 08:34:15 gatekeeper innd[60826]: SERVER cant store overview for
> @03015445585400000000013659FE00000001@
>
> I don't see the OS reporting any filesystem or issues with the block device
> and tdx-util doesn't report any issues.


Click here to read the complete article
Re: INN 2.7.1 death spiral under high load?

<u7mn40$o5i$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1823&group=news.software.nntp#1823

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 13:59:28 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7mn40$o5i$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <u7mjh7$2h1vi$1@news.trigofacile.com> <u7ml5b$28a3$1@nnrp.usenet.blueworldhosting.com> <u7mluk$1tqm$1@nnrp.usenet.blueworldhosting.com> <u7mmkc$19ks$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 13:59:28 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="24754"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:2fNkCmGj0RH3WdsXNlDARebjde4= sha256:5iKWirAwWVvrdv/RDCG5v/JFiazEewONCftmDGDVUWg=
sha1:jOCoBZUp276XNCEyF21mx4SWd14= sha256:ld1ifqVJbNqQqxELIMGDedbExDjDXiGxVLE4JvE8StI=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Fri, 30 Jun 2023 13:59 UTC

It seems I've hit a 'stop' of some sort, or maybe it's just the throttle
behavior but nothing has changed for several minutes.

news.notice:

Jun 30 08:44:12 gatekeeper innd[60826]: SERVER throttled Bad file descriptor
writing creating overview file -- throttling
Jun 30 08:44:12 gatekeeper innd[60826]: SERVER throttle Bad file descriptor
writing creating overview file -- throttling
Jun 30 08:44:12 gatekeeper innd[60826]: SERVER cant store overview for
@030154455854000000000138F94B00000001@
Jun 30 08:44:12 gatekeeper innd[60826]: SERVER cant write history
<20051124113600.38de321b@debianmail.imac> Bad file descriptor
****SNIPPED checkpoint / closed messages****
Jun 30 08:54:02 gatekeeper innd[60826]: ME status seconds 0 accepted 0 refused
0 rejected 0 duplicate 0 accepted size 1 duplicate size 0 rejected size 0
Jun 30 08:54:02 gatekeeper innd[60826]: news.blueworldhosting.com:1028
inactive 40
Jun 30 08:54:02 gatekeeper innd[60826]: news.blueworldhosting.com:1045
inactive 40
Jun 30 08:54:02 gatekeeper innd[60826]: ME time 600939 hishave 192(3318)
hiswrite 471(3317) hissync 0(0) idle 594219(5121) artclean 23(3318) artwrite
498(3318) artcncl 0(1) hisgrep/artcncl 0(1) overv 2343(3318) hissync/overv
13(1) perl 1619(3318) python 81(3318) nntpread 58(7125) artparse 75(8613)
artlog 35(3337) datamove 12(7346)
Jun 30 08:54:02 gatekeeper innd[60826]: ME HISstats 0 hitpos 0 hitneg 0 missed
0 dne

news.debug:

Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92 NCproc
Used=3897
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92
cp->Start=0 cp->Next=2896 bp->Used=3897
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92 NCproc
state=8 next "ivist ^M Richard"
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92
checkpoint seconds 203 accepted 1842 refused 0 rejected 2 duplicate 0 accepted
size 5794716 duplicate size 0 rejected size 181352
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:92 closed
seconds 1851 accepted 17826 refused 0 rejected 18 duplicate 16 accepted size
52758348 duplicate size 39793 rejected size 221145
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 NCreader
Used=1448
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 NCproc
Used=2310
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882
cp->Start=0 cp->Next=1448 bp->Used=2310
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 NCproc
state=8 next "-42e3-8733-69a8"
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882
checkpoint seconds 115 accepted 881 refused 0 rejected 1 duplicate 0 accepted
size 2532336 duplicate size 0 rejected size 2310
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:882 closed
seconds 1877 accepted 14862 refused 0 rejected 20 duplicate 16 accepted size
77809304 duplicate size 294673 rejected size 967053
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 NCreader
Used=2896
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 NCproc
Used=4323
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649
cp->Start=0 cp->Next=2896 bp->Used=4323
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 NCproc
state=8 next "surface, GFS ha"
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649
checkpoint seconds 141 accepted 1905 refused 0 rejected 1 duplicate 0 accepted
size 3706440 duplicate size 0 rejected size 4323
Jun 30 08:44:12 gatekeeper innd[60826]: news.blueworldhosting.com:649 closed
seconds 956 accepted 11887 refused 0 rejected 19 duplicate 16 accepted size
23317392 duplicate size 27153 rejected size 38795
Jun 30 08:54:02 gatekeeper innd[60826]: ME status seconds 0 accepted 0 refused
0 rejected 0 duplicate 0 accepted size 1 duplicate size 0 rejected size 0
Jun 30 08:54:02 gatekeeper innd[60826]: news.blueworldhosting.com:1028
inactive 40
Jun 30 08:54:02 gatekeeper innd[60826]: news.blueworldhosting.com:1045
inactive 40
Jun 30 08:54:02 gatekeeper innd[60826]: ME time 600939 hishave 192(3318)
hiswrite 471(3317) hissync 0(0) idle 594219(5121) artclean 23(3318) artwrite
498(3318) artcncl 0(1) hisgrep/artcncl 0(1) overv 2343(3318) hissync/overv
13(1) perl 1619(3318) python 81(3318) nntpread 58(7125) artparse 75(8613)
artlog 35(3337) datamove 12(7346)
Jun 30 08:54:02 gatekeeper innd[60826]: ME HISstats 0 hitpos 0 hitneg 0 missed
0 dne

Truss output just stops with this:

root@gatekeeper:/var/log/news # truss -p 60826
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)

Re: INN 2.7.1 death spiral under high load?

<u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1824&group=news.software.nntp#1824

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 14:38:18 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <u7mluk$1tqm$1@nnrp.usenet.blueworldhosting.com> <u7mmkc$19ks$1@nnrp.usenet.blueworldhosting.com> <u7mn40$o5i$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 14:38:18 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="39888"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:t+h99ojNmESfXWd3OksjgSEk8fg= sha256:4q0zYMNtFflnV2ezdx+Eu9Skknk6LfCxvebqO4txH9s=
sha1:WBK47mMAJnqdSCW6Pq+DBoFW6PA= sha256:e+Qv9vfzvFk885BGcasD58mIF2gDNPv/KTWABN57bDI=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Fri, 30 Jun 2023 14:38 UTC

After letting things sit for a while without any connections truss did output
a little bit, and I decided to run a innxmit process on the remote side and in
the logs I see:

Jun 30 09:32:58 gatekeeper innd[60826]: any:0 1033 sleeping without Waker
Jun 30 09:32:58 gatekeeper innd[60826]: any:0 closed
Jun 30 09:32:58 gatekeeper innd[60826]: free:-1 1033 free but was in SMASK
Jun 30 09:33:12 gatekeeper innd[60826]: any:0 1024 sleeping without Waker
Jun 30 09:33:12 gatekeeper innd[60826]: any:0 closed
Jun 30 09:33:21 gatekeeper innd[60826]: free:-1 1024 free but was in SMASK
Jun 30 09:33:22 gatekeeper innd[60826]: any:0 1034 sleeping without Waker
Jun 30 09:33:22 gatekeeper innd[60826]: any:0 closed
Jun 30 09:33:22 gatekeeper innd[60826]: free:-1 1034 free but was in SMASK
Jun 30 09:33:23 gatekeeper innd[60826]: free:-1 1024 free but was in SMASK

root@gatekeeper:/var/log/news # truss -p 60826
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1046,{ 12 14 15 17 },{ },0x0,{ 137.000000 }) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
open("/usr/local/news/http/inn_status.html",O_WRONLY|O_CREAT|O_TRUNC,0666) = 9
(0x9)
dup2(9,4) = 4 (0x4)
close(9) = 0 (0x0)
fstat(4,{ mode=-rw-rw-r-- ,inode=3126,size=0,blksize=4096 }) = 0 (0x0)
getpid() = 60826 (0xed9a)
__sysctl("kern.hostname",2,0x2231d0316620,0x2231d0312508,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:04:03.389851"...,198,0,NULL,0) = 198 (0xc6)
write(4,"<!DOCTYPE html>\n<html lang="en""...,1166) = 1166 (0x48e)
open("/dev/null",O_RDONLY,0666) = 9 (0x9)
dup2(9,4) = 4 (0x4)
close(9) = 0 (0x0)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:04:03.392944"...,128,0,NULL,0) = 128 (0x80)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:04:03.393756"...,128,0,NULL,0) = 128 (0x80)
__sysctl("kern.hostname",2,0x2231d0316930,0x2231d0312818,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:04:03.394272"...,325,0,NULL,0) = 325 (0x145)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:04:03.394707"...,130,0,NULL,0) = 130 (0x82)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1046,{ 12 14 15 17 },{ },0x0,{ 600.000000 }) = 1 (0x1)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
accept(17,{ AF_INET 104.152.52.99:53903 },0x2231d0315a5c) = 9 (0x9)
setsockopt(9,SOL_SOCKET,SO_KEEPALIVE,0x2231d03158dc,4) = 0 (0x0)
fcntl(9,F_GETFL,) = 6 (0x6)
fcntl(9,F_SETFL,O_RDWR) = 0 (0x0)
fork() = 85116 (0x14c7c)
close(9) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1046,{ 12 14 15 17 },{ },0x0,{ 511.000000 }) ERR#4 'Interrupted system
call'
SIGNAL 20 (SIGCHLD) code=CLD_EXITED pid=85116 uid=8 status=1
sigprocmask(SIG_SETMASK,{ SIGCHLD },0x0) = 0 (0x0)
wait4(-1,{ EXITED,val=1 },WNOHANG,0x0) = 85116 (0x14c7c)
wait4(-1,0x2231d0316484,WNOHANG,0x0) ERR#10 'No child processes'
sigreturn(0x2231d03164b0) EJUSTRETURN
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1046,{ 12 14 15 17 },{ },0x0,{ 511.000000 }) = 1 (0x1)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
accept(17,{ AF_INET 104.152.52.199:50860 },0x2231d0315a5c) = 9 (0x9)
setsockopt(9,SOL_SOCKET,SO_KEEPALIVE,0x2231d03158dc,4) = 0 (0x0)
fcntl(9,F_GETFL,) = 6 (0x6)
fcntl(9,F_SETFL,O_RDWR) = 0 (0x0)
fork() = 85997 (0x14fed)
close(9) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1046,{ 12 14 15 17 },{ },0x0,{ 464.000000 }) ERR#4 'Interrupted system
call'
SIGNAL 20 (SIGCHLD) code=CLD_EXITED pid=85997 uid=8 status=1
sigprocmask(SIG_SETMASK,{ SIGCHLD },0x0) = 0 (0x0)
wait4(-1,{ EXITED,val=1 },WNOHANG,0x0) = 85997 (0x14fed)
wait4(-1,0x2231d0316484,WNOHANG,0x0) ERR#10 'No child processes'
sigreturn(0x2231d03164b0) EJUSTRETURN
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1046,{ 12 14 15 17 },{ },0x0,{ 463.000000 }) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
open("/usr/local/news/http/inn_status.html",O_WRONLY|O_CREAT|O_TRUNC,0666) = 9
(0x9)
dup2(9,4) = 4 (0x4)
close(9) = 0 (0x0)
fstat(4,{ mode=-rw-rw-r-- ,inode=3126,size=0,blksize=4096 }) = 0 (0x0)
getpid() = 60826 (0xed9a)
__sysctl("kern.hostname",2,0x2231d0316620,0x2231d0312508,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.408448"...,198,0,NULL,0) = 198 (0xc6)
write(4,"<!DOCTYPE html>\n<html lang="en""...,1166) = 1166 (0x48e)
open("/dev/null",O_RDONLY,0666) = 9 (0x9)
dup2(9,4) = 4 (0x4)
close(9) = 0 (0x0)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.409333"...,128,0,NULL,0) = 128 (0x80)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.409640"...,124,0,NULL,0) = 124 (0x7c)
__sysctl("kern.hostname",2,0x2231d0316940,0x2231d0312828,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.409953"...,233,0,NULL,0) = 233 (0xe9)
__sysctl("kern.hostname",2,0x2231d0316940,0x2231d0312828,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.410270"...,229,0,NULL,0) = 229 (0xe5)
close(1028) = 0 (0x0)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.410712"...,128,0,NULL,0) = 128 (0x80)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.411017"...,124,0,NULL,0) = 124 (0x7c)
__sysctl("kern.hostname",2,0x2231d0316940,0x2231d0312828,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.411323"...,233,0,NULL,0) = 233 (0xe9)
__sysctl("kern.hostname",2,0x2231d0316940,0x2231d0312828,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.411634"...,229,0,NULL,0) = 229 (0xe5)
close(1045) = 0 (0x0)
__sysctl("kern.hostname",2,0x2231d0316930,0x2231d0312818,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.412088"...,324,0,NULL,0) = 324 (0x144)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:14:03.412390"...,130,0,NULL,0) = 130 (0x82)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1045,{ 12 14 15 17 },{ },0x0,{ 600.000000 }) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
open("/usr/local/news/http/inn_status.html",O_WRONLY|O_CREAT|O_TRUNC,0666) = 9
(0x9)
dup2(9,4) = 4 (0x4)
close(9) = 0 (0x0)
fstat(4,{ mode=-rw-rw-r-- ,inode=3126,size=0,blksize=4096 }) = 0 (0x0)
getpid() = 60826 (0xed9a)
__sysctl("kern.hostname",2,0x2231d0316620,0x2231d0312508,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:24:03.449870"...,198,0,NULL,0) = 198 (0xc6)
write(4,"<!DOCTYPE html>\n<html lang="en""...,1166) = 1166 (0x48e)
open("/dev/null",O_RDONLY,0666) = 9 (0x9)
dup2(9,4) = 4 (0x4)
close(9) = 0 (0x0)
__sysctl("kern.hostname",2,0x2231d0316930,0x2231d0312818,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:24:03.450754"...,324,0,NULL,0) = 324 (0x144)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:24:03.451068"...,130,0,NULL,0) = 130 (0x82)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1045,{ 12 14 15 17 },{ },0x0,{ 600.000000 }) = 1 (0x1)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
accept(17,{ AF_INET 199.102.165.15:54669 },0x2231d0315a5c) = 9 (0x9)
fcntl(9,F_GETFD,) = 0 (0x0)
fcntl(9,F_SETFD,FD_CLOEXEC) = 0 (0x0)
fcntl(9,F_GETFL,) = 6 (0x6)
fcntl(9,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0)
setsockopt(9,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(9,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(9,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0)
write(1,"Jun 30 08:44:12.171 - spool1.use"...,3147) = 3147 (0xc4b)
write(2,"innd: any:0 1033 sleeping withou"...,40) = 40 (0x28)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<59>1 2023-06-30T09:32:58.242206"...,119,0,NULL,0) = 119 (0x77)
__sysctl("kern.hostname",2,0x2231d0316950,0x2231d0312838,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:32:58.242567"...,98,0,NULL,0) = 98 (0x62)
close(0) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1045,{ 12 14 15 17 1033 },{ 9 },0x0,{ 66.000000 }) = 2 (0x2)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
write(2,"innd: free:-1 1033 free but was "...,41) = 41 (0x29)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<59>1 2023-06-30T09:32:58.243474"...,120,0,NULL,0) = 120 (0x78)
close(1033) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1045,{ 12 14 15 17 },{ },0x0,{ 66.000000 }) = 1 (0x1)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
accept(17,{ AF_INET 199.102.165.15:54670 },0x2231d0315a5c) = 0 (0x0)
fcntl(0,F_GETFD,) = 0 (0x0)
fcntl(0,F_SETFD,FD_CLOEXEC) = 0 (0x0)
fcntl(0,F_GETFL,) = 6 (0x6)
fcntl(0,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0)
setsockopt(0,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(0,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(0,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0)
write(2,"innd: any:0 1024 sleeping withou"...,40) = 40 (0x28)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<59>1 2023-06-30T09:33:12.269818"...,119,0,NULL,0) = 119 (0x77)
__sysctl("kern.hostname",2,0x2231d0316950,0x2231d0312838,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:33:12.270176"...,98,0,NULL,0) = 98 (0x62)
close(0) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1045,{ 12 14 15 17 },{ },0x0,{ 52.000000 }) = 1 (0x1)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
accept(17,{ AF_INET 199.102.165.15:16010 },0x2231d0315a5c) = 0 (0x0)
fcntl(0,F_GETFD,) = 0 (0x0)
fcntl(0,F_SETFD,FD_CLOEXEC) = 0 (0x0)
fcntl(0,F_GETFL,) = 6 (0x6)
fcntl(0,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0)
setsockopt(0,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(0,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(0,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0)
write(2,"innd: free:-1 1024 free but was "...,41) = 41 (0x29)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<59>1 2023-06-30T09:33:21.619875"...,120,0,NULL,0) = 120 (0x78)
close(1024) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1045,{ 12 14 15 17 },{ },0x0,{ 42.000000 }) = 1 (0x1)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
accept(17,{ AF_INET 199.102.165.15:17875 },0x2231d0315a5c) = 10 (0xa)
fcntl(10,F_GETFD,) = 0 (0x0)
fcntl(10,F_SETFD,FD_CLOEXEC) = 0 (0x0)
fcntl(10,F_GETFL,) = 6 (0x6)
fcntl(10,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0)
setsockopt(10,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(10,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(10,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0)
write(2,"innd: any:0 1034 sleeping withou"...,40) = 40 (0x28)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<59>1 2023-06-30T09:33:22.750984"...,119,0,NULL,0) = 119 (0x77)
__sysctl("kern.hostname",2,0x2231d0316950,0x2231d0312838,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<61>1 2023-06-30T09:33:22.751310"...,98,0,NULL,0) = 98 (0x62)
close(0) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1045,{ 12 14 15 17 1034 },{ 10 },0x0,{ 41.000000 }) = 2 (0x2)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
write(2,"innd: free:-1 1034 free but was "...,41) = 41 (0x29)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<59>1 2023-06-30T09:33:22.752212"...,120,0,NULL,0) = 120 (0x78)
close(1034) = 0 (0x0)
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)
select(1045,{ 12 14 15 17 },{ },0x0,{ 41.000000 }) = 1 (0x1)
sigprocmask(SIG_SETMASK,{ SIGHUP|SIGPIPE|SIGTERM|SIGCHLD|SIGUSR1 },0x0) = 0
(0x0)
accept(17,{ AF_INET 199.102.165.15:24294 },0x2231d0315a5c) = 0 (0x0)
fcntl(0,F_GETFD,) = 0 (0x0)
fcntl(0,F_SETFD,FD_CLOEXEC) = 0 (0x0)
fcntl(0,F_GETFL,) = 6 (0x6)
fcntl(0,F_SETFL,O_RDWR|O_NONBLOCK) = 0 (0x0)
setsockopt(0,SOL_SOCKET,SO_SNDBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(0,SOL_SOCKET,SO_RCVBUF,0x2231d03159e4,4) = 0 (0x0)
setsockopt(0,SOL_SOCKET,SO_KEEPALIVE,0x2231d03159e4,4) = 0 (0x0)
write(2,"innd: free:-1 1024 free but was "...,41) = 41 (0x29)
__sysctl("kern.hostname",2,0x2231d03169a0,0x2231d0312888,0x0,0) = 0 (0x0)
getpid() = 60826 (0xed9a)
sendto(3,"<59>1 2023-06-30T09:33:23.636694"...,120,0,NULL,0) = 120 (0x78)
close(1024) ERR#9 'Bad file descriptor'
sigprocmask(SIG_SETMASK,{ },0x0) = 0 (0x0)


Click here to read the complete article
Re: INN 2.7.1 death spiral under high load?

<u7mtup$2h67a$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1825&group=news.software.nntp#1825

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.176-143-2-105.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Fri, 30 Jun 2023 17:56:09 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <u7mtup$2h67a$1@news.trigofacile.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<u7mluk$1tqm$1@nnrp.usenet.blueworldhosting.com>
<u7mmkc$19ks$1@nnrp.usenet.blueworldhosting.com>
<u7mn40$o5i$1@nnrp.usenet.blueworldhosting.com>
<u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 30 Jun 2023 15:56:09 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="176-143-2-105.abo.bbox.fr:176.143.2.105";
logging-data="2660586"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.12.0
Cancel-Lock: sha1:Q6WmkmdQm4dtrmr6zuHNlZJaI2k= sha256:+rV18Mq5B8MutCQ29rFAT913xTBhCS5Q8P7E8pkae+M=
sha1:lnngYhunmTIZ0v5qMMXlyiHAXs0= sha256:695T9ja8w2/jn+ZbIRCfe1w2GjP6E6jpxQm1dkLE6og=
In-Reply-To: <u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com>
 by: Julien ÉLIE - Fri, 30 Jun 2023 15:56 UTC

Hi Jesse,

> After letting things sit for a while without any connections truss did output
> a little bit, and I decided to run a innxmit process on the remote side and in
> the logs I see:
>
> Jun 30 09:32:58 gatekeeper innd[60826]: any:0 1033 sleeping without Waker
> Jun 30 09:32:58 gatekeeper innd[60826]: any:0 closed

Hmm, now the file descriptors exceed 1024.

In the man page of select(2) for FreeBSD, I see:

The default size of FD_SETSIZE is currently 1024. In order to accommo-
date programs which might potentially use a larger number of open
files
with select(), it is possible to increase this size by having the
program
define FD_SETSIZE before the inclusion of any header which includes
<sys/types.h>.

There's maybe something to do regarding that as we do not do that...
Nonetheless, it does not explain the previous issue with file
descriptors < 1024.

--
Julien ÉLIE

« Un clavier azerty en vaut deux. »

Re: INN 2.7.1 death spiral under high load?

<u7ogvd$2ifbq$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1828&group=news.software.nntp#1828

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Sat, 1 Jul 2023 08:26:53 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <u7ogvd$2ifbq$1@news.trigofacile.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<u7mluk$1tqm$1@nnrp.usenet.blueworldhosting.com>
<u7mmkc$19ks$1@nnrp.usenet.blueworldhosting.com>
<u7mn40$o5i$1@nnrp.usenet.blueworldhosting.com>
<u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com>
<u7mtup$2h67a$1@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 1 Jul 2023 06:26:53 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr:176.143.2.105";
logging-data="2702714"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.12.0
Cancel-Lock: sha1:p4VMHo7G7IWwXdhZQoLDyjGx1Kg= sha256:aL6k5IOUGGfqaR7wlqkSRPt+GSHhsF/gTqsf8vOYU9g=
sha1:Khev0GT8txra1CgLLdaNTCJaaQg= sha256:yKXpbtf8pagShjCqtQU6cmVCCFmFFZhHO+lbRAZdvZE=
In-Reply-To: <u7mtup$2h67a$1@news.trigofacile.com>
 by: Julien ÉLIE - Sat, 1 Jul 2023 06:26 UTC

Hi Jesse,

> In the man page of select(2) for FreeBSD, I see:
>
> The default size of FD_SETSIZE is currently 1024. In order to accommo-
> date programs which might potentially use a larger number of open files
> with select(), it is possible to increase this size by having the program
> define FD_SETSIZE before the inclusion of any header which includes
> <sys/types.h>.
>
> There's maybe something to do regarding that as we do not do that...
> Nonetheless, it does not explain the previous issue with file
> descriptors < 1024.

Could this be tried please?

The sleep set is just after the read set in memory, and I am under the
impression that when FD_SET is used on a higher file descriptor than
1024 by select() to advertise that this file descriptor should be read,
it just marks that number minus 1024 as sleeping in the sleep set. The
length of the set is not large enough.

You could either try to rebuild INN with 4096 set in FD_SETSIZE in your
system headers /usr/include/sys/select.h (and then revert the change
after the build, just to test) or try this one for innd/chan.c only:

--- a/innd/chan.c
+++ b/innd/chan.c
@@ -17,6 +17,9 @@

/* Needed on AIX 4.1 to get fd_set and friends. */
#ifdef HAVE_SYS_SELECT_H
+# ifndef FD_SETSIZE
+# define FD_SETSIZE 4096
+# endif
# include <sys/select.h>
#endif

Please tell if it happens to work better.

--
Julien ÉLIE

« Un clavier azerty en vaut deux. »

Re: INN 2.7.1 death spiral under high load?

<u7oljn$1chb$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1829&group=news.software.nntp#1829

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Sat, 1 Jul 2023 07:45:59 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7oljn$1chb$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com> <u7mtup$2h67a$1@news.trigofacile.com> <u7ogvd$2ifbq$1@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 1 Jul 2023 07:45:59 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="45611"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:ffYSdHKyaEhos9h1fhxTb7uzIiw= sha256:Q++Q+eFYoKkp3yDrnaIUBlAdvmS3KGLLDEz+mpJ8qHQ=
sha1:hfl8T/7Uq7176qli3uRToSjzfXk= sha256:aQM6F6jK9E2EacEjBBBTIyMCQawJSyNW25QErk9ruRc=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Sat, 1 Jul 2023 07:45 UTC

On Jul 1, 2023 at 1:26:53 AM CDT, "Julien ÉLIE"
<iulius@nom-de-mon-site.com.invalid> wrote:

> Hi Jesse,
>
> The sleep set is just after the read set in memory, and I am under the
> impression that when FD_SET is used on a higher file descriptor than
> 1024 by select() to advertise that this file descriptor should be read,
> it just marks that number minus 1024 as sleeping in the sleep set. The
> length of the set is not large enough.
>
> You could either try to rebuild INN with 4096 set in FD_SETSIZE in your
> system headers /usr/include/sys/select.h (and then revert the change
> after the build, just to test) or try this one for innd/chan.c only:
>
> --- a/innd/chan.c
> +++ b/innd/chan.c
> @@ -17,6 +17,9 @@
>
> /* Needed on AIX 4.1 to get fd_set and friends. */
> #ifdef HAVE_SYS_SELECT_H
> +# ifndef FD_SETSIZE
> +# define FD_SETSIZE 4096
> +# endif
> # include <sys/select.h>
> #endif
>
>
> Please tell if it happens to work better.

Hi Julien, I spent today in the data center setting up a Dell R440 with a
couple PCI NVMe drives. This is a big upgrade from my source and destination
hardware (R710s on old SAS RAID5) and am in the process of migrating the VMs
to this new host. I will rebuild INN with the larger FD_SETSIZE and see what
happens.

One thing of note, I have been doing various things over the last year that
involve shoving large amount of articles to INN as fast as I can, but it
wasn't until I messed with a separate destination machine that has SSD storage
and could write a ton faster that I ran into this, so I'm curious to see how
things go when both source and destination are on NVMe drives.

Much appreciate everyone's eyes and suggestions. Will report back when I've
got the new environments running.

Re: INN 2.7.1 death spiral under high load?

<u7ollu$1jmj$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1830&group=news.software.nntp#1830

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Sat, 1 Jul 2023 07:47:10 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7ollu$1jmj$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com> <u7mtup$2h67a$1@news.trigofacile.com> <u7ogvd$2ifbq$1@news.trigofacile.com> <u7oljn$1chb$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 1 Jul 2023 07:47:10 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="52947"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:/GpW1ydJV3pw4a4HXy8mnk9BHPg= sha256:kPg50u5x1M/vNy2hB5IP9EyvztE17MCiQ8+vRsOPxZs=
sha1:0l7re7Jrm1ykwt339QL13BjNgCE= sha256:SNqNEkCyKqacJT5EokGKTwYW4YKX6bOnzSuW5A7c68k=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Sat, 1 Jul 2023 07:47 UTC

On Jul 1, 2023 at 2:45:59 AM CDT, "Jesse Rehmer"
<jesse.rehmer@blueworldhosting.com> wrote:

> On Jul 1, 2023 at 1:26:53 AM CDT, "Julien ÉLIE"
> <iulius@nom-de-mon-site.com.invalid> wrote:
>
>> Hi Jesse,
>>
>> The sleep set is just after the read set in memory, and I am under the
>> impression that when FD_SET is used on a higher file descriptor than
>> 1024 by select() to advertise that this file descriptor should be read,
>> it just marks that number minus 1024 as sleeping in the sleep set. The
>> length of the set is not large enough.
>>
>> You could either try to rebuild INN with 4096 set in FD_SETSIZE in your
>> system headers /usr/include/sys/select.h (and then revert the change
>> after the build, just to test) or try this one for innd/chan.c only:
>>
>> --- a/innd/chan.c
>> +++ b/innd/chan.c
>> @@ -17,6 +17,9 @@
>>
>> /* Needed on AIX 4.1 to get fd_set and friends. */
>> #ifdef HAVE_SYS_SELECT_H
>> +# ifndef FD_SETSIZE
>> +# define FD_SETSIZE 4096
>> +# endif
>> # include <sys/select.h>
>> #endif
>>
>>
>> Please tell if it happens to work better.
>
> Hi Julien, I spent today in the data center setting up a Dell R440 with a
> couple PCI NVMe drives. This is a big upgrade from my source and destination
> hardware (R710s on old SAS RAID5) and am in the process of migrating the VMs
> to this new host. I will rebuild INN with the larger FD_SETSIZE and see what
> happens.
>
> One thing of note, I have been doing various things over the last year that
> involve shoving large amount of articles to INN as fast as I can, but it
> wasn't until I messed with a separate destination machine that has SSD storage
> and could write a ton faster that I ran into this, so I'm curious to see how
> things go when both source and destination are on NVMe drives.
>
> Much appreciate everyone's eyes and suggestions. Will report back when I've
> got the new environments running.

Something I just noticed when looking at my Diablo feeder source code... some
time ago one of the patches submitted increased FD_SETSIZE to 4096. Hmmm. :-)

Re: INN 2.7.1 death spiral under high load?

<u7phln$2is6p$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1831&group=news.software.nntp#1831

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Sat, 1 Jul 2023 17:44:55 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <u7phln$2is6p$1@news.trigofacile.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com>
<u7mtup$2h67a$1@news.trigofacile.com> <u7ogvd$2ifbq$1@news.trigofacile.com>
<u7oljn$1chb$1@nnrp.usenet.blueworldhosting.com>
<u7ollu$1jmj$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 1 Jul 2023 15:44:55 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="san13-h02-176-143-2-105.dsl.sta.abo.bbox.fr:176.143.2.105";
logging-data="2715865"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.12.0
Cancel-Lock: sha1:cWdYciuCOhARz+pO3M8GkTj2jz4= sha256:pjQqV9tXqHHqzQfl4TLwkwvLW0iB0OPdsYM8izycZZQ=
sha1:GD4u83oYq2vF0YkS7uCIET9kRxM= sha256:V0WTAbeC1uFVm/6wDgB7sHo35DgkhOv5aoiAkwZ9E5U=
In-Reply-To: <u7ollu$1jmj$1@nnrp.usenet.blueworldhosting.com>
 by: Julien ÉLIE - Sat, 1 Jul 2023 15:44 UTC

Hi Jesse,

>> One thing of note, I have been doing various things over the last year that
>> involve shoving large amount of articles to INN as fast as I can, but it
>> wasn't until I messed with a separate destination machine that has SSD storage
>> and could write a ton faster that I ran into this, so I'm curious to see how
>> things go when both source and destination are on NVMe drives.

Thanks again for your bug report and all your testings!

> Something I just noticed when looking at my Diablo feeder source code... some
> time ago one of the patches submitted increased FD_SETSIZE to 4096. Hmmm. :-)

Oh, thanks for the pointer.
Maybe we should do something like that for INN too as it seems to have
done the work for Diablo. Here is the related patch:
https://github.com/jpmens/diablo/commit/9505abdc9d100e97ef89e2f58482fe9bd49ca215

/*
* Cannot increase FD_SETSIZE on Linux, but we can increase __FD_SETSIZE
* with glibc 2.2 and 2.3 at least. We do this by including
* bits/types.h which defines __FD_SETSIZE first (before any other
include),
* then we redefine __FD_SETSIZE. Ofcourse a user program may NEVER
* include bits/whatever.h directly, so this is a dirty hack!
*/
#if LARGE_FD_SETSIZE > 1024
# ifdef __linux__
# include <features.h>
# if (__GLIBC__ > 2) || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2)
# include <bits/types.h>
# undef __FD_SETSIZE
# define __FD_SETSIZE LARGE_FD_SETSIZE
# endif
# else
/* Works on most BSDs */
# define FD_SETSIZE LARGE_FD_SETSIZE
# endif
#endif

So it would for instance mean for INN to say in the documentation that
-DLARGE_FD_SETSIZE=4096 should be given at build time to increase the
limit of the number of file descriptors to 4096.

Having googled a bit, there does not seem to be a more elegant way to
achieve that, unless switching to another method to read sockets
(poll(), libevent or).

Any worries with that piece of code to add into INN?

Jesse, the suggestion for
#define FD_SETSIZE 4096
I gave you in innd/chan.c may not work. The #define should be before
any inclusion of <sys/types.h>, and we do that in several header files.
Maybe it should be in include/inn/libinn.h instead, or maybe in
include/portable/system.h or maybe elsewhere, I should have a deeper look.

--
Julien ÉLIE

« La bête aux douze pieds qui marche sur la tête. » (Nougaro)

Re: INN 2.7.1 death spiral under high load?

<871qhrd13m.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1832&group=news.software.nntp#1832

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Sat, 01 Jul 2023 09:54:05 -0700
Organization: The Eyrie
Message-ID: <871qhrd13m.fsf@hope.eyrie.org>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com>
<u7mtup$2h67a$1@news.trigofacile.com>
<u7ogvd$2ifbq$1@news.trigofacile.com>
<u7oljn$1chb$1@nnrp.usenet.blueworldhosting.com>
<u7ollu$1jmj$1@nnrp.usenet.blueworldhosting.com>
<u7phln$2is6p$1@news.trigofacile.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: hope.eyrie.org;
logging-data="32674"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:/4G5splaYPTWJPaCiTyTbWP5RO0=
 by: Russ Allbery - Sat, 1 Jul 2023 16:54 UTC

Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> writes:

> Having googled a bit, there does not seem to be a more elegant way to
> achieve that, unless switching to another method to read sockets
> (poll(), libevent or).

Switching to libevent is almost certainly the right long-term solution
(I've been thinking about this for ages and previously converted remctld
to it), since then it handles all the many confusing variations of
select/poll/epoll/etc. for you and you don't have to worry about figuring
out which system has which variation. It provides an equivalent of innd's
current select loop that's much cleaner but works in a very similar way.
But unfortunately it's a whole lot of quite disruptive work.

It would probably produce a noticable performance increase for really busy
servers, too. select is notoriously inefficient.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: INN 2.7.1 death spiral under high load?

<u7q8ac$1icq$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1833&group=news.software.nntp#1833

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Sat, 1 Jul 2023 22:11:24 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7q8ac$1icq$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <u7oljn$1chb$1@nnrp.usenet.blueworldhosting.com> <u7ollu$1jmj$1@nnrp.usenet.blueworldhosting.com> <u7phln$2is6p$1@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 1 Jul 2023 22:11:24 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="51610"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:gfjLqOK6Or6aWoQDcpHopixB/8U= sha256:nqzST2r7SrYO+8u3T27zqKBY+JUS1MS+3BE54Odj3MU=
sha1:H+HuTmY9IkrpTDr83JIVs6bveNA= sha256:Wikt/2LS+ZM0+atB/fzGnxEHEdZc5joDMICi4UwhfHg=
X-Usenapp: v1.27.1/d - Full License
 by: Jesse Rehmer - Sat, 1 Jul 2023 22:11 UTC

On Jul 1, 2023 at 10:44:55 AM CDT, "Julien ÉLIE"
<iulius@nom-de-mon-site.com.invalid> wrote:

> Hi Jesse,
>
>>> One thing of note, I have been doing various things over the last year that
>>> involve shoving large amount of articles to INN as fast as I can, but it
>>> wasn't until I messed with a separate destination machine that has SSD storage
>>> and could write a ton faster that I ran into this, so I'm curious to see how
>>> things go when both source and destination are on NVMe drives.
>
> Thanks again for your bug report and all your testings!
>
>
>> Something I just noticed when looking at my Diablo feeder source code... some
>> time ago one of the patches submitted increased FD_SETSIZE to 4096. Hmmm. :-)
>
> Oh, thanks for the pointer.
> Maybe we should do something like that for INN too as it seems to have
> done the work for Diablo. Here is the related patch:
>
> https://github.com/jpmens/diablo/commit/9505abdc9d100e97ef89e2f58482fe9bd49ca215
>
> /*
> * Cannot increase FD_SETSIZE on Linux, but we can increase __FD_SETSIZE
> * with glibc 2.2 and 2.3 at least. We do this by including
> * bits/types.h which defines __FD_SETSIZE first (before any other
> include),
> * then we redefine __FD_SETSIZE. Ofcourse a user program may NEVER
> * include bits/whatever.h directly, so this is a dirty hack!
> */
> #if LARGE_FD_SETSIZE > 1024
> # ifdef __linux__
> # include <features.h>
> # if (__GLIBC__ > 2) || (__GLIBC__ == 2 && __GLIBC_MINOR__ >= 2)
> # include <bits/types.h>
> # undef __FD_SETSIZE
> # define __FD_SETSIZE LARGE_FD_SETSIZE
> # endif
> # else
> /* Works on most BSDs */
> # define FD_SETSIZE LARGE_FD_SETSIZE
> # endif
> #endif
>
>
>
> So it would for instance mean for INN to say in the documentation that
> -DLARGE_FD_SETSIZE=4096 should be given at build time to increase the
> limit of the number of file descriptors to 4096.
>
> Having googled a bit, there does not seem to be a more elegant way to
> achieve that, unless switching to another method to read sockets
> (poll(), libevent or).
>
> Any worries with that piece of code to add into INN?
>
>
> Jesse, the suggestion for
> #define FD_SETSIZE 4096
> I gave you in innd/chan.c may not work. The #define should be before
> any inclusion of <sys/types.h>, and we do that in several header files.
> Maybe it should be in include/inn/libinn.h instead, or maybe in
> include/portable/system.h or maybe elsewhere, I should have a deeper look.

I spun up a new FreeBSD VM backed by SSD storage, set FD_SETSIZE to 4096 in
/usr/include/sys/select.h and compiled INN and started my process. Honestly,
it's going so fast with one innxmit process now that both machines are on
solid state storage I'm not sure I can make it go any faster on the receiving
end. Doing ~3200 articles per second with a single innxmit process on the
source side.

INN isn't CPU bound yet, so tomorrow when I plan to do more playing around I
will try multiple innxmit processes and see just how far I can push it and if
I run into a similar crash. That said, the previous crashes I was not getting
anywhere near this article rate.

Can we talk about the implications of setting icdsynccount to a much much
higher value than 10? I have it set to 500 now, but curious if there could be
gains by setting this really high in my scenario where I know I'm not
receiving duplicate articles and no articles exist on the destination? If
there are any other tuning suggestions that can be had as far as reducing
extraneous things (I did set incominglogfrequency to 2000 which still logs
about twice a second, for example)?

Jul 1 16:46:32 sorter innd[1098]: ME status seconds 1183 accepted 3722328
refused 0 rejected 1017 duplicate 16 accepted size 10797672448 duplicate size
42912 rejected size 149576720
Jul 1 16:56:32 sorter innd[1098]: ME time 600000 hishave 2087(1997776)
hiswrite 37175(1997776) hissync 6(2) idle 127733(2314098) artclean
3311(1997776) artwrite 20582(1997709) artcncl 90(431) hisgrep/artcncl 2(431)
overv 50717(1997709) perl 295793(1997776) python 13162(1997776) nntpread
4811(2314098) artparse 14916(4102412) artlog 5645(1998040) datamove
709(2472871)
Jul 1 16:56:32 sorter innd[1098]: ME HISstats 0 hitpos 0 hitneg 0 missed
1997776 dne
Jul 1 16:56:32 sorter innd[1098]: ME status seconds 1783 accepted 5720026
refused 0 rejected 1084 duplicate 16 accepted size 16114387968 duplicate size
42912 rejected size 151971536

Re: INN 2.7.1 death spiral under high load?

<87h6qnb6jl.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1834&group=news.software.nntp#1834

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Sat, 01 Jul 2023 15:39:26 -0700
Organization: The Eyrie
Message-ID: <87h6qnb6jl.fsf@hope.eyrie.org>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<u7oljn$1chb$1@nnrp.usenet.blueworldhosting.com>
<u7ollu$1jmj$1@nnrp.usenet.blueworldhosting.com>
<u7phln$2is6p$1@news.trigofacile.com>
<u7q8ac$1icq$1@nnrp.usenet.blueworldhosting.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="17487"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:OT+wiTSs+xHoFHUXht2hCQmoPSk=
 by: Russ Allbery - Sat, 1 Jul 2023 22:39 UTC

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

> Can we talk about the implications of setting icdsynccount to a much
> much higher value than 10? I have it set to 500 now, but curious if
> there could be gains by setting this really high in my scenario where I
> know I'm not receiving duplicate articles and no articles exist on the
> destination?

It decreases the frequency of syncs so it's a straight tradeoff between
faster speed and more data loss if innd crashes (or the system crashes, or
loses power, etc.). The risk is that if something goes wrong and innd
crashes before updating the history and active files, the sending system
may think it has successfully transmitted articles (they were acked), but
the receiving system never recorded they were received and thus while
they're on disk they're orphaned.

If innd doesn't crash, you could set it arbitrarily high and I think it
would just make things faster. Think of it as the checkpointing
frequency.

> If there are any other tuning suggestions that can be had as far as
> reducing extraneous things (I did set incominglogfrequency to 2000 which
> still logs about twice a second, for example)?

Make sure that you've disabled sync for all of your news logs. I think
that's already covered in INSTALL but a lot of people forget about it.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: INN 2.7.1 death spiral under high load?

<u7qaf2$15t1$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1835&group=news.software.nntp#1835

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer03.ams1!peer.ams1.xlned.com!news.xlned.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Sat, 1 Jul 2023 22:48:02 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <u7qaf2$15t1$1@nnrp.usenet.blueworldhosting.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com> <u7phln$2is6p$1@news.trigofacile.com> <u7q8ac$1icq$1@nnrp.usenet.blueworldhosting.com> <87h6qnb6jl.fsf@hope.eyrie.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 1 Jul 2023 22:48:02 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="38817"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:L5rfKP1gsGGA52FIdtGHV138VtU= sha256:Nhq1g+7DewDP/gsCf9B3/CWh23p1YJj8zZbGxArw+GY=
sha1:UhRx0Rftyn5byoChMPVlqhAeUwI= sha256:o7d+2OQzMNeCpYLmffOl+2NtdjT77MX5BctqFaPvy5E=
X-Usenapp: v1.27.1/d - Full License
X-Received-Bytes: 2817
 by: Jesse Rehmer - Sat, 1 Jul 2023 22:48 UTC

On Jul 1, 2023 at 5:39:26 PM CDT, "Russ Allbery" <eagle@eyrie.org> wrote:

> Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
>
>> Can we talk about the implications of setting icdsynccount to a much
>> much higher value than 10? I have it set to 500 now, but curious if
>> there could be gains by setting this really high in my scenario where I
>> know I'm not receiving duplicate articles and no articles exist on the
>> destination?
>
> It decreases the frequency of syncs so it's a straight tradeoff between
> faster speed and more data loss if innd crashes (or the system crashes, or
> loses power, etc.). The risk is that if something goes wrong and innd
> crashes before updating the history and active files, the sending system
> may think it has successfully transmitted articles (they were acked), but
> the receiving system never recorded they were received and thus while
> they're on disk they're orphaned.
>
> If innd doesn't crash, you could set it arbitrarily high and I think it
> would just make things faster. Think of it as the checkpointing
> frequency.
>
>> If there are any other tuning suggestions that can be had as far as
>> reducing extraneous things (I did set incominglogfrequency to 2000 which
>> still logs about twice a second, for example)?
>
> Make sure that you've disabled sync for all of your news logs. I think
> that's already covered in INSTALL but a lot of people forget about it.

I'm not finding anything about that in the current INSTALL I have on my
machine?

Re: INN 2.7.1 death spiral under high load?

<87cz1bb502.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1836&group=news.software.nntp#1836

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.szaf.org!nntp-feed.chiark.greenend.org.uk!ewrotcd!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Sat, 01 Jul 2023 16:12:45 -0700
Organization: The Eyrie
Message-ID: <87cz1bb502.fsf@hope.eyrie.org>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<u7phln$2is6p$1@news.trigofacile.com>
<u7q8ac$1icq$1@nnrp.usenet.blueworldhosting.com>
<87h6qnb6jl.fsf@hope.eyrie.org>
<u7qaf2$15t1$1@nnrp.usenet.blueworldhosting.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="17487"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:FRiZIo7tRQu07+CKUOZoyb3o1Lg=
 by: Russ Allbery - Sat, 1 Jul 2023 23:12 UTC

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
> On Jul 1, 2023 at 5:39:26 PM CDT, "Russ Allbery" <eagle@eyrie.org> wrote:

>> Make sure that you've disabled sync for all of your news logs. I think
>> that's already covered in INSTALL but a lot of people forget about it.

> I'm not finding anything about that in the current INSTALL I have on my
> machine?

Huh, yes, indeed. Edit /etc/syslog.conf to add - as the first character
for the paths to which news logs are written, at least for news.notice.
For example:

news.notice -/var/log/news/news.notice

Otherwise syslogd syncs on every write.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: INN 2.7.1 death spiral under high load?

<u7u9tc$2ls19$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1837&group=news.software.nntp#1837

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.176-143-2-105.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: INN 2.7.1 death spiral under high load?
Date: Mon, 3 Jul 2023 13:03:08 +0200
Organization: Groupes francophones par TrigoFACILE
Message-ID: <u7u9tc$2ls19$1@news.trigofacile.com>
References: <u7l7kl$1qqb$1@nnrp.usenet.blueworldhosting.com>
<u7mpcq$16ug$1@nnrp.usenet.blueworldhosting.com>
<u7mtup$2h67a$1@news.trigofacile.com> <u7ogvd$2ifbq$1@news.trigofacile.com>
<u7oljn$1chb$1@nnrp.usenet.blueworldhosting.com>
<u7ollu$1jmj$1@nnrp.usenet.blueworldhosting.com>
<u7phln$2is6p$1@news.trigofacile.com> <871qhrd13m.fsf@hope.eyrie.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 3 Jul 2023 11:03:08 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="176-143-2-105.abo.bbox.fr:176.143.2.105";
logging-data="2813993"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.12.0
Cancel-Lock: sha1:u2BmRIClUwoahOTExvOnamh3yKQ= sha256:Z/hNEs3eSqS5wFINReUK8df08RwGSTqtPzRpHCA4EZo=
sha1:Q6WVen6DL7xmYv4q3z1njElZXRg= sha256:zFy6LMopj6+iMPPY2PPQRB59f/DIA8aH4YO5PGKeEpc=
In-Reply-To: <871qhrd13m.fsf@hope.eyrie.org>
 by: Julien ÉLIE - Mon, 3 Jul 2023 11:03 UTC

Hi Russ,

>> Having googled a bit, there does not seem to be a more elegant way to
>> achieve that, unless switching to another method to read sockets
>> (poll(), libevent or).
>
> Switching to libevent is almost certainly the right long-term solution
> (I've been thinking about this for ages and previously converted remctld
> to it), since then it handles all the many confusing variations of
> select/poll/epoll/etc. for you and you don't have to worry about figuring
> out which system has which variation. It provides an equivalent of innd's
> current select loop that's much cleaner but works in a very similar way.
> But unfortunately it's a whole lot of quite disruptive work.
>
> It would probably produce a noticable performance increase for really busy
> servers, too. select is notoriously inefficient.

Yes, I totally agree for the long-term solution to switch to libevent.

For the time being, I've tried to integrate Diablo's hack into INN. It
seems to work (at least innd sees the wanted FD_SETSIZE size).

In INSTALL and in the documentation of rlimitnofile in inn.conf, I just
plan to add:

If you really need using more than 1024 file descriptors, you can try
to rebuild INN with for instance the "-DLARGE_FD_SETSIZE=4096" option
given to the compiler to increase the possible value to 4096.

Incidentally, is the "File Descriptor Limits" still accurate in INSTALL?

INN won't be able to increase the limits above the hard limits set by
your operating system; on some systems, that hard limit is normally
256 file descriptors (Linux, for example). On others, like Solaris,
it's 1024. Increasing the limit beyond that value may require serious
system configuration work. (On some operating systems, it requires
patching and recompiling the kernel. On Solaris it can be changed in
/etc/system, but for 2.6 or earlier the limit cannot be increased
beyond 1024 without breaking select(2) and thereby breaking all of
INN. For current versions of Linux, you may be able to change the
maximum by writing to /proc/sys/fs/file-max.)

Isn't the hard limit 1024 (or more?) for Linux, and 256 for Solaris?

I've found out that when systemd is in use, systemd/sd-daemon.h includes
asm/socket.h when __USE_MISC is set, which in turn includes
linux/posix_types.h which unconditionnaly redefines __FD_SETSIZE to 1024
(even if it has another value) for use with __kernel_fd_set.
So __FD_SETSIZE has to be forced before any inclusion of bits/types.h
and after any inclusion of systemd/sd-daemon.h on Linux
(systemd/sd-daemon.h also includes bits/types.h).

I am unsure of the consequences but systemd might not work with INN if a
systemd file descriptor exceeds 1024 (?) (though the fd_set size used
with select(2) has the wanted LARGE_FD_SETSIZE value, from sys/select.h)
In doubt, I could prevent LARGE_FD_SETSIZE from being used when systemd
is also used?

Jesse, I'll also change the log line:
xxx:yyy zzz free but was in SMASK
to:
xxx:yyy zzz free but was in SMASK; looks like fd_set is not large
enough (you may want to increase FD_SETSIZE to a higher value by
rebuilding INN with -DLARGE_FD_SETSIZE=4096)

This way, I hope it will be of help for other people coming across that
issue.

Also, the documentation of the rlimitnofile parameter should say that
innd may malfunction, and not only "suffer reduced functionality" :)

The maximum number of file descriptors that innd(8) or innfeed(8) can have
open at once. If innd(8) or innfeed(8) attempts to open more file
descriptors than this value, it is possible the program may throttle or
-otherwise suffer reduced functionality. The number of open file
+otherwise malfunction or suffer reduced functionality.

I hope all of these changes will be useful, and you're happy with your
new hardware setup Jesse!

--
Julien ÉLIE

« – Quel a été votre plus beau jour ?
– Une nuit. » (Brigitte Bardot)

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor