Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

It's not really a rule--it's more like a trend. -- Larry Wall in <199710221721.KAA24321@wall.org>


computers / news.software.nntp / Not very useful logging information

SubjectAuthor
* Not very useful logging informationNigel Reed
+- Re: Not very useful logging informationTim
+* Re: Not very useful logging informationJesse Rehmer
|`* Re: Not very useful logging informationyamo'
| `* Re: Not very useful logging informationJesse Rehmer
|  `* Re: Not very useful logging informationyamo'
|   `* Re: Not very useful logging informationJesse Rehmer
|    `* Re: Not very useful logging informationJesse Rehmer
|     `* Re: Not very useful logging informationJulien ÉLIE
|      +- Re: Not very useful logging informationRuss Allbery
|      `* Re: Not very useful logging informationJesse Rehmer
|       `* Re: Not very useful logging informationJulien ÉLIE
|        `* Re: Not very useful logging informationJesse Rehmer
|         `* Re: Not very useful logging informationJulien ÉLIE
|          `* Re: Not very useful logging informationJesse Rehmer
|           `- Re: Not very useful logging informationJulien ÉLIE
`* Re: Not very useful logging informationyamo '
 `* Re: Not very useful logging informationRuss Allbery
  `- Re: Not very useful logging informationyamo'

1
Not very useful logging information

<20230131155925.36dd4013@wibble.sysadmininc.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1355&group=news.software.nntp#1355

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.endofthelinebbs.com!.POSTED.47.186.32.124!not-for-mail
From: sysop@endofthelinebbs.com (Nigel Reed)
Newsgroups: news.software.nntp
Subject: Not very useful logging information
Date: Tue, 31 Jan 2023 15:59:25 -0600
Organization: End Of The Line BBS
Message-ID: <20230131155925.36dd4013@wibble.sysadmininc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=US-ASCII
Content-Transfer-Encoding: 7bit
Injection-Info: www.sysadmininc.com; posting-host="47.186.32.124";
logging-data="3160250"; mail-complaints-to="usenet@www.sysadmininc.com"
X-Newsreader: Claws Mail 4.1.1git14 (GTK 3.24.20; x86_64-pc-linux-gnu)
 by: Nigel Reed - Tue, 31 Jan 2023 21:59 UTC

Hi all,

Jan 31 15:56:12 www innd: rejecting[perl] <T5gCL.2931033$miq3.1112593@usenetxs.com> 439 Binary: misplaced binary
Jan 31 15:56:14 www innd: rejecting[perl] <U5gCL.2931041$miq3.2915980@usenetxs.com> 439 Binary: misplaced binary

I'm getting hundreds, if not thousands of these.

It doesn't really tell me which news server is sending out misplaced
binaries or which newsgroup is the culprit.

Open to suggestions on figuring this one out. I'd really like it to
stop. I don't accept binaries for a reason. I don't have unlimited
bandwidth so I'd like to nip this junk in the bud.

Thanks,

--
End Of The Line BBS - Plano, TX
telnet endofthelinebbs.com 23

Re: Not very useful logging information

<6855133b-29f6-4148-8b57-f055acb6511f@none>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1356&group=news.software.nntp#1356

  copy link   Newsgroups: news.software.nntp
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
From: no@none (Tim)
Date: Wed, 01 Feb 2023 00:50:52 +0000
Message-ID: <6855133b-29f6-4148-8b57-f055acb6511f@none>
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!sewer!.POSTED.localhost!not-for-mail
 by: Tim - Wed, 1 Feb 2023 00:50 UTC

> Jan 31 15:56:12 www innd: rejecting[perl] <T5gCL.2931033$miq3.1112593@usenetxs.com> 439 Binary: misplaced binary

> It doesn't really tell me which news server is sending out misplaced
> binaries

The news log (/var/log/news/news) should tell you which peer caused the message.

> or which newsgroup is the culprit.

You could try and connect to the peer, then issue a "head <msgid>" command.

Or edit the filter and make it output the Newsgroups, too. cleanfeed sets
@groups and $sortgrps.

Re: Not very useful logging information

<trclcl$pqg$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1357&group=news.software.nntp#1357

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Wed, 1 Feb 2023 03:13:25 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <trclcl$pqg$1@nnrp.usenet.blueworldhosting.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 1 Feb 2023 03:13:25 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="26448"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:EGmJIGlCyvl8LZd3GKERHD5GG2E= sha256:9PIVUY2fuvhcY5wbvyPPPL2++KteOu3tqKyLh15LuzI=
sha1:IzvkLL3DGu4b5TpXwIVC+NywS7o= sha256:0EyUWahXFha5sj235uXn00iLqZOLpukY0AJJChndu6Q=
X-Usenapp: v1.26/d - Full License
 by: Jesse Rehmer - Wed, 1 Feb 2023 03:13 UTC

On Jan 31, 2023 at 3:59:25 PM CST, "Nigel Reed" <sysop@endofthelinebbs.com>
wrote:

> Hi all,
>
>
> Jan 31 15:56:12 www innd: rejecting[perl]
> <T5gCL.2931033$miq3.1112593@usenetxs.com> 439 Binary: misplaced binary
> Jan 31 15:56:14 www innd: rejecting[perl]
> <U5gCL.2931041$miq3.2915980@usenetxs.com> 439 Binary: misplaced binary
>
>
> I'm getting hundreds, if not thousands of these.
>
> It doesn't really tell me which news server is sending out misplaced
> binaries or which newsgroup is the culprit.
>
> Open to suggestions on figuring this one out. I'd really like it to
> stop. I don't accept binaries for a reason. I don't have unlimited
> bandwidth so I'd like to nip this junk in the bud.
>
> Thanks,

It is probably coming from HighWinds. They recently started leaking "alt.b" to
me as well, which looks like test traffic between two commercial providers.
I'm not propgating the articles, and have asked them to filter them out, but
no response yet.

Cheers,
Jesse

Re: Not very useful logging information

<trdbda$g6$1@rasp.pasdenom.info>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1361&group=news.software.nntp#1361

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.nntp4.net!pasdenom.info!.POSTED.newsportal.pasdenom.info!newsportal
From: News@pasdenom.info (yamo ')
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Wed, 1 Feb 2023 09:29:14 -0000 (UTC)
Organization: <https://pasdenom.info/news.html>
Message-ID: <trdbda$g6$1@rasp.pasdenom.info>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 1 Feb 2023 09:29:14 -0000 (UTC)
Injection-Info: newsportal.pasdenom.info; posting-account="stephane@usenet";
posting-host="2a01:e0a:21:ea80:10b9:20f6:2d91:e3bb" logging-data="http";
mail-complaints-to="abuse@pasdenom.info"
User-Agent: NewsPortal/0.52.a8
( https://gitlab.com/yamo-nntp/newsportal )
Cancel-Lock: sha256:FlOtvlxftX3iq0RxOcZm/GI3c0kyRL16p+z7XoGITFk=
Http-User-Agent: Mozilla/5.0 (Linux; Android 11) AppleWebKit/537.36 (KHTML,
like Gecko) Version/4.0 Chrome/109.0.5414.118 Mobile DuckDuckGo/5
Safari/537.36
 by: yamo ' - Wed, 1 Feb 2023 09:29 UTC

Hi,
Nigel Reed a écrit :

> Jan 31 15:56:12 www innd: rejecting[perl] <T5gCL.2931033$miq3.1112593@usenetxs.com> 439 Binary: misplaced binary
> Jan 31 15:56:14 www innd: rejecting[perl] <U5gCL.2931041$miq3.2915980@usenetxs.com> 439 Binary: misplaced binary

> I'm getting hundreds, if not thousands of these.

> It doesn't really tell me which news server is sending out misplaced
> binaries or which newsgroup is the culprit.

You can see which feed is sending that posts in /var/log/news/news but the
first server is not in the log files.

--
Stéphane
Sorry for my bad English...

Re: Not very useful logging information

<877cx1nzmd.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1363&group=news.software.nntp#1363

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Wed, 01 Feb 2023 08:01:46 -0800
Organization: The Eyrie
Message-ID: <877cx1nzmd.fsf@hope.eyrie.org>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
<trdbda$g6$1@rasp.pasdenom.info>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="20486"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:EVuxVanr4log1tHTx3WetsLHQGU=
 by: Russ Allbery - Wed, 1 Feb 2023 16:01 UTC

yamo ' <News@pasdenom.info > writes:

> You can see which feed is sending that posts in /var/log/news/news but
> the first server is not in the log files.

The first server is to some extent fundamentally unknowable since the Path
header can be manipulated, but you could make the Perl filter log the Path
header as well if you wanted to try to track things down to that extent.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: Not very useful logging information

<trgmm4$rh9$1@rasp.pasdenom.info>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1373&group=news.software.nntp#1373

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!rocksolid2!i2pn.org!paganini.bofh.team!pasdenom.info!.POSTED.2a01:e0a:21:ea80:add6:7a44:79cc:951f!not-for-mail
From: yamo@beurdin.invalid (yamo')
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Thu, 2 Feb 2023 17:00:02 +0100
Organization: <https://pasdenom.info/news.html>
Message-ID: <trgmm4$rh9$1@rasp.pasdenom.info>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
<trdbda$g6$1@rasp.pasdenom.info> <877cx1nzmd.fsf@hope.eyrie.org>
Reply-To: yamo@groumpf.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 2 Feb 2023 16:00:04 -0000 (UTC)
Injection-Info: rasp.pasdenom.info; posting-account="stephane@usenet"; posting-host="2a01:e0a:21:ea80:add6:7a44:79cc:951f";
logging-data="28201"; mail-complaints-to="abuse@pasdenom.info"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.15
Cancel-Lock: sha256:AyjbqcC3EnNCeTx3PVk/071Qfh4VOaUB97C5XYITN30=
In-Reply-To: <877cx1nzmd.fsf@hope.eyrie.org>
X-Seamonkey: <https://www.seamonkey-project.org/>
 by: yamo' - Thu, 2 Feb 2023 16:00 UTC

Hi,
Russ Allbery a tapoté le 01/02/2023 17:01:
> The first server is to some extent fundamentally unknowable since the Path
> header can be manipulated, but you could make the Perl filter log the Path
> header as well if you wanted to try to track things down to that extent.

Yes but it may not be useful.

--
Stéphane

Re: Not very useful logging information

<triiru$24l$1@rasp.pasdenom.info>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1375&group=news.software.nntp#1375

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!pasdenom.info!.POSTED.2a01:e0a:21:ea80:6202:fd18:55d0:8e01!not-for-mail
From: yamo@beurdin.invalid (yamo')
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Fri, 3 Feb 2023 10:07:10 +0100
Organization: <https://pasdenom.info/news.html>
Message-ID: <triiru$24l$1@rasp.pasdenom.info>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
<trclcl$pqg$1@nnrp.usenet.blueworldhosting.com>
Reply-To: yamo@groumpf.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 3 Feb 2023 09:07:10 -0000 (UTC)
Injection-Info: rasp.pasdenom.info; posting-account="stephane@usenet"; posting-host="2a01:e0a:21:ea80:6202:fd18:55d0:8e01";
logging-data="2197"; mail-complaints-to="abuse@pasdenom.info"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.15
Cancel-Lock: sha256:fyhfX9zppnNeHw7MBr08roZflMybnMFwIexGKiqlBhw=
In-Reply-To: <trclcl$pqg$1@nnrp.usenet.blueworldhosting.com>
X-Seamonkey: <https://www.seamonkey-project.org/>
 by: yamo' - Fri, 3 Feb 2023 09:07 UTC

Hi,
Jesse Rehmer a tapoté le 01/02/2023 04:13:
> I'm not propgating the articles, and have asked them to filter them out, but
> no response yet.

There may be something wrong in your configuration.

You send me some of this :
<https://pasdenom.info/usenet/news-notice.2023.02.02-06.15.04.html#inn_unwanted>

I think there should be an entry "Binary" in this table.

Example (today) : <48a6d37ee00645eaa197950f15bd9c66@ngPost>

--
Stéphane

Re: Not very useful logging information

<trinb5$2os4$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1376&group=news.software.nntp#1376

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Fri, 3 Feb 2023 10:23:33 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <trinb5$2os4$1@nnrp.usenet.blueworldhosting.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com> <trclcl$pqg$1@nnrp.usenet.blueworldhosting.com> <triiru$24l$1@rasp.pasdenom.info>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 3 Feb 2023 10:23:33 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="91012"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:Jwqu8ES6d25Xs+qjMpdG1wWvvok= sha256:eRHHJKGUCVXuBCSfXoLlUIHpYraumAmhXq+2vtYix8Q=
sha1:ttOFbM0Jx89k+AGhTDW5q9uc1qg= sha256:6HnWhGPSqzzjgdiOG/LNirH66jKpQYIm8FScM0/tCyw=
X-Usenapp: v1.26/d - Full License
 by: Jesse Rehmer - Fri, 3 Feb 2023 10:23 UTC

On Feb 3, 2023 at 3:07:10 AM CST, "yamo'" <yamo@beurdin.invalid> wrote:

> Hi,
> Jesse Rehmer a tapoté le 01/02/2023 04:13:
>> I'm not propgating the articles, and have asked them to filter them out, but
>> no response yet.
>
> There may be something wrong in your configuration.
>
> You send me some of this :
> <https://pasdenom.info/usenet/news-notice.2023.02.02-06.15.04.html#inn_unwanted>
>
> I think there should be an entry "Binary" in this table.
>
> Example (today) : <48a6d37ee00645eaa197950f15bd9c66@ngPost>

The original post was referencing "alt.b", which I was getting a ton of for
months and filtering. Your example is from a.b.erotica, and should have been
rejected by pyClean. I'm investigating, but have added more specific group
filters to your feed in the meantime.

Please reach out to me with more examples if it is still an issue.

Re: Not very useful logging information

<triodh$f43$1@rasp.pasdenom.info>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1377&group=news.software.nntp#1377

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!pasdenom.info!.POSTED.2a01:e0a:21:ea80:6202:fd18:55d0:8e01!not-for-mail
From: yamo@beurdin.invalid (yamo')
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Fri, 3 Feb 2023 11:41:53 +0100
Organization: <https://pasdenom.info/news.html>
Message-ID: <triodh$f43$1@rasp.pasdenom.info>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
<trclcl$pqg$1@nnrp.usenet.blueworldhosting.com>
<triiru$24l$1@rasp.pasdenom.info>
<trinb5$2os4$1@nnrp.usenet.blueworldhosting.com>
Reply-To: yamo@groumpf.org
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 3 Feb 2023 10:41:53 -0000 (UTC)
Injection-Info: rasp.pasdenom.info; posting-account="stephane@usenet"; posting-host="2a01:e0a:21:ea80:6202:fd18:55d0:8e01";
logging-data="15491"; mail-complaints-to="abuse@pasdenom.info"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:91.0) Gecko/20100101
Firefox/91.0 SeaMonkey/2.53.15
Cancel-Lock: sha256:XPPigV47LEQE8wjj2tR5XefxHiMd0bVwzvvdVGNk3Y0=
X-Seamonkey: <https://www.seamonkey-project.org/>
In-Reply-To: <trinb5$2os4$1@nnrp.usenet.blueworldhosting.com>
 by: yamo' - Fri, 3 Feb 2023 10:41 UTC

Hi,
Jesse Rehmer a tapoté le 03/02/2023 11:23:

> The original post was referencing "alt.b", which I was getting a ton of for
> months and filtering. Your example is from a.b.erotica, and should have been
> rejected by pyClean. I'm investigating, but have added more specific group
> filters to your feed in the meantime.
>
> Please reach out to me with more examples if it is still an issue.
>

You may have found the bug, the last one filtered by cleanfeed is :

Feb 3 11:05:55.650 - usenet.blueworldhosting.com
<HX4DL.844661$US27.24444@usenetxs.com> 439 Binary: misplaced binary

Thanks!

pyClean didn't work on my server...

--
Stéphane

Re: Not very useful logging information

<trionq$om1$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1378&group=news.software.nntp#1378

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Fri, 3 Feb 2023 10:47:22 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <trionq$om1$1@nnrp.usenet.blueworldhosting.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com> <trclcl$pqg$1@nnrp.usenet.blueworldhosting.com> <triiru$24l$1@rasp.pasdenom.info> <trinb5$2os4$1@nnrp.usenet.blueworldhosting.com> <triodh$f43$1@rasp.pasdenom.info>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 3 Feb 2023 10:47:22 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="25281"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:XY427TY1ggvTDNP31FhQUkv4frE= sha256:6ThLJlujrdIypv7M+RXZUiArcER93b46g4qmm55Qckk=
sha1:FIjB477bFP3SZls5NlsoBWRyPCI= sha256:PL2UH6Lv8S0HvQxtmhvXRsyKpmR9GDMQ9vXlYyPLCQc=
X-Usenapp: v1.26/d - Full License
 by: Jesse Rehmer - Fri, 3 Feb 2023 10:47 UTC

On Feb 3, 2023 at 4:41:53 AM CST, "yamo'" <yamo@beurdin.invalid> wrote:

> Hi,
> Jesse Rehmer a tapoté le 03/02/2023 11:23:
>
>> The original post was referencing "alt.b", which I was getting a ton of for
>> months and filtering. Your example is from a.b.erotica, and should have been
>> rejected by pyClean. I'm investigating, but have added more specific group
>> filters to your feed in the meantime.
>>
>> Please reach out to me with more examples if it is still an issue.
>>
>
> You may have found the bug, the last one filtered by cleanfeed is :
>
> Feb 3 11:05:55.650 - usenet.blueworldhosting.com
> <HX4DL.844661$US27.24444@usenetxs.com> 439 Binary: misplaced binary
>
> Thanks!
>
> pyClean didn't work on my server...

I think I figured it out, or at least I am seeing it reject misplaced binaries
now. There were no error messages to be found, but on a hunch I removed
pyclean/lib/* and restarted INN.

2023-02-03 04:45:09 INFO reject:
mid=<5b323cbb146d4e14a9ec0afabb7d3660@ngPost>, reason=Binary (yEnc)
2023-02-03 04:45:11 INFO reject: mid=<63800.TQ.20230203.114510@teamquest.pl>,
reason=EMP Body Reject

Re: Not very useful logging information

<triplu$1p5r$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1379&group=news.software.nntp#1379

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Fri, 3 Feb 2023 11:03:26 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <triplu$1p5r$1@nnrp.usenet.blueworldhosting.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com> <trinb5$2os4$1@nnrp.usenet.blueworldhosting.com> <triodh$f43$1@rasp.pasdenom.info> <trionq$om1$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 3 Feb 2023 11:03:26 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="58555"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:rcfOoqcdH54o3jvg4hGY3K61r7c= sha256:EaZJR33mA+WeFpxEiAQ8/N9tRsHqrsiCYXx6bzI59Lg=
sha1:VGI0dvuLDijZrTQB7oQbb+idDJY= sha256:CSXvEilg/4ta+EhLWpcY5ymcwcFPWFdMTA/07IRQRgY=
X-Usenapp: v1.26/d - Full License
 by: Jesse Rehmer - Fri, 3 Feb 2023 11:03 UTC

On Feb 3, 2023 at 4:47:22 AM CST, "Jesse Rehmer"
<jesse.rehmer@blueworldhosting.com> wrote:

> On Feb 3, 2023 at 4:41:53 AM CST, "yamo'" <yamo@beurdin.invalid> wrote:
>
>> Hi,
>> Jesse Rehmer a tapoté le 03/02/2023 11:23:
>>
>>> The original post was referencing "alt.b", which I was getting a ton of for
>>> months and filtering. Your example is from a.b.erotica, and should have been
>>> rejected by pyClean. I'm investigating, but have added more specific group
>>> filters to your feed in the meantime.
>>>
>>> Please reach out to me with more examples if it is still an issue.
>>>
>>
>> You may have found the bug, the last one filtered by cleanfeed is :
>>
>> Feb 3 11:05:55.650 - usenet.blueworldhosting.com
>> <HX4DL.844661$US27.24444@usenetxs.com> 439 Binary: misplaced binary
>>
>> Thanks!
>>
>> pyClean didn't work on my server...
>
> I think I figured it out, or at least I am seeing it reject misplaced binaries
> now. There were no error messages to be found, but on a hunch I removed
> pyclean/lib/* and restarted INN.
>
> 2023-02-03 04:45:09 INFO reject:
> mid=<5b323cbb146d4e14a9ec0afabb7d3660@ngPost>, reason=Binary (yEnc)
> 2023-02-03 04:45:11 INFO reject: mid=<63800.TQ.20230203.114510@teamquest.pl>,
> reason=EMP Body Reject

Cleanfeed isn't much better in this regard, but been running the latest
pyClean from https://github.com/crooks/PyClean, and the CPU consumption is
execessive for a small stream of binaries:

PID USERNAME THR PRI NICE SIZE RES STATE C TIME WCPU COMMAND
33293 news 1 98 0 212M 152M CPU6 6 8:46 91.04%
/usr/local/news/bin/innd -u

When receiving ~10Mbps I'm highly CPU bound on a big server. Another driver
for me to keep going down the Diablo path, even though there is plenty of pain
involved. Diablo's "feeder" design is more scalable, even with Cleanfeed in
the mix, but will admit INN works best for a backend spool for me. I like
having working control message processing, Cancel-Lock support, native TLS,
etc.

Re: Not very useful logging information

<trjm6p$if11$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1381&group=news.software.nntp#1381

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.176.143-2-105.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Fri, 3 Feb 2023 20:10:17 +0100
Organization: Groupes francophones par TrigoFACILE
Message-ID: <trjm6p$if11$1@news.trigofacile.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
<trinb5$2os4$1@nnrp.usenet.blueworldhosting.com>
<triodh$f43$1@rasp.pasdenom.info>
<trionq$om1$1@nnrp.usenet.blueworldhosting.com>
<triplu$1p5r$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 3 Feb 2023 19:10:17 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="176.143-2-105.abo.bbox.fr:176.143.2.105";
logging-data="605217"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.6.1
Cancel-Lock: sha1:ropVLpuwrwumhr5E81VKqdfYUoM= sha256:hlPLI80Ny/QPV0Xqdwz2m0Y0Tu6zaGqcWcJ8etZp7Eg=
sha1:UtbIrM0HpnjjN61FDqbFcpwlGz8= sha256:XjX9Zf5AnX6TW8O7YnhdnNGumxys34XAnMsWLT5TZJ0=
In-Reply-To: <triplu$1p5r$1@nnrp.usenet.blueworldhosting.com>
 by: Julien ÉLIE - Fri, 3 Feb 2023 19:10 UTC

Hi Jesse,

> Diablo's "feeder" design is more scalable, even with Cleanfeed in
> the mix, but will admit INN works best for a backend spool for me. I like
> having working control message processing, Cancel-Lock support, native TLS,
> etc.

FWIW, Miquel van Smoorenburg added support in INN for Diablo's hashfeed.
It's the Q flag value in newsfeeds. It permits scaling feeders like
Diablo does with backend-servers.

You may then have the following architecture:

- a front innd transit server without overview/reader/filtering, just
feeding to N internal innd feeders (using Q to split the feed in N parts
in newsfeeds; or binaries to some feeders, text to others);
- N internal innd feeders doing filtering, without overview and reader;
- a final innd/nnrpd serving server, without filtering, but numbering
and storing articles, generating overview data and handling readers.

I don't know whether it could be of help for your usage, notably for the
reduced CPU amount needed for filtering across several intermediate feeders.

--
Julien ÉLIE

« Le bonheur, c'est de continuer à désirer ce que l'on possède. »
(Saint-Augustin)

Re: Not very useful logging information

<871qn6eeqd.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1382&group=news.software.nntp#1382

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.nntp4.net!nntp.terraraq.uk!nntp-feed.chiark.greenend.org.uk!ewrotcd!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Fri, 03 Feb 2023 11:22:18 -0800
Organization: The Eyrie
Message-ID: <871qn6eeqd.fsf@hope.eyrie.org>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
<trinb5$2os4$1@nnrp.usenet.blueworldhosting.com>
<triodh$f43$1@rasp.pasdenom.info>
<trionq$om1$1@nnrp.usenet.blueworldhosting.com>
<triplu$1p5r$1@nnrp.usenet.blueworldhosting.com>
<trjm6p$if11$1@news.trigofacile.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 8bit
Injection-Info: hope.eyrie.org;
logging-data="5826"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/28.2 (gnu/linux)
Cancel-Lock: sha1:fV0d+4v75mHSWIHT4S8X8+0raSg=
 by: Russ Allbery - Fri, 3 Feb 2023 19:22 UTC

Julien ÉLIE <iulius@nom-de-mon-site.com.invalid> writes:

> I don't know whether it could be of help for your usage, notably for the
> reduced CPU amount needed for filtering across several intermediate
> feeders.

I suspect the available filtering software is fairly inefficient on CPU
cycles, although that's a much harder problem to solve. Last time I
looked at it, it was tons of regex matches written in languages that
aren't the fastest. I know Jeremy Nixon did a lot of benchmarking and
optimization of the regexes back in the day, but I'm not sure if those
optimizations still work with the latest Perl or have been carried
forward, and I suspect Python will be even slower. (The Perl regex engine
is terrifying, but it's absurdly optimized.)

For something like binary detection, there are probably optimization gains
to be had, but it would be a lot of development work. (Spam detection is
a lot harder and inherently seems to involve a lot of complex pattern
matching.)

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: Not very useful logging information

<trjqja$sdm$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1386&group=news.software.nntp#1386

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Fri, 3 Feb 2023 20:25:14 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <trjqja$sdm$1@nnrp.usenet.blueworldhosting.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com> <trionq$om1$1@nnrp.usenet.blueworldhosting.com> <triplu$1p5r$1@nnrp.usenet.blueworldhosting.com> <trjm6p$if11$1@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 3 Feb 2023 20:25:14 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="29110"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:1fIxo9T5pvT8n8lLRePyQdePPi0= sha256:f9dg+RJJ7McgWUeNFHg255qVW5C19fPvIHsXWNcVF3Q=
sha1:kcxOcyErmSVx41K86pkROv12Epo= sha256:eCxCbL/hR0dL9E9COTZ8sQJ178nPSldpy4uuiznBcLE=
X-Usenapp: v1.26/d - Full License
 by: Jesse Rehmer - Fri, 3 Feb 2023 20:25 UTC

On Feb 3, 2023 at 1:10:17 PM CST, "Julien ÉLIE"
<iulius@nom-de-mon-site.com.invalid> wrote:

> Hi Jesse,
>
>> Diablo's "feeder" design is more scalable, even with Cleanfeed in
>> the mix, but will admit INN works best for a backend spool for me. I like
>> having working control message processing, Cancel-Lock support, native TLS,
>> etc.
>
> FWIW, Miquel van Smoorenburg added support in INN for Diablo's hashfeed.
> It's the Q flag value in newsfeeds. It permits scaling feeders like
> Diablo does with backend-servers.
>
> You may then have the following architecture:
>
> - a front innd transit server without overview/reader/filtering, just
> feeding to N internal innd feeders (using Q to split the feed in N parts
> in newsfeeds; or binaries to some feeders, text to others);
> - N internal innd feeders doing filtering, without overview and reader;
> - a final innd/nnrpd serving server, without filtering, but numbering
> and storing articles, generating overview data and handling readers.
>
> I don't know whether it could be of help for your usage, notably for the
> reduced CPU amount needed for filtering across several intermediate feeders.

This is similar to my current setup, but I don't have the middle "internal"
feeder layer doing filtering, it's being done by the transit server to protect
text-only peers.

As Russ stated, it's the filtering that's eating up CPU cycles. I can handle
> 50Mbps of incoming and >80Mbps outgoing traffic using ~20% of the CPU without
filtering, but when I turn on Cleanfeed or PyClean innd maxes out its core and
upstream peers start to spool.

I will admit, I'm likely an outlier in that I absolutely want to be a good
Usenet citizen to all of Usenet. while also participating (partially) in all
of it.

Re: Not very useful logging information

<trjrs6$j087$2@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1388&group=news.software.nntp#1388

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.176.143-2-105.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Fri, 3 Feb 2023 21:47:01 +0100
Organization: Groupes francophones par TrigoFACILE
Message-ID: <trjrs6$j087$2@news.trigofacile.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
<trionq$om1$1@nnrp.usenet.blueworldhosting.com>
<triplu$1p5r$1@nnrp.usenet.blueworldhosting.com>
<trjm6p$if11$1@news.trigofacile.com>
<trjqja$sdm$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 3 Feb 2023 20:47:02 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="176.143-2-105.abo.bbox.fr:176.143.2.105";
logging-data="622855"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.6.1
Cancel-Lock: sha1:xmkhnUnAWQN/wliguclq3dnjheQ= sha256:02zan9aBipNpcRqDIGPfgn9L7aasWg9SJRpaIe8d16I=
sha1:ztBoL6LWU5Kv5r5TjvqIpP3/B/E= sha256:KMNe3f3T/9S5IAQTKahuukiKgKlb6F3LmEvUSkySqko=
In-Reply-To: <trjqja$sdm$1@nnrp.usenet.blueworldhosting.com>
 by: Julien ÉLIE - Fri, 3 Feb 2023 20:47 UTC

Hi Jesse,
> As Russ stated, it's the filtering that's eating up CPU cycles. I can handle
> >50Mbps of incoming and >80Mbps outgoing traffic using ~20% of the CPU without
> filtering, but when I turn on Cleanfeed or PyClean innd maxes out its core and
> upstream peers start to spool.

Do you think reordering the checks Cleanfeed does would be of help?
For instance having the is_binary() check as soon as possible instead of
always having body parsing like:

$body = lc substr($hdr{__BODY__}, 0, 4000);

$state{badlines}++ while $hdr{__BODY__} =~ /[^\r]\n/g;

$hdr{__BODY__} =~ /
^[Bb][Ee][Gg][Ii][Nn]$hws+[0-7]{3,4}$hws+ # begin 666
(...long regexp for UUencoded...)
/mx;

--
Julien ÉLIE

« Les grands artistes n'ont pas de patrie. » (Alfred de Musset)

Re: Not very useful logging information

<trkpl9$1vg0$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1391&group=news.software.nntp#1391

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Sat, 4 Feb 2023 05:15:21 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <trkpl9$1vg0$1@nnrp.usenet.blueworldhosting.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com> <trjm6p$if11$1@news.trigofacile.com> <trjqja$sdm$1@nnrp.usenet.blueworldhosting.com> <trjrs6$j087$2@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 4 Feb 2023 05:15:21 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="65024"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:gOoY2tC7zRiwVwR21P5QFd2/SUE= sha256:H2F7+Je7rfPomMbY49ITz7a/6N1kEZU5r/PBdqqTNvk=
sha1:tiNZPan0cZ6fgt1rbNDKLkx41j8= sha256:62gZfpCLlR/Uaa9Y20XXEWWrm8aEEpS328ryA+coTzE=
X-Usenapp: v1.26.1/d - Full License
 by: Jesse Rehmer - Sat, 4 Feb 2023 05:15 UTC

On Feb 3, 2023 at 2:47:01 PM CST, "Julien ÉLIE"
<iulius@nom-de-mon-site.com.invalid> wrote:

> Hi Jesse,
>> As Russ stated, it's the filtering that's eating up CPU cycles. I can handle
>>> 50Mbps of incoming and >80Mbps outgoing traffic using ~20% of the CPU without
>> filtering, but when I turn on Cleanfeed or PyClean innd maxes out its core and
>> upstream peers start to spool.
>
> Do you think reordering the checks Cleanfeed does would be of help?
> For instance having the is_binary() check as soon as possible instead of
> always having body parsing like:
>
> $body = lc substr($hdr{__BODY__}, 0, 4000);
>
> $state{badlines}++ while $hdr{__BODY__} =~ /[^\r]\n/g;
>
> $hdr{__BODY__} =~ /
> ^[Bb][Ee][Gg][Ii][Nn]$hws+[0-7]{3,4}$hws+ # begin 666
> (...long regexp for UUencoded...)
> /mx;

I'm sure it would help. If the binary detection were performed first, and an
article is iidentified as a binary then passed through without further checks
would eliminate a lot of unncessary cycles.

I've been able to strip out most of the checks from Cleanfeed to one that is
only checking for misplaced binaries and CPU impact is negligable compared to
before. Beyond ripping everything out, I'm not a developer but sometimes
manage to modify things to suit my needs. Likely not efficiently. :)

The one thing I noticed early on with PyClean was the binary articles pass
through all the other checks, or at least the EMP check and every so often one
would be rejected from that part of the filter. That always seemed inefficient
and I experiemented with the variable it uses to exclude groups but that
didn't seem to work to exclude them from the EMP filter.

If anyone has modified versions/patches they'd be interested in having me test
for performance I'm capable of tossing a steady stream of mixed article flow
at it.

Re: Not very useful logging information

<trlgqa$kbge$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1393&group=news.software.nntp#1393

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.176-143-2-105.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Sat, 4 Feb 2023 12:50:34 +0100
Organization: Groupes francophones par TrigoFACILE
Message-ID: <trlgqa$kbge$1@news.trigofacile.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
<trjm6p$if11$1@news.trigofacile.com>
<trjqja$sdm$1@nnrp.usenet.blueworldhosting.com>
<trjrs6$j087$2@news.trigofacile.com>
<trkpl9$1vg0$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 4 Feb 2023 11:50:34 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="176-143-2-105.abo.bbox.fr:176.143.2.105";
logging-data="667150"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.6.1
Cancel-Lock: sha1:zc855k4GKvnBKzi5EDdaNuNjJyM= sha256:p+fWHEPE4Gl/fYgKx55xWVpst8m7saa2DFRYYflzShI=
sha1:QQci9bVjymSxGrdnn/vxFjR++Z4= sha256:oPBoavel+c5mZLwNrA1Se5uTHvMoZ0qyjTtP5g/wpA8=
In-Reply-To: <trkpl9$1vg0$1@nnrp.usenet.blueworldhosting.com>
 by: Julien ÉLIE - Sat, 4 Feb 2023 11:50 UTC

Hi Jesse,

> I'm sure it would help. If the binary detection were performed first, and an
> article is identified as a binary then passed through without further checks
> would eliminate a lot of unncessary cycles.

Does it mean that if you turn off binary detection in Cleanfeed or
PyClean, there's no longer any huge CPU load?

Something like "block_binaries => 0" for Cleanfeed and "bin_allowed =
['\.']" (any newsgroup containing a dot will match) for PyClean.

It would be interesting to know, if CPU load is still high, which checks
cause that. For instance by disabling in Cleanfeed things like:

do_md5 => 0
do_scoring_filter => 0
fuzzy_md5 => 0
block_mime_html => 0 (and other block_html_* stuff)

and telling PyClean (in pyclean.py) that all groups are test groups, so
as to disable many checks:

test = ['\.']

Does the load become acceptable?

If that's not the case, there's a more fundamental problem that cannot
be fixed by just optimizing Cleanfeed and PyClean regexps.

> I've been able to strip out most of the checks from Cleanfeed to one that is
> only checking for misplaced binaries and CPU impact is negligable compared to
> before. Beyond ripping everything out, I'm not a developer but sometimes
> manage to modify things to suit my needs. Likely not efficiently. :)

Did you also try to directly disable the checks with do_xxx => 0?

> The one thing I noticed early on with PyClean was the binary articles pass
> through all the other checks, or at least the EMP check and every so often one
> would be rejected from that part of the filter. That always seemed inefficient
> and I experiemented with the variable it uses to exclude groups but that
> didn't seem to work to exclude them from the EMP filter.

Even with:

emp_exclude = ['\.']

in pyclean.py?

> If anyone has modified versions/patches they'd be interested in having me test
> for performance I'm capable of tossing a steady stream of mixed article flow
> at it.

Looking at Diablo's code to detect binaries... well, it's pretty
straight-forward. There aren't many checks done:
https://github.com/jpmens/diablo/blob/master/lib/arttype.c

/*
*
* Try and categorise an article into a range of types
*
* We keep state info as the article is passed to us line by line
*
* Once we have found an article type, we keep to that type unless
* we find a different type. We never reset.
*
* Once we have found a binary, we stop scanning the article - save CPU
*
*/

"save CPU" :-)

It could maybe be useful to do something similar in INN, and make the
function available to embedded Perl and Python filters (like
INN::havehist() and like). They may run faster.
If of course Perl and Python filters without binary checking on already
run fast enough.

--
Julien ÉLIE

« Passion is inversely proportional to the amount of real information
available. » (Benford's law)

Re: Not very useful logging information

<trm35v$1rja$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1396&group=news.software.nntp#1396

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Sat, 4 Feb 2023 17:03:59 -0000 (UTC)
Organization: BlueWorld Hosting Usenet (https://usenet.blueworldhosting.com)
Message-ID: <trm35v$1rja$1@nnrp.usenet.blueworldhosting.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com> <trjrs6$j087$2@news.trigofacile.com> <trkpl9$1vg0$1@nnrp.usenet.blueworldhosting.com> <trlgqa$kbge$1@news.trigofacile.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 4 Feb 2023 17:03:59 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="61034"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:yCqHiRnXey4B4EjDh/wP3RWfTKs= sha256:37v2KPR0GJVPLmnMoA4LfYSJJYZWB1gVzg1t6LNlV/M=
sha1:fi3Uw0BO5X5A2eYirkPZ/1q16ho= sha256:OhuCN53sbSFOPthV6Mttkkqq6xoPsonsqa4JghymmVM=
X-Usenapp: v1.26.1/d - Full License
 by: Jesse Rehmer - Sat, 4 Feb 2023 17:03 UTC

On Feb 4, 2023 at 5:50:34 AM CST, "Julien ÉLIE"
<iulius@nom-de-mon-site.com.invalid> wrote:

> Hi Jesse,
>
>> I'm sure it would help. If the binary detection were performed first, and an
>> article is identified as a binary then passed through without further checks
>> would eliminate a lot of unncessary cycles.
>
> Does it mean that if you turn off binary detection in Cleanfeed or
> PyClean, there's no longer any huge CPU load?

These are good questions, and I will take some time to do testing of various
scenarios and report back with findings.

When I brought my systems back online last year I started with pyClean, no
modifications or configuration. I began backfilling my spool from commercial
entities using 10-20 instances of pullnews running at the same time. This
resulted in a very mixed article stream (lots of misplaced binaries on
commercial spools!), and is when I first noticed the filtering bottleneck.

In my observation, the more binaries going through, the worse things got.
This is when I noticed that binaries were also going through the EMP filter as
occasionally one would be rejected by the EMP filter. That got me thinking I
just needed to add the binary patterns to emp_exclude, so I used this:

emp_exclude = ['^alt\.anonymous\.messages',
'^free\.', '^local\.', '\.answers',
'^news\.answers', '^comp\.answers',
'^relcom\.', '^mailing\.', '^fa\.', '\.cvs\.',
'^gnu\.', 'lists\.freebsd\.ports\.bugs',
'^bin[a.]', '\.bin[aei.]', '\.bin$']

But that didn't help with CPU load, and binaries were still passing through
the EMP filter and sometimes rejected there.

pyClean was harder for me to understand how to turn off some filters (can you
even fully disable the EMP filter in pyClean?), so I switched to Cleanfeed
because the configuration made it obvious how to enable/disable all of the
filters.

Your suggestion of:

emp_exclude = ['\.']

Is a good one that I didn't think to try, but will!

I didn't understand what the test groups definition in pyClean was for, so I
didn't experiment with that either.

Re: Not very useful logging information

<trnu9j$ltit$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=1400&group=news.software.nntp#1400

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.176-143-2-105.abo.bbox.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: Not very useful logging information
Date: Sun, 5 Feb 2023 10:52:51 +0100
Organization: Groupes francophones par TrigoFACILE
Message-ID: <trnu9j$ltit$1@news.trigofacile.com>
References: <20230131155925.36dd4013@wibble.sysadmininc.com>
<trjrs6$j087$2@news.trigofacile.com>
<trkpl9$1vg0$1@nnrp.usenet.blueworldhosting.com>
<trlgqa$kbge$1@news.trigofacile.com>
<trm35v$1rja$1@nnrp.usenet.blueworldhosting.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 5 Feb 2023 09:52:51 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="176-143-2-105.abo.bbox.fr:176.143.2.105";
logging-data="718429"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 10.15; rv:102.0)
Gecko/20100101 Thunderbird/102.6.1
Cancel-Lock: sha1:syARgjzpBRNPJ/BmO4DA3+0Y99k= sha256:yAayikph4Oo/yBzpd4nPo0MhZbvBFqk2FZqDIhLzS3Y=
sha1:AqiovFvNv3AhRlDKtwp/KiyZod0= sha256:0mP2vXfqLO8bXkPqb+ri2Gau2962A3i2il2CwLOMv70=
In-Reply-To: <trm35v$1rja$1@nnrp.usenet.blueworldhosting.com>
 by: Julien ÉLIE - Sun, 5 Feb 2023 09:52 UTC

Hi Jesse,

> This resulted in a very mixed article stream (lots of misplaced
> binaries on commercial spools!), and is when I first noticed the
> filtering bottleneck.

Seems like they're using light heuristics to detect binaries, which
generate false negative (and probably also false positive). But these
heuristics are less CPU intensive!

> can you even fully disable the EMP filter in pyClean? >
> Your suggestion of:
>
> emp_exclude = ['\.']
>
> Is a good one that I didn't think to try, but will!

:)

> I didn't understand what the test groups definition in pyClean was for, so I
> didn't experiment with that either.

In the current version, test groups are treated like groups to exclude
from the EMP filter. The check for test_bool is only at one place:

# Start of EMP checks.
if (not self.groups['emp_exclude_bool'] and
not self.groups['test_bool']):

The intent is probably to also exclude them from other checks in future
versions.

--
Julien ÉLIE

« Le rire est une chose sérieuse avec laquelle il ne faut pas
plaisanter. » (Raymond Devos)

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor