Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Life is difficult because it is non-linear.


computers / news.software.nntp / INN expiring from overview but not history or disk

SubjectAuthor
* INN expiring from overview but not history or diskJohn Goerzen
`* tradspool and crosspost expiry (was Re: INN expiring from overview but not histoJohn Goerzen
 `* Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not hRuss Allbery
  +- Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not hD
  +* Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not hJohn Goerzen
  |+- Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not hRuss Allbery
  |`* Re: tradspool and crosspost expiryJesse Rehmer
  | +- Re: tradspool and crosspost expiryJohn Goerzen
  | `* Re: tradspool and crosspost expiryRuss Allbery
  |  `* Re: tradspool and crosspost expiryJohn Goerzen
  |   `- Re: tradspool and crosspost expiryJulien ÉLIE
  `- Re: tradspool and crosspost expiryJulien ÉLIE

1
INN expiring from overview but not history or disk

<slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2793&group=news.software.nntp#2793

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.quux.org!alexnews.alexandria.complete.org!.POSTED!not-for-mail
From: jgoerzen@complete.org (John Goerzen)
Newsgroups: news.software.nntp
Subject: INN expiring from overview but not history or disk
Date: Tue, 16 Jan 2024 17:32:10 -0000 (UTC)
Organization: Alexandria NNCP news system
Message-ID: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
Injection-Date: Tue, 16 Jan 2024 17:32:10 -0000 (UTC)
Injection-Info: alexnews.alexandria.complete.org;
logging-data="146248"; mail-complaints-to="jgoerzen@complete.org"
User-Agent: slrn/1.0.3 (Linux)
 by: John Goerzen - Tue, 16 Jan 2024 17:32 UTC

Hello,

On my private INN leaf system, I am running INN 2.7.1 with groupbaseexpiry=true,
ovmethod=ovdb, hismethod=hisv6, and tradspool.

I am having an issue where the news.daily job is:

* For SOME articles, correctly expiring them from all places

* For others, even in the same group, removing them from lownumber in active
(and presumably also overview) but leaving them on disk (and sometimes in
history).

I noticed this when doing a routine examination of what was using space on this
system, which is getting a full text-only feed.

I picked the group at the top of the storage space list under alt. (Please do
not derail; this group was picked solely by its disk space usage, and is not one
I read anyhow; this is a purely technical question)

news:~/articles/alt/atheism$ ls | wc -l
118089
news:~/articles/alt/atheism$ grep ^alt.atheism\ /var/lib/news/active
alt.atheism 0000289856 0000288979 y

According to active, there are 877 articles in that group. But according to ls,
over 100,000.

I added a specific like to expire.ctl for this hierarchy for testing:

alt.atheism.*:A:5:5:5
alt.atheism:A:5:5:5

My expire.log shows:

expireover start Tue Jan 16 04:18:25 CST 2024: ( -z/var/log/news/expire.rm -Z/var/log/news/expire.lowmark)
Article lines processed 9050290
Articles dropped 16771
Overview index dropped 17663
expireover end Tue Jan 16 04:37:41 CST 2024
lowmarkrenumber begin Tue Jan 16 04:37:41 CST 2024: (/var/log/news/expire.lowmark)
lowmarkrenumber end Tue Jan 16 04:37:41 CST 2024
expirerm start Tue Jan 16 04:37:41 CST 2024
expirerm end Tue Jan 16 04:38:02 CST 2024
expire begin Tue Jan 16 04:38:32 CST 2024: (-v1)
Article lines processed 8051376
Articles retained 5934390
Entries expired 2116986
expire end Tue Jan 16 04:52:01 CST 2024
all done Tue Jan 16 04:52:01 CST 2024

Well that's weird. My expire.list file does have some alt/atheism/* files in
it, and THOSE files are gone. But:

news:~/articles/alt/atheism$ ls -ltr | head
total 584041
-rw-rw-r-- 4 news news 5967 Aug 30 2021 1
-rw-rw-r-- 4 news news 2612 Aug 30 2021 2
-rw-rw-r-- 4 news news 2609 Aug 30 2021 3
-rw-rw-r-- 4 news news 3596 Aug 30 2021 16
-rw-rw-r-- 4 news news 3511 Aug 30 2021 24
-rw-rw-r-- 4 news news 4217 Aug 30 2021 25
-rw-rw-r-- 4 news news 3303 Aug 30 2021 27
-rw-rw-r-- 4 news news 2362 Aug 30 2021 28
-rw-rw-r-- 4 news news 3994 Aug 30 2021 32

Clearly something isn't right here.

Looking at the message-IDs from these, for some of them (for instance, the very
oldest in the list) grephistory doesn't show any entries. For others -- such as
this one from February 2023, nearly a year ago:

news:~/articles/alt/atheism$ grephistory -l 'c0eeuhhbc3ebr3t8vlvfu4ftd4bdbb49u7@4ax.com'
[43098AA196CF303E586EDE4477A98426] 1676098879~-~1676097578 @0500000002F00000000000030D4000000000@

And those dates match. But this is clearly outside what is listed in active.

ovdb seems to match active:

news:~/articles/alt/atheism$ ovdb_stat -c alt.atheism
alt.atheism: counted: low: 288979, high: 289856, count: 878
news:~/articles/alt/atheism$ ovdb_stat -g alt.atheism
alt.atheism: groupstats: low: 288979, high: 289856, count: 878, flag: y
news:~/articles/alt/atheism$ ovdb_stat -i alt.atheism
alt.atheism: flags: none
alt.atheism: gid: 752; Stored in: ov00027
alt.atheism: last expired: 2024-01-16 04:20:27 CST
alt.atheism: by process id: 138688

Even expire.list is weird:

alt/atheism/288826
alt/atheism/288827
alt/atheism/288828
alt/atheism/288829
alt/atheism/288830
alt/atheism/288833
alt/atheism/288834
alt/atheism/288835
alt/atheism/288836
alt/atheism/288840
alt/atheism/288842

I don't know how to explain those gaps; 288831 and 288832 do exist on disk and
should be expired, for instance.

I believe this is happening in a number of other groups as well. That is, it's
not specific to this one. This is just the biggest example.

So my questions are:

1) Why is this happening?

2) Once I fix #1, how can I fix it? I could brute force a 'find . -mtime +x
-delete' but that might not leave the history in a consistent state.

I am using the default cronjob for calling news.daily expireover lowmark delayrm.

Thanks,

John

tradspool and crosspost expiry (was Re: INN expiring from overview but not history or disk)

<slrnuqebiq.borl.jgoerzen@slrnh.complete.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2794&group=news.software.nntp#2794

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.quux.org!alexnews.alexandria.complete.org!.POSTED!not-for-mail
From: jgoerzen@complete.org (John Goerzen)
Newsgroups: news.software.nntp
Subject: tradspool and crosspost expiry (was Re: INN expiring from overview
but not history or disk)
Date: Wed, 17 Jan 2024 01:33:14 -0000 (UTC)
Organization: Alexandria NNCP news system
Message-ID: <slrnuqebiq.borl.jgoerzen@slrnh.complete.org>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
Injection-Date: Wed, 17 Jan 2024 01:33:14 -0000 (UTC)
Injection-Info: alexnews.alexandria.complete.org;
logging-data="154959"; mail-complaints-to="jgoerzen@complete.org"
User-Agent: slrn/1.0.3 (Linux)
 by: John Goerzen - Wed, 17 Jan 2024 01:33 UTC

In reading some more docs, I have a clue, but it is not an answer.

The expireover manpage documents these options:

-e Remove articles from the news spool and all overview databases as
soon as they expire out of any newsgroup to which they are posted,
rather than retain them until they expire out of all newsgroups. -e
and -k cannot be used at the same time. This flag is ignored if
groupbaseexpiry is false.

-k Retain all overview information for an article, as well as the
article itself, until it expires out of all newsgroups to which it
was posted. This can cause articles to stick around in a newsgroup
for longer than the expire.ctl rules indicate, when they're
crossposted. -e and -k cannot be used at the same time. This flag
is ignored if groupbaseexpiry is false.

It doesn't state which is the default. However, some anecdotal inspection
suggests that every post that seems to be mysteriously present was crossposted
into newsgroups with longer expiry times (the default on this server is never,
which I only override for ones that are using excessive disk)

So now I have a ton of questions:

- Since a tradspool article is hardlinked into every newsgroup it is posted to,
how does -e even know where to remove it from? Same for ctlinnd cancel and
such. It would have to rm/unlink it from multiple places if crossposted.

- Part of the reason I picked tradspool was to have more fine-grained control
over expiry. I had thought that it would simply unlink a crossposted article
from any group whose rules would expire it. It sounds like it's
all-or-nothing, and perhaps -k is the default.

- It seems odd that there's no medium between -e and -k that means "just unlink
it from groups as it comes up for expiry there." Or am I missing something?

Thanks,

John

On 2024-01-16, John Goerzen <jgoerzen@complete.org> wrote:
> Hello,
>
> On my private INN leaf system, I am running INN 2.7.1 with groupbaseexpiry=true,
> ovmethod=ovdb, hismethod=hisv6, and tradspool.
>
> I am having an issue where the news.daily job is:
>
> * For SOME articles, correctly expiring them from all places
>
> * For others, even in the same group, removing them from lownumber in active
> (and presumably also overview) but leaving them on disk (and sometimes in
> history).
>
> I noticed this when doing a routine examination of what was using space on this
> system, which is getting a full text-only feed.
>
> I picked the group at the top of the storage space list under alt. (Please do
> not derail; this group was picked solely by its disk space usage, and is not one
> I read anyhow; this is a purely technical question)
>
> news:~/articles/alt/atheism$ ls | wc -l
> 118089
> news:~/articles/alt/atheism$ grep ^alt.atheism\ /var/lib/news/active
> alt.atheism 0000289856 0000288979 y
>
> According to active, there are 877 articles in that group. But according to ls,
> over 100,000.
>
> I added a specific like to expire.ctl for this hierarchy for testing:
>
> alt.atheism.*:A:5:5:5
> alt.atheism:A:5:5:5
>
> My expire.log shows:
>
> expireover start Tue Jan 16 04:18:25 CST 2024: ( -z/var/log/news/expire.rm -Z/var/log/news/expire.lowmark)
> Article lines processed 9050290
> Articles dropped 16771
> Overview index dropped 17663
> expireover end Tue Jan 16 04:37:41 CST 2024
> lowmarkrenumber begin Tue Jan 16 04:37:41 CST 2024: (/var/log/news/expire.lowmark)
> lowmarkrenumber end Tue Jan 16 04:37:41 CST 2024
> expirerm start Tue Jan 16 04:37:41 CST 2024
> expirerm end Tue Jan 16 04:38:02 CST 2024
> expire begin Tue Jan 16 04:38:32 CST 2024: (-v1)
> Article lines processed 8051376
> Articles retained 5934390
> Entries expired 2116986
> expire end Tue Jan 16 04:52:01 CST 2024
> all done Tue Jan 16 04:52:01 CST 2024
>
> Well that's weird. My expire.list file does have some alt/atheism/* files in
> it, and THOSE files are gone. But:
>
> news:~/articles/alt/atheism$ ls -ltr | head
> total 584041
> -rw-rw-r-- 4 news news 5967 Aug 30 2021 1
> -rw-rw-r-- 4 news news 2612 Aug 30 2021 2
> -rw-rw-r-- 4 news news 2609 Aug 30 2021 3
> -rw-rw-r-- 4 news news 3596 Aug 30 2021 16
> -rw-rw-r-- 4 news news 3511 Aug 30 2021 24
> -rw-rw-r-- 4 news news 4217 Aug 30 2021 25
> -rw-rw-r-- 4 news news 3303 Aug 30 2021 27
> -rw-rw-r-- 4 news news 2362 Aug 30 2021 28
> -rw-rw-r-- 4 news news 3994 Aug 30 2021 32
>
> Clearly something isn't right here.
>
> Looking at the message-IDs from these, for some of them (for instance, the very
> oldest in the list) grephistory doesn't show any entries. For others -- such as
> this one from February 2023, nearly a year ago:
>
> news:~/articles/alt/atheism$ grephistory -l 'c0eeuhhbc3ebr3t8vlvfu4ftd4bdbb49u7@4ax.com'
> [43098AA196CF303E586EDE4477A98426] 1676098879~-~1676097578 @0500000002F00000000000030D4000000000@
>
> And those dates match. But this is clearly outside what is listed in active.
>
> ovdb seems to match active:
>
> news:~/articles/alt/atheism$ ovdb_stat -c alt.atheism
> alt.atheism: counted: low: 288979, high: 289856, count: 878
> news:~/articles/alt/atheism$ ovdb_stat -g alt.atheism
> alt.atheism: groupstats: low: 288979, high: 289856, count: 878, flag: y
> news:~/articles/alt/atheism$ ovdb_stat -i alt.atheism
> alt.atheism: flags: none
> alt.atheism: gid: 752; Stored in: ov00027
> alt.atheism: last expired: 2024-01-16 04:20:27 CST
> alt.atheism: by process id: 138688
>
> Even expire.list is weird:
>
> alt/atheism/288826
> alt/atheism/288827
> alt/atheism/288828
> alt/atheism/288829
> alt/atheism/288830
> alt/atheism/288833
> alt/atheism/288834
> alt/atheism/288835
> alt/atheism/288836
> alt/atheism/288840
> alt/atheism/288842
>
> I don't know how to explain those gaps; 288831 and 288832 do exist on disk and
> should be expired, for instance.
>
> I believe this is happening in a number of other groups as well. That is, it's
> not specific to this one. This is just the biggest example.
>
> So my questions are:
>
> 1) Why is this happening?
>
> 2) Once I fix #1, how can I fix it? I could brute force a 'find . -mtime +x
> -delete' but that might not leave the history in a consistent state.
>
> I am using the default cronjob for calling news.daily expireover lowmark delayrm.
>
> Thanks,
>
> John

Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not history or disk)

<8734uw4qs3.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2795&group=news.software.nntp#2795

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.furie.org.uk!usenet.goja.nl.eu.org!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not history or disk)
Date: Tue, 16 Jan 2024 18:50:20 -0800
Organization: The Eyrie
Message-ID: <8734uw4qs3.fsf@hope.eyrie.org>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
<slrnuqebiq.borl.jgoerzen@slrnh.complete.org>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="2019"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:vfEzTma4UVTAwCSPAMduZCeeFcM=
 by: Russ Allbery - Wed, 17 Jan 2024 02:50 UTC

John Goerzen <jgoerzen@complete.org> writes:

> In reading some more docs, I have a clue, but it is not an answer.

> The expireover manpage documents these options:

> -e Remove articles from the news spool and all overview databases
> as soon as they expire out of any newsgroup to which they are
> posted, rather than retain them until they expire out of all
> newsgroups. -e and -k cannot be used at the same time. This
> flag is ignored if groupbaseexpiry is false.

> -k Retain all overview information for an article, as well as the
> article itself, until it expires out of all newsgroups to
> which it was posted. This can cause articles to stick around
> in a newsgroup for longer than the expire.ctl rules indicate,
> when they're crossposted. -e and -k cannot be used at the
> same time. This flag is ignored if groupbaseexpiry is false.

> It doesn't state which is the default.

I believe the default is neither of those: the overview information is
removed as the article expires from each group, but the article itself is
retained until it has expired from all groups.

Crossposting was my first guess about what behavior you were seeing, but
then I decided it couldn't be that since you had thousands of unexpired
messages. But now that you say that you keep everything forever except
certain groups, it all makes sense. Your example group, alt.atheism, is
notorious for getting tons of crossposted troll posts from tons of other
groups, so much of its traffic is crossposted.

> - Since a tradspool article is hardlinked into every newsgroup it is
> posted to, how does -e even know where to remove it from? Same for
> ctlinnd cancel and such. It would have to rm/unlink it from multiple
> places if crossposted.

It uses the Xref header of the article to find all the groups to which it
was posted. It has to retrieve that anyway to know what expiration rules
to apply. This is why although the standard doesn't require it, INN
requires the Xref header be present in the overview database.

> - Part of the reason I picked tradspool was to have more fine-grained
> control over expiry. I had thought that it would simply unlink a
> crossposted article from any group whose rules would expire it. It
> sounds like it's all-or-nothing, and perhaps -k is the default.

I forget why we don't do that. I think it's because the links aren't
always hard links; sometimes they're symlinks and in that case you can't
just delete the article out of each group independently? But I'm not
really sure.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not history or disk)

<3d66a63c58166f8ac09b6820df08470f@dizum.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2796&group=news.software.nntp#2796

  copy link   Newsgroups: news.software.nntp
From: J@M (D)
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
<slrnuqebiq.borl.jgoerzen@slrnh.complete.org> <8734uw4qs3.fsf@hope.eyrie.org>
Subject: Re: tradspool and crosspost expiry (was Re: INN expiring from
overview but not history or disk)
Content-Transfer-Encoding: 7bit
Message-ID: <3d66a63c58166f8ac09b6820df08470f@dizum.com>
Date: Wed, 17 Jan 2024 05:27:50 +0100 (CET)
Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!news.in-chemnitz.de!news2.arglkargh.de!alphared!sewer!news.dizum.net!not-for-mail
Organization: dizum.com - The Internet Problem Provider
X-Abuse: abuse@dizum.com
Injection-Info: sewer.dizum.com - 2001::1/128
 by: D - Wed, 17 Jan 2024 04:27 UTC

On Tue, 16 Jan 2024 18:50:20 -0800, Russ Allbery <eagle@eyrie.org> wrote:
>John Goerzen <jgoerzen@complete.org> writes:
>> In reading some more docs, I have a clue, but it is not an answer.
>> The expireover manpage documents these options:
>> -e Remove articles from the news spool and all overview databases
>> as soon as they expire out of any newsgroup to which they are
>> posted, rather than retain them until they expire out of all
>> newsgroups. -e and -k cannot be used at the same time. This
>> flag is ignored if groupbaseexpiry is false.
>> -k Retain all overview information for an article, as well as the
>> article itself, until it expires out of all newsgroups to
>> which it was posted. This can cause articles to stick around
>> in a newsgroup for longer than the expire.ctl rules indicate,
>> when they're crossposted. -e and -k cannot be used at the
>> same time. This flag is ignored if groupbaseexpiry is false.
>> It doesn't state which is the default.
>
>I believe the default is neither of those: the overview information is
>removed as the article expires from each group, but the article itself is
>retained until it has expired from all groups.
>Crossposting was my first guess about what behavior you were seeing, but
>then I decided it couldn't be that since you had thousands of unexpired
>messages. But now that you say that you keep everything forever except
>certain groups, it all makes sense. Your example group, alt.atheism, is
>notorious for getting tons of crossposted troll posts from tons of other
>groups, so much of its traffic is crossposted.
>
>> - Since a tradspool article is hardlinked into every newsgroup it is
>> posted to, how does -e even know where to remove it from? Same for
>> ctlinnd cancel and such. It would have to rm/unlink it from multiple
>> places if crossposted.
>
>It uses the Xref header of the article to find all the groups to which it
>was posted. It has to retrieve that anyway to know what expiration rules
>to apply. This is why although the standard doesn't require it, INN
>requires the Xref header be present in the overview database.
>
>> - Part of the reason I picked tradspool was to have more fine-grained
>> control over expiry. I had thought that it would simply unlink a
>> crossposted article from any group whose rules would expire it. It
>> sounds like it's all-or-nothing, and perhaps -k is the default.
>
>I forget why we don't do that. I think it's because the links aren't
>always hard links; sometimes they're symlinks and in that case you can't
>just delete the article out of each group independently? But I'm not
>really sure.

technical discussions about news servers are beyond my simple understanding,
but it has always seemed like usenet would have been better off without any
crossposting . . . e.g. "alt.atheism" has been crossposted to from opposing
religious groups more often than not; alt.fan.nietzsche, alt.fan.voltaire(?)
soc.atheism, talk.atheism, uk.philosophy.atheism etc. might make more sense,
but even then why crosspost at all? alt.atheism is by far the busiest group,
so subscribers to other relatively less active atheist-centric forums would
almost certainly have alt.atheism on their common list of subscribed groups;
even the word atheism seems dubious, whereas atheist is clearly unequivocal,
an "atheismist" is an adherent of atheism and atheist is only a human being

Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not history or disk)

<slrnuqeo9o.borl.jgoerzen@slrnh.complete.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2797&group=news.software.nntp#2797

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!news.samoylyk.net!1.us.feeder.erje.net!feeder.erje.net!news.quux.org!alexnews.alexandria.complete.org!.POSTED!not-for-mail
From: jgoerzen@complete.org (John Goerzen)
Newsgroups: news.software.nntp
Subject: Re: tradspool and crosspost expiry (was Re: INN expiring from
overview but not history or disk)
Date: Wed, 17 Jan 2024 05:10:16 -0000 (UTC)
Organization: Alexandria NNCP news system
Message-ID: <slrnuqeo9o.borl.jgoerzen@slrnh.complete.org>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
<slrnuqebiq.borl.jgoerzen@slrnh.complete.org>
<8734uw4qs3.fsf@hope.eyrie.org>
Injection-Date: Wed, 17 Jan 2024 05:10:16 -0000 (UTC)
Injection-Info: alexnews.alexandria.complete.org;
logging-data="158998"; mail-complaints-to="jgoerzen@complete.org"
User-Agent: slrn/1.0.3 (Linux)
 by: John Goerzen - Wed, 17 Jan 2024 05:10 UTC

On 2024-01-17, Russ Allbery <eagle@eyrie.org> wrote:
> John Goerzen <jgoerzen@complete.org> writes:
>
>> In reading some more docs, I have a clue, but it is not an answer.
>
>> It doesn't state which is the default.
>
> I believe the default is neither of those: the overview information is
> removed as the article expires from each group, but the article itself is
> retained until it has expired from all groups.

Hi Russ, and thanks for the reply!

I am a little fuzzy on the relationship between the triple of (overview,
history, article storage on disk).

It sounds like you're saying:

- The overview information is per-group (which is pretty much has to be, given
what overview is for). So expireover could remove the article from overview
for a group per expire.ctl without removing the hardlink to it from that
group, or the history entry.

- Then after it is expired from all groups (I guess by checking the Xref
header?) it is finally removed from the history and on-disk as well.

Did I get that right? I think this could explain the behavior I was seeing, of
ovdb and active not showing all the old articles, but them still being on disk
and (mostly?) in history.

In that case, the only "penalty" I am paying here is the cost of the directory
entry, since the inode and data is already spoken for with the other crossposts.

So, expireover -k would modify that by not even removing it from overview
either, until all references are removed.

And -e would change it to remove from everywhere as soon as even one of the
Xrefs expire it.

> Crossposting was my first guess about what behavior you were seeing, but
> then I decided it couldn't be that since you had thousands of unexpired
> messages. But now that you say that you keep everything forever except
> certain groups, it all makes sense. Your example group, alt.atheism, is
> notorious for getting tons of crossposted troll posts from tons of other
> groups, so much of its traffic is crossposted.

Oh interesting. That could account for it indeed. And with all the hardlinks,
space attribution is notoriously difficult, and it could be that the tool I was
using (ncdu) was arbitrarily assigning the space to the first instance it
encountered (and with alt.atheism sorting near the top of the group list, that
seems plausible).

>> - Since a tradspool article is hardlinked into every newsgroup it is
>> posted to, how does -e even know where to remove it from? Same for
>> ctlinnd cancel and such. It would have to rm/unlink it from multiple
>> places if crossposted.
>
> It uses the Xref header of the article to find all the groups to which it
> was posted. It has to retrieve that anyway to know what expiration rules
> to apply. This is why although the standard doesn't require it, INN
> requires the Xref header be present in the overview database.

Ahhhh. That makes sense.

>> - Part of the reason I picked tradspool was to have more fine-grained
>> control over expiry. I had thought that it would simply unlink a
>> crossposted article from any group whose rules would expire it. It
>> sounds like it's all-or-nothing, and perhaps -k is the default.
>
> I forget why we don't do that. I think it's because the links aren't
> always hard links; sometimes they're symlinks and in that case you can't
> just delete the article out of each group independently? But I'm not
> really sure.

I've been wondering of I could use different storage classes to solve this
problem. Or maybe that would introduce more; I've never had more than one
storage class before.

What is the mechanism when an article is crossposted into groups that are in
different classes? Is it safe to use tradspool for multiple classes? (I think
it is, given the example in the manpage of timehash for multiple classes) I
imagine that in the case where an article is crossposted to multiple classes
using different storage methods, it would either have to be duplicated into all
of them, or somehow everything will need to know where to find it (implying that
cnfs rolling expiry may hold some surprises for some messages that are
crossposted outside of cnfs).

Can one move groups between classes if the storage method is the same? (I note
the warning "arbitrary but permanent" and am not sure how it might apply here.)

Thanks again,

John

Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not history or disk)

<87r0ig34lm.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2798&group=news.software.nntp#2798

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: tradspool and crosspost expiry (was Re: INN expiring from overview but not history or disk)
Date: Tue, 16 Jan 2024 21:34:45 -0800
Organization: The Eyrie
Message-ID: <87r0ig34lm.fsf@hope.eyrie.org>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
<slrnuqebiq.borl.jgoerzen@slrnh.complete.org>
<8734uw4qs3.fsf@hope.eyrie.org>
<slrnuqeo9o.borl.jgoerzen@slrnh.complete.org>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="6672"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:5DnAuzwuDJNjYi7lca7bHvrwdBY=
 by: Russ Allbery - Wed, 17 Jan 2024 05:34 UTC

John Goerzen <jgoerzen@complete.org> writes:

> I am a little fuzzy on the relationship between the triple of (overview,
> history, article storage on disk).

> It sounds like you're saying:

> - The overview information is per-group (which is pretty much has to be,
> given what overview is for). So expireover could remove the article
> from overview for a group per expire.ctl without removing the hardlink
> to it from that group, or the history entry.

Correct. An article has one and only one history entry, which is only
dropped after it has expired completely *and* its Date header is older
than the history retention cutoff.

> - Then after it is expired from all groups (I guess by checking the Xref
> header?) it is finally removed from the history and on-disk as well.

> Did I get that right? I think this could explain the behavior I was
> seeing, of ovdb and active not showing all the old articles, but them
> still being on disk and (mostly?) in history.

Yup, I think that's correct.

> In that case, the only "penalty" I am paying here is the cost of the
> directory entry, since the inode and data is already spoken for with the
> other crossposts.

Also correct.

> So, expireover -k would modify that by not even removing it from
> overview either, until all references are removed.

> And -e would change it to remove from everywhere as soon as even one of
> the Xrefs expire it.

Yes, indeed. This may be what you want for your expiration configuration.

>> I forget why we don't do that. I think it's because the links aren't
>> always hard links; sometimes they're symlinks and in that case you
>> can't just delete the article out of each group independently? But I'm
>> not really sure.

> I've been wondering of I could use different storage classes to solve
> this problem. Or maybe that would introduce more; I've never had more
> than one storage class before.

Oh, that's an interesting question. My first instinct is that tradspool
doesn't care about the storage classes at all (it predates them), just
whether the article is stored in a tradspool class, so I don't think this
would change behavior, but the class goes into the token, so maybe this
isn't correct.

> What is the mechanism when an article is crossposted into groups that
> are in different classes?

Whichever rule matches the article first takes ownership of the article
and the remaining rules are ignored. So one article is always stored in
one and only one storage class.

This is the technique I used to control disk space usage long ago when I
was running servers for other people. I sent the high-volume groups (and
thus also all the articles crossposted to them) into CNFS and let them
autoexpire, and only used tradspool for the lower volume groups.

I think you need different storage backends to pull this off, though. I
don't think making multiple storage classes and assigning them all to
tradspool will do anything.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: tradspool and crosspost expiry

<uo7rh9$2v8s$1@nnrp.usenet.blueworldhosting.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2799&group=news.software.nntp#2799

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!nnrp.usenet.blueworldhosting.com!.POSTED!not-for-mail
From: jesse.rehmer@blueworldhosting.com (Jesse Rehmer)
Newsgroups: news.software.nntp
Subject: Re: tradspool and crosspost expiry
Date: Wed, 17 Jan 2024 06:19:21 -0000 (UTC)
Organization: BWH Usenet Archive (https://usenet.blueworldhosting.com)
Message-ID: <uo7rh9$2v8s$1@nnrp.usenet.blueworldhosting.com>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org> <slrnuqebiq.borl.jgoerzen@slrnh.complete.org> <8734uw4qs3.fsf@hope.eyrie.org> <slrnuqeo9o.borl.jgoerzen@slrnh.complete.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8; format=fixed
Content-Transfer-Encoding: 8bit
Injection-Date: Wed, 17 Jan 2024 06:19:21 -0000 (UTC)
Injection-Info: nnrp.usenet.blueworldhosting.com;
logging-data="97564"; mail-complaints-to="usenet@blueworldhosting.com"
User-Agent: Usenapp for MacOS
Cancel-Lock: sha1:gLdvIh0RLjRuDAYaOiGtDg/p1K0= sha256:RKY+MmTli/SUpHiOBVGaKyuOM6z9Fs9kZaJ3X3AUMwY=
sha1:jRqeIe7KchPfLuBsCTlBjinpXR0= sha256:m/cYSLlShij803XzZ1HUdIvEaQ3pCAu/cB3swauqmjQ=
X-Usenapp: v1.27.2/d - Full License
 by: Jesse Rehmer - Wed, 17 Jan 2024 06:19 UTC

On Jan 16, 2024 at 11:10:16 PM CST, "John Goerzen" <jgoerzen@complete.org>
wrote:

> On 2024-01-17, Russ Allbery <eagle@eyrie.org> wrote:
>> John Goerzen <jgoerzen@complete.org> writes:
>>
>>> In reading some more docs, I have a clue, but it is not an answer.
>>
>>> It doesn't state which is the default.
>>
>> I believe the default is neither of those: the overview information is
>> removed as the article expires from each group, but the article itself is
>> retained until it has expired from all groups.
>
> Hi Russ, and thanks for the reply!
>
> I am a little fuzzy on the relationship between the triple of (overview,
> history, article storage on disk).
>
> It sounds like you're saying:
>
> - The overview information is per-group (which is pretty much has to be, given
> what overview is for). So expireover could remove the article from overview
> for a group per expire.ctl without removing the hardlink to it from that
> group, or the history entry.
>
> - Then after it is expired from all groups (I guess by checking the Xref
> header?) it is finally removed from the history and on-disk as well.
>
> Did I get that right? I think this could explain the behavior I was seeing, of
> ovdb and active not showing all the old articles, but them still being on disk
> and (mostly?) in history.
>
> In that case, the only "penalty" I am paying here is the cost of the directory
> entry, since the inode and data is already spoken for with the other
> crossposts.
>
> So, expireover -k would modify that by not even removing it from overview
> either, until all references are removed.
>
> And -e would change it to remove from everywhere as soon as even one of the
> Xrefs expire it.

If you want to keep tradspool, you may want to add the 'X' flag to the
expire.ctl entry for alt.atheism, and any other similar group that you wish to
expire crossposted articles. That will expire articles posted to that group
that are also crossposted to other groups. Using the '-e' flag with expireover
may result in behavior you do not expect if you have many entries in
expire.ctl.

"An expiration policy is applied to every article in a newsgroup it
matches. There is no way to set an expiration policy for articles
crossposted to groups you don't carry that's different than other
articles in the same group. Normally, articles are not completely
deleted until they expire out of every group to which they were posted,
but if an article is expired following a rule where <flag> contains
"X", it is deleted out of all newsgroups to which it was posted
immediately. (If that behaviour is wanted for all rules, you may want
to give the -e flag to expireoverview.)"

Re: tradspool and crosspost expiry

<slrnuqfl21.borl.jgoerzen@slrnh.complete.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2800&group=news.software.nntp#2800

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!rocksolid2!news.neodome.net!weretis.net!feeder6.news.weretis.net!news.quux.org!alexnews.alexandria.complete.org!.POSTED!not-for-mail
From: jgoerzen@complete.org (John Goerzen)
Newsgroups: news.software.nntp
Subject: Re: tradspool and crosspost expiry
Date: Wed, 17 Jan 2024 13:21:05 -0000 (UTC)
Organization: Alexandria NNCP news system
Message-ID: <slrnuqfl21.borl.jgoerzen@slrnh.complete.org>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
<slrnuqebiq.borl.jgoerzen@slrnh.complete.org>
<8734uw4qs3.fsf@hope.eyrie.org>
<slrnuqeo9o.borl.jgoerzen@slrnh.complete.org>
<uo7rh9$2v8s$1@nnrp.usenet.blueworldhosting.com>
Injection-Date: Wed, 17 Jan 2024 13:21:05 -0000 (UTC)
Injection-Info: alexnews.alexandria.complete.org;
logging-data="167381"; mail-complaints-to="jgoerzen@complete.org"
User-Agent: slrn/1.0.3 (Linux)
 by: John Goerzen - Wed, 17 Jan 2024 13:21 UTC

On 2024-01-17, Jesse Rehmer <jesse.rehmer@blueworldhosting.com> wrote:
> If you want to keep tradspool, you may want to add the 'X' flag to the
> expire.ctl entry for alt.atheism, and any other similar group that you wish to
> expire crossposted articles. That will expire articles posted to that group
> that are also crossposted to other groups. Using the '-e' flag with expireover
> may result in behavior you do not expect if you have many entries in
> expire.ctl.

Ah! Yes, that is nice. That is a lot less of a blunt instrument than -e.
Thank you!

(It's still not what my mind thought would happen -- just unlinking from the
groups that expire earlier -- but it looks like this doesn't exist, so flag X is
the next best thing.)

- John

Re: tradspool and crosspost expiry

<87edef6grq.fsf@hope.eyrie.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2801&group=news.software.nntp#2801

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!news.killfile.org!news.eyrie.org!.POSTED!not-for-mail
From: eagle@eyrie.org (Russ Allbery)
Newsgroups: news.software.nntp
Subject: Re: tradspool and crosspost expiry
Date: Wed, 17 Jan 2024 08:55:53 -0800
Organization: The Eyrie
Message-ID: <87edef6grq.fsf@hope.eyrie.org>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
<slrnuqebiq.borl.jgoerzen@slrnh.complete.org>
<8734uw4qs3.fsf@hope.eyrie.org>
<slrnuqeo9o.borl.jgoerzen@slrnh.complete.org>
<uo7rh9$2v8s$1@nnrp.usenet.blueworldhosting.com>
Mime-Version: 1.0
Content-Type: text/plain
Injection-Info: hope.eyrie.org;
logging-data="4040"; mail-complaints-to="news@eyrie.org"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:eMvpeu9mP61gjSP4V2y58n7+mi4=
 by: Russ Allbery - Wed, 17 Jan 2024 16:55 UTC

Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:

> If you want to keep tradspool, you may want to add the 'X' flag to the
> expire.ctl entry for alt.atheism, and any other similar group that you
> wish to expire crossposted articles.

Oh, good call, I completely forgot about that feature.

--
Russ Allbery (eagle@eyrie.org) <https://www.eyrie.org/~eagle/>

Please post questions rather than mailing me directly.
<https://www.eyrie.org/~eagle/faqs/questions.html> explains why.

Re: tradspool and crosspost expiry

<slrnuqir42.psp8.jgoerzen@slrnh.complete.org>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2802&group=news.software.nntp#2802

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!news.quux.org!alexnews.alexandria.complete.org!.POSTED!not-for-mail
From: jgoerzen@complete.org (John Goerzen)
Newsgroups: news.software.nntp
Subject: Re: tradspool and crosspost expiry
Date: Thu, 18 Jan 2024 18:22:58 -0000 (UTC)
Organization: Alexandria NNCP news system
Message-ID: <slrnuqir42.psp8.jgoerzen@slrnh.complete.org>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
<slrnuqebiq.borl.jgoerzen@slrnh.complete.org>
<8734uw4qs3.fsf@hope.eyrie.org>
<slrnuqeo9o.borl.jgoerzen@slrnh.complete.org>
<uo7rh9$2v8s$1@nnrp.usenet.blueworldhosting.com>
<87edef6grq.fsf@hope.eyrie.org>
Injection-Date: Thu, 18 Jan 2024 18:22:58 -0000 (UTC)
Injection-Info: alexnews.alexandria.complete.org;
logging-data="494105"; mail-complaints-to="jgoerzen@complete.org"
User-Agent: slrn/1.0.3 (Linux)
 by: John Goerzen - Thu, 18 Jan 2024 18:22 UTC

On 2024-01-17, Russ Allbery <eagle@eyrie.org> wrote:
> Jesse Rehmer <jesse.rehmer@blueworldhosting.com> writes:
>
>> If you want to keep tradspool, you may want to add the 'X' flag to the
>> expire.ctl entry for alt.atheism, and any other similar group that you
>> wish to expire crossposted articles.
>
> Oh, good call, I completely forgot about that feature.
>

So a followup here...

I tried adding the X flag for alt.atheism. Now, showing a few excerpts from
expire.ctl, it looks like this:

/remember/:11
*:A:never:never:never
uk.*:A:30:30:30
uk.sci.weather:A:7:7:7
alt.politics.*:A:30:30:30
comp.lang.c:A:30:30:30
comp.lang.fortran:A:30:30:30
comp.protocols.dicom:A:5:5:5
alt.atheism.*:AX:5:5:5
alt.atheism:AX:5:5:5
alt.fan.rush-limbaugh:AX:5:5:5
alt.test:AX:5:5:5
alt.test.*:AX:5:5:5
alt.politics:AX:30:30:30
alt.politics.*:AX:30:30:30
alt.society.liberalism:AX:5:5:5
talk.politics.guns:AX:5:5:5
alt.religion.christian:A:90:90:90
rec.arts.tv:A:90:90:90
it.*:A:10:10:10
de.*:A:30:30:30

The added X had no effect. However, adding expireoverflags=-e caused approximately
half of the suspiciously-old messages in alt.atheism to go away.

The mystery now is: why are there still a bunch of other ones?

news:/var/spool/news/articles/alt/atheism$ ls | wc -l
64302

news:~/articles/alt/atheism$ ls -ltr | head
total 312503
-rw-rw-r-- 4 news news 5967 Aug 30 2021 1
-rw-rw-r-- 4 news news 2612 Aug 30 2021 2
-rw-rw-r-- 4 news news 2609 Aug 30 2021 3
-rw-rw-r-- 4 news news 3596 Aug 30 2021 16
-rw-rw-r-- 4 news news 3511 Aug 30 2021 24
-rw-rw-r-- 4 news news 4217 Aug 30 2021 25
-rw-rw-r-- 4 news news 3303 Aug 30 2021 27
-rw-rw-r-- 4 news news 2362 Aug 30 2021 28
-rw-rw-r-- 4 news news 3994 Aug 30 2021 32

Hmm. So are these still in history?

news:~/articles/alt/atheism$ grep '^Message-ID: ' 16
news:~/articles/alt/atheism$ grephistory "`grep '^Message-ID:' 16 | sed -e 's/^Message-ID: //' -e 's/<//' -e 's/>.*//'`"
grephistory: not found

Not that one..

news:~/articles/alt/atheism$ grephistory "`grep '^Message-ID:' 36557 | sed -e 's/^Message-ID: //' -e 's/<//' -e 's/>.*//'`"
@0500000002F00000000000008ECD00000000@

Well that one is there.

news:~/articles/alt/atheism$ ls -l 36557
-rw-rw-r-- 3 news news 3838 Nov 29 2021 36557

And, what does ovdb say?

news:~/articles/alt/atheism$ ovdb_stat -c alt.atheism
alt.atheism: counted: low: 289303, high: 290329, count: 1023

OK, so from incidental inspection, MOST but not ALL of the 60,000 articles still
in alt.atheism are visible to grephistory. These from 2021 are clearly outside
the expire range. The article count is down from the roughly 120,000 that it
was before adding -e to expireover, but still the 60,000 files on disk is way
higher than the 1023 articles that ovdb thinks I have.

news:~/articles/alt/atheism$ grep ^Newsgroups: 16 36557
16:Newsgroups: alt.fan.rush-limbaugh,alt.society.liberalism,alt.atheism,alt.politics.democrats.d,talk.politics.guns
36557:Newsgroups: alt.atheism,alt.religion.christian.catholic,alt.christnet.christianlife

Upon additional investigation, it seems that every old message still remaining in
alt.atheism is also crossposted to another group.

news:~/articles/alt/atheism$ find . -maxdepth 1 -type f -exec grep -l '^Newsgroups: alt.atheism.$' {} +

only returned recent posts - that is, they all seemed to be less than 5 days old.

So, the problem seems definitively to still involve crossposts. Anecdotally, it
seems that they mostly (all?) involve crossposts to other groups explicitly
listed in expire.ctl. Still, the highest explicit retention time on those is 30
days and we've got articles from 2021 here.

The two examples I pulled up above illustrate this. Number 16 was crossposted
entirely to groups listed in expire.ctl. (Side note: although
alt.fan.rush-limbaugh is listed in expire.ctl, that group isn't carried
locally.)

Number 36557 is crossposted only to groups that would match the default "never"
expiration.

So, I am still puzzled.

Thanks,

- John

Re: tradspool and crosspost expiry

<uoekcd$3fkf7$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2803&group=news.software.nntp#2803

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!usenet.network!news.neodome.net!weretis.net!feeder8.news.weretis.net!news.trigofacile.com!.POSTED.2a01cb080adc110029b5328be2eee065.ipv6.abo.wanadoo.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: tradspool and crosspost expiry
Date: Fri, 19 Jan 2024 21:00:13 +0100
Organization: Groupes francophones par TrigoFACILE
Message-ID: <uoekcd$3fkf7$1@news.trigofacile.com>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
<slrnuqebiq.borl.jgoerzen@slrnh.complete.org> <8734uw4qs3.fsf@hope.eyrie.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 19 Jan 2024 20:00:13 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="2a01cb080adc110029b5328be2eee065.ipv6.abo.wanadoo.fr:2a01:cb08:adc:1100:29b5:328b:e2ee:e065";
logging-data="3658215"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:MK1sZKsJlV4SETfBSbiwOMD7QPw= sha256:+XNPI9Yz6fX0QFQX6x2QFG87UVvbUmt8s6cHKeg+d2o=
sha1:04WqMS5QZYuQmpdHK2Yc8/RHStk= sha256:TFONuEXkPv7BqeE271QTs1oOFxnGuiqtTfWLJzA+64g=
In-Reply-To: <8734uw4qs3.fsf@hope.eyrie.org>
 by: Julien ÉLIE - Fri, 19 Jan 2024 20:00 UTC

Hi John, Russ,

>> The expireover manpage documents these options:
>
>> -e Remove articles from the news spool and all overview databases
>> as soon as they expire out of any newsgroup to which they are
>> posted, rather than retain them until they expire out of all
>> newsgroups. -e and -k cannot be used at the same time. This
>> flag is ignored if groupbaseexpiry is false.
>
>> -k Retain all overview information for an article, as well as the
>> article itself, until it expires out of all newsgroups to
>> which it was posted. This can cause articles to stick around
>> in a newsgroup for longer than the expire.ctl rules indicate,
>> when they're crossposted. -e and -k cannot be used at the
>> same time. This flag is ignored if groupbaseexpiry is false.
>
>> It doesn't state which is the default.
>
> I believe the default is neither of those: the overview information is
> removed as the article expires from each group, but the article itself is
> retained until it has expired from all groups.

Indeed, and this is documented at the beginning of the expireover(8)
manual page:

"When groupbaseexpiry is set, the default behavior of expireover is to
remove the article from the spool once it expires out of all of the
newsgroups to which it was crossposted. The article is, however,
removed from the overview database of each newsgroup as soon as it
expires out of that individual newsgroup."

>> - Part of the reason I picked tradspool was to have more fine-grained
>> control over expiry. I had thought that it would simply unlink a
>> crossposted article from any group whose rules would expire it. It
>> sounds like it's all-or-nothing, and perhaps -k is the default.
>
> I forget why we don't do that. I think it's because the links aren't
> always hard links; sometimes they're symlinks and in that case you can't
> just delete the article out of each group independently? But I'm not
> really sure.

When the creation of a hard link fails, tradspool falls back to create a
symlink, and if it also does not work, no link is created at all and an
error is logged. That's certainly why files aren't directly removed, as
they can be symlink on rare occasions.

>> Can one move groups between classes if the storage method is the
>> same? (I note the warning "arbitrary but permanent" and am not
>> sure how it might apply here.)
Yes, they can be moved between classes, and even of different methods.
The warning just says that if, at the time of the arrival of an article
to news.software.nntp, its storage method is tradspool and has class
number 3, then it will be stored in tradspool and be assigned a storage
token @0503...@ which will permit retrieving it in tradspool.
"05" means tradspool, and "03" is the class number (used for expiration
per storage classes when groupbaseexpiry is false).

If you then parameterize news.software.nntp to be stored in timehash and
class number 7, then the next article in that newsgroup will have a
storage token of the form @0207...@ where "02" means timehash, and "07"
is the class number.

The previous article in tradspool will naturally still be retrievable.
What INN needs to know is in the storage token.

And in case you remove the storage.conf entry for class number 3, the
articles will still be here and retrievable.
Later, if you reintroduce a new storage.conf entry for class number 3 on
a different set of groups and maybe a different wish of retention, here
comes the warning "arbitrary but permanent" as your old articles stored
in class number 3 will share the same class number as the new articles.

>> I've been wondering of I could use different storage classes to
>> solve this problem.

In case there's a bug in how crossposted articles are marked by
expireover as to be expired from the news spool, using different storage
classes *and* setting groupbaseexpiry to *false* may improve the
expiration of such articles.
(Naturally, the possible bug in expireover should be fixed but it should
first be found...)

The expiration process is driven by expireover when groupbaseexpiry is
true. The contents of the overview database is used to find articles to
expire, and then are removed from the overview and the news spool if
needed according to the rules in expire.ctl. The expire program purges
the history file afterwards.

When groupbaseexpiry is false, the expire program drives the expiration
process. The articles are removed from the history file and the news
spool according to their storage class (and expire.ctl rules). The
expireover program purges the overview database afterwards.

So if you have several storage classes (all in tradspool if you wish,
one for 5 days of retention, another for 7 days, 10 days, 30 days,
etc.), I am under the impression that the expiration would work better.
There's no longer any rule about crossposts.

Just a thought I would like to share anyway, I've not tested.

--
Julien ÉLIE

« – Tu parles ?
– Tu parles ! » (Astérix)

Re: tradspool and crosspost expiry

<uoel6a$3fkf6$1@news.trigofacile.com>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2804&group=news.software.nntp#2804

  copy link   Newsgroups: news.software.nntp
Path: i2pn2.org!i2pn.org!nntp.comgw.net!paganini.bofh.team!news.trigofacile.com!.POSTED.2a01cb080adc110029b5328be2eee065.ipv6.abo.wanadoo.fr!not-for-mail
From: iulius@nom-de-mon-site.com.invalid (Julien ÉLIE)
Newsgroups: news.software.nntp
Subject: Re: tradspool and crosspost expiry
Date: Fri, 19 Jan 2024 21:14:02 +0100
Organization: Groupes francophones par TrigoFACILE
Message-ID: <uoel6a$3fkf6$1@news.trigofacile.com>
References: <slrnuqdfcq.9bhe.jgoerzen@slrnh.complete.org>
<slrnuqebiq.borl.jgoerzen@slrnh.complete.org> <8734uw4qs3.fsf@hope.eyrie.org>
<slrnuqeo9o.borl.jgoerzen@slrnh.complete.org>
<uo7rh9$2v8s$1@nnrp.usenet.blueworldhosting.com>
<87edef6grq.fsf@hope.eyrie.org> <slrnuqir42.psp8.jgoerzen@slrnh.complete.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 19 Jan 2024 20:14:02 -0000 (UTC)
Injection-Info: news.trigofacile.com; posting-account="julien"; posting-host="2a01cb080adc110029b5328be2eee065.ipv6.abo.wanadoo.fr:2a01:cb08:adc:1100:29b5:328b:e2ee:e065";
logging-data="3658214"; mail-complaints-to="abuse@trigofacile.com"
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:AI7XKYblB4OqMBQeRKRH0j/V+EI= sha256:MesYb4SL0NRa8qNh2W1UMBlsnCMDNYelijoAPjZGPzo=
sha1:hXflIyGUlWcGLV5AsvfhVXSkzoI= sha256:sUj3L5L8PeMkO5l2oS71yXfe4oHJH2ddGeK9uX3hlF0=
In-Reply-To: <slrnuqir42.psp8.jgoerzen@slrnh.complete.org>
 by: Julien ÉLIE - Fri, 19 Jan 2024 20:14 UTC

Hi John,

> news:~/articles/alt/atheism$ ls -ltr | head
> total 312503
> -rw-rw-r-- 4 news news 5967 Aug 30 2021 1
> -rw-rw-r-- 4 news news 2612 Aug 30 2021 2
> -rw-rw-r-- 4 news news 2609 Aug 30 2021 3
> -rw-rw-r-- 4 news news 3596 Aug 30 2021 16
> -rw-rw-r-- 4 news news 3511 Aug 30 2021 24
> -rw-rw-r-- 4 news news 4217 Aug 30 2021 25
> -rw-rw-r-- 4 news news 3303 Aug 30 2021 27
> -rw-rw-r-- 4 news news 2362 Aug 30 2021 28
> -rw-rw-r-- 4 news news 3994 Aug 30 2021 32
>
> news:~/articles/alt/atheism$ ovdb_stat -c alt.atheism
> alt.atheism: counted: low: 289303, high: 290329, count: 1023
>
> OK, so from incidental inspection, MOST but not ALL of the 60,000 articles still
> in alt.atheism are visible to grephistory. These from 2021 are clearly outside
> the expire range. The article count is down from the roughly 120,000 that it
> was before adding -e to expireover, but still the 60,000 files on disk is way
> higher than the 1023 articles that ovdb thinks I have.

I think these articles somehow failed to be marked as to be removed from
the news spool by expireover when their entries were deleted from ovdb.
They're staying orphaned in the news spool, and are no longer
re-processed by expireover as only known articles from it are processed.

When there's even no longer a history entry, it would mean that
expireover someone did not manage to remove the file, but expire did its
job of removing the history entry.
Sounds like a bug... somewhere...

FWIW, running the scanspool program will normally report these articles,
which should no longer be here.

--
Julien ÉLIE

« L'ordinateur obéit à vos ordres, pas à vos intentions. »

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor