Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Research is to see what everybody else has seen, and think what nobody else has thought.


devel / comp.lang.lisp / why is this procedure is so slow?

SubjectAuthor
* why is this procedure is so slow?Julieta Shem
+- Re: why is this procedure is so slow?Spiros Bousbouras
+* Re: why is this procedure is so slow?Raymond Wiker
|+* Re: why is this procedure is so slow?Spiros Bousbouras
||`- Re: why is this procedure is so slow?Spiros Bousbouras
|`* Re: why is this procedure is so slow?Julieta Shem
| +* Re: why is this procedure is so slow?Raymond Wiker
| |+* Re: why is this procedure is so slow?Spiros Bousbouras
| ||`* Re: why is this procedure is so slow?Julieta Shem
| || `* Re: why is this procedure is so slow?Spiros Bousbouras
| ||  `* Re: why is this procedure is so slow?Julieta Shem
| ||   +* Re: why is this procedure is so slow?Alan Bawden
| ||   |`* Re: why is this procedure is so slow?Jeff Barnett
| ||   | +* Re: why is this procedure is so slow?Ben Bacarisse
| ||   | |+* Re: why is this procedure is so slow?Spiros Bousbouras
| ||   | ||`- Re: why is this procedure is so slow?Ben Bacarisse
| ||   | |+* Re: why is this procedure is so slow?Jeff Barnett
| ||   | ||`* Re: why is this procedure is so slow?Ben Bacarisse
| ||   | || `* Re: why is this procedure is so slow?Jeff Barnett
| ||   | ||  `* Re: why is this procedure is so slow?Ben Bacarisse
| ||   | ||   `* Re: why is this procedure is so slow?Jeff Barnett
| ||   | ||    `- Re: why is this procedure is so slow?Ben Bacarisse
| ||   | |`- Re: why is this procedure is so slow?Spiros Bousbouras
| ||   | +* Re: why is this procedure is so slow?Spiros Bousbouras
| ||   | |`- Re: why is this procedure is so slow?Spiros Bousbouras
| ||   | `- Re: why is this procedure is so slow?Alan Bawden
| ||   `- Re: why is this procedure is so slow?Spiros Bousbouras
| |`- Re: why is this procedure is so slow?Julieta Shem
| `* Re: why is this procedure is so slow?Spiros Bousbouras
|  +- Re: why is this procedure is so slow?Julieta Shem
|  `- Re: why is this procedure is so slow?Lawrence D'Oliveiro
`- Re: why is this procedure is so slow?Kaz Kylheku

Pages:12
why is this procedure is so slow?

<87sf22ykaw.fsf@yaxenu.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17979&group=comp.lang.lisp#17979

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jshem@yaxenu.org (Julieta Shem)
Newsgroups: comp.lang.lisp
Subject: why is this procedure is so slow?
Date: Thu, 08 Feb 2024 13:47:51 -0300
Organization: A noiseless patient Spider
Lines: 132
Message-ID: <87sf22ykaw.fsf@yaxenu.org>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="39892e5664978d74c121af6a7fa90c9d";
logging-data="2162523"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+gY4kdQ35qzlO+5K/OdKtxqPtEQga40m0="
Cancel-Lock: sha1:C1mHiAzJL91tfbV0IATRcTrBJT0=
sha1:4RV/mzlsa7b1CRT8V4o1qfRy48c=
 by: Julieta Shem - Thu, 8 Feb 2024 16:47 UTC

First I define x to be a list of bytes of an NNTP article.

(defvar x (fetch-article "local.test" "28"))

Now x is a list of integers, the bytes of the article 28, which is a
MIME message containing a PDF of 610 KiB.

Now I want to split it line by line. I am able to do it, but it's very
slow.

* (time (length (split-sequence (list 13 10) x nil)))
Evaluation took:
65.710 seconds of real time
65.671875 seconds of total run time (47.093750 user, 18.578125 system)
[ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
99.94% CPU
170,322,749,168 processor cycles
79,439,358,864 bytes consed

11585
*

Less than 12k lines. The number 79,439,358,864 of bytes is way more
than I would have expected for anything regarding this invocation above,
but I have no idea what this number really measures here. I appreciate
any help. Thank you.

Here's the procedure:

(defun split-sequence (delim ls acc &key limit (so-far 1))
(let* ((len (length ls))
(delim delim)
(pos (search delim ls))
(n-take (or pos len))
(n-drop (if pos
(+ n-take (length delim))
n-take)))
(cond ((zerop len) acc)
((and limit (= so-far limit)) (list ls))
(t (split-sequence
delim (drop n-drop ls)
(cons (take n-take ls) acc)
:limit limit
:so-far (1+ so-far))))))

(defun take (n seq) (subseq seq 0 n))
(defun drop (n seq) (subseq seq n))

(*) A sample of the article

All lines in it should be pretty short.

--8<---------------cut here---------------start------------->8---
Message-Id: <tnkqcqnuujaljsvmzvuc@loop>
Content-Type: multipart/mixed; boundary="------------PeB0GiqcER01ZhCmBvnP2yr6"
Date: Wed, 7 Feb 2024 22:22:57 -0300
Mime-Version: 1.0
User-Agent: Mozilla Thunderbird
Newsgroups: local.test
Content-Language: en-US
From: Someone <someone@somewhere.org>
Subject: juris hartmanis

This is a multi-part message in MIME format.
--------------PeB0GiqcER01ZhCmBvnP2yr6
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 7bit

--------------PeB0GiqcER01ZhCmBvnP2yr6
Content-Type: application/pdf;
name="juris-hartmanis-godel-von-neumann-1956-89-994.pdf"
Content-Disposition: attachment;
filename="juris-hartmanis-godel-von-neumann-1956-89-994.pdf"
Content-Transfer-Encoding: base64

JVBERi0xLjIKekdf1fnfSqQYt7AjczYfpmRSIEyEcx8KMSAwIG9iago8PAovVHlwZSAvQ2F0
YWxvZwovUGFnZXMgMyAwIFIKL091dGxpbmVzIDIgMCBSCj4+CmVuZG9iagoyIDAgb2JqCjw8
[...]
IDAwMDAwIG4gCjAwMDA2MjI5NTQgMDAwMDAgbiAKdHJhaWxlcgo8PAovU2l6ZSA0NgovUm9v
dCAxIDAgUgovSW5mbyA0NSAwIFIKPj4Kc3RhcnR4cmVmCjYyMzI4NgolJUVPRgo=

--------------PeB0GiqcER01ZhCmBvnP2yr6--
--8<---------------cut here---------------end--------------->8---

Here's the first 1000 bytes of it.

* (take 1000 x)
(77 101 115 115 97 103 101 45 73 100 58 32 60 116 110 107 113 99 113 110 117
117 106 97 108 106 115 118 109 122 118 117 99 64 108 111 111 112 62 13 10 67
111 110 116 101 110 116 45 84 121 112 101 58 32 109 117 108 116 105 112 97 114
116 47 109 105 120 101 100 59 32 98 111 117 110 100 97 114 121 61 34 45 45 45
45 45 45 45 45 45 45 45 45 80 101 66 48 71 105 113 99 69 82 48 49 90 104 67
109 66 118 110 80 50 121 114 54 34 13 10 68 97 116 101 58 32 87 101 100 44 32
55 32 70 101 98 32 50 48 50 52 32 50 50 58 50 50 58 53 55 32 45 48 51 48 48 13
10 77 105 109 101 45 86 101 114 115 105 111 110 58 32 49 46 48 13 10 85 115
101 114 45 65 103 101 110 116 58 32 77 111 122 105 108 108 97 32 84 104 117
110 100 101 114 98 105 114 100 13 10 78 101 119 115 103 114 111 117 112 115 58
32 108 111 99 97 108 46 116 101 115 116 13 10 67 111 110 116 101 110 116 45 76
97 110 103 117 97 103 101 58 32 101 110 45 85 83 13 10 70 114 111 109 58 32 83
97 98 114 105 110 97 32 87 97 100 115 119 111 114 116 104 32 60 115 97 98 64
114 105 110 97 46 111 114 103 62 13 10 83 117 98 106 101 99 116 58 32 106 117
114 105 115 32 104 97 114 116 109 97 110 105 115 13 10 13 10 84 104 105 115 32
105 115 32 97 32 109 117 108 116 105 45 112 97 114 116 32 109 101 115 115 97
103 101 32 105 110 32 77 73 77 69 32 102 111 114 109 97 116 46 13 10 45 45 45
45 45 45 45 45 45 45 45 45 45 45 80 101 66 48 71 105 113 99 69 82 48 49 90 104
67 109 66 118 110 80 50 121 114 54 13 10 67 111 110 116 101 110 116 45 84 121
112 101 58 32 116 101 120 116 47 112 108 97 105 110 59 32 99 104 97 114 115
101 116 61 85 84 70 45 56 13 10 67 111 110 116 101 110 116 45 84 114 97 110
115 102 101 114 45 69 110 99 111 100 105 110 103 58 32 55 98 105 116 13 10 13
10 45 45 45 45 45 45 45 45 45 45 45 45 45 45 80 101 66 48 71 105 113 99 69 82
48 49 90 104 67 109 66 118 110 80 50 121 114 54 13 10 67 111 110 116 101 110
116 45 84 121 112 101 58 32 97 112 112 108 105 99 97 116 105 111 110 47 112
100 102 59 13 10 32 110 97 109 101 61 34 106 117 114 105 115 45 104 97 114 116
109 97 110 105 115 45 103 111 100 101 108 45 118 111 110 45 110 101 117 109 97
110 110 45 49 57 53 54 45 56 57 45 57 57 52 46 112 100 102 34 13 10 67 111 110
116 101 110 116 45 68 105 115 112 111 115 105 116 105 111 110 58 32 97 116 116
97 99 104 109 101 110 116 59 13 10 32 102 105 108 101 110 97 109 101 61 34 106
117 114 105 115 45 104 97 114 116 109 97 110 105 115 45 103 111 100 101 108 45
118 111 110 45 110 101 117 109 97 110 110 45 49 57 53 54 45 56 57 45 57 57 52
46 112 100 102 34 13 10 67 111 110 116 101 110 116 45 84 114 97 110 115 102
101 114 45 69 110 99 111 100 105 110 103 58 32 98 97 115 101 54 52 13 10 13 10
74 86 66 69 82 105 48 120 76 106 73 75 101 107 100 102 49 102 110 102 83 113
81 89 116 55 65 106 99 122 89 102 112 109 82 83 73 69 121 69 99 120 56 75 77
83 65 119 73 71 57 105 97 103 111 56 80 65 111 118 86 72 108 119 90 83 65 118
81 50 70 48 13 10 89 87 120 118 90 119 111 118 85 71 70 110 90 88 77 103 77
121 65 119 73 70 73 75 76 48 57 49 100 71 120 112 98 109 86 122 73 68 73 103
77 67 66 83 67 106 52 43 67 109 86 117 90 71 57 105 97 103 111 121 73 68 65
103 98 50 74 113 67 106 119 56 13 10 67 105 57 85 101 88 66 108 73 67 57 80
100 88 82 115 97 87 53 108 99 119 111 118 81 50 57 49 98 110 81 103 77 84 65
75 76 48 90 112 99 110 78 48 73 68 77 49 73 68 65 103 85 103 111 118 84 71 70
122 100 67 65 48 78 67 65 119 73 70 73 75 13 10 80 106 52 75 90 87 53 107 98
50 74 113 67 106 77 103 77 67 66 118 89 109 111)

Re: why is this procedure is so slow?

<pTn4Nk52tBdYJaZ43@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17980&group=comp.lang.lisp#17980

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Thu, 8 Feb 2024 18:26:39 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 121
Message-ID: <pTn4Nk52tBdYJaZ43@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 8 Feb 2024 18:26:39 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2f52725f7ab632db4d1225907b110523";
logging-data="2197544"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18shkPhkxNqDcoSfDp1lqx6"
Cancel-Lock: sha1:s/KV5I4cPYShEUfoSCfHuW1ViZk=
In-Reply-To: <87sf22ykaw.fsf@yaxenu.org>
X-Server-Commands: nowebcancel
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Thu, 8 Feb 2024 18:26 UTC

On Thu, 08 Feb 2024 13:47:51 -0300
Julieta Shem <jshem@yaxenu.org> wrote:
> First I define x to be a list of bytes of an NNTP article.
>
> (defvar x (fetch-article "local.test" "28"))
>
> Now x is a list of integers, the bytes of the article 28, which is a
> MIME message containing a PDF of 610 KiB.
>
> Now I want to split it line by line. I am able to do it, but it's very
> slow.
>
> * (time (length (split-sequence (list 13 10) x nil)))
> Evaluation took:
> 65.710 seconds of real time
> 65.671875 seconds of total run time (47.093750 user, 18.578125 system)
> [ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
> 99.94% CPU
> 170,322,749,168 processor cycles
> 79,439,358,864 bytes consed
>
> 11585
> *

For questions about speed , it is good if you specify which version of
which implementation you are using. It seems to be SBCL.

Note also that

(length (split-sequence (list 13 10) x nil))

has to traverse the list returned by split-sequence so your measuring of
time also includes that as opposed to just the time split-sequence
itself takes. I don't think it contributes much here but something to keep in
mind.

> Less than 12k lines. The number 79,439,358,864 of bytes is way more
> than I would have expected for anything regarding this invocation above,
> but I have no idea what this number really measures here. I appreciate
> any help. Thank you.
>
> Here's the procedure:
>
> (defun split-sequence (delim ls acc &key limit (so-far 1))

You say above "x is a list of integers" .You should have been using a
vector. On every recursive call to split-sequence your code does

(length ls) which means traversing the whole of the remaining list. This
already ensures quadratic execution time [for your task , it should be
linear] on top of the quadratic execution time for the unrelated reason I
point out below.

> (let* ((len (length ls))
> (delim delim)
What's the purpose of this last line ? delim remains constant over the
whole execution , no ?

> (pos (search delim ls))
> (n-take (or pos len))
> (n-drop (if pos
> (+ n-take (length delim))
delim remains constant over the whole execution , no ? So no reason to
calculate again and again its length.

> n-take)))
> (cond ((zerop len) acc)
> ((and limit (= so-far limit)) (list ls))
> (t (split-sequence
> delim (drop n-drop ls)
> (cons (take n-take ls) acc)
> :limit limit
> :so-far (1+ so-far))))))
>
> (defun take (n seq) (subseq seq 0 n))
> (defun drop (n seq) (subseq seq n))
>
> (*) A sample of the article
>
> All lines in it should be pretty short.

The
delim (drop n-drop ls)
(cons (take n-take ls) acc)

part copies the whole of ls so you get quadratic performance. Lets do some
calculations. Lets assume that every line is 80 bytes long including delim
and that delim is 2 bytes.

610 KiB = 624640 bytes. So over the whole execution you copy

624640-2 + (624640-80)-2 + (624640-2*80)-2 + (624640-3*80)-2 + ...
+ (624640-7807*80)-2 = 2,438,891,264 bytes. That's still quite a distance from
what SBCL gives but basically quadratic performance is no good. Note that
this copying would happen even if you were using a vector to store the
article. The fact that you use a list , makes things worse.

So then , use an auxiliary function :

(defun split-sequence (delim ls acc &key limit (so-far 1)
&aux (len (length ls)) (len-del (length delim)))
(flet ((split-sequence-aux (start) ...)))
......
)

..It is that which should be called recursively. ls , delim , etc. do not
need local copies in every recursive invocation.

I said above that ls should be a vector and obviously the same goes for
delim .I don't know what the rest of your code does but I suspect that the
lines returned should also go into a vector rather than a list. And ,
depending on what the rest of your code does , it is possible that you don't
need to copy lines but rather return a sequence of positions in the vector
ls where the line breaks happen.

Ideally , abandon the recursion too and rewrite the whole thing as a loop.

--
I don't understand why there are so many unsolved crimes. It seemed
like every time I did something wrong, it was solved overnight.
Joe Mammana

Re: why is this procedure is so slow?

<m24jeiyeb1.fsf@MacBook-Pro-2.home>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17981&group=comp.lang.lisp#17981

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rwiker@gmail.com (Raymond Wiker)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Thu, 08 Feb 2024 19:57:22 +0100
Organization: A noiseless patient Spider
Lines: 93
Message-ID: <m24jeiyeb1.fsf@MacBook-Pro-2.home>
References: <87sf22ykaw.fsf@yaxenu.org>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="2a3b340885d03ff93907eb91e52df5ee";
logging-data="2207916"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/oaUyNIrH91E9N9Z7IzZ8Y"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:oofJqs5NdJ9oK6BdPK+ODYxqNJg=
sha1:b6Y+n9HkUrZagRCTlXhl+F6fjKA=
 by: Raymond Wiker - Thu, 8 Feb 2024 18:57 UTC

Julieta Shem <jshem@yaxenu.org> writes:

> First I define x to be a list of bytes of an NNTP article.
>
> (defvar x (fetch-article "local.test" "28"))
>
> Now x is a list of integers, the bytes of the article 28, which is a
> MIME message containing a PDF of 610 KiB.
>
> Now I want to split it line by line. I am able to do it, but it's very
> slow.
>
> * (time (length (split-sequence (list 13 10) x nil)))
> Evaluation took:
> 65.710 seconds of real time
> 65.671875 seconds of total run time (47.093750 user, 18.578125 system)
> [ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
> 99.94% CPU
> 170,322,749,168 processor cycles
> 79,439,358,864 bytes consed
>
> 11585
> *
>
> Less than 12k lines. The number 79,439,358,864 of bytes is way more
> than I would have expected for anything regarding this invocation above,
> but I have no idea what this number really measures here. I appreciate
> any help. Thank you.
>
> Here's the procedure:
>
> (defun split-sequence (delim ls acc &key limit (so-far 1))
> (let* ((len (length ls))
> (delim delim)
> (pos (search delim ls))
> (n-take (or pos len))
> (n-drop (if pos
> (+ n-take (length delim))
> n-take)))
> (cond ((zerop len) acc)
> ((and limit (= so-far limit)) (list ls))
> (t (split-sequence
> delim (drop n-drop ls)
> (cons (take n-take ls) acc)
> :limit limit
> :so-far (1+ so-far))))))
>
> (defun take (n seq) (subseq seq 0 n))
> (defun drop (n seq) (subseq seq n))

It is slow because

* You're repeatedly calling #'length on whatever is left of the
list. Computing the length of a list is an O(n) operation; repeatedly
calling #'length on the remainder of the list is then O(n^2).

* You are generating new lists for both the initial segments and the
rest of the list. In order to generate 12k lines you have to generate
lists containing the lines, but also 12 lists containing the remainder
(at various points) of the input.

* Each character of the input requires a cons cell, which is probably 16
bytes (assuming a 64-bit Lisp implementation). So, your input now
requires 610000*16, so about 9.76MB.

* The remainder lists will consume about 0.5*610000*16*12000, so about
55GB. This is not too far from the 74GB that time reports. (My
calculations are probably off, but clearly not _too_ far off.)

* I'm going to assume that 74GB is going to be a significant chunk of
the available memory in your computer. That means that there will be
significant garbage-collection activity, and you may also end up
swapping to disk.

If you used strings (or byte arrays instead), you would

* Require much less memory.

* Computing #'length for a string or array is O(1), so the overall complexity
becomes O(n).

* You don't have to create new arrays containing the successive
remainders; instead, you can use the :start keyword to #'search.

* Also, if you used strings, you could just create a string stream from
the input string and repeatedly call #'read-line to collect the
lines. The entire program would be something like this (untested):

(defun split-article (article-as-string)
(with-input-from-string (str article-as-string)
(loop for line = (read-line str nil)
while line
collect line)))

Re: why is this procedure is so slow?

<20240208111205.216@kylheku.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17982&group=comp.lang.lisp#17982

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: 433-929-6894@kylheku.com (Kaz Kylheku)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Thu, 8 Feb 2024 19:24:57 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 47
Message-ID: <20240208111205.216@kylheku.com>
References: <87sf22ykaw.fsf@yaxenu.org>
Injection-Date: Thu, 8 Feb 2024 19:24:57 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="163263dc92baed57fd9aed84cf1a8c15";
logging-data="2217553"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/7/dbpisHB67XEn65dsMQhy2fWvXF5kes="
User-Agent: slrn/pre1.0.4-9 (Linux)
Cancel-Lock: sha1:1dkk/+YooflB2ONg0zL5m2mLBRM=
 by: Kaz Kylheku - Thu, 8 Feb 2024 19:24 UTC

On 2024-02-08, Julieta Shem <jshem@yaxenu.org> wrote:
> First I define x to be a list of bytes of an NNTP article.
>
> (defvar x (fetch-article "local.test" "28"))
>
> Now x is a list of integers, the bytes of the article 28, which is a
> MIME message containing a PDF of 610 KiB.
>
> Now I want to split it line by line. I am able to do it, but it's very
> slow.
>
> * (time (length (split-sequence (list 13 10) x nil)))
> Evaluation took:
> 65.710 seconds of real time

Your code has O(N^2) behavior because each recursive call performs
operations that take a time proportional to the length of input
list, such as taking its length.

The operation (subseq list n) is required to produce a copy of the
cell structure. You can skip n elements of a list with (nthcdr n list)
without allocating memory. It still requires traversing the list
from the beginning to find the n-th cell.

This kind of function is best implemented separately for lists
and vectors.

To develop it reasonably efficiently in a generic way, what we can do is
have some sort of abstract iteration primitives first: a way to get an
iterator object over any kind of sequence, and then step it. The
splitting is then done as a state machine. Getting successive items,
collecting them and watching for the delimiter, collecting the items
into sequences of the same kind as the input.

To make it better, you'd really want to specialize this. If the input
sequence is a list, dispatch code that does efficient list processing
with minimal consing, and no redundant walking over the list (single
pass only).

Otherwise if the sequence is a vector-like sequence, then use subseq.
But don't wastefully calculate the subseq of the rest of the list.
Just extract the splits.

--
TXR Programming Language: http://nongnu.org/txr
Cygnal: Cygwin Native Application Library: http://kylheku.com/cygnal
Mastodon: @Kazinator@mstdn.ca

Re: why is this procedure is so slow?

<vkYf5CCW9ZWpdZWsy@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17983&group=comp.lang.lisp#17983

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Thu, 8 Feb 2024 19:30:52 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 16
Message-ID: <vkYf5CCW9ZWpdZWsy@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 8 Feb 2024 19:30:52 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="2f52725f7ab632db4d1225907b110523";
logging-data="2219591"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX190RH2qLKP8J/c8UhsuLlfe"
Cancel-Lock: sha1:LY/s4nkE6nmFGS+uh54arp5RVtM=
X-Organisation: Weyland-Yutani
In-Reply-To: <m24jeiyeb1.fsf@MacBook-Pro-2.home>
X-Server-Commands: nowebcancel
 by: Spiros Bousbouras - Thu, 8 Feb 2024 19:30 UTC

On Thu, 08 Feb 2024 19:57:22 +0100
Raymond Wiker <rwiker@gmail.com> wrote:
> * Also, if you used strings, you could just create a string stream from
> the input string and repeatedly call #'read-line to collect the
> lines. The entire program would be something like this (untested):
>
> (defun split-article (article-as-string)
> (with-input-from-string (str article-as-string)
> (loop for line = (read-line str nil)
> while line
> collect line)))

READ-LINE will use whatever sequence of bytes the host operating system
regards as terminating lines whereas Julieta wants what the RFCs for NNTP
articles consider end of lines and that is the sequence CRLF .The 2 sequences
may coincide or they may not.

Re: why is this procedure is so slow?

<ULZx09i+e0unLPVVX@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17984&group=comp.lang.lisp#17984

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Thu, 8 Feb 2024 19:47:07 -0000 (UTC)
Organization: To protect and to server
Message-ID: <ULZx09i+e0unLPVVX@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home> <vkYf5CCW9ZWpdZWsy@bongo-ra.co>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 8 Feb 2024 19:47:07 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="2838440"; posting-host="9H7U5kayiTdk7VIdYU44Rw.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
Cancel-Lock: sha256:XdSnO/QV29jYzwqv1h8tUmanojAEzXW9mBbjQrB7GxQ=
X-Server-Commands: nowebcancel
X-Notice: Filtered by postfilter v. 0.9.3
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Thu, 8 Feb 2024 19:47 UTC

On Thu, 8 Feb 2024 19:30:52 -0000 (UTC)
Spiros Bousbouras <spibou@gmail.com> wrote:
> On Thu, 08 Feb 2024 19:57:22 +0100
> Raymond Wiker <rwiker@gmail.com> wrote:
> > * Also, if you used strings, you could just create a string stream from
> > the input string and repeatedly call #'read-line to collect the
> > lines. The entire program would be something like this (untested):
> >
> > (defun split-article (article-as-string)
> > (with-input-from-string (str article-as-string)
> > (loop for line = (read-line str nil)
> > while line
> > collect line)))
>
> READ-LINE will use whatever sequence of bytes the host operating system
> regards as terminating lines whereas Julieta wants what the RFCs for NNTP
> articles consider end of lines and that is the sequence CRLF .The 2 sequences
> may coincide or they may not.

And actually a string object is no good for reading stuff off the internet ,
individual elements must be octets therefore integers [or an appropriate
subset of the Common Lisp integer type].

Re: why is this procedure is so slow?

<87v86xpy8a.fsf@yaxenu.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17985&group=comp.lang.lisp#17985

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jshem@yaxenu.org (Julieta Shem)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Fri, 09 Feb 2024 10:24:05 -0300
Organization: A noiseless patient Spider
Lines: 170
Message-ID: <87v86xpy8a.fsf@yaxenu.org>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="552d2bdf92c1545696ec9c1cd7c94b6e";
logging-data="2784461"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19r4zHjTZvgLh8b4fOJffVcLSXMTPTl9WM="
Cancel-Lock: sha1:f3UQq1JW79evfYg3sQgTswBUezs=
sha1:eDRynhLwr9iN1YLPzxusJn1HcJE=
 by: Julieta Shem - Fri, 9 Feb 2024 13:24 UTC

I'd like to thank everyone in this thread. Valuable help. Thank you.

Raymond Wiker <rwiker@gmail.com> writes:

> Julieta Shem <jshem@yaxenu.org> writes:
>
>> First I define x to be a list of bytes of an NNTP article.
>>
>> (defvar x (fetch-article "local.test" "28"))
>>
>> Now x is a list of integers, the bytes of the article 28, which is a
>> MIME message containing a PDF of 610 KiB.
>>
>> Now I want to split it line by line. I am able to do it, but it's very
>> slow.
>>
>> * (time (length (split-sequence (list 13 10) x nil)))
>> Evaluation took:
>> 65.710 seconds of real time
>> 65.671875 seconds of total run time (47.093750 user, 18.578125 system)
>> [ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
>> 99.94% CPU
>> 170,322,749,168 processor cycles
>> 79,439,358,864 bytes consed
>>
>> 11585
>> *
>>
>> Less than 12k lines. The number 79,439,358,864 of bytes is way more
>> than I would have expected for anything regarding this invocation above,
>> but I have no idea what this number really measures here. I appreciate
>> any help. Thank you.
>>
>> Here's the procedure:
>>
>> (defun split-sequence (delim ls acc &key limit (so-far 1))
>> (let* ((len (length ls))
>> (delim delim)
>> (pos (search delim ls))
>> (n-take (or pos len))
>> (n-drop (if pos
>> (+ n-take (length delim))
>> n-take)))
>> (cond ((zerop len) acc)
>> ((and limit (= so-far limit)) (list ls))
>> (t (split-sequence
>> delim (drop n-drop ls)
>> (cons (take n-take ls) acc)
>> :limit limit
>> :so-far (1+ so-far))))))
>>
>> (defun take (n seq) (subseq seq 0 n))
>> (defun drop (n seq) (subseq seq n))
>
> It is slow because
>
> * You're repeatedly calling #'length on whatever is left of the
> list. Computing the length of a list is an O(n) operation; repeatedly
> calling #'length on the remainder of the list is then O(n^2).
>
> * You are generating new lists for both the initial segments and the
> rest of the list. In order to generate 12k lines you have to generate
> lists containing the lines, but also 12 lists containing the remainder
> (at various points) of the input.

That's the take and drop, I think.

> * Each character of the input requires a cons cell, which is probably 16
> bytes (assuming a 64-bit Lisp implementation). So, your input now
> requires 610000*16, so about 9.76MB.

So that's one of the advantages of using a vector, I think.

> * The remainder lists will consume about 0.5*610000*16*12000, so about
> 55GB. This is not too far from the 74GB that time reports. (My
> calculations are probably off, but clearly not _too_ far off.)

Impressive.

> * I'm going to assume that 74GB is going to be a significant chunk of
> the available memory in your computer. That means that there will be
> significant garbage-collection activity, and you may also end up
> swapping to disk.

What do you mean by ``available memory;'''? I have 24 GiB of RAM---23.9
GiB usable, says Windows. I don't know how to show this right now, but
I think I don't have any swap space (on Windows). I tried to turn it
off some time ago I think I succeeded.

> If you used strings (or byte arrays instead), you would
>
> * Require much less memory.
>
> * Computing #'length for a string or array is O(1), so the overall complexity
> becomes O(n).

Because strings and arrays store their size somewhere in their data
structure, I suppose, and lists do not.

> * You don't have to create new arrays containing the successive
> remainders; instead, you can use the :start keyword to #'search.

I tried this. Here's how I rewrote it. First some user interface that
gets the length just once, even though now it's probably okay to call
length multiple times. (The /acc/umulator should be probably be
optional here, but I was afraid of mixing &optional with &key. This is
my first Common Lisp program.)

(defun split-vector (delim v acc &key limit (so-far 1))
(let ((len (length v)))
(split-vector-helper delim v len acc limit so-far 0)))

Now, find the position of /delim/ between /start/ and /len/. If you
can't find it, we're done. If we've reached the limit of lines the
caller requested, we're done. Now we found /delim/: copy the slice of
the vector that goes from /start/ to the /pos/ition we found /delim/;
save it in the accumulator. Update /start/ we along over the vector.
Now repeat everything.

(defun split-vector-helper (delim v len acc limit so-far start)
(if (zerop len)
acc
(let ((pos (search delim v :start2 start :end2 len)))
(cond ((or (not pos) (and limit (= so-far limit)))
(nreverse (cons (subseq v start len) acc)))
(t (split-vector-helper delim
v
len
(cons (subseq v start (or pos len)) acc)
limit
(1+ so-far)
(+ pos (length delim))))))))

The result is better:

* (time (length (split-vector (vector 13 10) x nil)))
Evaluation took:
0.011 seconds of real time
0.000000 seconds of total run time (0.000000 user, 0.000000 system)
0.00% CPU
30,857,378 processor cycles
7,000,064 bytes consed

11576

(Difficult to understand 0.011 real time atgainst ``total run time''.)

(I call /length/ so I won't get the output of split-vector, which is
long.)

Question. When split-vector-helper is repeated, will the arguments be
copied over to the new invokation, or will SBCL somehow pass on
pointers?

I haven't studied about the /loop/ macro yet.

> * Also, if you used strings, you could just create a string stream from
> the input string and repeatedly call #'read-line to collect the
> lines. The entire program would be something like this (untested):
>
> (defun split-article (article-as-string)
> (with-input-from-string (str article-as-string)
> (loop for line = (read-line str nil)
> while line
> collect line)))

As Spiros Bousbouras remarked in the thread, I'd prefer not to try to
understand article content because the user has the right to use
whatever encoding they please in NNTP articles. So I've been trying
handle them as byte sequences.

Re: why is this procedure is so slow?

<m2a5o9h1bw.fsf@MacBook-Pro-2.home>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17986&group=comp.lang.lisp#17986

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: rwiker@gmail.com (Raymond Wiker)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Fri, 09 Feb 2024 20:41:55 +0100
Organization: A noiseless patient Spider
Lines: 238
Message-ID: <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="23534c70185b2087d28c21f07f658813";
logging-data="2920358"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/3zspmnDUFIIkAB20AkIWl"
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:w/wSh+iW149qHq8z0LRCIawO270=
sha1:uU1arE0vS0JliQWplMOiyxvXzm0=
 by: Raymond Wiker - Fri, 9 Feb 2024 19:41 UTC

Julieta Shem <jshem@yaxenu.org> writes:

> I'd like to thank everyone in this thread. Valuable help. Thank you.
>
> Raymond Wiker <rwiker@gmail.com> writes:
>
>> Julieta Shem <jshem@yaxenu.org> writes:
>>
>>> First I define x to be a list of bytes of an NNTP article.
>>>
>>> (defvar x (fetch-article "local.test" "28"))
>>>
>>> Now x is a list of integers, the bytes of the article 28, which is a
>>> MIME message containing a PDF of 610 KiB.
>>>
>>> Now I want to split it line by line. I am able to do it, but it's very
>>> slow.
>>>
>>> * (time (length (split-sequence (list 13 10) x nil)))
>>> Evaluation took:
>>> 65.710 seconds of real time
>>> 65.671875 seconds of total run time (47.093750 user, 18.578125 system)
>>> [ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
>>> 99.94% CPU
>>> 170,322,749,168 processor cycles
>>> 79,439,358,864 bytes consed
>>>
>>> 11585
>>> *
>>>
>>> Less than 12k lines. The number 79,439,358,864 of bytes is way more
>>> than I would have expected for anything regarding this invocation above,
>>> but I have no idea what this number really measures here. I appreciate
>>> any help. Thank you.
>>>
>>> Here's the procedure:
>>>
>>> (defun split-sequence (delim ls acc &key limit (so-far 1))
>>> (let* ((len (length ls))
>>> (delim delim)
>>> (pos (search delim ls))
>>> (n-take (or pos len))
>>> (n-drop (if pos
>>> (+ n-take (length delim))
>>> n-take)))
>>> (cond ((zerop len) acc)
>>> ((and limit (= so-far limit)) (list ls))
>>> (t (split-sequence
>>> delim (drop n-drop ls)
>>> (cons (take n-take ls) acc)
>>> :limit limit
>>> :so-far (1+ so-far))))))
>>>
>>> (defun take (n seq) (subseq seq 0 n))
>>> (defun drop (n seq) (subseq seq n))
>>
>> It is slow because
>>
>> * You're repeatedly calling #'length on whatever is left of the
>> list. Computing the length of a list is an O(n) operation; repeatedly
>> calling #'length on the remainder of the list is then O(n^2).
>>
>> * You are generating new lists for both the initial segments and the
>> rest of the list. In order to generate 12k lines you have to generate
>> lists containing the lines, but also 12 lists containing the remainder
>> (at various points) of the input.
>
> That's the take and drop, I think.

Yup.

>
>> * Each character of the input requires a cons cell, which is probably 16
>> bytes (assuming a 64-bit Lisp implementation). So, your input now
>> requires 610000*16, so about 9.76MB.
>
> So that's one of the advantages of using a vector, I think.

Right. The other is that accessing any element in an array by position
(index) is O(1); for a list, it's O(n).

>> * The remainder lists will consume about 0.5*610000*16*12000, so about
>> 55GB. This is not too far from the 74GB that time reports. (My
>> calculations are probably off, but clearly not _too_ far off.)
>
> Impressive.

Maybe... but it also appears to be wrong :-)

>> * I'm going to assume that 74GB is going to be a significant chunk of
>> the available memory in your computer. That means that there will be
>> significant garbage-collection activity, and you may also end up
>> swapping to disk.
>
> What do you mean by ``available memory;'''? I have 24 GiB of RAM---23.9
> GiB usable, says Windows. I don't know how to show this right now, but
> I think I don't have any swap space (on Windows). I tried to turn it
> off some time ago I think I succeeded.

What implementation of Common Lisp are you using? Is it a 32-bit or
64-bit version?

>> If you used strings (or byte arrays instead), you would
>>
>> * Require much less memory.
>>
>> * Computing #'length for a string or array is O(1), so the overall complexity
>> becomes O(n).
>
> Because strings and arrays store their size somewhere in their data
> structure, I suppose, and lists do not.

Right. Lists are literally composed of objects that contain two values
(cons cells), where the second value is a pointer to the next cons cell.

>> * You don't have to create new arrays containing the successive
>> remainders; instead, you can use the :start keyword to #'search.
>
> I tried this. Here's how I rewrote it. First some user interface that
> gets the length just once, even though now it's probably okay to call
> length multiple times. (The /acc/umulator should be probably be
> optional here, but I was afraid of mixing &optional with &key. This is
> my first Common Lisp program.)
>
> (defun split-vector (delim v acc &key limit (so-far 1))
> (let ((len (length v)))
> (split-vector-helper delim v len acc limit so-far 0)))
>
> Now, find the position of /delim/ between /start/ and /len/. If you
> can't find it, we're done. If we've reached the limit of lines the
> caller requested, we're done. Now we found /delim/: copy the slice of
> the vector that goes from /start/ to the /pos/ition we found /delim/;
> save it in the accumulator. Update /start/ we along over the vector.
> Now repeat everything.
>
> (defun split-vector-helper (delim v len acc limit so-far start)
> (if (zerop len)
> acc
> (let ((pos (search delim v :start2 start :end2 len)))
> (cond ((or (not pos) (and limit (= so-far limit)))
> (nreverse (cons (subseq v start len) acc)))
> (t (split-vector-helper delim
> v
> len
> (cons (subseq v start (or pos len)) acc)
> limit
> (1+ so-far)
> (+ pos (length delim))))))))
>
> The result is better:
>
> * (time (length (split-vector (vector 13 10) x nil)))
> Evaluation took:
> 0.011 seconds of real time
> 0.000000 seconds of total run time (0.000000 user, 0.000000 system)
> 0.00% CPU
> 30,857,378 processor cycles
> 7,000,064 bytes consed
>
> 11576
>
> (Difficult to understand 0.011 real time atgainst ``total run time''.)

That's a good second implementation, and a very nice improvement in run
time and memory use!

Note that you do not actually need the "len" parameter: if you don't
specify :end2 for #'search or the end position for #'subseq, they will
assume the value nil, which means the end of the sequence.

"real time" is the actual time taken for the operation, while "total
time" is the time that the Lisp runtime will admit to. So, basically,
the time taken to execute your new implementation is so small that it
cannot be measured by the Lisp runtime.

> (I call /length/ so I won't get the output of split-vector, which is
> long.)

The canonical way of doing this in Common Lisp would be to store the
value in a variable, instead, and return either nil or or no values at
all.

Something like

(time (defparameter *lines* (split-vector (vector 13 10) x nil)))

or

(time
(progn
(setf *lines* (split-vector (vector 13 10) x nil))
(values)))

> Question. When split-vector-helper is repeated, will the arguments be
> copied over to the new invokation, or will SBCL somehow pass on
> pointers?

Common Lisp in general does not copy complex structures unless
explicitly asked to. So, a list is passed around as a pointer to the
first cons cell, and an array is passed as a pointer to a structure that
defines the shape and size of the array. The pointers may also include
information in the least significant bits that describe what the
pointers point to.

Certain types of values (e.g, characters and a subset of integers) are
passed directly; the least significant bits of the value then indicate
that there is an integer or a character encoded in the rest of the
value.

>
> I haven't studied about the /loop/ macro yet.

Some people abhor loop, while others find it strangely attractive (or
even addictive).

>> * Also, if you used strings, you could just create a string stream from
>> the input string and repeatedly call #'read-line to collect the
>> lines. The entire program would be something like this (untested):
>>
>> (defun split-article (article-as-string)
>> (with-input-from-string (str article-as-string)
>> (loop for line = (read-line str nil)
>> while line
>> collect line)))
>
> As Spiros Bousbouras remarked in the thread, I'd prefer not to try to
> understand article content because the user has the right to use
> whatever encoding they please in NNTP articles. So I've been trying
> handle them as byte sequences.


Click here to read the complete article
Re: why is this procedure is so slow?

<ogD0Mv+vfv1CrR+QE@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17988&group=comp.lang.lisp#17988

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!news.hispagatos.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Fri, 9 Feb 2024 20:46:43 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 55
Message-ID: <ogD0Mv+vfv1CrR+QE@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home> <87v86xpy8a.fsf@yaxenu.org>
<m2a5o9h1bw.fsf@MacBook-Pro-2.home>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 9 Feb 2024 20:46:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="ec9598d95134c279cbf9337608deb2a7";
logging-data="2942042"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Lzz3U8aubfYKYGWxxoDFg"
Cancel-Lock: sha1:XSentLTyZKWbhoZEum9wYIhx/jo=
In-Reply-To: <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
X-Server-Commands: nowebcancel
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Fri, 9 Feb 2024 20:46 UTC

On Fri, 09 Feb 2024 20:41:55 +0100
Raymond Wiker <rwiker@gmail.com> wrote:
> Julieta Shem <jshem@yaxenu.org> writes:
> > Question. When split-vector-helper is repeated, will the arguments be
> > copied over to the new invokation, or will SBCL somehow pass on
> > pointers?
>
> Common Lisp in general does not copy complex structures unless
> explicitly asked to. So, a list is passed around as a pointer to the
> first cons cell, and an array is passed as a pointer to a structure that
> defines the shape and size of the array. The pointers may also include
> information in the least significant bits that describe what the
> pointers point to.
>
> Certain types of values (e.g, characters and a subset of integers) are
> passed directly; the least significant bits of the value then indicate
> that there is an integer or a character encoded in the rest of the
> value.

What you say about integers can be interpreted to mean that if you pass as an
argument to a function an integer which does not belong to this subset and
the fuction modifies the integer , it will be modified in the caller too.
This will not happen. But internally a Common Lisp implementation may store
fixnums "unboxed" and bignums boxed.

> > I haven't studied about the /loop/ macro yet.
>
> Some people abhor loop, while others find it strangely attractive (or
> even addictive).

For turning Julieta's recursion into a loop , DO will do just fine.

> > As Spiros Bousbouras remarked in the thread, I'd prefer not to try to
> > understand article content because the user has the right to use
> > whatever encoding they please in NNTP articles. So I've been trying
> > handle them as byte sequences.
>
> At some point you will probably have to encode the article as
> characters. In order to do so, you will have to parse the article
> headers to find what encoding is used. I'm not convinced that this is
> possible, but if you have a 16-bit character encoding, you may have an
> octet 13 from one character, and an octet 10 from the following
> character.

I think the code by Julieta is for before any decoding is done to the article
in which case CR and LF can only appear as CRLF .After decoding , anything
goes for an appropriate Content-Type and encoding. But from other posts I
think Julieta is writing an NNTP server and I doubt that a server will ever
have to do decoding of the body. A server should check that the syntax
restrictions for header and body are satisfied but otherwise may not be
concerned with the content of the body. If it also wants to do spam filtering
then yes , it will have to decode the body.

--
vlaho.ninja/menu

Re: why is this procedure is so slow?

<oFTEKHKSwjuoXLNeu@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17989&group=comp.lang.lisp#17989

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!news.hispagatos.org!news.nntp4.net!paganini.bofh.team!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Fri, 9 Feb 2024 21:13:18 -0000 (UTC)
Organization: To protect and to server
Message-ID: <oFTEKHKSwjuoXLNeu@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home> <87v86xpy8a.fsf@yaxenu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Fri, 9 Feb 2024 21:13:18 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="3127231"; posting-host="9H7U5kayiTdk7VIdYU44Rw.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
Cancel-Lock: sha256:KD+hrxDiNF+zXpobv0si+hlnjuxbGG/EUBhnMyTYiVQ=
X-Server-Commands: nowebcancel
X-Notice: Filtered by postfilter v. 0.9.3
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Fri, 9 Feb 2024 21:13 UTC

On Fri, 09 Feb 2024 10:24:05 -0300
Julieta Shem <jshem@yaxenu.org> wrote:

I don't know if you have learned declaration syntax yet but some declarations
would help make the code clearer and help the compiler do optimisations or
catch errors. For example

(declaim (ftype (function (vector vector list
&key (:limit (or null integer)) (:so-far integer))
list)
split-vector))

> (defun split-vector (delim v acc &key limit (so-far 1))
> (let ((len (length v)))
> (split-vector-helper delim v len acc limit so-far 0)))
>
> Now, find the position of /delim/ between /start/ and /len/. If you
> can't find it, we're done. If we've reached the limit of lines the
> caller requested, we're done. Now we found /delim/: copy the slice of
> the vector that goes from /start/ to the /pos/ition we found /delim/;
> save it in the accumulator. Update /start/ we along over the vector.
> Now repeat everything.
>
> (defun split-vector-helper (delim v len acc limit so-far start)
> (if (zerop len)
> acc
> (let ((pos (search delim v :start2 start :end2 len)))
> (cond ((or (not pos) (and limit (= so-far limit)))
> (nreverse (cons (subseq v start len) acc)))
> (t (split-vector-helper delim
> v
> len
> (cons (subseq v start (or pos len)) acc)
^^^^^^^^^^
If this branch is taken , you already know that pos is non NIL .

> limit
> (1+ so-far)
> (+ pos (length delim))))))))

> Question. When split-vector-helper is repeated, will the arguments be
> copied over to the new invokation, or will SBCL somehow pass on
> pointers?

This has been answered in a different post but you can also write the
helper function within the body of split-vector in which case the
helper will inherit the surrounding lexical environment so you won't
need to pass so many arguments to it.

--
Ladies and Gentlemen, this is your Captain speaking. We have a small
problem. All four engines have stopped. We are doing our damnedest
to get it under control. I trust you are not in too much distress.
http://en.wikipedia.org/wiki/British_Airways_Flight_9

Re: why is this procedure is so slow?

<871q9kjsgj.fsf@yaxenu.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17990&group=comp.lang.lisp#17990

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jshem@yaxenu.org (Julieta Shem)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sat, 10 Feb 2024 11:37:32 -0300
Organization: A noiseless patient Spider
Lines: 264
Message-ID: <871q9kjsgj.fsf@yaxenu.org>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="4e28b53434257e8ae1ee25e3f601f368";
logging-data="3378863"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/lQAgWzRVWvl6W+Jtoe87pZRt1LyeGWFE="
Cancel-Lock: sha1:yhqRAK/BHSMKtH4GKGRTEg9QYkA=
sha1:UGMPJyCbDUiFecDbH/m70ZzqjNM=
 by: Julieta Shem - Sat, 10 Feb 2024 14:37 UTC

Raymond Wiker <rwiker@gmail.com> writes:

> Julieta Shem <jshem@yaxenu.org> writes:
>
>> I'd like to thank everyone in this thread. Valuable help. Thank you.
>>
>> Raymond Wiker <rwiker@gmail.com> writes:
>>
>>> Julieta Shem <jshem@yaxenu.org> writes:
>>>
>>>> First I define x to be a list of bytes of an NNTP article.
>>>>
>>>> (defvar x (fetch-article "local.test" "28"))
>>>>
>>>> Now x is a list of integers, the bytes of the article 28, which is a
>>>> MIME message containing a PDF of 610 KiB.
>>>>
>>>> Now I want to split it line by line. I am able to do it, but it's very
>>>> slow.
>>>>
>>>> * (time (length (split-sequence (list 13 10) x nil)))
>>>> Evaluation took:
>>>> 65.710 seconds of real time
>>>> 65.671875 seconds of total run time (47.093750 user, 18.578125 system)
>>>> [ Run times consist of 23.968 seconds GC time, and 41.704 seconds non-GC time. ]
>>>> 99.94% CPU
>>>> 170,322,749,168 processor cycles
>>>> 79,439,358,864 bytes consed
>>>>
>>>> 11585
>>>> *
>>>>
>>>> Less than 12k lines. The number 79,439,358,864 of bytes is way more
>>>> than I would have expected for anything regarding this invocation above,
>>>> but I have no idea what this number really measures here. I appreciate
>>>> any help. Thank you.
>>>>
>>>> Here's the procedure:
>>>>
>>>> (defun split-sequence (delim ls acc &key limit (so-far 1))
>>>> (let* ((len (length ls))
>>>> (delim delim)
>>>> (pos (search delim ls))
>>>> (n-take (or pos len))
>>>> (n-drop (if pos
>>>> (+ n-take (length delim))
>>>> n-take)))
>>>> (cond ((zerop len) acc)
>>>> ((and limit (= so-far limit)) (list ls))
>>>> (t (split-sequence
>>>> delim (drop n-drop ls)
>>>> (cons (take n-take ls) acc)
>>>> :limit limit
>>>> :so-far (1+ so-far))))))
>>>>
>>>> (defun take (n seq) (subseq seq 0 n))
>>>> (defun drop (n seq) (subseq seq n))
>>>
>>> It is slow because
>>>
>>> * You're repeatedly calling #'length on whatever is left of the
>>> list. Computing the length of a list is an O(n) operation; repeatedly
>>> calling #'length on the remainder of the list is then O(n^2).
>>>
>>> * You are generating new lists for both the initial segments and the
>>> rest of the list. In order to generate 12k lines you have to generate
>>> lists containing the lines, but also 12 lists containing the remainder
>>> (at various points) of the input.
>>
>> That's the take and drop, I think.
>
> Yup.
>
>>> * Each character of the input requires a cons cell, which is probably 16
>>> bytes (assuming a 64-bit Lisp implementation). So, your input now
>>> requires 610000*16, so about 9.76MB.
>>
>> So that's one of the advantages of using a vector, I think.
>
> Right. The other is that accessing any element in an array by position
> (index) is O(1); for a list, it's O(n).
>
>>> * The remainder lists will consume about 0.5*610000*16*12000, so about
>>> 55GB. This is not too far from the 74GB that time reports. (My
>>> calculations are probably off, but clearly not _too_ far off.)
>>
>> Impressive.
>
> Maybe... but it also appears to be wrong :-)

That's okay. :-)

>>> * I'm going to assume that 74GB is going to be a significant chunk of
>>> the available memory in your computer. That means that there will be
>>> significant garbage-collection activity, and you may also end up
>>> swapping to disk.
>>
>> What do you mean by ``available memory;'''? I have 24 GiB of RAM---23.9
>> GiB usable, says Windows. I don't know how to show this right now, but
>> I think I don't have any swap space (on Windows). I tried to turn it
>> off some time ago I think I succeeded.
>
> What implementation of Common Lisp are you using? Is it a 32-bit or
> 64-bit version?

I believe it's a 64-bit version.

%file sbcl.exe
sbcl.exe: PE32+ executable (console) x86-64, for MS Windows

I got it from

https://www.sbcl.org/platform-table.html

and I downloaded the MSI-file from column AMD64, version 2.3.2.

%./sbcl.exe --version
SBCL 2.3.2

>>> If you used strings (or byte arrays instead), you would
>>>
>>> * Require much less memory.
>>>
>>> * Computing #'length for a string or array is O(1), so the overall complexity
>>> becomes O(n).
>>
>> Because strings and arrays store their size somewhere in their data
>> structure, I suppose, and lists do not.
>
> Right. Lists are literally composed of objects that contain two values
> (cons cells), where the second value is a pointer to the next cons cell.
>
>>> * You don't have to create new arrays containing the successive
>>> remainders; instead, you can use the :start keyword to #'search.
>>
>> I tried this. Here's how I rewrote it. First some user interface that
>> gets the length just once, even though now it's probably okay to call
>> length multiple times. (The /acc/umulator should be probably be
>> optional here, but I was afraid of mixing &optional with &key. This is
>> my first Common Lisp program.)
>>
>> (defun split-vector (delim v acc &key limit (so-far 1))
>> (let ((len (length v)))
>> (split-vector-helper delim v len acc limit so-far 0)))
>>
>> Now, find the position of /delim/ between /start/ and /len/. If you
>> can't find it, we're done. If we've reached the limit of lines the
>> caller requested, we're done. Now we found /delim/: copy the slice of
>> the vector that goes from /start/ to the /pos/ition we found /delim/;
>> save it in the accumulator. Update /start/ we along over the vector.
>> Now repeat everything.
>>
>> (defun split-vector-helper (delim v len acc limit so-far start)
>> (if (zerop len)
>> acc
>> (let ((pos (search delim v :start2 start :end2 len)))
>> (cond ((or (not pos) (and limit (= so-far limit)))
>> (nreverse (cons (subseq v start len) acc)))
>> (t (split-vector-helper delim
>> v
>> len
>> (cons (subseq v start (or pos len)) acc)
>> limit
>> (1+ so-far)
>> (+ pos (length delim))))))))
>>
>> The result is better:
>>
>> * (time (length (split-vector (vector 13 10) x nil)))
>> Evaluation took:
>> 0.011 seconds of real time
>> 0.000000 seconds of total run time (0.000000 user, 0.000000 system)
>> 0.00% CPU
>> 30,857,378 processor cycles
>> 7,000,064 bytes consed
>>
>> 11576
>>
>> (Difficult to understand 0.011 real time atgainst ``total run time''.)
>
> That's a good second implementation, and a very nice improvement in run
> time and memory use!

Thanks!

> Note that you do not actually need the "len" parameter: if you don't
> specify :end2 for #'search or the end position for #'subseq, they will
> assume the value nil, which means the end of the sequence.

Noted.

>> I haven't studied about the /loop/ macro yet.
>
> Some people abhor loop, while others find it strangely attractive (or
> even addictive).

People say loop is hard to understand. I have a feeling I won't like it
much: hard-to-understand things don't give us a feeling of control.

``The details of the interaction matter, ease of use matters, but I
want more than correct details, more than a system that is easy to
learn or to use: I want a system that is enjoyable to use. This is
an important, dominating design philosophy, easier to say than to
do. It implies developing systems that provide a strong sense of
understanding and control. This means tools that reveal their
underlying conceptual model and allow for interaction, tools that
emphasize comfort, ease, and pleasure of use [...]. A major factor
in this debate is the feeling of control that the user has over the
operations that are being performed. A `powerful,' `intelligent'
system can lead to the well documented problems of `overautomation,'
causing the user to be a passive observer of operations, no longer
in control of either what operations take place, or of how they are
done. On the other hand, systems that are not sufficiently powerful
or intelligent can leave too large a gap in the mappings from
intention to action execution and from system state to psychological
interpretation. The result is that operation and interpretation are
complex and difficult, and the user again feels out of control,
distanced from the system.'' -- ``User Centered System Design'',
chapter 3, ``cognitive engineering'', ``on the quality of
human-computer interaction'', pages 48--49, Donald A. Norman, CRC
Press, 1986, ISBN 0-89859-872-9.


Click here to read the complete article
Re: why is this procedure is so slow?

<87mss8idlr.fsf@yaxenu.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17991&group=comp.lang.lisp#17991

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jshem@yaxenu.org (Julieta Shem)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sat, 10 Feb 2024 11:43:44 -0300
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <87mss8idlr.fsf@yaxenu.org>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
<ogD0Mv+vfv1CrR+QE@bongo-ra.co>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="4e28b53434257e8ae1ee25e3f601f368";
logging-data="3378863"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+25bkMel+rU5BmfiW1PaIOkfZqiXVTSoE="
Cancel-Lock: sha1:sbAtyAYhzIfUO0CNsTyM4nA3rP0=
sha1:zpKtov8a+q1c1Xst8dAl0C9mUcw=
 by: Julieta Shem - Sat, 10 Feb 2024 14:43 UTC

Spiros Bousbouras <spibou@gmail.com> writes:

[...]

>> > I haven't studied about the /loop/ macro yet.
>>
>> Some people abhor loop, while others find it strangely attractive (or
>> even addictive).
>
> For turning Julieta's recursion into a loop , DO will do just fine.

Good to know. I haven't studied DO and DO* yet either. Lol. How am I
writing code over here? Perhaps you'd all be horrified. It's hard to
learn everything. It seems better to write with the little we have and
then rewrite it as the years go by. I did learn to use DOLIST.

[...]

> If it also wants to do spam filtering
> then yes , it will have to decode the body.

Spam should be the problem of a spam filter---not my problem. :)

Re: why is this procedure is so slow?

<877cjcidfo.fsf@yaxenu.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17992&group=comp.lang.lisp#17992

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jshem@yaxenu.org (Julieta Shem)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sat, 10 Feb 2024 11:47:23 -0300
Organization: A noiseless patient Spider
Lines: 58
Message-ID: <877cjcidfo.fsf@yaxenu.org>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <oFTEKHKSwjuoXLNeu@bongo-ra.co>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="4e28b53434257e8ae1ee25e3f601f368";
logging-data="3378863"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19UPvE/kHXJSd0brMnltIsSeXJEv1yXnoA="
Cancel-Lock: sha1:70Zo/2b3bvd3SMsDcoqvW6thA7Y=
sha1:QSsyj4epogiy49uAZMVqQJksXOQ=
 by: Julieta Shem - Sat, 10 Feb 2024 14:47 UTC

Spiros Bousbouras <spibou@gmail.com> writes:

> On Fri, 09 Feb 2024 10:24:05 -0300
> Julieta Shem <jshem@yaxenu.org> wrote:
>
> I don't know if you have learned declaration syntax yet but some declarations
> would help make the code clearer and help the compiler do optimisations or
> catch errors. For example
>
> (declaim (ftype (function (vector vector list
> &key (:limit (or null integer)) (:so-far integer))
> list)
> split-vector))

I have not. Very interesting. Sounds like ``gradual typing'', if
``gradual typing'' is what it sounds like. :)

>> (defun split-vector (delim v acc &key limit (so-far 1))
>> (let ((len (length v)))
>> (split-vector-helper delim v len acc limit so-far 0)))
>>
>> Now, find the position of /delim/ between /start/ and /len/. If you
>> can't find it, we're done. If we've reached the limit of lines the
>> caller requested, we're done. Now we found /delim/: copy the slice of
>> the vector that goes from /start/ to the /pos/ition we found /delim/;
>> save it in the accumulator. Update /start/ we along over the vector.
>> Now repeat everything.
>>
>> (defun split-vector-helper (delim v len acc limit so-far start)
>> (if (zerop len)
>> acc
>> (let ((pos (search delim v :start2 start :end2 len)))
>> (cond ((or (not pos) (and limit (= so-far limit)))
>> (nreverse (cons (subseq v start len) acc)))
>> (t (split-vector-helper delim
>> v
>> len
>> (cons (subseq v start (or pos len)) acc)
> ^^^^^^^^^^
> If this branch is taken , you already know that pos is non NIL .

Thanks!

>> limit
>> (1+ so-far)
>> (+ pos (length delim))))))))
>
>> Question. When split-vector-helper is repeated, will the arguments be
>> copied over to the new invokation, or will SBCL somehow pass on
>> pointers?
>
> This has been answered in a different post but you can also write the
> helper function within the body of split-vector in which case the
> helper will inherit the surrounding lexical environment so you won't
> need to pass so many arguments to it.

Sounds very useful. Your ``&aux'' made me think that's possible.
Great news.

Re: why is this procedure is so slow?

<rjGq+aS7Ay3xAtLwb@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17993&group=comp.lang.lisp#17993

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sat, 10 Feb 2024 15:41:36 -0000 (UTC)
Organization: To protect and to server
Message-ID: <rjGq+aS7Ay3xAtLwb@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home> <87v86xpy8a.fsf@yaxenu.org>
<m2a5o9h1bw.fsf@MacBook-Pro-2.home> <ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sat, 10 Feb 2024 15:41:36 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="3378138"; posting-host="9H7U5kayiTdk7VIdYU44Rw.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
Cancel-Lock: sha256:Zm3KjUuKSLeiuHQkp+Usq4dIk+jpeNhd0uSi9k5/mQo=
X-Server-Commands: nowebcancel
X-Notice: Filtered by postfilter v. 0.9.3
X-Organisation: Weyland-Yutani
 by: Spiros Bousbouras - Sat, 10 Feb 2024 15:41 UTC

On Sat, 10 Feb 2024 11:43:44 -0300
Julieta Shem <jshem@yaxenu.org> wrote:
> Spiros Bousbouras <spibou@gmail.com> writes:
>
> [...]
>
> >> > I haven't studied about the /loop/ macro yet.
> >>
> >> Some people abhor loop, while others find it strangely attractive (or
> >> even addictive).
> >
> > For turning Julieta's recursion into a loop , DO will do just fine.
>
> Good to know. I haven't studied DO and DO* yet either. Lol. How am I
> writing code over here? Perhaps you'd all be horrified. It's hard to
> learn everything. It seems better to write with the little we have and
> then rewrite it as the years go by. I did learn to use DOLIST.

It will only take you 10 minutes tops to learn to use DO and it will make
your life easier. It must be hard trying to write everything as recursion.
Just be careful that the CLHS says
do iterates over a group of statements while a test condition holds.

which is the opposite than the correct behaviour which is that the iteration
terminates when the test condition becomes true. That's an embarrassing
mistake in the specification. All the implementations implement the correct
behaviour.

> [...]
>
> > If it also wants to do spam filtering
> > then yes , it will have to decode the body.
>
> Spam should be the problem of a spam filter---not my problem. :)

That depends on whether you are planning to use your server "properly"
i.e. put it online and peer with other servers , etc. People won't peer
with you unless you have antispam measures in place. So perhaps spam
filtering won't be implemented within the server but you probably should
give some thought on how well your server would integrate with preexisting
spam filters. I've seen peering policies specifying that in order to peer
with a server , it has to use a piece of software called cleanfeed .

--
vlaho.ninja/menu

Re: why is this procedure is so slow?

<87il2wdyqy.fsf@yaxenu.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17994&group=comp.lang.lisp#17994

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jshem@yaxenu.org (Julieta Shem)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sat, 10 Feb 2024 14:18:13 -0300
Organization: A noiseless patient Spider
Lines: 119
Message-ID: <87il2wdyqy.fsf@yaxenu.org>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
<ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="4e28b53434257e8ae1ee25e3f601f368";
logging-data="3432741"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+y79dE2eUQkqWNlRHnOGn+ZJOsxbI5vFo="
Cancel-Lock: sha1:/CI7UlOjRaDZNUFHUcUiFdZKfSc=
sha1:JC4IPNLeJI+b3n/xMDJDgHv+4BA=
 by: Julieta Shem - Sat, 10 Feb 2024 17:18 UTC

Spiros Bousbouras <spibou@gmail.com> writes:

> On Sat, 10 Feb 2024 11:43:44 -0300
> Julieta Shem <jshem@yaxenu.org> wrote:
>> Spiros Bousbouras <spibou@gmail.com> writes:
>>
>> [...]
>>
>> >> > I haven't studied about the /loop/ macro yet.
>> >>
>> >> Some people abhor loop, while others find it strangely attractive (or
>> >> even addictive).
>> >
>> > For turning Julieta's recursion into a loop , DO will do just fine.
>>
>> Good to know. I haven't studied DO and DO* yet either. Lol. How am I
>> writing code over here? Perhaps you'd all be horrified. It's hard to
>> learn everything. It seems better to write with the little we have and
>> then rewrite it as the years go by. I did learn to use DOLIST.
>
> It will only take you 10 minutes tops to learn to use DO and it will make
> your life easier.

That's what non-beginners think, which includes beginners that didn't
begin yet---I've thought that too. Things are always more complicated
than it appears. You have no idea how many stupid things I've done
here. :) Here's a not-first attempt at using DO.

(defun nntp-read-line-helper (bs)
(let ((len (length bs))
acc)
(do ((i 0 (1+ i)))
((= i len) (nreverse acc))
(let ((x (svref bs i)))
(if (= x 10)
(return (nreverse acc))
(push x acc))))))

CL-USER> (nntp-read-line-helper (vector 1 2 3 13 10 4 5 6 13 10))
(1 2 3 13)

CL-USER> (nntp-read-line-helper (vector 4 5 6 13 10))
(4 5 6 13)

CL-USER> (nntp-read-line-helper (vector))
NIL

It'd be used like that. (This one might be successful.) I haven't
written nntp-read-line driver of this helper yet. (There are some thing
that I consider non-obvious. Network connections can deliver partial
lines. What do I do when I get a partial line? Also, for good
performance, I likely need read-sequence here. So, my first version
uses read-byte and now I have a slow reading procedure that shows its
performance when you attach a file, say. To be honest, I'm not even
going to allow files to be attached, but I would like to know what to do
if I wanted a high-performance reader.)

Vectors are not convenient when we want to build something one element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.

> It must be hard trying to write everything as recursion. Just be
> careful that the CLHS says do iterates over a group of statements
> while a test condition holds.
>
> which is the opposite than the correct behaviour which is that the iteration
> terminates when the test condition becomes true. That's an embarrassing
> mistake in the specification. All the implementations implement the correct
> behaviour.

I don't read the CLHS at first. I first, say, consult Paul Graham's
ANSI Common Lisp.

>> [...]
>>
>> > If it also wants to do spam filtering
>> > then yes , it will have to decode the body.
>>
>> Spam should be the problem of a spam filter---not my problem. :)
>
> That depends on whether you are planning to use your server "properly"
> i.e. put it online and peer with other servers , etc. People won't peer
> with you unless you have antispam measures in place. So perhaps spam
> filtering won't be implemented within the server but you probably should
> give some thought on how well your server would integrate with preexisting
> spam filters. I've seen peering policies specifying that in order to peer
> with a server , it has to use a piece of software called cleanfeed .

I'm not going to join the USENET---that would likely be a lot of work.
The idea of this server is to offer Lisp programmers a little private
club in which they can play with things. The server will be
closed---you need an account to join. Accounts can be created by any
member, so any member can invite other members---giving members a sense
they own the software that runs the community. (Instead of mailing
lists, say, you can run this little NNTP guy.) That's the idea---a
playground. It's also a small-community tool.

I've got a skeleton of it running. It's been great fun. I've never had
so much fun in programming. There was a time I was seriously
considering stopping programming, but then I discovered Lisp. In
choosing a dialect, I chose Racket and spent considerable time studying
it. I've written small things in Racket (for the web). I loved their
web-server based on continuations. But it seems that life is much
easier and more fun in Common Lisp---not by a small delta. I always
felt that pretty much any Lisp would be more or less the same fun, but
that's not true.

Big thanks to the people here who directly or indirectly made me
consider Common Lisp. There's a thread here that began with Racket and
was eventually renamed to ``Choosing between Lisps and Schemes''. I
believe it was the POSIX discussion that gave me the idea that maybe I
would find the Common Lisp world easier to understand than Racket's.
Maybe that's what made me decide to give it a try. I'm very grateful to
Racket, though, in that their books were what I needed to stop thinking
so much like a C programmer and start thinking more like a functional
programmer. The project now is to think like a language-oriented
programmer and, ironically, I gave up on Racket for that. (Maybe I am
the problem.)

Re: why is this procedure is so slow?

<86le7rqycf.fsf@williamsburg.bawden.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17995&group=comp.lang.lisp#17995

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!bawden.eternal-september.org!.POSTED!not-for-mail
From: alan@csail.mit.edu (Alan Bawden)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 02:00:48 -0500
Organization: ITS Preservation Society
Lines: 13
Message-ID: <86le7rqycf.fsf@williamsburg.bawden.org>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
<ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: bawden.eternal-september.org; posting-host="a13d222465dca45e8ec5bc9977c9a50b";
logging-data="932534"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18Zzv25Qf0/XDNHD6Z7EfjM"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)
Cancel-Lock: sha1:4GaseXoHsf2yE96wnNoTS71mLsU=
sha1:v/xr958dkIrhjzBhKajsTzCUKNI=
 by: Alan Bawden - Sun, 11 Feb 2024 07:00 UTC

Julieta Shem <jshem@yaxenu.org> writes:

Vectors are not convenient when we want to build something one element
at a time. I think coercing a list into a vector is O(n), but that
seems inevitable because to make a vector we need to know the size and
we don't know the size up front.

This is why we have arrays with fill pointers. You should read up on
how arrays (and vectors) really work, with an eye on eventually being
able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
just 8-bit bytes.

- Alan

Re: why is this procedure is so slow?

<uq9t8d$spnu$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17996&group=comp.lang.lisp#17996

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!usenet.network!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jbb@notatt.com (Jeff Barnett)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 00:33:29 -0700
Organization: A noiseless patient Spider
Lines: 25
Message-ID: <uq9t8d$spnu$1@dont-email.me>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
<ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org>
<86le7rqycf.fsf@williamsburg.bawden.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 11 Feb 2024 07:33:33 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1e1b4014ad4daa54fc63f1a4cc11d994";
logging-data="943870"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19jbUdSj5V4Ct5lUZXyM5abypmfCA2MSls="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:phiTKzQliy/1itZDbSPSSfH2qxA=
Content-Language: en-US
In-Reply-To: <86le7rqycf.fsf@williamsburg.bawden.org>
 by: Jeff Barnett - Sun, 11 Feb 2024 07:33 UTC

On 2/11/2024 12:00 AM, Alan Bawden wrote:
> Julieta Shem <jshem@yaxenu.org> writes:
>
> Vectors are not convenient when we want to build something one element
> at a time. I think coercing a list into a vector is O(n), but that
> seems inevitable because to make a vector we need to know the size and
> we don't know the size up front.
>
> This is why we have arrays with fill pointers. You should read up on
> how arrays (and vectors) really work, with an eye on eventually being
> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
> just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many
definitions of "many") times, will consume O(n^2) time AND probably
memory too. There is no free (or even cheap) lunch when you don't have a
good estimate of final total size up front.

I think the best you can do is some sort of balanced tree, e.g., a heap,
that will consume total build time of O(n*log n) and just O(n) space.
The average/median retrieval time of a single element by index: O(1) for
vector; O(log n) for heap; O(n) for linked list.
--
Jeff Barnett

Re: why is this procedure is so slow?

<87wmrbmgju.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17997&group=comp.lang.lisp#17997

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 10:38:45 +0000
Organization: A noiseless patient Spider
Lines: 38
Message-ID: <87wmrbmgju.fsf@bsb.me.uk>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
<ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org>
<86le7rqycf.fsf@williamsburg.bawden.org> <uq9t8d$spnu$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="956c294dc931e3fb273392ff133791d3";
logging-data="997750"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19EUtfuf9wEx6E3APNWHxaCedny9TzjS0o="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:UgPVn/B8DSQWqqry9wAFBfsGSRU=
sha1:obSlHdLB2pmATynEztXdrdvDN/o=
X-BSB-Auth: 1.3432e215005b9a8971f4.20240211103845GMT.87wmrbmgju.fsf@bsb.me.uk
 by: Ben Bacarisse - Sun, 11 Feb 2024 10:38 UTC

Jeff Barnett <jbb@notatt.com> writes:

> On 2/11/2024 12:00 AM, Alan Bawden wrote:
>> Julieta Shem <jshem@yaxenu.org> writes:
>> Vectors are not convenient when we want to build something one
>> element
>> at a time. I think coercing a list into a vector is O(n), but that
>> seems inevitable because to make a vector we need to know the size and
>> we don't know the size up front.
>> This is why we have arrays with fill pointers. You should read up on
>> how arrays (and vectors) really work, with an eye on eventually being
>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
>> just 8-bit bytes.
>
> N.B. The use of vector-push-extend, where extend happens many (for many
> definitions of "many") times, will consume O(n^2) time AND probably memory
> too. There is no free (or even cheap) lunch when you don't have a good
> estimate of final total size up front.

Is this true for real implementations? I would have thought they would
typically use an exponential allocation strategy to give what is often
called "amortised O(n)" growth.

That is certainly what I found in a quick experiment using clisp:

(defun build-vec (size)
(let ((a (make-array 0 :adjustable t :fill-pointer t)))
(dotimes (i size (length a)) (vector-push-extend i a))))

shows linear growth in execution time.

> I think the best you can do is some sort of balanced tree, e.g., a heap,
> that will consume total build time of O(n*log n) and just O(n) space. The
> average/median retrieval time of a single element by index: O(1) for
> vector; O(log n) for heap; O(n) for linked list.

--
Ben.

Re: why is this procedure is so slow?

<wBataYNc=aR5lboLU@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17998&group=comp.lang.lisp#17998

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!news.swapon.de!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 20:35:49 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 113
Message-ID: <wBataYNc=aR5lboLU@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home> <87v86xpy8a.fsf@yaxenu.org>
<m2a5o9h1bw.fsf@MacBook-Pro-2.home> <ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 11 Feb 2024 20:35:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="449cdbe5578df7a18af127a2329ceacc";
logging-data="1186650"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19W+VVf/chkEuqqMrepKAzo"
Cancel-Lock: sha1:wA7TePldCYxkIEP4boF7WNTvK4M=
X-Server-Commands: nowebcancel
X-Organisation: Weyland-Yutani
In-Reply-To: <87il2wdyqy.fsf@yaxenu.org>
 by: Spiros Bousbouras - Sun, 11 Feb 2024 20:35 UTC

On Sat, 10 Feb 2024 14:18:13 -0300
Julieta Shem <jshem@yaxenu.org> wrote:
> Spiros Bousbouras <spibou@gmail.com> writes:
> > It will only take you 10 minutes tops to learn to use DO and it will make
> > your life easier.
>
> That's what non-beginners think, which includes beginners that didn't
> begin yet---I've thought that too. Things are always more complicated
> than it appears. You have no idea how many stupid things I've done
> here. :) Here's a not-first attempt at using DO.
>
> (defun nntp-read-line-helper (bs)
> (let ((len (length bs))
> acc)
> (do ((i 0 (1+ i)))
> ((= i len) (nreverse acc))
> (let ((x (svref bs i)))
> (if (= x 10)
> (return (nreverse acc))
> (push x acc))))))
>
> CL-USER> (nntp-read-line-helper (vector 1 2 3 13 10 4 5 6 13 10))
> (1 2 3 13)
>
> CL-USER> (nntp-read-line-helper (vector 4 5 6 13 10))
> (4 5 6 13)
>
> CL-USER> (nntp-read-line-helper (vector))
> NIL
>
> It'd be used like that. (This one might be successful.) I haven't
> written nntp-read-line driver of this helper yet. (There are some thing
> that I consider non-obvious. Network connections can deliver partial
> lines. What do I do when I get a partial line? Also, for good
> performance, I likely need read-sequence here. So, my first version
> uses read-byte and now I have a slow reading procedure that shows its
> performance when you attach a file, say. To be honest, I'm not even
> going to allow files to be attached, but I would like to know what to do
> if I wanted a high-performance reader.)

Too many unknowns to answer this. Are you using a separate thread for each
connection ? [SBCL has multithreading extensions] Do you use some poll()
wrapper ? Also I don't see why you are thinking in terms of lines. It seems
to me that you should read the whole article first [up to a maximum size ,
otherwise you reject it sending back an appropriate error code and message]
and then you parse it. When you parse it then you start worrying about lines.

[ Speaking of lines , some code you included in a previous post allowed for
the possibility that the final "line" of an NNTP article does not end in
CRLF. Does the grammar actually allow this ? If not then you probably need
to return an error message to the sender at that point.
]

> Vectors are not convenient when we want to build something one element
> at a time. I think coercing a list into a vector is O(n), but that
> seems inevitable because to make a vector we need to know the size and
> we don't know the size up front.

Common Lisp offers extendable vectors. The question is by how much to extend
them. With SBCL I have found that extending them by a constant size [in a
loop] leads to poor performance , likely because SBCL copies the vector each
time so you get quadratic execution time. On the other hand , multiplying by
a constant C works fine. The question is how much should C be. In the
beginning , C = 2 is fine. As the vector gets "too large" then you may want
to make C = 3/2 .How large is "too large" depends on your application ,
implementation [in particular value of MOST-POSITIVE-FIXNUM] and how much
memory you have. In this case you want a limit on the size of each article
accepted anyway. Since you won't allow attachments , 1,048,576 bytes seems
perfectly adequate in which case you can allocate a vector of that size right
away and , after you have read the article , adjust the size of the vector to
be the same as the article. But for general reference , here is an example

(defun store-in-vector (vec pos elem &aux (len (length vec)))
"vec must be an adjustable vector. The function stores elem at
position pos adjusting the size of the vector if necessary. The
function is written so that it is efficient to call it in a loop."

(when (< pos len) (return-from store-in-vector (setf (aref vec pos) elem)))
(let ((new-size (max (+ len len) (+ 1 pos))))
(adjust-array vec new-size))
(setf (aref vec pos) elem))

(defun nntp-read-line-helper-2
(bs &aux (res (make-array 128 :adjustable t))
(len (length bs)))
(do ((i 0 (1+ i)))
((eql i len) (adjust-array res len))
(let ((x (svref bs i)))
(when (eql x 10) (return (adjust-array res i)))
(store-in-vector res i x))))

> Big thanks to the people here who directly or indirectly made me
> consider Common Lisp. There's a thread here that began with Racket and
> was eventually renamed to ``Choosing between Lisps and Schemes''. I
> believe it was the POSIX discussion that gave me the idea that maybe I
> would find the Common Lisp world easier to understand than Racket's.
> Maybe that's what made me decide to give it a try. I'm very grateful to
> Racket, though, in that their books were what I needed to stop thinking
> so much like a C programmer and start thinking more like a functional
> programmer. The project now is to think like a language-oriented
> programmer and, ironically, I gave up on Racket for that. (Maybe I am
> the problem.)

I very much think like a C programmer when I write Common Lisp code and
this is how I prefer it. These days , when you want to use a functional
style , Lisp's are not the best choice and Common Lisp is not the best
choice among Lisp's.

--
Alalalala char alalalala char
alalalala char char cha char chi char
Alalalala int alalalala int
alalalala int-int int-int int int

Re: why is this procedure is so slow?

<A63+jAflJIP=hOfBM@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=17999&group=comp.lang.lisp#17999

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 20:45:39 -0000 (UTC)
Organization: To protect and to server
Message-ID: <A63+jAflJIP=hOfBM@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home> <87v86xpy8a.fsf@yaxenu.org>
<m2a5o9h1bw.fsf@MacBook-Pro-2.home> <ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org> <86le7rqycf.fsf@williamsburg.bawden.org>
<uq9t8d$spnu$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 11 Feb 2024 20:45:39 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="3686182"; posting-host="9H7U5kayiTdk7VIdYU44Rw.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
Cancel-Lock: sha256:F20SfHjCHjRMOO+ujgoD1YxySvMeh7zkx59zqPVvwCU=
X-Organisation: Weyland-Yutani
X-Notice: Filtered by postfilter v. 0.9.3
X-Server-Commands: nowebcancel
 by: Spiros Bousbouras - Sun, 11 Feb 2024 20:45 UTC

On Sun, 11 Feb 2024 00:33:29 -0700
Jeff Barnett <jbb@notatt.com> wrote:
> On 2/11/2024 12:00 AM, Alan Bawden wrote:
> > Julieta Shem <jshem@yaxenu.org> writes:
> >
> > Vectors are not convenient when we want to build something one element
> > at a time. I think coercing a list into a vector is O(n), but that
> > seems inevitable because to make a vector we need to know the size and
> > we don't know the size up front.
> >
> > This is why we have arrays with fill pointers. You should read up on
> > how arrays (and vectors) really work, with an eye on eventually being
> > able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
> > just 8-bit bytes.
>
> N.B. The use of vector-push-extend, where extend happens many (for many
> definitions of "many") times, will consume O(n^2) time AND probably
> memory too. There is no free (or even cheap) lunch when you don't have a
> good estimate of final total size up front.

The method I described in <wBataYNc=aR5lboLU@bongo-ra.co> will add
log(N) time due to resizing [with the likely attendant copying] to the
execution time of N which is unavoidable for the application discussed
here. I would classify this as a "cheap lunch".

> I think the best you can do is some sort of balanced tree, e.g., a heap,
> that will consume total build time of O(n*log n) and just O(n) space.
> The average/median retrieval time of a single element by index: O(1) for
> vector; O(log n) for heap; O(n) for linked list.

If the input won't fit into 1 vector [perhaps because it is larger than
MOST-POSITIVE-FIXNUM] then a vector of vectors will still ensure constant
access time so no need to mess around with balanced trees.

But I agree in not using VECTOR-PUSH-EXTEND .The appropriate reallocation
strategy depends on the application so one should not rely on some generic
algorithm used by the implementation [which may or may not give O(N^2)
performance].

--
vlaho.ninja/menu

Re: why is this procedure is so slow?

<UZlEnnFkR+7fvs18m@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18000&group=comp.lang.lisp#18000

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!paganini.bofh.team!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 20:51:08 -0000 (UTC)
Organization: To protect and to server
Message-ID: <UZlEnnFkR+7fvs18m@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home> <87v86xpy8a.fsf@yaxenu.org>
<m2a5o9h1bw.fsf@MacBook-Pro-2.home> <ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org> <86le7rqycf.fsf@williamsburg.bawden.org>
<uq9t8d$spnu$1@dont-email.me> <87wmrbmgju.fsf@bsb.me.uk>
Mime-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 11 Feb 2024 20:51:08 -0000 (UTC)
Injection-Info: paganini.bofh.team; logging-data="3686684"; posting-host="9H7U5kayiTdk7VIdYU44Rw.user.paganini.bofh.team"; mail-complaints-to="usenet@bofh.team"; posting-account="9dIQLXBM7WM9KzA+yjdR4A";
Cancel-Lock: sha256:5Aaah63grlV7yyTxWqqYuIlomUQO1Wl9E1RYNfjkJNs=
X-Notice: Filtered by postfilter v. 0.9.3
X-Organisation: Weyland-Yutani
X-Server-Commands: nowebcancel
 by: Spiros Bousbouras - Sun, 11 Feb 2024 20:51 UTC

On Sun, 11 Feb 2024 10:38:45 +0000
Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
> Jeff Barnett <jbb@notatt.com> writes:
> > N.B. The use of vector-push-extend, where extend happens many (for many
> > definitions of "many") times, will consume O(n^2) time AND probably memory
> > too. There is no free (or even cheap) lunch when you don't have a good
> > estimate of final total size up front.
>
> Is this true for real implementations? I would have thought they would
> typically use an exponential allocation strategy to give what is often
> called "amortised O(n)" growth.
>
> That is certainly what I found in a quick experiment using clisp:
>
> (defun build-vec (size)
> (let ((a (make-array 0 :adjustable t :fill-pointer t)))
> (dotimes (i size (length a)) (vector-push-extend i a))))
>
> shows linear growth in execution time.

What did you measure ?

Re: why is this procedure is so slow?

<87r0himzjb.fsf@bsb.me.uk>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18001&group=comp.lang.lisp#18001

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ben.usenet@bsb.me.uk (Ben Bacarisse)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 22:00:56 +0000
Organization: A noiseless patient Spider
Lines: 33
Message-ID: <87r0himzjb.fsf@bsb.me.uk>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
<ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org>
<86le7rqycf.fsf@williamsburg.bawden.org> <uq9t8d$spnu$1@dont-email.me>
<87wmrbmgju.fsf@bsb.me.uk> <UZlEnnFkR+7fvs18m@bongo-ra.co>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: dont-email.me; posting-host="956c294dc931e3fb273392ff133791d3";
logging-data="1214595"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX192ylEZ63OaNCuOY3ACgQra7tk3Mfcaxtc="
User-Agent: Gnus/5.13 (Gnus v5.13)
Cancel-Lock: sha1:D5P7YHz8dL7vQhEPa+u1m/+mKxg=
sha1:bqiCwPLyyQxc/K5tXefb7BYnyh0=
X-BSB-Auth: 1.58c1b39e5bd33dbbc670.20240211220056GMT.87r0himzjb.fsf@bsb.me.uk
 by: Ben Bacarisse - Sun, 11 Feb 2024 22:00 UTC

Spiros Bousbouras <spibou@gmail.com> writes:

> On Sun, 11 Feb 2024 10:38:45 +0000
> Ben Bacarisse <ben.usenet@bsb.me.uk> wrote:
>> Jeff Barnett <jbb@notatt.com> writes:
>> > N.B. The use of vector-push-extend, where extend happens many (for many
>> > definitions of "many") times, will consume O(n^2) time AND probably memory
>> > too. There is no free (or even cheap) lunch when you don't have a good
>> > estimate of final total size up front.
>>
>> Is this true for real implementations? I would have thought they would
>> typically use an exponential allocation strategy to give what is often
>> called "amortised O(n)" growth.
>>
>> That is certainly what I found in a quick experiment using clisp:
>>
>> (defun build-vec (size)
>> (let ((a (make-array 0 :adjustable t :fill-pointer t)))
>> (dotimes (i size (length a)) (vector-push-extend i a))))
>>
>> shows linear growth in execution time.
>
> What did you measure ?

I did essentially this

(dolist (i '(16 17 18 19 20 21 22 23)) (time (build-vec (expt 2 i))))

though I was actually running it interactively to see what gave
reliable timings.

--
Ben.

Re: why is this procedure is so slow?

<c2HaMHVG89KbqjXrz@bongo-ra.co>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18002&group=comp.lang.lisp#18002

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: spibou@gmail.com (Spiros Bousbouras)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 23:39:43 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 9
Message-ID: <c2HaMHVG89KbqjXrz@bongo-ra.co>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home> <87v86xpy8a.fsf@yaxenu.org>
<m2a5o9h1bw.fsf@MacBook-Pro-2.home> <ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org> <86le7rqycf.fsf@williamsburg.bawden.org>
<uq9t8d$spnu$1@dont-email.me> <A63+jAflJIP=hOfBM@bongo-ra.co>
MIME-Version: 1.0
Content-Type: text/plain; charset=ISO-8859-1
Content-Transfer-Encoding: 8bit
Injection-Date: Sun, 11 Feb 2024 23:39:43 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4735e20e02677e58ec38222b2752a50f";
logging-data="1244504"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+qFDsbBryS5i1NFXSQeZ6k"
Cancel-Lock: sha1:PgJWJWLEGNULkeNiQ2AvKFrtOGE=
X-Server-Commands: nowebcancel
X-Organisation: Weyland-Yutani
In-Reply-To: <A63+jAflJIP=hOfBM@bongo-ra.co>
 by: Spiros Bousbouras - Sun, 11 Feb 2024 23:39 UTC

On Sun, 11 Feb 2024 20:45:39 -0000 (UTC)
Spiros Bousbouras <spibou@gmail.com> wrote:
> The method I described in <wBataYNc=aR5lboLU@bongo-ra.co> will add
> log(N) time due to resizing [with the likely attendant copying] to the
> execution time of N which is unavoidable for the application discussed
> here. I would classify this as a "cheap lunch".

Actually it's up to 2*N rather than log(N) .But the important thing
is that you still get linear performance.

Re: why is this procedure is so slow?

<86h6ieqvvn.fsf@williamsburg.bawden.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18003&group=comp.lang.lisp#18003

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!usenet.network!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!bawden.eternal-september.org!.POSTED!not-for-mail
From: alan@csail.mit.edu (Alan Bawden)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 21:06:20 -0500
Organization: ITS Preservation Society
Lines: 58
Message-ID: <86h6ieqvvn.fsf@williamsburg.bawden.org>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
<ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org>
<86le7rqycf.fsf@williamsburg.bawden.org> <uq9t8d$spnu$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain
Injection-Info: bawden.eternal-september.org; posting-host="76c037d461f00ef6ae3711d9f8340ea1";
logging-data="1282099"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18c9BKuL8B4K/PSN1y9Ren3"
User-Agent: Gnus/5.13 (Gnus v5.13) Emacs/24.4 (gnu/linux)
Cancel-Lock: sha1:mcV88mYiCYinho/BF80QJKjtUX8=
sha1:lzWhtIXUNl9KoP3Sg2Z4mVpqft4=
 by: Alan Bawden - Mon, 12 Feb 2024 02:06 UTC

Jeff Barnett <jbb@notatt.com> writes:

On 2/11/2024 12:00 AM, Alan Bawden wrote:
> Julieta Shem <jshem@yaxenu.org> writes:
>
> Vectors are not convenient when we want to build something one element
> at a time. I think coercing a list into a vector is O(n), but that
> seems inevitable because to make a vector we need to know the size and
> we don't know the size up front.
>
> This is why we have arrays with fill pointers. You should read up on
> how arrays (and vectors) really work, with an eye on eventually being
> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
> just 8-bit bytes.

N.B. The use of vector-push-extend, where extend happens many (for many
definitions of "many") times, will consume O(n^2) time AND probably memory
too. There is no free (or even cheap) lunch when you don't have a good
estimate of final total size up front.

I think the best you can do is some sort of balanced tree, e.g., a heap,
that will consume total build time of O(n*log n) and just O(n) space. The
average/median retrieval time of a single element by index: O(1) for
vector; O(log n) for heap; O(n) for linked list.

No. (Well technically you're correct, but this is not what the OP needs
to hear.)

Look, we added fill pointers, and VECTOR-PUSH and VECTOR-PUSH-EXTEND to
Common Lisp PRECISELY TO COVER SITUATIONS LIKE THIS. There's no need
here for balanced trees or other complicated data structures.

First you allocate an array with sufficient capacity for an estimated
typical case. If you're reading bytes from a stream, a size of
something like 16K is probably enough. Set the fill pointer to zero
and start accumulating bytes using VECTOR-PUSH-EXTEND.

Note that the third argument to VECTOR-PUSH-EXTEND is the amount to
extend the buffer if you get to the point of having more than 16K (or
whatever) elements. Pick something big enough so that extension is
rare. Like another 16K for example.

If you're doing this multiple times, just reset the fill pointer and
reuse this same array for the next buffer load of input. This way even
if you estimated wrong about the size of a typical buffer load, you'll
only pay for as many extensions as are need for the largest buffer load
you see.

This is pretty much the way you would handle things in C, and there is
no need to get any more complicated than that.

Yes, technically this is O(n^2) -- if you set the initial buffer
capacity to 1, and extend it by 1 element each time, this will matter.
But if you set the initial buffer capacity and extension amount to
something reasonable, you'll extend your buffer so few times that you'll
never know it isn't O(n).

- Alan

Re: why is this procedure is so slow?

<uqc7e8$1c368$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18004&group=comp.lang.lisp#18004

  copy link   Newsgroups: comp.lang.lisp
Path: i2pn2.org!i2pn.org!usenet.network!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: jbb@notatt.com (Jeff Barnett)
Newsgroups: comp.lang.lisp
Subject: Re: why is this procedure is so slow?
Date: Sun, 11 Feb 2024 21:39:34 -0700
Organization: A noiseless patient Spider
Lines: 57
Message-ID: <uqc7e8$1c368$1@dont-email.me>
References: <87sf22ykaw.fsf@yaxenu.org> <m24jeiyeb1.fsf@MacBook-Pro-2.home>
<87v86xpy8a.fsf@yaxenu.org> <m2a5o9h1bw.fsf@MacBook-Pro-2.home>
<ogD0Mv+vfv1CrR+QE@bongo-ra.co> <87mss8idlr.fsf@yaxenu.org>
<rjGq+aS7Ay3xAtLwb@bongo-ra.co> <87il2wdyqy.fsf@yaxenu.org>
<86le7rqycf.fsf@williamsburg.bawden.org> <uq9t8d$spnu$1@dont-email.me>
<87wmrbmgju.fsf@bsb.me.uk>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Injection-Date: Mon, 12 Feb 2024 04:39:36 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="1f63bc2fbc9cd46b633e946a52b04f36";
logging-data="1445064"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18g6qzTGIlyWA/c4+A8hIbAbv7+2hGGXZA="
User-Agent: Mozilla Thunderbird
Cancel-Lock: sha1:+SqHl4GajZncqria2lmPcmprn/o=
In-Reply-To: <87wmrbmgju.fsf@bsb.me.uk>
Content-Language: en-US
 by: Jeff Barnett - Mon, 12 Feb 2024 04:39 UTC

On 2/11/2024 3:38 AM, Ben Bacarisse wrote:
> Jeff Barnett <jbb@notatt.com> writes:
>
>> On 2/11/2024 12:00 AM, Alan Bawden wrote:
>>> Julieta Shem <jshem@yaxenu.org> writes:
>>> Vectors are not convenient when we want to build something one
>>> element
>>> at a time. I think coercing a list into a vector is O(n), but that
>>> seems inevitable because to make a vector we need to know the size and
>>> we don't know the size up front.
>>> This is why we have arrays with fill pointers. You should read up on
>>> how arrays (and vectors) really work, with an eye on eventually being
>>> able to use VECTOR-PUSH-EXTEND on a array that's specialized to contain
>>> just 8-bit bytes.
>>
>> N.B. The use of vector-push-extend, where extend happens many (for many
>> definitions of "many") times, will consume O(n^2) time AND probably memory
>> too. There is no free (or even cheap) lunch when you don't have a good
>> estimate of final total size up front.
>
> Is this true for real implementations? I would have thought they would
> typically use an exponential allocation strategy to give what is often
> called "amortised O(n)" growth.
>
> That is certainly what I found in a quick experiment using clisp:
>
> (defun build-vec (size)
> (let ((a (make-array 0 :adjustable t :fill-pointer t)))
> (dotimes (i size (length a)) (vector-push-extend i a))))
>
> shows linear growth in execution time.

Every time you do a rebuild initiated by a push-extend you end up
copying all the current items in the vector; and each time you do the
copy, more items are copied than for the last copy. This leads, finally,
to O(n^2) complexity. And as to storage, you must look at all the
allocations you have done and the overhead to either GC or tack them
into your heap. Usually, you only notice this degradation if you push a
whole lot of items - recall, O(.) is really an asymptotic measure. Lots
of tricks will hide the problem for a while but one usually gets bit in
the end.

When you ran build-vec, did you let size be a few billion? What happened
to run time? My guess is that most storage heap management schemes take
about the same amount of time and storage to allocate a few bytes and a
few thousand bytes. The growth effect, if it is real in the
implementation you are using for testing, must build vectors many many
times longer than the internal allocation chunk size to see the
quadratic effect.

>> I think the best you can do is some sort of balanced tree, e.g., a heap,
>> that will consume total build time of O(n*log n) and just O(n) space. The
>> average/median retrieval time of a single element by index: O(1) for
>> vector; O(log n) for heap; O(n) for linked list.
--
Jeff Barnett

Pages:12
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor