Message-ID:

Ya'll hear about the geometer who went to the beach to catch some rays and became a tangent ?

devel / comp.lang.javascript / Re: How to read an identified part of a huge text file?

How to read an identified part of a huge text file?

<u0c19l$2hc4f$1@dont-email.me>

https://www.rocksolidbbs.com/devel/article-flat.php?id=18092&group=comp.lang.javascript#18092

copy link Newsgroups: comp.lang.javascript

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.javascript
Subject: How to read an identified part of a huge text file?
Date: Sun, 2 Apr 2023 15:51:49 +0200
Organization: A noiseless patient Spider
Lines: 24
Message-ID: <u0c19l$2hc4f$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 2 Apr 2023 13:51:49 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f8252982302c6ac04b7f368a173f26a6";
logging-data="2666639"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+683ffL/Kl/9w8Xf478x1d"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:2jz7zOKoTquSSJhR/f+ctcsYQfo=
X-Mozilla-News-Host: news://news.eternal-september.org:119
X-Enigmail-Draft-Status: N1110

by: Janis Papanagnou - Sun, 2 Apr 2023 13:51 UTC

I want to read identified content from a huge text file that resides in
the file system. (My javascript code is embedded in a HTML page. I am
running all code client side and have no application servers or data
base systems running.)

I've found a suggestion using 'require("fs")' but the samples required
to load the whole file content so doesn't seem to fit for my megabytes
large data file which I strictly want to avoid loading as a whole.

My data file is actually structured as <key> <TAB> <text-data> lines
and I just want to extract the <text-data> given the respective <key>.
Is there some simple standard way to achieve that extraction?

The second question is whether it is possible to find the <key>s given
a text-match (a string match or ideally a regular expression match) on
the respective <text-data> on the external file?

For a solution/workaround to both questions it might be also useful to
call an external extractor (awk, perl, ...) from javascript and read in
the output of such an external tool invocation. - Is that possible?

Thanks for any hints.

Janis

Re: How to read an identified part of a huge text file?

<slrnu2j5fn.2bs.jon+usenet@raven.unequivocal.eu>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18093&group=comp.lang.javascript#18093

copy link Newsgroups: comp.lang.javascript

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: jon+usenet@unequivocal.eu (Jon Ribbens)
Newsgroups: comp.lang.javascript
Subject: Re: How to read an identified part of a huge text file?
Date: Sun, 2 Apr 2023 14:49:27 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 36
Message-ID: <slrnu2j5fn.2bs.jon+usenet@raven.unequivocal.eu>
References: <u0c19l$2hc4f$1@dont-email.me>
Injection-Date: Sun, 2 Apr 2023 14:49:27 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4a36298b2b88eaa53446fa2d2ce2654b";
logging-data="2681716"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18xfTJrJnTKC2YWKaDVRv5AioaREbabmFg="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:SMgvqSBXd4p9m4rK0wCfvzWi5bQ=

by: Jon Ribbens - Sun, 2 Apr 2023 14:49 UTC

On 2023-04-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> I want to read identified content from a huge text file that resides in
> the file system. (My javascript code is embedded in a HTML page. I am
> running all code client side and have no application servers or data
> base systems running.)
>
> I've found a suggestion using 'require("fs")' but the samples required
> to load the whole file content so doesn't seem to fit for my megabytes
> large data file which I strictly want to avoid loading as a whole.

require('fs') is a nodejs thing, which is not going to work if you're
using in-browser javascript.

> My data file is actually structured as <key> <TAB> <text-data> lines
> and I just want to extract the <text-data> given the respective <key>.
> Is there some simple standard way to achieve that extraction?

I think in a modern browser you might be able to use the fetch and
streams APIs to read the file a chunk at a time. e.g.

const response = await fetch('myfile.txt')
for await (const chunk of response.body) {
// Do something with each chunk
}

> The second question is whether it is possible to find the <key>s given
> a text-match (a string match or ideally a regular expression match) on
> the respective <text-data> on the external file?

Yes? I'm not sure I understand that question.

> For a solution/workaround to both questions it might be also useful to
> call an external extractor (awk, perl, ...) from javascript and read in
> the output of such an external tool invocation. - Is that possible?

Not from inside a browser, no.

Re: How to read an identified part of a huge text file?

<u0cg5b$2jgjp$1@dont-email.me>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18094&group=comp.lang.javascript#18094

copy link Newsgroups: comp.lang.javascript

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.javascript
Subject: Re: How to read an identified part of a huge text file?
Date: Sun, 2 Apr 2023 20:05:31 +0200
Organization: A noiseless patient Spider
Lines: 43
Message-ID: <u0cg5b$2jgjp$1@dont-email.me>
References: <u0c19l$2hc4f$1@dont-email.me>
<slrnu2j5fn.2bs.jon+usenet@raven.unequivocal.eu>
MIME-Version: 1.0
Content-Type: text/plain; charset=windows-1252
Content-Transfer-Encoding: 7bit
Injection-Date: Sun, 2 Apr 2023 18:05:31 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="f8252982302c6ac04b7f368a173f26a6";
logging-data="2736761"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1+SJRuNC4VVn52SqZ9zMU80"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:B/evVQ8sOvbFOyktECpT78ps6tI=
In-Reply-To: <slrnu2j5fn.2bs.jon+usenet@raven.unequivocal.eu>
X-Enigmail-Draft-Status: N1110

by: Janis Papanagnou - Sun, 2 Apr 2023 18:05 UTC

Thanks for your hints and insights thus far!

On 02.04.2023 16:49, Jon Ribbens wrote:
> On 2023-04-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>> My data file is actually structured as <key> <TAB> <text-data> lines

>> The second question is whether it is possible to find the <key>s given
>> a text-match (a string match or ideally a regular expression match) on
>> the respective <text-data> on the external file?
>
> Yes? I'm not sure I understand that question.

Where my first question was (informally described) by something like

Select <text-data> From <text-file> Where <key> Equals <search-key>

the second one operates on the data and returns text-data matching keys
that identify the data records like

Select <keys> From <text-file> Where <text-data> Matches <s1> And <s2>

with a possibility to either get all the s1/s2-matching key-identifier
in one returned set or which lets me sequentially get these keys or
let me operate on matching records (that are identified by the keys of
matching records).

Basically in both questions I have want access (line-wise, record-wise)
to the data, either the <text-data> selected by <key> or the <keys>
where the <text-data> match a search criterion.

The point is; once data is read into memory accessible to JS I can do
everything (including matching), but the problem is the bottleneck due
to the mass of data in the file, so I need to preselect the desired
records (to not have to load it completely into memory).

(I hope it got cleared and doesn't muddy it further.)

The suggestion of using await fetch('myfile.txt') sounds like it's
a raw (byte-oriented) data function (not line/record oriented one),
but I will be looking into that as well. Thanks again.

Janis

Re: How to read an identified part of a huge text file?

<slrnu2jnj1.2bs.jon+usenet@raven.unequivocal.eu>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18095&group=comp.lang.javascript#18095

copy link Newsgroups: comp.lang.javascript

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: jon+usenet@unequivocal.eu (Jon Ribbens)
Newsgroups: comp.lang.javascript
Subject: Re: How to read an identified part of a huge text file?
Date: Sun, 2 Apr 2023 19:58:25 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 59
Message-ID: <slrnu2jnj1.2bs.jon+usenet@raven.unequivocal.eu>
References: <u0c19l$2hc4f$1@dont-email.me>
<slrnu2j5fn.2bs.jon+usenet@raven.unequivocal.eu>
<u0cg5b$2jgjp$1@dont-email.me>
Injection-Date: Sun, 2 Apr 2023 19:58:25 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4a36298b2b88eaa53446fa2d2ce2654b";
logging-data="2768246"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18S+a3xEo8LrCOJJLInfSETfAqWQ9Hj7M4="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:iyjz5bp+ozRu+bNd9SYmErO0Vg4=

by: Jon Ribbens - Sun, 2 Apr 2023 19:58 UTC

On 2023-04-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
> Thanks for your hints and insights thus far!
>
> On 02.04.2023 16:49, Jon Ribbens wrote:
>> On 2023-04-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
>>> My data file is actually structured as <key> <TAB> <text-data> lines
>
>>> The second question is whether it is possible to find the <key>s given
>>> a text-match (a string match or ideally a regular expression match) on
>>> the respective <text-data> on the external file?
>>
>> Yes? I'm not sure I understand that question.
>
> Where my first question was (informally described) by something like
>
> Select <text-data> From <text-file> Where <key> Equals <search-key>
>
> the second one operates on the data and returns text-data matching keys
> that identify the data records like
>
> Select <keys> From <text-file> Where <text-data> Matches <s1> And <s2>
>
> with a possibility to either get all the s1/s2-matching key-identifier
> in one returned set or which lets me sequentially get these keys or
> let me operate on matching records (that are identified by the keys of
> matching records).
>
> Basically in both questions I have want access (line-wise, record-wise)
> to the data, either the <text-data> selected by <key> or the <keys>
> where the <text-data> match a search criterion.
>
> The point is; once data is read into memory accessible to JS I can do
> everything (including matching), but the problem is the bottleneck due
> to the mass of data in the file, so I need to preselect the desired
> records (to not have to load it completely into memory).
>
> (I hope it got cleared and doesn't muddy it further.)
>
> The suggestion of using await fetch('myfile.txt') sounds like it's
> a raw (byte-oriented) data function (not line/record oriented one),
> but I will be looking into that as well. Thanks again.

Yes, although there's an example of how to use it to read line-by-line
here:

https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader/read#example_2_-_handling_text_line_by_line

I think the only solution available to you in a browser is to use
IndexedDB. On the plus side though, it's quite a good solution.
Basically, write a function in JavaScript to read and parse the
file and load it into an in-browser database:

https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API

and then you can search this indexed database of objects, which
should be very fast and efficient. You just need to make sure that
your code checks for the existence of the database and re-creates
it from the file if it doesn't exist due to the browser having
decided to expire it.

Re: How to read an identified part of a huge text file?

<1l3goe39ie47b$.mmy83votg143$.dlg@40tude.net>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18096&group=comp.lang.javascript#18096

copy link Newsgroups: comp.lang.javascript

Path: i2pn2.org!i2pn.org!eternal-september.org!feeder.eternal-september.org!.POSTED!not-for-mail
From: jj4public@outlook.com (JJ)
Newsgroups: comp.lang.javascript
Subject: Re: How to read an identified part of a huge text file?
Date: Mon, 3 Apr 2023 06:08:52 +0700
Organization: A noiseless patient Spider
Lines: 13
Message-ID: <1l3goe39ie47b$.mmy83votg143$.dlg@40tude.net>
References: <u0c19l$2hc4f$1@dont-email.me> <slrnu2j5fn.2bs.jon+usenet@raven.unequivocal.eu> <u0cg5b$2jgjp$1@dont-email.me>
MIME-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Injection-Info: dont-email.me; posting-host="d04ab23911277a3681c0ea7c449e1a5e";
logging-data="2823272"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/YEMS7T458DWTHReuOg50W3aET3yBuaRM="
User-Agent: 40tude_Dialog/2.0.15.84
Cancel-Lock: sha1:aXj4MshtGIKDljX0zanjnLPVixQ=
X-Face: \*\`0(1j~VfYC>ebz[&O.]=,Nm\oRM{of,liRO#7Eqi4|!]!(Gs=Akgh{J)605>C9Air?pa d{sSZ09u+A7f<^paR"/NH_#<mE1S"hde\c6PZLUB[t/s5-+Iu5DSc?P0+4%,Hl
X-Bitcoin: 1LcqwCQBQmhcWfWsVEAeyLchkAY8ZfuMnS

by: JJ - Sun, 2 Apr 2023 23:08 UTC

On Sun, 2 Apr 2023 20:05:31 +0200, Janis Papanagnou wrote:
>
> The suggestion of using await fetch('myfile.txt') sounds like it's
> a raw (byte-oriented) data function (not line/record oriented one),
> but I will be looking into that as well. Thanks again.
>
> Janis

With Fetch/XHR and the `Range` HTTP request header, you'll need to have a
pre-generated index file for the text file lines, if you want to get only
specific lines without having to read the whole file. The index file would
contain byte offsets for each line in the text file, so that you'll know the
byte range a specific line is located in the text file.

Re: How to read an identified part of a huge text file?

<6574c165-12d8-4172-b1ff-7f756b14c12an@googlegroups.com>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=18097&group=comp.lang.javascript#18097

copy link Newsgroups: comp.lang.javascript

X-Received: by 2002:a05:622a:18a8:b0:3e3:7c8b:24fa with SMTP id v40-20020a05622a18a800b003e37c8b24famr476654qtc.10.1680656389087;
Tue, 04 Apr 2023 17:59:49 -0700 (PDT)
X-Received: by 2002:a05:6870:460a:b0:17f:1631:4a90 with SMTP id
z10-20020a056870460a00b0017f16314a90mr2132987oao.1.1680656388608; Tue, 04 Apr
2023 17:59:48 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!diablo1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.javascript
Date: Tue, 4 Apr 2023 17:59:48 -0700 (PDT)
In-Reply-To: <u0c19l$2hc4f$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=2603:6000:8900:6915:c83d:2a54:1b8:80d8;
posting-account=hYRygAoAAABkmvJVmPilz9Q1TOjgPQAq
NNTP-Posting-Host: 2603:6000:8900:6915:c83d:2a54:1b8:80d8
References: <u0c19l$2hc4f$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <6574c165-12d8-4172-b1ff-7f756b14c12an@googlegroups.com>
Subject: Re: How to read an identified part of a huge text file?
From: tno@thenewobjective.com (Michael Haufe (TNO))
Injection-Date: Wed, 05 Apr 2023 00:59:49 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3087

by: Michael Haufe (TNO) - Wed, 5 Apr 2023 00:59 UTC

On Sunday, April 2, 2023 at 8:52:00 AM UTC-5, Janis Papanagnou wrote:
> I want to read identified content from a huge text file that resides in
> the file system. (My javascript code is embedded in a HTML page. I am
> running all code client side and have no application servers or data
> base systems running.)
>
> I've found a suggestion using 'require("fs")' but the samples required
> to load the whole file content so doesn't seem to fit for my megabytes
> large data file which I strictly want to avoid loading as a whole.
>
> My data file is actually structured as <key> <TAB> <text-data> lines
> and I just want to extract the <text-data> given the respective <key>.
> Is there some simple standard way to achieve that extraction?
>
> The second question is whether it is possible to find the <key>s given
> a text-match (a string match or ideally a regular expression match) on
> the respective <text-data> on the external file?
>
> For a solution/workaround to both questions it might be also useful to
> call an external extractor (awk, perl, ...) from javascript and read in
> the output of such an external tool invocation. - Is that possible?
>
> Thanks for any hints.

In the latest browsers there is a feature called the Origin private file system (OPFS):

<https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API#origin_private_file_system>

This provides a FileSystemSyncAccessHandle:

<https://developer.mozilla.org/en-US/docs/Web/API/FileSystemSyncAccessHandle>

which has a `read()` method:

<https://developer.mozilla.org/en-US/docs/Web/API/FileSystemSyncAccessHandle/read>

That method with an appropriately sized buffer (size being your record size) will let you access a specific location in the file

Subject	Author
How to read an identified part of a huge text file?	Janis Papanagnou
Re: How to read an identified part of a huge text file?	Jon Ribbens
Re: How to read an identified part of a huge text file?	Janis Papanagnou
Re: How to read an identified part of a huge text file?	Jon Ribbens
Re: How to read an identified part of a huge text file?	JJ
Re: How to read an identified part of a huge text file?	Michael Haufe (TNO)