Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

"Everyone's head is a cheap movie show." -- Jeff G. Bone


computers / alt.comp.software.firefox / Re: Lack of web archive format support

SubjectAuthor
* Lack of web archive format supportVanguardLH
+* Re: Lack of web archive format supportNewyana2
|`* Re: Lack of web archive format supportVanguardLH
| `* Re: Lack of web archive format supportNewyana2
|  +- Re: Lack of web archive format supportAdam H. Kerman
|  `- Re: Lack of web archive format supportVanguardLH
`- Re: Lack of web archive format supportAndy Burns

1
Lack of web archive format support

<kazecz1endc2.dlg@v.nguard.lh>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2423&group=alt.comp.software.firefox#2423

  copy link   Newsgroups: alt.comp.software.firefox
Path: i2pn2.org!rocksolid2!news.neodome.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: V@nguard.LH (VanguardLH)
Newsgroups: alt.comp.software.firefox
Subject: Lack of web archive format support
Date: Wed, 31 Jan 2024 16:57:41 -0600
Organization: Usenet Elder
Lines: 46
Sender: V@nguard.LH
Message-ID: <kazecz1endc2.dlg@v.nguard.lh>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: individual.net 7WY9lw47l8XAji/XneICRQ670lTa+r/OygCSJ5U4DyuxSEkJcf
Keywords: VanguardLH,VLH
Cancel-Lock: sha1:UJjU14fiizsnNGFtMMrqoZ4D4oU= sha256:IVH0hdaoOMRBO76o1/IcQm+I4GsNZplKjkMTPiqcNW4=
User-Agent: 40tude_Dialog/2.0.15.41
 by: VanguardLH - Wed, 31 Jan 2024 22:57 UTC

Here comes an old contentious topic: Firefox and Mozilla's continued
lack of web archive format.

Seems odd that after decades of users still wanting an HTML web doc
archive format (MSHTML = MIME HTML archive format) that Firefox still
doesn't support it. Internet Explorer did as does Microsoft Edge-C and
Google Chrome. The "Save to MHT" and UnMHT are dysfunctional, or no
longer supported. UnMHT is no longer available at addons.mozilla.org,
and "Save to MHT" usually results in "data on this page cannot be
saved." Plus, they were extensions to account for Mozilla's lack of
supporting a web archive format. Not even MAF (Mozilla Archive Format)
survived which looks suspiciously like nothing more than HTMLZ;
https://en.wikipedia.org/wiki/Mozilla_Archive_Format. I looked at:

https://bugzilla.mozilla.org/show_bug.cgi?id=40873
(Opened 24 YEARS AGO. Still status = New.)

but that fizzled out. Microsoft patented an HTML archive document
format back in 2005; see:

http://web.archive.org/web/20100414035108/http://www.patentstorm.us/patents/6886132.html

However, RFC 2577 c.1999 seems to precede Microsoft's claim on a patent.
Microsoft was involved, as can be seen in the list of authors on the
RFC, but I thought RFCs were open for use by anyone to provide guidance
on compatibility and implementation. Can RFCs be patented to bar their
use by others than the authors of the RFC?

I really don't want to resort to MS Edge-C (or Chrome) to create and
view MHTML web archive files. Giving URLs often fails, because page
content changes, or the pages become unavailable (moved, deleted). A
web archive format permits retaining or giving others something solid
over time instead something in a state of flux.

I don't know how long IBM has had their HTMLC archive format which looks
to be just a Zip archive file containing the main HTML document (e.g.,
index.html) along with all the resources called by the document (images,
CSS, scripts, etc). I'm not sure IBM's HTMLC format is any different
than the HTMLZ format. Neither uses MIME to encapsulate content and
resources, but instead uses the hierarchy of files and folder objects
within a .zip-style archive file.

https://wiki.mobileread.com/wiki/HTMLZ

Sure seems Mozilla does not want to make web documents portable or
archivable. Are they afraid of facilitating copyright infringement?

Re: Lack of web archive format support

<upf5u2$1ttub$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2440&group=alt.comp.software.firefox#2440

  copy link   Newsgroups: alt.comp.software.firefox
Path: i2pn2.org!rocksolid2!news.neodome.net!weretis.net!feeder8.news.weretis.net!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Newyana2@invalid.nospam (Newyana2)
Newsgroups: alt.comp.software.firefox
Subject: Re: Lack of web archive format support
Date: Wed, 31 Jan 2024 23:15:19 -0500
Organization: A noiseless patient Spider
Lines: 26
Message-ID: <upf5u2$1ttub$1@dont-email.me>
References: <kazecz1endc2.dlg@v.nguard.lh>
Injection-Date: Thu, 1 Feb 2024 04:16:02 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5e48ab35690a3a6326a87ad3c45d1d5a";
logging-data="2029515"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18O45y13Th632xqpw0142oNj49ymPPANZI="
Cancel-Lock: sha1:zDegTn/ViKgx7xidS3uC3dqRCw8=
X-Priority: 3
X-Newsreader: Microsoft Outlook Express 6.00.2900.5512
X-MSMail-Priority: Normal
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5512
 by: Newyana2 - Thu, 1 Feb 2024 04:15 UTC

"VanguardLH" <V@nguard.LH> wrote

| Here comes an old contentious topic: Firefox and Mozilla's continued
| lack of web archive format.
|

I actually wrote a program years ago to create such a thing.
It involved putting multiple webpages and images into a ZIP and
then wrapping that in an SFX self-executing package that would
load a file named card.html inside the zip and provided numerous
icon choices for the EXE. The idea was to create restaurant menus,
greeting cards, etc that could be sent in email or offered for download.

Unfortunately, right around that time there began to be security
issues with email. People stopped being willing to open an attached
EXE file or even download EXE files. So the software was a flop.

What do you want this for? There are PDFs for multipage
presentations. There's also the data URI protocol, that allows
images to be stored in an HTML file as base64, similar to the
way that emails can do. So you can have a single HTML text
file that includes any number of images. IE never handled them
very well, but FF does. I actually keep scripts on my desktop
for times when I want to do that.

Re: Lack of web archive format support

<1hyb4rojn4ff7.dlg@v.nguard.lh>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2442&group=alt.comp.software.firefox#2442

  copy link   Newsgroups: alt.comp.software.firefox
Path: i2pn2.org!i2pn.org!news.neodome.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: V@nguard.LH (VanguardLH)
Newsgroups: alt.comp.software.firefox
Subject: Re: Lack of web archive format support
Date: Wed, 31 Jan 2024 23:02:35 -0600
Organization: Usenet Elder
Lines: 64
Sender: V@nguard.LH
Message-ID: <1hyb4rojn4ff7.dlg@v.nguard.lh>
References: <kazecz1endc2.dlg@v.nguard.lh> <upf5u2$1ttub$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: individual.net mAMsP7gQw/bwCYFknX2AzwN8SIYYdXRL+w/kQBgKX25quVNPf2
Keywords: VanguardLH,VLH
Cancel-Lock: sha1:z5HE3g1vBXzOsdNtAUqoh+C4qmo= sha256:MFTCSQw9T0YRFVQyYXVeUWqXmwDXloHaiggTQ2VD5lc=
User-Agent: 40tude_Dialog/2.0.15.41
 by: VanguardLH - Thu, 1 Feb 2024 05:02 UTC

Newyana2 <Newyana2@invalid.nospam> wrote:

> "VanguardLH" <V@nguard.LH> wrote
>
>| Here comes an old contentious topic: Firefox and Mozilla's continued
>| lack of web archive format.
>|
>
> I actually wrote a program years ago to create such a thing.
> It involved putting multiple webpages and images into a ZIP and
> then wrapping that in an SFX self-executing package that would
> load a file named card.html inside the zip and provided numerous
> icon choices for the EXE. The idea was to create restaurant menus,
> greeting cards, etc that could be sent in email or offered for download.
>
> Unfortunately, right around that time there began to be security
> issues with email. People stopped being willing to open an attached
> EXE file or even download EXE files. So the software was a flop.
>
> What do you want this for? There are PDFs for multipage
> presentations. There's also the data URI protocol, that allows
> images to be stored in an HTML file as base64, similar to the
> way that emails can do. So you can have a single HTML text
> file that includes any number of images. IE never handled them
> very well, but FF does. I actually keep scripts on my desktop
> for times when I want to do that.

Came to mind, and made me remember past discussions, when Bradley
mentioned Firefox not supporting HTMLZ. Well, Firefox does not support
ANY web doc archive format which sucks.

Say you visit a page and want to send a copy to someone to show them
what you saw then, now what shows up on a later visit. "Printing" to
PDF sucks, as PDF was not designed to handle HTML docs, like scrollable
elements for text. Saving a web doc to a .pdf file will NOT give you
the same document.

You can elect to Save As using "Web page, complete" to ensure all
resources got saved, but that creates multiple files. You get the main
HTML document (<pagetitle>.htm), and a <pagetitle>-named subfolder with
the resources (images, scripts, css, etc). You don't get one file to
archive or to send to someone showing just what you saw just when you
visited the site. You have to Zip it up into a .zip archive file, and
then you can save or send just the one file. So, it is doable, but
requires the additional steps of you having to create the archive file,
and someone else knowing to unzip the file to save somewhere to hold the
main .html file and the subfolder with resources.

Like I said: doable. But not convenient. The whole point of have web
doc archive formats was to make convenient the distribution of the web
docs with all their resources, and do so with one file. As shown above,
there is a workaround, but only because Firefox does not support ANY web
archive format. It won't create web doc archives. It won't read them.
Yet old Internet Explorer, MS Edge-C, and Chrome do. After 24 years of
users asking for web archive support, you don't think it is about time
Mozilla catch up with other web browsers?

I see no Alt/File -> Save Page As option that let me save a web doc with
images base64 encoded (the URI protocol you mention). My choices are:

Web page complete (the main.html and subfolder mentioned above)
Web page HTML only (just the main.html doc, mostly unusable)
Text files (worthless when trying to save an *HTML* doc)
All files (no idea what this means - maybe some raw format)

Re: Lack of web archive format support

<l210mtF9nqtU1@mid.individual.net>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2445&group=alt.comp.software.firefox#2445

  copy link   Newsgroups: alt.comp.software.firefox
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: usenet@andyburns.uk (Andy Burns)
Newsgroups: alt.comp.software.firefox
Subject: Re: Lack of web archive format support
Date: Thu, 1 Feb 2024 07:48:44 +0000
Lines: 9
Message-ID: <l210mtF9nqtU1@mid.individual.net>
References: <kazecz1endc2.dlg@v.nguard.lh>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
X-Trace: individual.net sHKpwqfR1HOaTyvniXhEcgztxZ73TLx84ZDp9SYT9LyQl1c3oR
Cancel-Lock: sha1:h01VM3FCsmCUhbgfIRhD1NQvp1s= sha256:63e+L792NN26DXyqa74u5W2EP4s/g0KcPB7ffea2Am4=
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <kazecz1endc2.dlg@v.nguard.lh>
 by: Andy Burns - Thu, 1 Feb 2024 07:48 UTC

VanguardLH wrote:

> Here comes an old contentious topic: Firefox and Mozilla's continued
> lack of web archive format.

I used to use unMHT, I suppose 24 years ago lots of the web was static
pages, and pages were less bloated and didn't rely on large js libraries
.... maybe if people saw how big an archived page was, they'd choose
saving as PDF anyway?

Re: Lack of web archive format support

<upg7ai$23beh$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2446&group=alt.comp.software.firefox#2446

  copy link   Newsgroups: alt.comp.software.firefox
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: Newyana2@invalid.nospam (Newyana2)
Newsgroups: alt.comp.software.firefox
Subject: Re: Lack of web archive format support
Date: Thu, 1 Feb 2024 08:45:12 -0500
Organization: A noiseless patient Spider
Lines: 97
Message-ID: <upg7ai$23beh$1@dont-email.me>
References: <kazecz1endc2.dlg@v.nguard.lh> <upf5u2$1ttub$1@dont-email.me> <1hyb4rojn4ff7.dlg@v.nguard.lh>
Injection-Date: Thu, 1 Feb 2024 13:45:55 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="5e48ab35690a3a6326a87ad3c45d1d5a";
logging-data="2207185"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1893Cu+YbH64NB/z+MJEkZN9ptjct5Kji8="
Cancel-Lock: sha1:e8DvNNSuJW5vkAkXH6ls1B07UX4=
X-MSMail-Priority: Normal
X-Priority: 3
X-Newsreader: Microsoft Outlook Express 6.00.2900.5512
X-MimeOLE: Produced By Microsoft MimeOLE V6.00.2900.5512
 by: Newyana2 - Thu, 1 Feb 2024 13:45 UTC

"VanguardLH" <V@nguard.LH> wrote

| Came to mind, and made me remember past discussions, when Bradley
| mentioned Firefox not supporting HTMLZ. Well, Firefox does not support
| ANY web doc archive format which sucks.
| | Say you visit a page and want to send a copy to someone to show them
| what you saw then, now what shows up on a later visit. "Printing" to
| PDF sucks, as PDF was not designed to handle HTML docs, like scrollable
| elements for text. Saving a web doc to a .pdf file will NOT give you
| the same document.
|

Everything you wrote is true, but it's actually never occurred to me.
If something's online that I want to show to a friend, I send a link.
If I want a copy of the text I save it to Notepad and add the URL
at the top. I have a large number of articles stored that way.
It's compact, simple, easy to read.

If I want to save the HTML for layout clarity
then I download that and edit out the CSS and script, adding CSS to
just make the page 800px wide and make all fonts 13px verdana.
Of course, most people can't do that. But most don't care to save
whole webpages.

HTMLZ is an ebook format. I don't deal with ebooks. I think of that
as a medium of its own. Awkward, hard to read, requires a special
device and costs a ridiculous amount of money. If I actually want the
book I'll buy it. Why pay 50% cost to rent it for an unknown duration
in a format that's hard to read and is dependent on a special device?

The data URI method is something I find very useful. For example,
I found a webpage showing images of plant leaves with various
mineral deficiencies. I wanted to save that. The easiest way was to
encode all of the images into the HTML. Yet that method is rare online.
I occasionally see tiny icons encoded. I see bloated SVG image code.
But that's it. The other day I came across a
large block of script/JSON that was base64 encoded, in a page that
was mostly script! I don't know why. The code didn't seem to be
anything secret or malicious. So why the bloated encoding?

I didn't realize that other browsers can save or open HTMLZ. I'm not
sure that I've even ever come across HTMLZ where there were not
other options. Archive.org? I'll download the PDF option or the TXT
option. I don't install Chrome. Spyware that won't even give me a
menu bar? It reminds me of Apple: "You get what we give you.
Shut up and be glad it's pretty."

I think, also, that today's typical webpage is poorly suited to packaging.
Not long ago, a website was HTML files, images and a CSS file. Today if
you download a single page it's not unusual to get 20 js files, 15 css
files, for a total of 20-odd MB, and the HTML file is mostly hyper-bloated
CSS combined with a big pile of JSON. But the text of the page is
probably less than 20 KB. So why save all that slop and be uncertain
what it's going to do if you let all that script run offline?

I'm actually very curious about this new method of webpage design.
It seems to be all auto-generated. The CSS in a typical page is vast.
There's very little HTML. Yesterday I was looking at a Bloomburg article
with a paywall. I could only see a couple of paragraphs. So I looked at
the source, to see if they were using the common trick of burying the
content in script. No. The text I could see was repeated over and over
again in script blocks, but there was no other text. It was a real paywall.
But why is so little text requiring such a big pile of script and JSON?
These
are not webpages. They're essentially large software programs.

I see big piles of text like this:

"biz_boeing","biz_att6","biz_facebook1","biz_facebook2","biz_mulberry","biz_Fidelity_investopedia","biz_hsbcpb","biz_jpmorgan","biz_morg","biz_morgan1","biz_facebookmobkoi","biz_amex2","biz_amex6","biz_mobkoivca","biz_ubsfrance","biz_vanguard19","biz_fb","biz_generalmonique","biz_kpmg","biz_pimco2019","biz_mobkoidyson","biz_mobkoiaudemars","biz_scb_apac","biz_socgenoctnov19","biz_saudiaramco","biz_porsche","biz_mobkoipublisher","biz_mobkoifortnite"

or this:

"isMetered":true,"cobrand":null,"magazine":false,"newsletterSlug":null,"suppressComments":false,"excludeFromPaywall":false,"theme":null,"background":null,"newsletterToutLabel":null,"terminalBlogId":null,"__typename":"Meta"},"mostRelevantTags":["Bonds","Federal
Reserve","Capital Markets","Central Bankers","Kellie J Wood","Interest
Rates","Bill Gross","PACIFIC INVESTMENT MANAGEMEN","International Monetary
Fund","Radio"],"moved":false,"pillar":"markets","premium":false,"publishedAt":"2024-01-18T23:25:00.000Z","readingUrl":"https://assets.bwbx.io/s3/readings/S7IDF1T0AFB41705686749030.mp3","readingDuration":409560,"record":null,"resourceType":"Story"

The latter seems to be extensive JSON instruction for assembling
the webpage. But why so bloated? Why not use HTML, whether it's
static or loaded from a database? I certainly wouldn't want to
package all that crap, along with 1MB javascript "libraries", into
a compressed storage file. The only value at Bloomburg -- if there
is any value -- is the text.

But I'm curious. I don't have any idea of even what kind of software
generates these files. Some kind of corporate WYZIWYG program? I
know the drag-drop designer for wix.com produces something similar.
But it's hard to understand how it's all working. And why so wildly
bloated? And why are they embedding CSS in JSON to be parsed by
script... with every kind of text encoding? I'm amazed that browsers can
parse all that so fast.

That whole Bloomberg page, with a menu and about 1 KB of text, weighs
in at 3 MB+ in 66 files!

Re: Lack of web archive format support

<upghl1$2501o$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2449&group=alt.comp.software.firefox#2449

  copy link   Newsgroups: alt.comp.software.firefox
Path: i2pn2.org!i2pn.org!eternal-september.org!feeder3.eternal-september.org!news.eternal-september.org!.POSTED!not-for-mail
From: ahk@chinet.com (Adam H. Kerman)
Newsgroups: alt.comp.software.firefox
Subject: Re: Lack of web archive format support
Date: Thu, 1 Feb 2024 16:42:10 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 12
Message-ID: <upghl1$2501o$1@dont-email.me>
References: <kazecz1endc2.dlg@v.nguard.lh> <upf5u2$1ttub$1@dont-email.me> <1hyb4rojn4ff7.dlg@v.nguard.lh> <upg7ai$23beh$1@dont-email.me>
Injection-Date: Thu, 1 Feb 2024 16:42:10 -0000 (UTC)
Injection-Info: dont-email.me; posting-host="4ab24d4970fdff243612344918616980";
logging-data="2261048"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1/oFicy/BtDMQ1P/QCS5G/ablQFTeP6W2I="
Cancel-Lock: sha1:JMjpWKTQqqCTtqPta9V9bneYXBM=
X-Newsreader: trn 4.0-test77 (Sep 1, 2010)
 by: Adam H. Kerman - Thu, 1 Feb 2024 16:42 UTC

Newyana2 <Newyana2@invalid.nospam> wrote:

>. . .

> HTMLZ is an ebook format.

It's an archive. There is a Web page in there. As long as the Web page
is associated with its support files after the minor conversion, there's no
reason why it cannot be read with a browser.

I have no further comment but I agree with the rest of what you wrote,
and yeah, it's horrifying.

Re: Lack of web archive format support

<k4jeajpcnt73$.dlg@v.nguard.lh>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2451&group=alt.comp.software.firefox#2451

  copy link   Newsgroups: alt.comp.software.firefox
Path: i2pn2.org!i2pn.org!usenet.goja.nl.eu.org!3.eu.feeder.erje.net!feeder.erje.net!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: V@nguard.LH (VanguardLH)
Newsgroups: alt.comp.software.firefox
Subject: Re: Lack of web archive format support
Date: Thu, 1 Feb 2024 13:19:50 -0600
Organization: Usenet Elder
Lines: 144
Sender: V@nguard.LH
Message-ID: <k4jeajpcnt73$.dlg@v.nguard.lh>
References: <kazecz1endc2.dlg@v.nguard.lh> <upf5u2$1ttub$1@dont-email.me> <1hyb4rojn4ff7.dlg@v.nguard.lh> <upg7ai$23beh$1@dont-email.me>
Mime-Version: 1.0
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
X-Trace: individual.net OllqKPKabEBXn82xbtigqQBJPr26dHn5Br8pbauIgyF0CtAn3n
Keywords: VanguardLH,VLH
Cancel-Lock: sha1:oHMxjVkBSzGOtPY1giZCfTtM1wc= sha256:EXl3+q6sP2Uzp9jY3KMw8XgNdAV+/K9hhwRdyu9+QtY=
User-Agent: 40tude_Dialog/2.0.15.41
 by: VanguardLH - Thu, 1 Feb 2024 19:19 UTC

Newyana2 <Newyana2@invalid.nospam> wrote:

> "VanguardLH" <V@nguard.LH> wrote
>
>| Came to mind, and made me remember past discussions, when Bradley
>| mentioned Firefox not supporting HTMLZ. Well, Firefox does not support
>| ANY web doc archive format which sucks.
>|
>| Say you visit a page and want to send a copy to someone to show them
>| what you saw then, now what shows up on a later visit. "Printing" to
>| PDF sucks, as PDF was not designed to handle HTML docs, like scrollable
>| elements for text. Saving a web doc to a .pdf file will NOT give you
>| the same document.
>|
>
> Everything you wrote is true, but it's actually never occurred to me.
> If something's online that I want to show to a friend, I send a link.

But content, like news, changes so the page changes. The article may
not be there tomorrow, and definitely not in a month, or later. Either
the URL will point to a page with new and different content, or the link
doesn't work anymore (page not found). I don't want to rely on
web.archive.org to find old stuff I saw.

> If I want a copy of the text I save it to Notepad and add the URL
> at the top. I have a large number of articles stored that way.
> It's compact, simple, easy to read.

That won't work when the content is more than text, like graphics,
images, or other HTML elements in the web doc. You can try to copy and
paste into LibreOffice or MS Word trying to keep some of the content,
but those really aren't HTML designers.

Another way to do that is the "Print Edit WE" add-on to Firefox.
Sometimes you just cannot select what you want without getting extra
crap since it is HTML (you cannot see all of it although it may affect
formatting or layout). But you're still stuck printing out the
selection(s) to a file, like a PDF "printer", which fucks up the content
or layout. If there is content in scrollable windows (sometimes the
whole doc window, sometimes an element within the doc), the info is
shown truncated.

> HTMLZ is an ebook format.

Only because Calibre called it that. It's still HTML, but in a Zip
archive file. Hardly equivalent to the other e-book formats. Plus the
base issue is Firefox does not support ANY web doc archive format.
MHTML, MHT, HTMLC, none.

> I think of that
> as a medium of its own. Awkward, hard to read, requires a special
> device and costs a ridiculous amount of money. If I actually want the
> book I'll buy it.

I still buy books, too. My library has books on many topics of
interest, and some are availble in e-book format (EPUB). Content looks
the same in the hardcopy as in the epub format since it was designed
that way. It wasn't like some joker simply scanned the book to put into
a file. Epub lets me read the book without having to waste time and gas
going to the library to pickup a physical book.

> Why pay 50% cost to rent it for an unknown duration
> in a format that's hard to read and is dependent on a special device?

I don't pay for any of the e-books I get from the library. However, the
topic was on Firefox, and it not supporting any web doc archive format.

> The data URI method is something I find very useful. For example,
> I found a webpage showing images of plant leaves with various
> mineral deficiencies. I wanted to save that. The easiest way was to
> encode all of the images into the HTML. Yet that method is rare online.
> I occasionally see tiny icons encoded. I see bloated SVG image code.

The "Print Edit WE" add-on sounds like a better solution. However, that
is adding something that Firefox lacks. I do install add-ons to enhance
Firefox's feature set, but Mozilla has been absent in supporting a web
doc archive format that other web browsers have supported for decades.
Imagine how Firefox would suck if it did not handle any image format, so
all images in web pages had to be rendered in an external handler.
Firefox is good at getting web docs rendered (into Firefox), but it
sucks for saving them (getting out of Firefox).

> I didn't realize that other browsers can save or open HTMLZ.

Don't know about HTMLZ (don't think so), but other web browsers support
MHTML (uses MIME to encode images and other content as resources not
directly coded in the main page).

> I'm not
> sure that I've even ever come across HTMLZ where there were not
> other options. Archive.org?

Nope. A Calibre-ism. Their invention.

> I think, also, that today's typical webpage is poorly suited to packaging.

If it can be rendered in a web client, it can be captured into files.
That's what webcrawlers do.

> Not long ago, a website was HTML files, images and a CSS file. Today if
> you download a single page it's not unusual to get 20 js files, 15 css
> files, for a total of 20-odd MB, and the HTML file is mostly hyper-bloated
> CSS combined with a big pile of JSON.

Perhaps why Calibre came up with HTMLZ to save the web doc, but compress
it to a much smaller size. Javascript and JSON are just text, so highly
compressible. Web docs are still mostly text, so also highly
compressible. Images are probably compressed so much already,
especially for web, that they won't compress much, if at all, and may
even take a bit more space after compression of already compressed
content (for archive overhead).

> I'm actually very curious about this new method of webpage design.
> It seems to be all auto-generated.

Alas, many web designers don't know HTML code. They use Wordpress, or
HTML editor, to design layout and never look at the underlying code.
Instead of using the old deprecated formatting tags, like <b>, <u>, and
so on, web docs moved to using CSS, so the formatting is elsewhere.
Yes, it can help in consistency in formatting, but the CSS directives
are bloated compared to the old tags. But that's getting off-topic.

Firefox does not support any web doc archive format, not even its old
MHT format. UnMHT died probably when Firefox move to Webkit extension
in v58 to drop XUL/XPCOM. "Save to MHT" doesn't work. Almost every
page I've visited to try it results in "data cannot be saved" error.
Other web browser have and do support MHTML, but never did Firefox. I
didn't find an add-on at addons.mozilla.org that can create a web doc
archive which produces a standards archive format. Well, other than
Calibre's attempt to do the equivalent of "Save Page As" to "Web page,
Complete" that creates the main .html doc file with subfolder with
resources, and then packs it into a Zip archive to keep main doc file
and subfolder with its files in one file for both easy transfer and a
compressed archive.

Mozilla is and has been focused on stuff going into Firefox, but little
concern about getting it out. I suppose I could use AutoIt to create a
macro that does the File menu -> "Save Page As", choose "Web page,
Complete", ask for a destination folder and filename, and then run a
compression archive (e.g., Peazip, 7-zip) to produce 1 file (.htmlz) as
a web doc archive, but then I have to install a macro tool to compensate
for Mozilla's deliberate omission of supporting a web doc archive
format. Rather than AutoIt, seems an add-on could automate the
procedure, or is controlling Firefox's chrome a no-no for add-ons?


computers / alt.comp.software.firefox / Re: Lack of web archive format support

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor