Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

The absence of labels [in ECL] is probably a good thing. -- T. Cheatham


devel / comp.lang.awk / feature share : a more sensible approach to browsing binary files

SubjectAuthor
o feature share : a more sensible approach to browsing binary filesKpop 2GM

1
feature share : a more sensible approach to browsing binary files

<ceac42fd-f6f9-4e54-86b7-7974e46c8f17n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1268&group=comp.lang.awk#1268

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:a05:622a:394:b0:304:eacb:2c69 with SMTP id j20-20020a05622a039400b00304eacb2c69mr18054604qtx.439.1654779276383;
Thu, 09 Jun 2022 05:54:36 -0700 (PDT)
X-Received: by 2002:a25:67d7:0:b0:663:b9c3:aa13 with SMTP id
b206-20020a2567d7000000b00663b9c3aa13mr16277432ybc.20.1654779276169; Thu, 09
Jun 2022 05:54:36 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!proxad.net!feeder1-2.proxad.net!209.85.160.216.MISMATCH!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Thu, 9 Jun 2022 05:54:35 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=2603:7000:3c3d:41c0:0:0:0:3c3;
posting-account=n74spgoAAAAZZyBGGjbj9G0N4Q659lEi
NNTP-Posting-Host: 2603:7000:3c3d:41c0:0:0:0:3c3
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ceac42fd-f6f9-4e54-86b7-7974e46c8f17n@googlegroups.com>
Subject: feature share : a more sensible approach to browsing binary files
From: jason.cy.kwan@gmail.com (Kpop 2GM)
Injection-Date: Thu, 09 Jun 2022 12:54:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
 by: Kpop 2GM - Thu, 9 Jun 2022 12:54 UTC

although existing tools like " od " and " xxd " are very powerful in their own right, the information overload sometimes become detrimental when one only wants a high-level browsing of it, to gain *TEXTUAL* context and perspective.

the caret-notation approach such as in " less " creates even more of a headache; disassemblers are definitely too low level for anything meaningful.

furthermore, binary executables frequently have looooong chains of null bytes or chains of 0xFF \377, making it easy to get lost in the midst when attempt to textual context, while straight up URL encoding clumps the encoded hex with actual alphanumeric characters.

My goal was to hoping to find a different, more sensible approach, without re-inventing the wheel, or attempting to replace any of the existing tools.

—————————— (the leading edge underscores are only for usenet formatting purposes - those aren't part of the code)

…..< binary executable file, already URL-encoded (using any of ur existing preferred methodologies > …… |

gawk -be / mawk 'BEGIN {
__ RS = FS = "^$"
__ OFS = ORS = ""
}

END {
__ gsub("([%]00)+", "\n ~~nulls~~ ")
__ gsub("[%]FF([%]FF)+", "\n ..FF's.. ")
__ gsub("[+]", " ")
__ gsub("[%]", "|")
__ gsub("[%|]25", "\371")
__ gsub("[|]", "\367")
__ gsub("[\367][0-9A-F][0-9A-F]", "\301&\300")
__ ___ = sprintf("%c", _ = 3 ^ 4 + 5 + 6)
__ for (_ = 3 + 3 ^ 3 + 3; (_ - 2) < (2 + 3) ^ 3; _++) {
__ __ if (_ != (5 + 2 ^ 5 )) {
__ __ __ gsub(sprintf("\367%.2X", _),
sprintf("%.*s%c", _ == (2 + 6 ^ 2), ___, _))
__ __ }
__ }
__ gsub("\301\367", "_[_")
__ gsub("\300", "_]_")
__ gsub("\371", "%")
__ gsub("[\000-\t\v-\037\177-\377]", "")
__ print
}'

——————————————

the basic idea is to keep ascii control bytes and 8-bit bytes still hex encoded, squeeze long-chains of NULLs \000 or xFF`s \377, and treating them as ORS instead of new line, while visually padding out remaining hex so they wouldn't visually be interfering with actual ascii alphanumeric, and make them appear in a unique fashion that they wouldn't be misconstrued as regex classes.

using gawk's executable as an example, now i can clearly see how the error messages, both internal, and external, are laid out, without going to the C-file source :

e.g. Op codes with their related ascii-punctuation character next to them :

~~nulls~~ Op_times
~~nulls~~ *_]_
~~nulls~~ Op_times_i

~~nulls~~ Op_quotient
~~nulls~~ /_]_
~~nulls~~ Op_quotient_i
~~nulls~~ Op_mod

~~nulls~~ %
~~nulls~~ Op_mod_i
~~nulls~~ Op_plus
~~nulls~~ +_]_

~~nulls~~ Op_plus_i
~~nulls~~ Op_minus
~~nulls~~ -

——————

or see a whole slew of error messages, both internal and external, grouped together, which helps with identifying discrepancies of approach :

~~nulls~~ for loop:_]_ array `_]_%s'_]_ changed size from %ld to %ld during loop execution
~~nulls~~ %s:_]_ called with %lu arguments,_]_ expecting at least %lu
~~nulls~~ %s:_]_ called with %lu arguments,_]_ expecting no more than %lu
~~nulls~~ indirect function call requires a simple scalar value

~~nulls~~ `_]_%s'_]_ is not a function,_]_ so it cannot be called indirectly
~~nulls~~ function called indirectly through `_]_%s'_]_ does not exist
~~nulls~~ function `_]_%s'_]_ not defined
~~nulls~~ error reading input file `_]_%s'_]_:_]_ %s

~~nulls~~ `_]_nextfile'_]_ cannot be called from a `_]_%s'_]_ rule
~~nulls~~ `_]_exit'_]_ cannot be called in the current context
~~nulls~~ `_]_next'_]_ cannot be called from a `_]_%s'_]_ rule
~~nulls~~ Sorry,_]_ don'_]_t know how to interpret `_]_%s'_]_

~~nulls~~ GAWK_STACKSIZE
~~nulls~~ Node_illegal
~~nulls~~ Node_val

or sections that appear well-pre-sorted (perhaps related to collation ordering ?), then one can quickly glance and see if anything glaring got accidentally left out :

~~nulls~~ _[_E3_]__[_05_]_
~~nulls~~ _[_E4_]__[_05_]_

~~nulls~~ _[_E5_]__[_05_]_
~~nulls~~ _[_E6_]__[_05_]_
~~nulls~~ _[_E7_]__[_05_]_
~~nulls~~ _[_E8_]__[_05_]_

~~nulls~~ _[_E9_]__[_05_]_
~~nulls~~ _[_EA_]__[_05_]_
~~nulls~~ _[_EB_]__[_05_]_
~~nulls~~ _[_EC_]__[_05_]_

~~nulls~~ _[_ED_]__[_05_]_
~~nulls~~ _[_EE_]__[_05_]_
~~nulls~~ _[_EF_]__[_05_]_

——————————

after some trial-and-error, i find " _[_7F_]_ " to be least intrusive visually, provides clear gap separation with its surrounding bytes/text, without overlapping with various equivalence/collation/character-class syntaxes.

while it's still valid regex in its own right, i can't imagine many writing underscore twice in the same class for no reason.

All that said, its still a work in progress, as some of the rougher edges still don't look as ideal as i hoped.

— The 4Chan Teller

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor