Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

One picture is worth 128K words.


computers / comp.misc / Awk: The Power and Promise of a 40-Year-Old Language

SubjectAuthor
* Awk: The Power and Promise of a 40-Year-Old LanguageBen Collver
`- Re: Awk: The Power and Promise of a 40-Year-Old LanguageBob Eager

1
Awk: The Power and Promise of a 40-Year-Old Language

<slrntsb3is.8sm.bencollver@svadhyaya.localdomain>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2178&group=comp.misc#2178

  copy link   Newsgroups: comp.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: bencollver@tilde.pink (Ben Collver)
Newsgroups: comp.misc
Subject: Awk: The Power and Promise of a 40-Year-Old Language
Date: Mon, 16 Jan 2023 17:54:25 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 301
Message-ID: <slrntsb3is.8sm.bencollver@svadhyaya.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Mon, 16 Jan 2023 17:54:25 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="6960c5e556d66cbb91d06775d32bf5a4";
logging-data="2971010"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX19bxXi6zg7YaFhmKBcU4WA09K0YTadE3yQ="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:dUSWHOYZDSJB8igKt3WmcgYhYM0=
 by: Ben Collver - Mon, 16 Jan 2023 17:54 UTC

# Awk: The Power and Promise of a 40-Year-Old Language

By Andy Oram, 19 May, 2021

Languages don't enjoy long lives. Very few people still code with
the legacies of the 1970s: ML, Pascal, Scheme, Smalltalk. (The C
language is still widely used but in significantly updated versions.)
Bucking that trend, the 1977 Unix utility Awk can boast of a loyal
band of users and seems poised to continue far into the future. In
this article, I’ll explain what makes Awk special and keeps it
relevant.

# A Descriptive Language

Awk runs on inputs and a script. The inputs can be files, but the
command is often used as part of a pipeline, taking input from the
previous command's output:

```
ls | awk '/SAMPLES_[1-9][0-9]/ { ++counter }'
```

The long quoted text in the above command is the script, which can be
included on the command line or read from files. Each script
comprises a set of conditions and actions. The condition is often a
regular expression enclosed by slashes. The action appears as one or
more statements between braces. If the condition matches a part of
the input, the action is executed. Here is my trivial, one-line
script:

```
/SAMPLES_[1-9][0-9]/ { ++counter }
```

The script searches for strings like SAMPLES_19 or SAMPLES_20 and
increments a counter each time a string is found. Of course, a real
script would use the counter in further calculations.

This is basically how Awk operates: evaluate a condition, then take
action when it matches. The script runs in what David Kerns, in an
email exchange with me, called an implied loop. In his review of
this article, Arnold Robbins, maintainer of the GNU version of Awk
(Gawk), calls the programs data-driven.

I see Awk as more of a declarative language than a procedural one.
You describe what you want to happen and the conditions under which
it happens, instead of specifying a series of sequential statements.
Awk certainly executes statements in sequence and offers control flow
statements (if, while), so it can serve quite well as a procedural
language. Nelson H. F. Beebe, in his review of this article,
mentioned writing a program with 23,981 lines of actions in just 12
patterns.

But overall, sequences of statements execute within a framework of
declaring the conditions under which these things should happen. The
concept of a declarative language has been around almost since the
beginning of high-level programming languages and can be found in the
popular notion of a promise, invented by Mark Burgess.

http://markburgess.org/promises.html

Awk documentation usually calls the condition a "pattern" because
regular expressions are so often used as conditions. Janis
Papanagnou, in his review of this article, explained that he has
recommended the word "condition" instead. I realized that this word
choice matches my own view of Awk at a high level as a descriptive
language. Aleksey Cheusov, in email, said that Awk programs can be
viewed as finite state machines, which declare how to move from one
state to another.

Neil Ormos, in an email exchange with me, offered an interesting
perspective on when to use Awk:

> I'd put Awk in a special category of general-purpose programming
> languages that are especially well adapted for: (1) personal
> computing; and (2) programmer-time-efficient prototype development,
> where the prototype artifact can evolve advantageously into a
> production-worthy tool with a little incremental effort.

Awk also maintains a delicate balance between being a line-oriented
utility like grep and a full programming language. Normally, Awk
just applies your script to each line of input, like grep, acting on
what matches your condition.

Furthermore, Awk is focused on lines divided into fields that are
separated by white space or by any character or regular expression
you choose. All behavior is subject to customizations—as Ed Morton
suggested in his review, we should speak more generally of "records"
instead of "lines"—but traditionally Awk is used on files where each
line consists of a regular set of fields. It has proven very useful
for parsing log files, for instance.

In 1988, Kernighan put a set of bug fixes and major new features into
a version released under the name Nawk (although he wanted it to
replace the original Awk), and the standard version has not changed
much since then.

# It's Not Just About the Language

Languages are part of a larger environment that often plays more of a
role in the choice of language than its actual features. For
instance, many people use Python because so many important libraries
have been written for that language. Other people use a language for
legacy reasons: they have an existing application to maintain or work
in an organization that has historically depended on a language.

Many of the people who responded to my outreach for this article
focused their appreciation of Awk on factors other than language
features. Besides being deeply embedded in many Unix scripts, Awk's
presence is guaranteed on every Unix-style system, including
GNU/Linux, BSD, and macOS. The utility's suitability for widespread
use is bolstered by its ability to accomplish complex tasks without
requiring the installation of outside libraries or packages. The
language's behavior is also guaranteed in a POSIX standard, which
turns out to be surprisingly important to a lot of users. However,
many variants have added non-standard features. Gawk and mawk are in
common use.

https://pubs.opengroup.org/onlinepubs/009696799/utilities/awk.html

Among people who use Awk on large projects, it's a critical part of
their toolkit because it's fast. Michael May and Glaudiston Gomes da
Silva told me that they had ported some Java data processing programs
to Awk with more than ten-fold reductions in CPU and RAM consumption.
One researcher clocked Awk on 25 TB of data with impressive results.
Another advised Awk’s use for some tasks, along with other classic
Unix tools, instead of Hadoop. And one of the most active sites in
data science, Analytics Vidhya, published an article praising Awk.

<https://livefreeordichotomize.com/2019/06/04/
using_awk_and_r_to_parse_25tb/>

<https://adamdrake.com/
command-line-tools-can-be-235x-faster-than-your-hadoop-cluster.html>

<https://medium.com/analytics-vidhya/
use-awk-to-save-time-and-money-in-data-science-eb4ea0b7523f>

Cheusov, in correspondence with me, provided more evidence of Awk's
speed:

> When I worked in computational linguistics, we often parsed
> gigabytes of text. Programs written in GNU Awk and mawk were much
> faster than equivalent programs written in Ruby, Python and Perl.
> Because AWK is so simple, its interpreter can be optimized much
> more easily than for much more complex languages.

Awk is fast because it has stayed simple and avoided features that
are considered necessities in other languages. It concentrates on
what it can do well. Several correspondents told me that they
appreciated being able to do what they wanted without downloading
large modules as they would do for other languages.

Computer science professor Tim Menzies, in his article "Why Gawk?",
cited the simplicity and regularity of Awk syntax, which allows it to
be learned quickly and to ward off overly complex code. Other
correspondents also cited the GNU Awk debugger as a boon for Awk
development.

https://web.archive.org/web/20150929033218/http://awk.info/?whygawk

https://www.gnu.org/software/gawk/manual/html_node/Debugger.html

Last but not least, we shouldn't ignore the importance of good
documentation. Awk documentation is easy to find on the web. The
manual for Gawk, written by the software's maintainer, Arnold D.
Robbins, is particularly helpful. For example, the Gawk manual
carefully distinguishes Gawk extensions from standard features, so
that you can avoid the extensions if you want to conform to the
standard. I have noticed that GNU tools in general have good
manuals, perhaps because Richard M. Stallman and his collaborators
have always assigned a high value to documentation.

https://www.gnu.org/software/gawk/

# Expansion Without Bloat

The classic Awk, as created by Alfred Aho, Peter J. Weinberger, and
Brian Kernighan (who drew on their initials to create the name of the
utility), was informal. It didn't make users declare variables but
simply assumed the variables' values to be zero or null the first
time they were used. Data types were implied. This kind of casual
scripting was common in the 1970s, and anything more formal would
have undermined the tool's appeal.

Every language evolves, usually by incorporating popular features
from other languages. The trick is to avoid throwing in features of
little value that degrade the language by making it hard to use, slow
to compile or run, etc. In this regard, Awk has done well. It has
resisted modernization in the form of data declarations and objects.
Because Awk is very different from general-purpose languages, it
doesn't have space for callbacks, polymorphism, and other fads that
have become central to application design in many languages. But
some variants of Awk added functionality of real value while
maintaining Awk's sleek performance and small footprint. Gawk, like
many GNU utilities, has upgraded aggressively.


Click here to read the complete article
Re: Awk: The Power and Promise of a 40-Year-Old Language

<k2m0edF5kq5U5@mid.individual.net>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2179&group=comp.misc#2179

  copy link   Newsgroups: comp.misc
Path: i2pn2.org!i2pn.org!news.swapon.de!fu-berlin.de!uni-berlin.de!individual.net!not-for-mail
From: news0009@eager.cx (Bob Eager)
Newsgroups: comp.misc
Subject: Re: Awk: The Power and Promise of a 40-Year-Old Language
Date: 16 Jan 2023 22:12:29 GMT
Lines: 36
Message-ID: <k2m0edF5kq5U5@mid.individual.net>
References: <slrntsb3is.8sm.bencollver@svadhyaya.localdomain>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
X-Trace: individual.net Yh4gRf+7GBFuEEzi2jzv6wkVMDedv59A7NETVi5jiBuBeO45wx
Cancel-Lock: sha1:RnGoLdHxtTmt6Evlf3UGyivU6S8=
User-Agent: Pan/0.145 (Duplicitous mercenary valetism; d7e168a
git.gnome.org/pan2)
 by: Bob Eager - Mon, 16 Jan 2023 22:12 UTC

On Mon, 16 Jan 2023 17:54:25 +0000, Ben Collver wrote:

> # Awk: The Power and Promise of a 40-Year-Old Language

I might have used awk more if I hadn't previously learned a macro
processor (no, m4 hardly counts).

I have done some complicated things that others had attempted and failed
- one was the extraction of names of WWII pilots from several hundred
disparate web pages. I still use the macro processor for writing more
user friendly firewall rules.

The specific advantages it has had (no necessarily so true now) are
arbitrary format input, variable delimiters (words or symbols), infinite
nesting (given enough memory) and quite a lot of storage and decision
making.

If interested, go here:

https://www.ml1.org.uk

You probably want to look at the short tutorial on this page (PDF and
HTML):

https://www.ml1.org.uk/doc.html

--
Using UNIX since v6 (1975)...

Use the BIG mirror service in the UK:
http://www.mirrorservice.org

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor