Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Save yourself! Reboot in 5 seconds!


devel / comp.lang.awk / Re: serial numbers as RS

SubjectAuthor
* serial numbers as RSraj
+* Re: serial numbers as RSJanis Papanagnou
|`* Re: serial numbers as RSraj
| `- Re: serial numbers as RSJanis Papanagnou
`- Re: serial numbers as RSKees Nuyt

1
serial numbers as RS

<047baf8e-19af-40b6-8b84-33f870d55212n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1351&group=comp.lang.awk#1351

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:ae9:ec0a:0:b0:706:517b:305 with SMTP id h10-20020ae9ec0a000000b00706517b0305mr314834qkg.625.1674012639609;
Tue, 17 Jan 2023 19:30:39 -0800 (PST)
X-Received: by 2002:a05:6870:f71d:b0:14c:9884:cfd6 with SMTP id
ej29-20020a056870f71d00b0014c9884cfd6mr291486oab.11.1674012639211; Tue, 17
Jan 2023 19:30:39 -0800 (PST)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Tue, 17 Jan 2023 19:30:39 -0800 (PST)
Injection-Info: google-groups.googlegroups.com; posting-host=183.83.134.74; posting-account=QvOHZQoAAACPAWJegNxpRnFc_Xh1y_DR
NNTP-Posting-Host: 183.83.134.74
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <047baf8e-19af-40b6-8b84-33f870d55212n@googlegroups.com>
Subject: serial numbers as RS
From: visitnag@gmail.com (raj)
Injection-Date: Wed, 18 Jan 2023 03:30:39 +0000
Content-Type: text/plain; charset="UTF-8"
X-Received-Bytes: 2566
 by: raj - Wed, 18 Jan 2023 03:30 UTC

Hi
I have file with 7 fields.
The first field is serial number
In some records 5th field is missing.
Few records got truncated with the next record. In the sample file
I have shown only two records truncation but in some cases even three to four records got truncated.
sample file:

1 651 643786485 107249 5190 M SMITH 1284
2 963 212018826 103480 M746 R WADHWA 156
3 232 215036022 105012 M743 SAMBA 337
4 232 215036023 105012 M743 SAMBA 443
5 054 215036704 103325 KIYA K 351 ====> 5th field is missing
6 205 308363068 103402 5537 Mc DON 943
7 231 343328800 105880 MANO M 6403 8 231 343329128 105880 MANO M 8324 =====> in both the records 5th field is missing
9 309 361257222 103595 M564 C R SAM 102 10 309 361297561 103595 M564 C R SAM 332
11 216 308659868 625402 9693 FERNAND 365

The required output:

1 651 643786485 107249 5190 M SMITH 1284
2 963 212018826 103480 M746 R WADHWA 156
3 232 215036022 105012 M743 SAMBA 337
4 232 215036023 105012 M743 SAMBA 443
5 054 215036704 103325 4897 KIYA K 351
6 205 308363068 103402 5537 Mc DON 943
7 231 343328800 105880 MANO M 6403
8 231 343329128 105880 MANO M 8324
9 309 361257222 103595 M564 C R SAM 102
10 309 361297561 103595 M564 C R SAM 332

I have tried by considering the serial number as RS but did not get the desired result

awk 'BEGIN{RS="[0-9]+"}{
print $0 RT
}' file

Actually I need first four fields(including serial number) and the last field.
If the "," delimiter is given in the output that would be more helpful.

Thank you

Re: serial numbers as RS

<tq81mh$3knhq$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1353&group=comp.lang.awk#1353

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: serial numbers as RS
Date: Wed, 18 Jan 2023 06:56:33 +0100
Organization: A noiseless patient Spider
Lines: 73
Message-ID: <tq81mh$3knhq$1@dont-email.me>
References: <047baf8e-19af-40b6-8b84-33f870d55212n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 18 Jan 2023 05:56:33 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="44de8d69934f6bd1ef29d0daf4b9f420";
logging-data="3825210"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX1997zJn4dR7BE9JQ9f5da+j"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:6+98X7fuziYFUSV7s+c539F1/YE=
X-Enigmail-Draft-Status: N1110
In-Reply-To: <047baf8e-19af-40b6-8b84-33f870d55212n@googlegroups.com>
 by: Janis Papanagnou - Wed, 18 Jan 2023 05:56 UTC

The contents of your post is inconsistent...

On 18.01.2023 04:30, raj wrote:
> Hi
> I have file with 7 fields.

No. Field numbers vary. A typical value is 8.

> The first field is serial number

No. There's gaps, or, joined subsequent lines.

> In some records 5th field is missing.

Also other fields in joined lines.

> Few records got truncated with the next record. In the sample file
> I have shown only two records truncation but in some cases even three to four records got truncated.
> sample file:
>
> 1 651 643786485 107249 5190 M SMITH 1284
> 2 963 212018826 103480 M746 R WADHWA 156
> 3 232 215036022 105012 M743 SAMBA 337
> 4 232 215036023 105012 M743 SAMBA 443
> 5 054 215036704 103325 KIYA K 351 ====> 5th field is missing
> 6 205 308363068 103402 5537 Mc DON 943
> 7 231 343328800 105880 MANO M 6403 8 231 343329128 105880 MANO M 8324 =====> in both the records 5th field is missing
> 9 309 361257222 103595 M564 C R SAM 102 10 309 361297561 103595 M564 C R SAM 332
> 11 216 308659868 625402 9693 FERNAND 365
>
> The required output:
>
> 1 651 643786485 107249 5190 M SMITH 1284
> 2 963 212018826 103480 M746 R WADHWA 156
> 3 232 215036022 105012 M743 SAMBA 337
> 4 232 215036023 105012 M743 SAMBA 443
> 5 054 215036704 103325 4897 KIYA K 351

And where from should that "4897" come?

> 6 205 308363068 103402 5537 Mc DON 943
> 7 231 343328800 105880 MANO M 6403
> 8 231 343329128 105880 MANO M 8324

You want records with 7 and 8 fields mixed?

> 9 309 361257222 103595 M564 C R SAM 102
> 10 309 361297561 103595 M564 C R SAM 332
>
> I have tried by considering the serial number as RS but did not get the desired result
>
> awk 'BEGIN{RS="[0-9]+"}{
> print $0 RT
> }' file
>
> Actually I need first four fields(including serial number) and the last field.

This does not match with the "required output" above.

> If the "," delimiter is given in the output that would be more helpful.
>
> Thank you
>

....so fix your data sample and requirements first.

And have a closer look on the definition of lines that have a number
of fields that may be 14, 15, 16, and how to distinguish that data.

And speak with the one who created that data trash to fix his process.

Janis

Re: serial numbers as RS

<9e1gshlja8opueo075bs2ej616bk1lc62n@dim53.demon.nl>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1354&group=comp.lang.awk#1354

  copy link   Newsgroups: comp.lang.awk
From: k.nuyt@nospam.demon.nl (Kees Nuyt)
Newsgroups: comp.lang.awk
Subject: Re: serial numbers as RS
Date: Wed, 18 Jan 2023 15:45:37 +0100
Reply-To: k.nuyt@nospam.demon.nl
Message-ID: <9e1gshlja8opueo075bs2ej616bk1lc62n@dim53.demon.nl>
References: <047baf8e-19af-40b6-8b84-33f870d55212n@googlegroups.com>
User-Agent: ForteAgent/7.10.32.1214
MIME-Version: 1.0
Content-Type: text/plain; charset=us-ascii
Content-Transfer-Encoding: 7bit
Organization: KPN B.V.
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!newsreader4.netcologne.de!news.netcologne.de!peer02.ams1!peer.ams1.xlned.com!news.xlned.com!peer02.ams4!peer.am4.highwinds-media.com!news.highwinds-media.com!feed.abavia.com!abe006.abavia.com!abp002.abavia.com!news.kpn.nl!not-for-mail
Lines: 15
Injection-Date: Wed, 18 Jan 2023 15:45:38 +0100
Injection-Info: news.kpn.nl; mail-complaints-to="abuse@kpn.com"
X-Received-Bytes: 1130
 by: Kees Nuyt - Wed, 18 Jan 2023 14:45 UTC

On Tue, 17 Jan 2023 19:30:39 -0800 (PST), raj
<visitnag@gmail.com> wrote:

> Actually I need first four fields(including serial number) and the last field.

The "last field" can always be addressed with $NF

> If the "," delimiter is given in the output that would be more helpful.

Have a look at OFS or printf. Your choice.
--
Kees Nuyt

Re: serial numbers as RS

<0f71e7b2-862d-4df6-87ab-b8e66007c657n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1355&group=comp.lang.awk#1355

  copy link   Newsgroups: comp.lang.awk
X-Received: by 2002:ac8:1417:0:b0:3b6:2e9b:a72c with SMTP id k23-20020ac81417000000b003b62e9ba72cmr325395qtj.465.1674053853784;
Wed, 18 Jan 2023 06:57:33 -0800 (PST)
X-Received: by 2002:a05:6870:6783:b0:15e:f9bc:9c42 with SMTP id
gc3-20020a056870678300b0015ef9bc9c42mr728497oab.159.1674053853346; Wed, 18
Jan 2023 06:57:33 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.awk
Date: Wed, 18 Jan 2023 06:57:33 -0800 (PST)
In-Reply-To: <tq81mh$3knhq$1@dont-email.me>
Injection-Info: google-groups.googlegroups.com; posting-host=183.83.134.74; posting-account=QvOHZQoAAACPAWJegNxpRnFc_Xh1y_DR
NNTP-Posting-Host: 183.83.134.74
References: <047baf8e-19af-40b6-8b84-33f870d55212n@googlegroups.com> <tq81mh$3knhq$1@dont-email.me>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <0f71e7b2-862d-4df6-87ab-b8e66007c657n@googlegroups.com>
Subject: Re: serial numbers as RS
From: visitnag@gmail.com (raj)
Injection-Date: Wed, 18 Jan 2023 14:57:33 +0000
Content-Type: text/plain; charset="UTF-8"
Lines: 124
 by: raj - Wed, 18 Jan 2023 14:57 UTC

On Wednesday, 18 January 2023 at 11:26:35 UTC+5:30, Janis Papanagnou wrote:
> The contents of your post is inconsistent...
>
> On 18.01.2023 04:30, raj wrote:
> > Hi
> > I have file with 7 fields.
>
> No. Field numbers vary. A typical value is 8.
>
> > The first field is serial number
>
> No. There's gaps, or, joined subsequent lines.
>
> > In some records 5th field is missing.
>
> Also other fields in joined lines.
>
> > Few records got truncated with the next record. In the sample file
> > I have shown only two records truncation but in some cases even three to four records got truncated.
> > sample file:
> >
> > 1 651 643786485 107249 5190 M SMITH 1284
> > 2 963 212018826 103480 M746 R WADHWA 156
> > 3 232 215036022 105012 M743 SAMBA 337
> > 4 232 215036023 105012 M743 SAMBA 443
> > 5 054 215036704 103325 KIYA K 351 ====> 5th field is missing
> > 6 205 308363068 103402 5537 Mc DON 943
> > 7 231 343328800 105880 MANO M 6403 8 231 343329128 105880 MANO M 8324 =====> in both the records 5th field is missing
> > 9 309 361257222 103595 M564 C R SAM 102 10 309 361297561 103595 M564 C R SAM 332
> > 11 216 308659868 625402 9693 FERNAND 365
> >
> > The required output:
> >
> > 1 651 643786485 107249 5190 M SMITH 1284
> > 2 963 212018826 103480 M746 R WADHWA 156
> > 3 232 215036022 105012 M743 SAMBA 337
> > 4 232 215036023 105012 M743 SAMBA 443
> > 5 054 215036704 103325 4897 KIYA K 351
>
> And where from should that "4897" come?
>
> > 6 205 308363068 103402 5537 Mc DON 943
> > 7 231 343328800 105880 MANO M 6403
> > 8 231 343329128 105880 MANO M 8324
>
> You want records with 7 and 8 fields mixed?
>
> > 9 309 361257222 103595 M564 C R SAM 102
> > 10 309 361297561 103595 M564 C R SAM 332
> >
> > I have tried by considering the serial number as RS but did not get the desired result
> >
> > awk 'BEGIN{RS="[0-9]+"}{
> > print $0 RT
> > }' file
> >
> > Actually I need first four fields(including serial number) and the last field.
>
> This does not match with the "required output" above.
>
> > If the "," delimiter is given in the output that would be more helpful.
> >
> > Thank you
> >
>
> ...so fix your data sample and requirements first.
>
> And have a closer look on the definition of lines that have a number
> of fields that may be 14, 15, 16, and how to distinguish that data.
>
> And speak with the one who created that data trash to fix his process.
>
> Janis

The data was copy and pasted in a text editor from a pdf file.
The user is not having any tool/access to convert the pdf to doc or excel.

The problem is arising when it is directly copied from the pdf file.
That is the reason for inconsistency.

awk 'BEGIN{RS="[0-9]+"}{
print $0 RT
}' file
The result of above is breaking each field into a separate record.

1
651
643786485
107249
5190
M SMITH 1284

2
963
212018826
103480
M746
R WADHWA 156

3
232
215036022
105012
M743
SAMBA 337

4
232
215036023
105012
M743
SAMBA 443

5
054
215036704
103325
4897
KIYA K 351

.....
......

Re: serial numbers as RS

<tq933v$upus$1@dont-email.me>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=1356&group=comp.lang.awk#1356

  copy link   Newsgroups: comp.lang.awk
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: janis_papanagnou+ng@hotmail.com (Janis Papanagnou)
Newsgroups: comp.lang.awk
Subject: Re: serial numbers as RS
Date: Wed, 18 Jan 2023 16:26:54 +0100
Organization: A noiseless patient Spider
Lines: 22
Message-ID: <tq933v$upus$1@dont-email.me>
References: <047baf8e-19af-40b6-8b84-33f870d55212n@googlegroups.com>
<tq81mh$3knhq$1@dont-email.me>
<0f71e7b2-862d-4df6-87ab-b8e66007c657n@googlegroups.com>
MIME-Version: 1.0
Content-Type: text/plain; charset=utf-8
Content-Transfer-Encoding: 7bit
Injection-Date: Wed, 18 Jan 2023 15:26:55 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="b3f04da338afa244f11a392eac0fe81b";
logging-data="1009628"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18HJwZZPdDLRjUQOccdEP7A"
User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:45.0) Gecko/20100101
Thunderbird/45.8.0
Cancel-Lock: sha1:ziVvN50aGtzK5VpbDQRJAh7lTIU=
In-Reply-To: <0f71e7b2-862d-4df6-87ab-b8e66007c657n@googlegroups.com>
X-Enigmail-Draft-Status: N1110
 by: Janis Papanagnou - Wed, 18 Jan 2023 15:26 UTC

On 18.01.2023 15:57, raj wrote:
>> [...]
>
> The data was copy and pasted in a text editor from a pdf file.

If all you have is a PDF I suggest to use a more sophisticated
PDF tool to extract the text in a more accurate plain text form,
or otherwise fix the worst formatting issue by hand before posting.

> The user is not having any tool/access to convert the pdf to doc or excel.
>
> The problem is arising when it is directly copied from the pdf file.
> That is the reason for inconsistency.

And don't forget to answer/clarify the other issues you have been
hinted to.

Janis

>
> [snip]

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor