Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

The moon is a planet just like the Earth, only it is even deader.


devel / comp.lang.python / Re: Code improvement question

SubjectAuthor
o Re: Code improvement questionMRAB

1
Re: Code improvement question

<mailman.250.1700021317.3828.python-list@python.org>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=28884&group=comp.lang.python#28884

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!weretis.net!feeder8.news.weretis.net!fu-berlin.de!uni-berlin.de!not-for-mail
From: python@mrabarnett.plus.com (MRAB)
Newsgroups: comp.lang.python
Subject: Re: Code improvement question
Date: Wed, 15 Nov 2023 04:08:29 +0000
Lines: 79
Message-ID: <mailman.250.1700021317.3828.python-list@python.org>
References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
<a048a0d3-a98d-445c-b315-0441b29bf737@mrabarnett.plus.com>
Mime-Version: 1.0
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 8bit
X-Trace: news.uni-berlin.de GDp6sWfFdmG/C5QCvz+AwgkqgTm2JKDgaEDzgvF9xglw==
Cancel-Lock: sha1:Fa9omuZzMVpGD1WOLoJMVA29UWg= sha256:U5RZnfMEs+FgANS5ESYnB3/JEuME29oNLzzDCi7uxSg=
Return-Path: <python@mrabarnett.plus.com>
X-Original-To: python-list@python.org
Delivered-To: python-list@mail.python.org
Authentication-Results: mail.python.org; dkim=pass
reason="2048-bit key; unprotected key"
header.d=plus.com header.i=@plus.com header.b=U/yghs9H;
dkim-adsp=none (unprotected policy); dkim-atps=neutral
X-Spam-Status: OK 0.004
X-Spam-Evidence: '*H*': 0.99; '*S*': 0.00; 'looks': 0.02; 'def': 0.04;
'matching': 0.07; '"""': 0.09; 'from:addr:python': 0.09;
'received:192.168.1.64': 0.09; 'set.': 0.09; 'smaller': 0.09;
'subject:Code': 0.09; 'bits': 0.16; 'expressions': 0.16;
'extracting': 0.16; 'from:addr:mrabarnett.plus.com': 0.16;
'from:name:mrab': 0.16; 'hints': 0.16; 'message-
id:@mrabarnett.plus.com': 0.16; 'received:84.93': 0.16;
'received:84.93.230': 0.16; 'received:plus.net': 0.16;
'subject:improvement': 0.16; 'super': 0.16; 'testing.': 0.16;
'thread.': 0.16; 'wrote:': 0.16; 'advance.': 0.17;
'subject:question': 0.17; 'to:addr:python-list': 0.20; 'code':
0.23; "i'd": 0.24; 'tried': 0.26; 'pattern': 0.26; 'else': 0.27;
'bit': 0.27; '>>>': 0.28; 'sense': 0.28; 'example,': 0.28; 'header
:User-Agent:1': 0.30; 'am,': 0.31; 'think': 0.32; 'answers': 0.32;
'end.': 0.32; 'python-list': 0.32; 'specified': 0.32; "wouldn't":
0.32; 'received:192.168.1': 0.32; 'but': 0.32; "i'm": 0.33;
'there': 0.33; 'skip:" 20': 0.34; 'header:In-Reply-To:1': 0.34;
'question.': 0.35; 'files': 0.36; 'people': 0.36;
'received:192.168': 0.37; 'thanks': 0.38; 'text': 0.39; 'use':
0.39; 'match': 0.40; 'want': 0.40; 'me.': 0.62; 'skip:\xc2 10':
0.62; 'here': 0.62; 'skip:r 40': 0.64; 'your': 0.64; 'came': 0.65;
'improve': 0.66; 'numbers': 0.67; 'right': 0.68; 'drop': 0.69;
'pieces': 0.70; 'little': 0.73; '8bit%:100': 0.76; 'documented':
0.76; '"")': 0.84; '10:25': 0.84; 'say,': 0.84; 'cas': 0.91
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=plus.com; s=042019;
t=1700021310; bh=okHYblxtTysRUlZKPIOdGQafr/SZAkDx+I+ACsMwBWo=;
h=Date:Subject:To:References:From:In-Reply-To;
b=U/yghs9HVqpu83OWQxAOec3gR6NLH40DGJy9t3HQ9639NfBKwuE+lrOrCGg++tfxe
RreRHj7Hm2Kbr+B18CMnec9UUPaW/gO+ZqLoEHou8GQcXPkzLwNYHK+Je7Dy4ZEjs9
hiAPzA57zOMtrCNG5gNqgYWjsk/EXtY/s0bMqsjY+fHpFOpMKe2J24lV3zRHKf7HSP
ngKA2Pa3sBBOT9MCG7oCa1a3z0K74wkrwThhcJ7+pqiQ8Zo9meEev+M3nuAj4Jyi2p
kjqnIFPJMJ9BgmM1yRgJGgUVrmORfqkCqTwRkwRTp2hzvcPCWlGTbXeWJo/DMu4bNq
N4sjnaaPXwN0g==
X-Clacks-Overhead: "GNU Terry Pratchett"
X-CM-Score: 0.00
X-CNFS-Analysis: v=2.4 cv=b7ChX/Kx c=1 sm=1 tr=0 ts=6554443e
a=0nF1XD0wxitMEM03M9B4ZQ==:117 a=0nF1XD0wxitMEM03M9B4ZQ==:17
a=IkcTkHD0fZMA:10 a=yDCjjpiEaLUQ9Ck5n6QA:9 a=QEXdDO2ut3YA:10
X-AUTH: mrabarnett@:2500
User-Agent: Mozilla Thunderbird
Content-Language: en-GB
In-Reply-To: <32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
X-CMAE-Envelope: MS4xfEz+eXeOlmGpsyv/lK3lgP7pGs74w3N7tcRyknrJX4/MJ+D50saqMdkZKb37xMRh+qlLvfiVkYjGXFxt4ThInD+TI/B982ZNcCoWI5GEmJYaBMwlxdrR
d+c+vUI33gpNAJw6s5/ZV+gHV5BJJK1hKRa6RRc1CWUvn53zmUGgKmNUq7y1HHqs4pN1+hWZkclRK7doygpFMN4lW4qC/o5ME1E=
X-BeenThere: python-list@python.org
X-Mailman-Version: 2.1.39
Precedence: list
List-Id: General discussion list for the Python programming language
<python-list.python.org>
List-Unsubscribe: <https://mail.python.org/mailman/options/python-list>,
<mailto:python-list-request@python.org?subject=unsubscribe>
List-Archive: <https://mail.python.org/pipermail/python-list/>
List-Post: <mailto:python-list@python.org>
List-Help: <mailto:python-list-request@python.org?subject=help>
List-Subscribe: <https://mail.python.org/mailman/listinfo/python-list>,
<mailto:python-list-request@python.org?subject=subscribe>
X-Mailman-Original-Message-ID: <a048a0d3-a98d-445c-b315-0441b29bf737@mrabarnett.plus.com>
X-Mailman-Original-References: <b6e81def-3db8-4f05-8459-9a967c774020@dewhirst.com.au>
<088586a6-79c2-4114-8d62-5e1a1061b841@mrabarnett.plus.com>
<32bbd365-a2fb-471f-b19e-3a3ec4457124@dewhirst.com.au>
 by: MRAB - Wed, 15 Nov 2023 04:08 UTC

On 2023-11-15 03:41, Mike Dewhirst via Python-list wrote:
> On 15/11/2023 10:25 am, MRAB via Python-list wrote:
>> On 2023-11-14 23:14, Mike Dewhirst via Python-list wrote:
>>> I'd like to improve the code below, which works. It feels clunky to me.
>>>
>>> I need to clean up user-uploaded files the size of which I don't know in
>>> advance.
>>>
>>> After cleaning they might be as big as 1Mb but that would be super rare.
>>> Perhaps only for testing.
>>>
>>> I'm extracting CAS numbers and here is the pattern xx-xx-x up to
>>> xxxxxxx-xx-x eg., 1012300-77-4
>>>
>>> def remove_alpha(txt):
>>>
>>>       """  r'[^0-9\- ]':
>>>
>>>       [^...]: Match any character that is not in the specified set.
>>>
>>>       0-9: Match any digit.
>>>
>>>       \: Escape character.
>>>
>>>       -: Match a hyphen.
>>>
>>>       Space: Match a space.
>>>
>>>       """
>>>
>>>       cleaned_txt = re.sub(r'[^0-9\- ]', '', txt)
>>>
>>>       bits = cleaned_txt.split()
>>>
>>>       pieces = []
>>>
>>>       for bit in bits:
>>>
>>>           # minimum size of a CAS number is 7 so drop smaller clumps
>>> of digits
>>>
>>>           pieces.append(bit if len(bit) > 6 else "")
>>>
>>>       return " ".join(pieces)
>>>
>>>
>>> Many thanks for any hints
>>>
>> Why don't you use re.findall?
>>
>> re.findall(r'\b[0-9]{2,7}-[0-9]{2}-[0-9]{2}\b', txt)
>
> I think I can see what you did there but it won't make sense to me - or
> whoever looks at the code - in future.
>
> That answers your specific question. However, I am in awe of people who
> can just "do" regular expressions and I thank you very much for what
> would have been a monumental effort had I tried it.
>
> That little re.sub() came from ChatGPT and I can understand it without
> too much effort because it came documented
>
> I suppose ChatGPT is the answer to this thread. Or everything. Or will be.
>
\b Word boundary
[0-9]{2,7} 2..7 digits
- "-"
[0-9]{2} 2 digits
- "-"
[0-9]{2} 2 digits
\b Word boundary

The "word boundary" thing is to stop it matching where there are letters
or digits right next to the digits.

For example, if the text contained, say, "123456789-12-1234", you
wouldn't want it to match because there are more than 7 digits at the
start and more than 2 digits at the end.


devel / comp.lang.python / Re: Code improvement question

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor