Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

Just don't create a file called -rf. :-) -- Larry Wall in <11393@jpl-devvax.JPL.NASA.GOV>


devel / comp.lang.python / Obtain the query interface url of BCS server.

SubjectAuthor
* Obtain the query interface url of BCS server.hongy...@gmail.com
`* Re: Obtain the query interface url of BCS server.DFS
 `* Re: Obtain the query interface url of BCS server.hongy...@gmail.com
  `* Re: Obtain the query interface url of BCS server.DFS
   `* Re: Obtain the query interface url of BCS server.hongy...@gmail.com
    `* Re: Obtain the query interface url of BCS server.DFS
     `- Re: Obtain the query interface url of BCS server.hongy...@gmail.com

1
Obtain the query interface url of BCS server.

<273b9481-192b-48b1-b057-fcfc22c6cf21n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=23905&group=comp.lang.python#23905

  copy link   Newsgroups: comp.lang.python
X-Received: by 2002:a05:620a:1255:b0:6ce:59a0:f2ee with SMTP id a21-20020a05620a125500b006ce59a0f2eemr207910qkl.111.1662973251192;
Mon, 12 Sep 2022 02:00:51 -0700 (PDT)
X-Received: by 2002:a05:6808:3081:b0:34f:7060:befa with SMTP id
bl1-20020a056808308100b0034f7060befamr3933903oib.212.1662973250917; Mon, 12
Sep 2022 02:00:50 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Mon, 12 Sep 2022 02:00:50 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=172.104.91.163; posting-account=kF0ZaAoAAACPbiK5gldhAyX5qTd3krV2
NNTP-Posting-Host: 172.104.91.163
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <273b9481-192b-48b1-b057-fcfc22c6cf21n@googlegroups.com>
Subject: Obtain the query interface url of BCS server.
From: hongyi.zhao@gmail.com (hongy...@gmail.com)
Injection-Date: Mon, 12 Sep 2022 09:00:51 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 1595
 by: hongy...@gmail.com - Mon, 12 Sep 2022 09:00 UTC

I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script.

However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim?

[1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10

Regards,
Zhao

Re: Obtain the query interface url of BCS server.

<GvMTK.25662$1Ly7.17695@fx34.iad>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=23908&group=comp.lang.python#23908

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx34.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.13.0
Subject: Re: Obtain the query interface url of BCS server.
Content-Language: en-US
Newsgroups: comp.lang.python
References: <273b9481-192b-48b1-b057-fcfc22c6cf21n@googlegroups.com>
From: nospam@dfs.com (DFS)
In-Reply-To: <273b9481-192b-48b1-b057-fcfc22c6cf21n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 37
Message-ID: <GvMTK.25662$1Ly7.17695@fx34.iad>
X-Complaints-To: abuse@blocknews.net
NNTP-Posting-Date: Mon, 12 Sep 2022 20:19:50 UTC
Organization: blocknews - www.blocknews.net
Date: Mon, 12 Sep 2022 16:19:51 -0400
X-Received-Bytes: 2202
 by: DFS - Mon, 12 Sep 2022 20:19 UTC

On 9/12/2022 5:00 AM, hongy...@gmail.com wrote:
> I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script.
>
> However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim?
>
> [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10
>
> Regards,
> Zhao

You didn't say what you want to query. Are you trying to download
entire sections of the Bilbao Crystallographic Server? Maybe the admins
will give you access to the data.

* this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
brings up the table of space group symbols.

* choose say #7: Pc

* now click ITA Settings, then choose the last entry "P c 1 1" and it
loads:

https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita

You might be able to fool around with that URL and substitute values and
get back the data you want (in HTML) via Python. Do you really want
HTML results?

Hit Ctrl+U to see the source HTML of a webpage

Right-click or hit Ctrl + Shift + C to inspect the individual elements
of the page

Re: Obtain the query interface url of BCS server.

<ebc27697-3574-4186-899f-92cef98b42f8n@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=23913&group=comp.lang.python#23913

  copy link   Newsgroups: comp.lang.python
X-Received: by 2002:a05:622a:86:b0:342:f620:dc7a with SMTP id o6-20020a05622a008600b00342f620dc7amr27181514qtw.594.1663055203839;
Tue, 13 Sep 2022 00:46:43 -0700 (PDT)
X-Received: by 2002:a05:6808:3081:b0:34f:7060:befa with SMTP id
bl1-20020a056808308100b0034f7060befamr909676oib.212.1663055203425; Tue, 13
Sep 2022 00:46:43 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Tue, 13 Sep 2022 00:46:43 -0700 (PDT)
In-Reply-To: <GvMTK.25662$1Ly7.17695@fx34.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=103.149.249.9; posting-account=kF0ZaAoAAACPbiK5gldhAyX5qTd3krV2
NNTP-Posting-Host: 103.149.249.9
References: <273b9481-192b-48b1-b057-fcfc22c6cf21n@googlegroups.com> <GvMTK.25662$1Ly7.17695@fx34.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <ebc27697-3574-4186-899f-92cef98b42f8n@googlegroups.com>
Subject: Re: Obtain the query interface url of BCS server.
From: hongyi.zhao@gmail.com (hongy...@gmail.com)
Injection-Date: Tue, 13 Sep 2022 07:46:43 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 3222
 by: hongy...@gmail.com - Tue, 13 Sep 2022 07:46 UTC

On Tuesday, September 13, 2022 at 4:20:12 AM UTC+8, DFS wrote:
> On 9/12/2022 5:00 AM, hongy...@gmail.com wrote:
> > I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script.
> >
> > However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim?
> >
> > [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10
> >
> > Regards,
> > Zhao
> You didn't say what you want to query. Are you trying to download
> entire sections of the Bilbao Crystallographic Server?

I am engaged in some related research and need some specific data used by BCS server.

> Maybe the admins will give you access to the data.

I don't think they will provide such convenience to researchers who have no cooperative relationship with them.

> * this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
> brings up the table of space group symbols.
>
> * choose say #7: Pc
>
> * now click ITA Settings, then choose the last entry "P c 1 1" and it
> loads:
>
> https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita

Not only that, but I want to obtain all such URLs programmatically!
> You might be able to fool around with that URL and substitute values and
> get back the data you want (in HTML) via Python. Do you really want
> HTML results?
>
> Hit Ctrl+U to see the source HTML of a webpage
>
> Right-click or hit Ctrl + Shift + C to inspect the individual elements
> of the page

For batch operations, all these manual methods are inefficient.

Best Regards,
Zhao

Re: Obtain the query interface url of BCS server.

<eE%TK.125259$IRd5.122009@fx10.iad>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=23917&group=comp.lang.python#23917

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx10.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.13.0
From: nospam@dfs.com (DFS)
Subject: Re: Obtain the query interface url of BCS server.
Newsgroups: comp.lang.python
References: <273b9481-192b-48b1-b057-fcfc22c6cf21n@googlegroups.com>
<GvMTK.25662$1Ly7.17695@fx34.iad>
<ebc27697-3574-4186-899f-92cef98b42f8n@googlegroups.com>
Content-Language: en-US
In-Reply-To: <ebc27697-3574-4186-899f-92cef98b42f8n@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 113
Message-ID: <eE%TK.125259$IRd5.122009@fx10.iad>
X-Complaints-To: abuse@blocknews.net
NNTP-Posting-Date: Tue, 13 Sep 2022 13:32:58 UTC
Organization: blocknews - www.blocknews.net
Date: Tue, 13 Sep 2022 09:32:58 -0400
X-Received-Bytes: 4764
 by: DFS - Tue, 13 Sep 2022 13:32 UTC

On 9/13/2022 3:46 AM, hongy...@gmail.com wrote:
> On Tuesday, September 13, 2022 at 4:20:12 AM UTC+8, DFS wrote:
>> On 9/12/2022 5:00 AM, hongy...@gmail.com wrote:
>>> I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script.
>>>
>>> However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim?
>>>
>>> [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10
>>>
>>> Regards,
>>> Zhao
>> You didn't say what you want to query. Are you trying to download
>> entire sections of the Bilbao Crystallographic Server?
>
> I am engaged in some related research and need some specific data used by BCS server.

What specific data?

Is it available elsewhere?

>> Maybe the admins will give you access to the data.
>
> I don't think they will provide such convenience to researchers who have no cooperative relationship with them.

You can try. Tell the admins what data you want, and ask them for the
easiest way to get it.

>> * this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
>> brings up the table of space group symbols.
>>
>> * choose say #7: Pc
>>
>> * now click ITA Settings, then choose the last entry "P c 1 1" and it
>> loads:
>>
>> https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita
>
> Not only that, but I want to obtain all such URLs programmatically!
>
>> You might be able to fool around with that URL and substitute values and
>> get back the data you want (in HTML) via Python. Do you really want
>> HTML results?
>>
>> Hit Ctrl+U to see the source HTML of a webpage
>>
>> Right-click or hit Ctrl + Shift + C to inspect the individual elements
>> of the page
>
> For batch operations, all these manual methods are inefficient.

Yes, but I don't think you'll be able to retrieve the URLs
programmatically. The JavaScript code doesn't put them in the HTML
result, except for that one I showed you, which seems like a mistake on
their part.

So you'll have to figure out the search fields, and your python program
will have to cycle through the search values:

Sample from above
gnum = 007
what = gp
trmat = b,-a-c,c
unconv = P c 1 1
from = ita

wBase = "https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen"
wGnum = "?gnum=" + findgnum
wWhat = "&what=" + findWhat
wTrmat = "&trmat=" + findTrmat
wUnconv = "&unconv=" + findUnconv
wFrom = "&from=" + findFrom
webpage = wBase + wGnum + wWhat + wTrmat + wUnconv + wFrom

Then if that returns a hit, you'll have to parse the resulting HTML and
extract the exact data you want.

I did something similar a while back using the requests and lxml libraries
----------------------------------------------------------------
#build url
wBase = "http://www.usdirectory.com"
wForm = "/ypr.aspx?fromform=qsearch"
wKeyw = "&qhqn=" + keyw
wCityZip = "&qc=" + cityzip
wState = "&qs=" + state
wDist = "&rg=" + str(miles)
wSort = "&sb=a2z" #sort alpha
wPage = "&ap=" #used with the results page number
webpage = wBase + wForm + wKeyw + wCityZip + wState + wDist

#open URL
page = requests.get(webpage)
tree = html.fromstring(page.content)

#no matches
matches = tree.xpath('//strong/text()')
if passNbr == 1 and ("No results were found" in str(matches)):
print "No results found for that search"
exit(0)
----------------------------------------------------------------

2.x code file: https://file.io/VdptORSKh5CN

> Best Regards,
> Zhao

Re: Obtain the query interface url of BCS server.

<4ea83cbd-b549-485b-b8b8-58b50133ab4cn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=23920&group=comp.lang.python#23920

  copy link   Newsgroups: comp.lang.python
X-Received: by 2002:a05:6214:23c6:b0:491:99e3:80ce with SMTP id hr6-20020a05621423c600b0049199e380cemr28632877qvb.111.1663111757555;
Tue, 13 Sep 2022 16:29:17 -0700 (PDT)
X-Received: by 2002:a05:6870:231b:b0:116:7e15:1593 with SMTP id
w27-20020a056870231b00b001167e151593mr851358oao.268.1663111757085; Tue, 13
Sep 2022 16:29:17 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Tue, 13 Sep 2022 16:29:16 -0700 (PDT)
In-Reply-To: <eE%TK.125259$IRd5.122009@fx10.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=172.104.91.163; posting-account=kF0ZaAoAAACPbiK5gldhAyX5qTd3krV2
NNTP-Posting-Host: 172.104.91.163
References: <273b9481-192b-48b1-b057-fcfc22c6cf21n@googlegroups.com>
<GvMTK.25662$1Ly7.17695@fx34.iad> <ebc27697-3574-4186-899f-92cef98b42f8n@googlegroups.com>
<eE%TK.125259$IRd5.122009@fx10.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4ea83cbd-b549-485b-b8b8-58b50133ab4cn@googlegroups.com>
Subject: Re: Obtain the query interface url of BCS server.
From: hongyi.zhao@gmail.com (hongy...@gmail.com)
Injection-Date: Tue, 13 Sep 2022 23:29:17 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6234
 by: hongy...@gmail.com - Tue, 13 Sep 2022 23:29 UTC

On Tuesday, September 13, 2022 at 9:33:20 PM UTC+8, DFS wrote:
> On 9/13/2022 3:46 AM, hongy...@gmail.com wrote:
> > On Tuesday, September 13, 2022 at 4:20:12 AM UTC+8, DFS wrote:
> >> On 9/12/2022 5:00 AM, hongy...@gmail.com wrote:
> >>> I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script.
> >>>
> >>> However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim?
> >>>
> >>> [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10
> >>>
> >>> Regards,
> >>> Zhao
> >> You didn't say what you want to query. Are you trying to download
> >> entire sections of the Bilbao Crystallographic Server?
> >
> > I am engaged in some related research and need some specific data used by BCS server.
> What specific data?

All the data corresponding to the total catalog here:
https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
> Is it available elsewhere?

This is an internationally recognized authoritative data source in this field. Data from other places, even if there are readily available electronic versions, are basically taken from here and are not comprehensive.

> >> Maybe the admins will give you access to the data.
> >
> > I don't think they will provide such convenience to researchers who have no cooperative relationship with them.
> You can try. Tell the admins what data you want, and ask them for the
> easiest way to get it.
> >> * this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
> >> brings up the table of space group symbols.
> >>
> >> * choose say #7: Pc
> >>
> >> * now click ITA Settings, then choose the last entry "P c 1 1" and it
> >> loads:
> >>
> >> https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita
> >
> > Not only that, but I want to obtain all such URLs programmatically!
> >
> >> You might be able to fool around with that URL and substitute values and
> >> get back the data you want (in HTML) via Python. Do you really want
> >> HTML results?
> >>
> >> Hit Ctrl+U to see the source HTML of a webpage
> >>
> >> Right-click or hit Ctrl + Shift + C to inspect the individual elements
> >> of the page
> >
> > For batch operations, all these manual methods are inefficient.
> Yes, but I don't think you'll be able to retrieve the URLs
> programmatically. The JavaScript code doesn't put them in the HTML
> result, except for that one I showed you, which seems like a mistake on
> their part.
>
> So you'll have to figure out the search fields, and your python program
> will have to cycle through the search values:
>
> Sample from above
> gnum = 007
> what = gp
> trmat = b,-a-c,c
> unconv = P c 1 1
> from = ita

The problem is that I must first get all possible combinations of these variables.
> wBase = "https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen"
> wGnum = "?gnum=" + findgnum
> wWhat = "&what=" + findWhat
> wTrmat = "&trmat=" + findTrmat
> wUnconv = "&unconv=" + findUnconv
> wFrom = "&from=" + findFrom
> webpage = wBase + wGnum + wWhat + wTrmat + wUnconv + wFrom
>
> Then if that returns a hit, you'll have to parse the resulting HTML and
> extract the exact data you want.
>
>
>
> I did something similar a while back using the requests and lxml libraries
> ----------------------------------------------------------------
> #build url
> wBase = "http://www.usdirectory.com"
> wForm = "/ypr.aspx?fromform=qsearch"
> wKeyw = "&qhqn=" + keyw
> wCityZip = "&qc=" + cityzip
> wState = "&qs=" + state
> wDist = "&rg=" + str(miles)
> wSort = "&sb=a2z" #sort alpha
> wPage = "&ap=" #used with the results page number
> webpage = wBase + wForm + wKeyw + wCityZip + wState + wDist
>
> #open URL
> page = requests.get(webpage)
> tree = html.fromstring(page.content)
>
> #no matches
> matches = tree.xpath('//strong/text()')
> if passNbr == 1 and ("No results were found" in str(matches)):
> print "No results found for that search"
> exit(0)
> ----------------------------------------------------------------
>
>
>
> 2.x code file: https://file.io/VdptORSKh5CN
>
>
>
> > Best Regards,
> > Zhao

Re: Obtain the query interface url of BCS server.

<8bbUK.173095$3AK7.82100@fx35.iad>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=23922&group=comp.lang.python#23922

  copy link   Newsgroups: comp.lang.python
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!fx35.iad.POSTED!not-for-mail
MIME-Version: 1.0
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:91.0) Gecko/20100101
Thunderbird/91.13.0
Subject: Re: Obtain the query interface url of BCS server.
Content-Language: en-US
Newsgroups: comp.lang.python
References: <273b9481-192b-48b1-b057-fcfc22c6cf21n@googlegroups.com>
<GvMTK.25662$1Ly7.17695@fx34.iad>
<ebc27697-3574-4186-899f-92cef98b42f8n@googlegroups.com>
<eE%TK.125259$IRd5.122009@fx10.iad>
<4ea83cbd-b549-485b-b8b8-58b50133ab4cn@googlegroups.com>
From: nospam@dfs.com (DFS)
In-Reply-To: <4ea83cbd-b549-485b-b8b8-58b50133ab4cn@googlegroups.com>
Content-Type: text/plain; charset=UTF-8; format=flowed
Content-Transfer-Encoding: 7bit
Lines: 179
Message-ID: <8bbUK.173095$3AK7.82100@fx35.iad>
X-Complaints-To: abuse@blocknews.net
NNTP-Posting-Date: Wed, 14 Sep 2022 02:41:08 UTC
Organization: blocknews - www.blocknews.net
Date: Tue, 13 Sep 2022 22:41:09 -0400
X-Received-Bytes: 7554
 by: DFS - Wed, 14 Sep 2022 02:41 UTC

On 9/13/2022 7:29 PM, hongy...@gmail.com wrote:
> On Tuesday, September 13, 2022 at 9:33:20 PM UTC+8, DFS wrote:
>> On 9/13/2022 3:46 AM, hongy...@gmail.com wrote:
>>> On Tuesday, September 13, 2022 at 4:20:12 AM UTC+8, DFS wrote:
>>>> On 9/12/2022 5:00 AM, hongy...@gmail.com wrote:
>>>>> I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script.
>>>>>
>>>>> However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim?
>>>>>
>>>>> [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10" rel="nofollow" target="_blank">https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10
>>>>>
>>>>> Regards,
>>>>> Zhao
>>>> You didn't say what you want to query. Are you trying to download
>>>> entire sections of the Bilbao Crystallographic Server?
>>>
>>> I am engaged in some related research and need some specific data used by BCS server.
>> What specific data?
>
> All the data corresponding to the total catalog here:
> https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
>
>> Is it available elsewhere?
>
> This is an internationally recognized authoritative data source in this field. Data from other places, even if there are readily available electronic versions, are basically taken from here and are not comprehensive.
>
>>>> Maybe the admins will give you access to the data.
>>>
>>> I don't think they will provide such convenience to researchers who have no cooperative relationship with them.
>> You can try. Tell the admins what data you want, and ask them for the
>> easiest way to get it.
>>>> * this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
>>>> brings up the table of space group symbols.
>>>>
>>>> * choose say #7: Pc
>>>>
>>>> * now click ITA Settings, then choose the last entry "P c 1 1" and it
>>>> loads:
>>>>
>>>> https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita
>>>
>>> Not only that, but I want to obtain all such URLs programmatically!
>>>
>>>> You might be able to fool around with that URL and substitute values and
>>>> get back the data you want (in HTML) via Python. Do you really want
>>>> HTML results?
>>>>
>>>> Hit Ctrl+U to see the source HTML of a webpage
>>>>
>>>> Right-click or hit Ctrl + Shift + C to inspect the individual elements
>>>> of the page
>>>
>>> For batch operations, all these manual methods are inefficient.
>> Yes, but I don't think you'll be able to retrieve the URLs
>> programmatically. The JavaScript code doesn't put them in the HTML
>> result, except for that one I showed you, which seems like a mistake on
>> their part.
>>
>> So you'll have to figure out the search fields, and your python program
>> will have to cycle through the search values:
>>
>> Sample from above
>> gnum = 007
>> what = gp
>> trmat = b,-a-c,c
>> unconv = P c 1 1
>> from = ita
>
> The problem is that I must first get all possible combinations of these variables.

Shouldn't be too hard, but I've never done some of these things and have
no code for you:

space group number = gnum = 1 to 230

* use python to put each of those values, one at a time, into the group
number field on the webpage

* use python to simulate a button click of the ITA Settings button

* it should load the HTML of the list of ITA settings for that space group

* use python to parse the HTML and extract each of the ITA settings.
The line of HTML has 'ITA number' in it. Find each of the 'href' values
in the line(s).

Real HTML from ITA Settings for space group 10:

<tr><th bgcolor="#bbbbbb">ITA number</th> <th
bgcolor="#bbbbbb">Setting</th></tr><tr><td align="center"
bgcolor="#f0f0f0">10</td> <td align="center"><a
href="/cgi-bin/cryst/programs//nph-getgen?gnum=010&what=gp"><i>P</i> 1
2/<i>m</i> 1</a></td></tr><tr><td align="center"
bgcolor="#f0f0f0">10</td> <td align="center"><a
href="/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=c,a,b&unconv=P
1 1 2/m&from=ita"><i>P</i> 1 1 2/<i>m</i></a></td></tr><tr><td
align="center" bgcolor="#f0f0f0">10</td> <td align="center"><a
href="/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=b,c,a&unconv=P
2/m 1 1&from=ita"><i>P</i> 2/<i>m</i> 1 1</a></td></tr></table>
</center>

If you parse it right you'll have these addresses:

"/cgi-bin/cryst/programs//nph-getgen?gnum=010&what=gp"

"/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=c,a,b&unconv=P 1
1 2/m&from=ita"

"/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=b,c,a&unconv=P 2/m
1 1&from=ita"

Then you can parse each of these addresses and build a master list of
the valid combinations of:

gnum, what, trmat, unconv, from

Check into the lxml library, and the 'etree' class. https://lxml.de

You can also search gen.lib.rus.ec for the crystallography volumes, and
maybe cut and paste data from them.

>> wBase = "https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen"
>> wGnum = "?gnum=" + findgnum
>> wWhat = "&what=" + findWhat
>> wTrmat = "&trmat=" + findTrmat
>> wUnconv = "&unconv=" + findUnconv
>> wFrom = "&from=" + findFrom
>> webpage = wBase + wGnum + wWhat + wTrmat + wUnconv + wFrom
>>
>> Then if that returns a hit, you'll have to parse the resulting HTML and
>> extract the exact data you want.
>>
>>
>>
>> I did something similar a while back using the requests and lxml libraries
>> ----------------------------------------------------------------
>> #build url
>> wBase = "http://www.usdirectory.com"
>> wForm = "/ypr.aspx?fromform=qsearch"
>> wKeyw = "&qhqn=" + keyw
>> wCityZip = "&qc=" + cityzip
>> wState = "&qs=" + state
>> wDist = "&rg=" + str(miles)
>> wSort = "&sb=a2z" #sort alpha
>> wPage = "&ap=" #used with the results page number
>> webpage = wBase + wForm + wKeyw + wCityZip + wState + wDist
>>
>> #open URL
>> page = requests.get(webpage)
>> tree = html.fromstring(page.content)
>>
>> #no matches
>> matches = tree.xpath('//strong/text()')
>> if passNbr == 1 and ("No results were found" in str(matches)):
>> print "No results found for that search"
>> exit(0)
>> ----------------------------------------------------------------
>>
>>
>>
>> 2.x code file: https://file.io/VdptORSKh5CN
>>
>>
>>
>>> Best Regards,
>>> Zhao

You're welcome.

Re: Obtain the query interface url of BCS server.

<be10579f-e50e-42dd-b68f-ee94c9e2bd5cn@googlegroups.com>

  copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=23926&group=comp.lang.python#23926

  copy link   Newsgroups: comp.lang.python
X-Received: by 2002:a05:6214:f24:b0:4ac:a9fd:8b42 with SMTP id iw4-20020a0562140f2400b004aca9fd8b42mr15666982qvb.22.1663159460103;
Wed, 14 Sep 2022 05:44:20 -0700 (PDT)
X-Received: by 2002:a05:6871:8a8:b0:127:7ae0:92d3 with SMTP id
r40-20020a05687108a800b001277ae092d3mr2066583oaq.32.1663159459623; Wed, 14
Sep 2022 05:44:19 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border-2.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.python
Date: Wed, 14 Sep 2022 05:44:19 -0700 (PDT)
In-Reply-To: <8bbUK.173095$3AK7.82100@fx35.iad>
Injection-Info: google-groups.googlegroups.com; posting-host=172.104.91.163; posting-account=kF0ZaAoAAACPbiK5gldhAyX5qTd3krV2
NNTP-Posting-Host: 172.104.91.163
References: <273b9481-192b-48b1-b057-fcfc22c6cf21n@googlegroups.com>
<GvMTK.25662$1Ly7.17695@fx34.iad> <ebc27697-3574-4186-899f-92cef98b42f8n@googlegroups.com>
<eE%TK.125259$IRd5.122009@fx10.iad> <4ea83cbd-b549-485b-b8b8-58b50133ab4cn@googlegroups.com>
<8bbUK.173095$3AK7.82100@fx35.iad>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <be10579f-e50e-42dd-b68f-ee94c9e2bd5cn@googlegroups.com>
Subject: Re: Obtain the query interface url of BCS server.
From: hongyi.zhao@gmail.com (hongy...@gmail.com)
Injection-Date: Wed, 14 Sep 2022 12:44:20 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 167
 by: hongy...@gmail.com - Wed, 14 Sep 2022 12:44 UTC

On Wednesday, September 14, 2022 at 10:41:32 AM UTC+8, DFS wrote:
> On 9/13/2022 7:29 PM, hongy...@gmail.com wrote:
> > On Tuesday, September 13, 2022 at 9:33:20 PM UTC+8, DFS wrote:
> >> On 9/13/2022 3:46 AM, hongy...@gmail.com wrote:
> >>> On Tuesday, September 13, 2022 at 4:20:12 AM UTC+8, DFS wrote:
> >>>> On 9/12/2022 5:00 AM, hongy...@gmail.com wrote:
> >>>>> I want to do the query from with in script based on the interface here [1]. For this purpose, the underlying posting URL must be obtained, say, the URL corresponding to "ITA Settings" button, so that I can make the corresponding query URL and issue the query from the script.
> >>>>>
> >>>>> However, I did not find the conversion rules from these buttons to the corresponding URL. Any hints for achieving this aim?
> >>>>>
> >>>>> [1] https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10" rel="nofollow" target="_blank">https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen?list=new&what=gen&gnum=10
> >>>>>
> >>>>> Regards,
> >>>>> Zhao
> >>>> You didn't say what you want to query. Are you trying to download
> >>>> entire sections of the Bilbao Crystallographic Server?
> >>>
> >>> I am engaged in some related research and need some specific data used by BCS server.
> >> What specific data?
> >
> > All the data corresponding to the total catalog here:
> > https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
> >
> >> Is it available elsewhere?
> >
> > This is an internationally recognized authoritative data source in this field. Data from other places, even if there are readily available electronic versions, are basically taken from here and are not comprehensive.
> >
> >>>> Maybe the admins will give you access to the data.
> >>>
> >>> I don't think they will provide such convenience to researchers who have no cooperative relationship with them.
> >> You can try. Tell the admins what data you want, and ask them for the
> >> easiest way to get it.
> >>>> * this link: https://www.cryst.ehu.es/cgi-bin/cryst/programs/nph-getgen
> >>>> brings up the table of space group symbols.
> >>>>
> >>>> * choose say #7: Pc
> >>>>
> >>>> * now click ITA Settings, then choose the last entry "P c 1 1" and it
> >>>> loads:
> >>>>
> >>>> https://www.cryst.ehu.es/cgi-bin/cryst/programs//nph-trgen?gnum=007&what=gp&trmat=b,-a-c,c&unconv=P%20c%201%201&from=ita
> >>>
> >>> Not only that, but I want to obtain all such URLs programmatically!
> >>>
> >>>> You might be able to fool around with that URL and substitute values and
> >>>> get back the data you want (in HTML) via Python. Do you really want
> >>>> HTML results?
> >>>>
> >>>> Hit Ctrl+U to see the source HTML of a webpage
> >>>>
> >>>> Right-click or hit Ctrl + Shift + C to inspect the individual elements
> >>>> of the page
> >>>
> >>> For batch operations, all these manual methods are inefficient.
> >> Yes, but I don't think you'll be able to retrieve the URLs
> >> programmatically. The JavaScript code doesn't put them in the HTML
> >> result, except for that one I showed you, which seems like a mistake on
> >> their part.
> >>
> >> So you'll have to figure out the search fields, and your python program
> >> will have to cycle through the search values:
> >>
> >> Sample from above
> >> gnum = 007
> >> what = gp
> >> trmat = b,-a-c,c
> >> unconv = P c 1 1
> >> from = ita
> >
> > The problem is that I must first get all possible combinations of these variables.
> Shouldn't be too hard, but I've never done some of these things and have
> no code for you:
>
> space group number = gnum = 1 to 230
>
> * use python to put each of those values, one at a time, into the group
> number field on the webpage
>
> * use python to simulate a button click of the ITA Settings button
>
> * it should load the HTML of the list of ITA settings for that space group

This is the trickiest part of the problem. For this purpose, Vladimir gave the following suggestion here [1]:

This is trivial with Selenium. Highlight the element by XPATH and extract URL with .get_attribute("href")

[1] https://discuss.python.org/t/obtain-the-query-interface-url-of-bcs-server/18996/2?u=hongyi-zhao

> * use python to parse the HTML and extract each of the ITA settings.
> The line of HTML has 'ITA number' in it. Find each of the 'href' values
> in the line(s).
>
> Real HTML from ITA Settings for space group 10:
>
> <tr><th bgcolor="#bbbbbb">ITA number</th> <th
> bgcolor="#bbbbbb">Setting</th></tr><tr><td align="center"
> bgcolor="#f0f0f0">10</td> <td align="center"><a
> href="/cgi-bin/cryst/programs//nph-getgen?gnum=010&what=gp"><i>P</i> 1
> 2/<i>m</i> 1</a></td></tr><tr><td align="center"
> bgcolor="#f0f0f0">10</td> <td align="center"><a
> href="/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=c,a,b&unconv=P
> 1 1 2/m&from=ita"><i>P</i> 1 1 2/<i>m</i></a></td></tr><tr><td
> align="center" bgcolor="#f0f0f0">10</td> <td align="center"><a
> href="/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=b,c,a&unconv=P
> 2/m 1 1&from=ita"><i>P</i> 2/<i>m</i> 1 1</a></td></tr></table>
> </center>
>
> If you parse it right you'll have these addresses:
>
> "/cgi-bin/cryst/programs//nph-getgen?gnum=010&what=gp"
>
> "/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=c,a,b&unconv=P 1
> 1 2/m&from=ita"
>
> "/cgi-bin/cryst/programs//nph-trgen?gnum=010&what=gp&trmat=b,c,a&unconv=P 2/m
> 1 1&from=ita"
>
>
> Then you can parse each of these addresses and build a master list of
> the valid combinations of:
>
> gnum, what, trmat, unconv, from
>
> Check into the lxml library, and the 'etree' class. https://lxml.de
>
> You can also search gen.lib.rus.ec for the crystallography volumes, and
> maybe cut and paste data from them.

The relevant search results are as follows:

http://libgen.rs/search.php?&req=International+Tables+for+Crystallography&phrase=1&view=simple&column=def&sort=year&sortmode=DESC

Best,
Zhao


devel / comp.lang.python / Obtain the query interface url of BCS server.

1
server_pubkey.txt

rocksolid light 0.9.81
clearnet tor