Rocksolid Light

Welcome to RetroBBS

mail  files  register  newsreader  groups  login

Message-ID:  

A list is only as strong as its weakest link. -- Don Knuth


computers / comp.misc / Why Tcl is 700% faster than Python for db benchmarking

SubjectAuthor
o Why Tcl is 700% faster than Python for db benchmarkingBen Collver

1
Why Tcl is 700% faster than Python for db benchmarking

<slrntm7rqn.qe1.bencollver@svadhyaya.localdomain>

  copy mid

https://www.rocksolidbbs.com/computers/article-flat.php?id=2090&group=comp.misc#2090

  copy link   Newsgroups: comp.misc
Path: i2pn2.org!i2pn.org!eternal-september.org!reader01.eternal-september.org!.POSTED!not-for-mail
From: bencollver@tilde.pink (Ben Collver)
Newsgroups: comp.misc
Subject: Why Tcl is 700% faster than Python for db benchmarking
Date: Thu, 3 Nov 2022 16:45:16 -0000 (UTC)
Organization: A noiseless patient Spider
Lines: 825
Message-ID: <slrntm7rqn.qe1.bencollver@svadhyaya.localdomain>
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
Injection-Date: Thu, 3 Nov 2022 16:45:16 -0000 (UTC)
Injection-Info: reader01.eternal-september.org; posting-host="73c3e3301ca14fc64729c408f5ae9896";
logging-data="1585507"; mail-complaints-to="abuse@eternal-september.org"; posting-account="U2FsdGVkX18BJNBVMSgAjhAE0r3JmZLk77/gmh40JW4="
User-Agent: slrn/1.0.3 (Linux)
Cancel-Lock: sha1:uXwD6sFE578HocTBcCuy64s05tw=
 by: Ben Collver - Thu, 3 Nov 2022 16:45 UTC

Why Tcl is 700% faster than Python for database benchmarking

Posted on July 5, 2022 by HammerDB

Python is a popular programming language, especially for beginners, and
consequently we see it occurring in places where it just shouldn’t be
used, such as database benchmarking. In contrast, a frequent question
when it comes to HammerDB is why is it written in Tcl? Surely any
language will do?

This post addresses both questions to illustrate exactly why HammerDB
is written in Tcl and why using Python to benchmark a database can
result in poor performance and misleading results. To do this, we are
going to rewrite HammerDB in Python and run a series of tests on a 2
socket Intel(R) Xeon(R) Platinum 8280L server to see how and why Tcl is
700% faster than Python for database benchmarking*

Background and Concepts

To begin with, you should be familiar with the concepts of parallelism
vs concurrency and why it is vital that your benchmark should run in
parallel so that you are testing the concurrency of the database. An
introduction to these concepts is given in the following post.

https://www.hammerdb.com/blog/uncategorized/
what-programming-languages-does-hammerdb-use-and-why-does-it-matter/

You should also be familiar with the concepts of ACID compliance and
locking versus multiversioning as well as database transaction
isolation levels, as it is essential to ensure that the database
benchmark itself is not serialized.

Rewriting HammerDB in Python

So we are not actually going to rewrite HammerDB in Python, however it
is trivial to write a command line driver script in Python as all the
hard work in creating the schema and writing the stored procedures has
already been done. (We use stored procedures because, as the
introductory post shows, using single SQL statements turns our database
benchmark into a network test). So instead we are going to take a cut
down version of the HammerDB TPROC-C driver script and do the same in
Python and use the HammerDB infrastructure to measure performance. For
this example, we are going to use PostgreSQL stored procedures in
PostgreSQL 14.1.

So firstly we have the Tcl based driver script called pgtest_thread.tcl

```
#!/usr/local/bin/tclsh8.6
package require Thread
set maxvuser 64
for { set vuser 0 } {$vuser < $maxvuser } {incr vuser} {
set threadID [thread::create {
#EDITABLE OPTIONS##################################################
set library Pgtcl ;# PostgreSQL Library
set total_iterations 10000000
;# Number of transactions before logging off
set host "localhost" ;# Address of the server hosting PostgreSQL
set port "5432" ;# Port of the PostgreSQL Server
set sslmode "disable" ;# SSLMode of the PostgreSQL Server
set user "tpcc" ;# PostgreSQL user
set password "tpcc" ;# Password for the PostgreSQL user
set db "tpcc" ;# Database containing the TPC Schema
#EDITABLE OPTIONS##################################################
#RANDOM NUMBER
proc RandomNumber {m M} {return [expr {int($m+rand()*($M+1-$m))}]}
proc NURand { iConst x y C } {
return [ expr {
((([RandomNumber 0 $iConst] | [RandomNumber $x $y]) + $C) %
($y - $x + 1)) + $x }]}
#RANDOM NAME
proc randname { num } {
array set namearr { 0 BAR 1 OUGHT 2 ABLE 3 PRI 4 PRES 5 ESE 6 ANTI
7 CALLY 8 ATION 9 EING }
set name [ concat $namearr([ expr {( $num / 100 ) % 10 }]) \
$namearr([ expr {( $num / 10 ) % 10 }]) \
$namearr([ expr {( $num / 1 ) % 10 }]) ]
return $name
}

#LOAD LIBRARIES AND MODULES
if [catch {package require $library} message] {
error "Failed to load $library - $message"
}

#TIMESTAMP
proc gettimestamp { } {
set tstamp [ clock format [ clock seconds ] -format %Y%m%d%H%M%S ]
return $tstamp
} #POSTGRES CONNECTION
proc ConnectToPostgres { host port sslmode user password dbname } {
global tcl_platform
if {[catch { set lda [pg_connect -conninfo [list host = $host \
port = $port sslmode = $sslmode user = $user \
password = $password dbname = $dbname ]]} message]
} {
set lda "Failed" ; puts $message
error $message
} else {
if {$tcl_platform(platform) == "windows"} {
#Workaround for Bug #95 where first connection fails on
#Windows
catch {pg_disconnect $lda}
set lda [pg_connect -conninfo [list host = $host \
port = $port sslmode = $sslmode user = $user \
password = $password dbname = $dbname ]]
}
pg_notice_handler $lda puts
set result [ pg_exec $lda "set CLIENT_MIN_MESSAGES TO 'ERROR'" ]
pg_result $result -clear
}
return $lda
} #NEW ORDER
proc neword { lda no_w_id w_id_input } {
#2.4.1.2 select district id randomly from home warehouse
# where d_w_id = d_id
set no_d_id [ RandomNumber 1 10 ]
#2.4.1.2 Customer id randomly selected where c_d_id = d_id and
# c_w_id = w_id
set no_c_id [ RandomNumber 1 3000 ]
#2.4.1.3 Items in the order randomly selected from 5 to 15
set ol_cnt [ RandomNumber 5 15 ]
#2.4.1.6 order entry date O_ENTRY_D generated by SUT
set date [ gettimestamp ]
set result [pg_exec $lda "call neword($no_w_id,$w_id_input,$no_d_id,\
$no_c_id,$ol_cnt,0.0,'','',0.0,0.0,0,\
TO_TIMESTAMP('$date','YYYYMMDDHH24MISS')::timestamp without time zone)" ]
if {[pg_result $result -status] != "PGRES_TUPLES_OK"} {
error "[pg_result $result -error]"
pg_result $result -clear
} else {
#puts "New Order: $no_w_id $w_id_input $no_d_id $no_c_id
# $ol_cnt 0 [ pg_ result $result -list ]"
pg_result $result -clear
}
} #PAYMENT
proc payment { lda p_w_id w_id_input } {
#2.5.1.1 The home warehouse id remains the same for each terminal
#2.5.1.1 select district id randomly from home warehouse
# where d_w_id = d_id
set p_d_id [ RandomNumber 1 10 ]
#2.5.1.2 customer selected 60% of time by name and 40% of time
# by number
set x [ RandomNumber 1 100 ]
set y [ RandomNumber 1 100 ]
if { $x <= 85 } {
set p_c_d_id $p_d_id
set p_c_w_id $p_w_id
} else {
#use a remote warehouse
set p_c_d_id [ RandomNumber 1 10 ]
set p_c_w_id [ RandomNumber 1 $w_id_input ]
while { ($p_c_w_id == $p_w_id) && ($w_id_input != 1) } {
set p_c_w_id [ RandomNumber 1 $w_id_input ]
}
}
set nrnd [ NURand 255 0 999 123 ]
set name [ randname $nrnd ]
set p_c_id [ RandomNumber 1 3000 ]
if { $y <= 60 } {
#use customer name
#C_LAST is generated
set byname 1
} else {
#use customer number
set byname 0
set name {}
}
#2.5.1.3 random amount from 1 to 5000
set p_h_amount [ RandomNumber 1 5000 ]
#2.5.1.4 date selected from SUT
set h_date [ gettimestamp ]
#2.5.2.1 Payment Transaction
#change following to correct values
set result [pg_exec $lda "call payment($p_w_id,$p_d_id,$p_c_w_id,\
$p_c_d_id,$byname,$p_h_amount,'0','$name',$p_c_id,'','','','','','',\
'','','','','','','','','','','','',TO_TIMESTAMP('$h_date',\
'YYYYMMDDHH24MISS')::timestamp without time zone,0.0,0.0,0.0,'',\
TO_TIMESTAMP('$h_date',\
'YYYYMMDDHH24MISS')::timestamp without time zone)" ]
if {[pg_result $result -status] != "PGRES_TUPLES_OK"} {
error "[pg_result $result -error]"
pg_result $result -clear
} else {
#puts "Payment: $p_w_id $p_d_id $p_c_w_id $p_c_d_id $p_c_id \
# $byname $p_h_amount $name 0 0 [ pg_result $result -list ]"
pg_result $result -clear
}
} #ORDER_STATUS
proc ostat { lda w_id } {
#2.5.1.1 select district id randomly from home warehouse \
# where d_w_id = d_id
set d_id [ RandomNumber 1 10 ]
set nrnd [ NURand 255 0 999 123 ]
set name [ randname $nrnd ]
set c_id [ RandomNumber 1 3000 ]
set y [ RandomNumber 1 100 ]
if { $y <= 60 } {
set byname 1
} else {
set byname 0
set name {}
}
set date [ gettimestamp ]
set result [pg_exec $lda "call ostat($w_id,$d_id,$c_id,\
$byname,'$name','','',0.0,0,TO_TIMESTAMP('$date',\
'YYYYMMDDHH24MISS')::timestamp without time zone,0,'')" ]
if {[pg_result $result -status] != "PGRES_TUPLES_OK"} {
error "[pg_result $result -error]"
pg_result $result -clear
} else {
#puts "Order Status: $w_id $d_id $c_id $byname $name \
# [ pg_result $result -list ]"
pg_result $result -clear
}
}
Click here to read the complete article

1
server_pubkey.txt

rocksolid light 0.9.8
clearnet tor