RetroBBS - comp.lang.mumps - How to start replication process without needing to backup database first

How to start replication process without needing to backup database first

<3669f2bc-39d1-4b11-9497-ea7da9e47587n@googlegroups.com>

https://www.rocksolidbbs.com/devel/article-flat.php?id=460&group=comp.lang.mumps#460

X-Received: by 2002:a0c:9a4e:0:b0:4b1:d537:c6b9 with SMTP id q14-20020a0c9a4e000000b004b1d537c6b9mr31477024qvd.3.1666696798128;
Tue, 25 Oct 2022 04:19:58 -0700 (PDT)
X-Received: by 2002:a05:6214:e6b:b0:4b3:dcaf:c3a7 with SMTP id
jz11-20020a0562140e6b00b004b3dcafc3a7mr31405194qvb.34.1666696797913; Tue, 25
Oct 2022 04:19:57 -0700 (PDT)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!1.us.feeder.erje.net!3.us.feeder.erje.net!feeder.erje.net!border-1.nntp.ord.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Tue, 25 Oct 2022 04:19:57 -0700 (PDT)
Injection-Info: google-groups.googlegroups.com; posting-host=2405:4803:fc78:a440:f5c9:d65a:96aa:9a63;
posting-account=bOW4SAoAAADW7tNhnLgXXfPw9LHkksOR
NNTP-Posting-Host: 2405:4803:fc78:a440:f5c9:d65a:96aa:9a63
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3669f2bc-39d1-4b11-9497-ea7da9e47587n@googlegroups.com>
Subject: How to start replication process without needing to backup database first
From: hieund2102@gmail.com (Hieu Nguyen)
Injection-Date: Tue, 25 Oct 2022 11:19:58 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 22

by: Hieu Nguyen - Tue, 25 Oct 2022 11:19 UTC

Hello all,

A little context for my usecase:

I need to stream YottaDB/GT.M's journal entries to Apache Kafka. I'm currently using golang to implements an external replication filter. However, it runs too slow compared to replication without using filter.

Even if the filter does nothing but printing from STDIN to STDOUT, its performance doesn't reach 10% of the one running without filter.

As this would massively impacts business, I would like to know if it's possible to start a replication instance without any data, so the replicated instance would only acts as a way to stream journal entry from the source instance in realtime, instead of it being another standby server?

Or, more ideally, can the replication process with external filter be tuned so that its replication speed can reach at least half of the one running without filter?

Thanks and Best Regards,
Hieu Nguyen

Re: How to start replication process without needing to backup database first

<11b56000-3905-441f-8463-3d83b3c8b3a9n@googlegroups.com>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=461&group=comp.lang.mumps#461

copy link Newsgroups: comp.lang.mumps

X-Received: by 2002:a05:622a:450:b0:39d:9a0:3b with SMTP id o16-20020a05622a045000b0039d09a0003bmr29686098qtx.213.1666800474245;
Wed, 26 Oct 2022 09:07:54 -0700 (PDT)
X-Received: by 2002:a05:620a:22cc:b0:6ee:3e43:ac40 with SMTP id
o12-20020a05620a22cc00b006ee3e43ac40mr31679640qki.454.1666800473973; Wed, 26
Oct 2022 09:07:53 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Wed, 26 Oct 2022 09:07:53 -0700 (PDT)
In-Reply-To: <3669f2bc-39d1-4b11-9497-ea7da9e47587n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=71.162.243.192; posting-account=zTPg1AoAAABx_LtAQ3dW6FBnU1dwmSvl
NNTP-Posting-Host: 71.162.243.192
References: <3669f2bc-39d1-4b11-9497-ea7da9e47587n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <11b56000-3905-441f-8463-3d83b3c8b3a9n@googlegroups.com>
Subject: Re: How to start replication process without needing to backup
database first
From: ksbhaskar@gmail.com (K.S. Bhaskar)
Injection-Date: Wed, 26 Oct 2022 16:07:54 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 2747

by: K.S. Bhaskar - Wed, 26 Oct 2022 16:07 UTC

On Tuesday, October 25, 2022 at 7:19:58 AM UTC-4, Hieu Nguyen wrote:
> Hello all,
>
> A little context for my usecase:
>
> I need to stream YottaDB/GT.M's journal entries to Apache Kafka. I'm currently using golang to implements an external replication filter. However, it runs too slow compared to replication without using filter.
>
> Even if the filter does nothing but printing from STDIN to STDOUT, its performance doesn't reach 10% of the one running without filter.
>
> As this would massively impacts business, I would like to know if it's possible to start a replication instance without any data, so the replicated instance would only acts as a way to stream journal entry from the source instance in realtime, instead of it being another standby server?
>
> Or, more ideally, can the replication process with external filter be tuned so that its replication speed can reach at least half of the one running without filter?
>
> Thanks and Best Regards,
> Hieu Nguyen

Hieu –

There is some overhead to using a filter, since the binary data of the replication stream is converted to text, and the text is then converted back. Also, Go may not be the fastest language to write a filter.

1. Is the filter running on the (presumably fast) source machine, or the receiving machine, and if the latter, is it as fast as the source machine?

2. If you use cat as a filter, how fast is it?

Regards
– Bhaskar

Re: How to start replication process without needing to backup database first

<4f1805e3-2208-4c23-85b8-668c6d407a56n@googlegroups.com>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=462&group=comp.lang.mumps#462

copy link Newsgroups: comp.lang.mumps

X-Received: by 2002:a05:6214:1d01:b0:4b0:b782:15a6 with SMTP id e1-20020a0562141d0100b004b0b78215a6mr39152550qvd.43.1666810275199;
Wed, 26 Oct 2022 11:51:15 -0700 (PDT)
X-Received: by 2002:a05:6214:20eb:b0:4bb:7349:84f with SMTP id
11-20020a05621420eb00b004bb7349084fmr13985014qvk.110.1666810275049; Wed, 26
Oct 2022 11:51:15 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Wed, 26 Oct 2022 11:51:14 -0700 (PDT)
In-Reply-To: <11b56000-3905-441f-8463-3d83b3c8b3a9n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=103.21.148.140; posting-account=bOW4SAoAAADW7tNhnLgXXfPw9LHkksOR
NNTP-Posting-Host: 103.21.148.140
References: <3669f2bc-39d1-4b11-9497-ea7da9e47587n@googlegroups.com> <11b56000-3905-441f-8463-3d83b3c8b3a9n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4f1805e3-2208-4c23-85b8-668c6d407a56n@googlegroups.com>
Subject: Re: How to start replication process without needing to backup
database first
From: hieund2102@gmail.com (Hieu Nguyen)
Injection-Date: Wed, 26 Oct 2022 18:51:15 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4036

by: Hieu Nguyen - Wed, 26 Oct 2022 18:51 UTC

On Wednesday, 26 October 2022 at 23:07:54 UTC+7, K.S. Bhaskar wrote:
> On Tuesday, October 25, 2022 at 7:19:58 AM UTC-4, Hieu Nguyen wrote:
> > Hello all,
> >
> > A little context for my usecase:
> >
> > I need to stream YottaDB/GT.M's journal entries to Apache Kafka. I'm currently using golang to implements an external replication filter. However, it runs too slow compared to replication without using filter.
> >
> > Even if the filter does nothing but printing from STDIN to STDOUT, its performance doesn't reach 10% of the one running without filter.
> >
> > As this would massively impacts business, I would like to know if it's possible to start a replication instance without any data, so the replicated instance would only acts as a way to stream journal entry from the source instance in realtime, instead of it being another standby server?
> >
> > Or, more ideally, can the replication process with external filter be tuned so that its replication speed can reach at least half of the one running without filter?
> >
> > Thanks and Best Regards,
> > Hieu Nguyen
> Hieu –
>
> There is some overhead to using a filter, since the binary data of the replication stream is converted to text, and the text is then converted back. Also, Go may not be the fastest language to write a filter.
>
> 1. Is the filter running on the (presumably fast) source machine, or the receiving machine, and if the latter, is it as fast as the source machine?
>
> 2. If you use cat as a filter, how fast is it?
>
> Regards
> – Bhaskar

Hello Bhaskar -
I run the filter on the receiver end, both servers have the same specs
If I run the filter using cat ( `-filter=/usr/bin/cat` ), it runs about 20% slower than without using filter
I have tried increasing the receiver’s buffer pool and the replication speed increased significantly, it is now within acceptable range for our requirements. I will try to improve this further by using replication helper processes.
Can you recommends a number for helper processes based on server’s specs (e.g: 4 CPU cores = 8 writers …)
Also, if Go is not the most optimal language, what would you recommend for implementing a filter?
As the source server on production is running on AIX, I cannot use C/C++ without writing a custom Kafka library, which is part of the reasons why I wanted to create a replication process without needing to backup and restore the source database in the first place, to run the receiver on other platform.

Thanks and Best Regards,
- Hieu Nguyen

Re: How to start replication process without needing to backup database first

<3595c94d-22ee-458e-80fc-57ec2bdb30b0n@googlegroups.com>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=463&group=comp.lang.mumps#463

copy link Newsgroups: comp.lang.mumps

X-Received: by 2002:a05:6214:19c9:b0:4b2:fe6f:90f9 with SMTP id j9-20020a05621419c900b004b2fe6f90f9mr37566286qvc.66.1666815842501;
Wed, 26 Oct 2022 13:24:02 -0700 (PDT)
X-Received: by 2002:ad4:5949:0:b0:4bb:9aa4:f510 with SMTP id
eo9-20020ad45949000000b004bb9aa4f510mr2844143qvb.121.1666815842253; Wed, 26
Oct 2022 13:24:02 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer02.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Wed, 26 Oct 2022 13:24:01 -0700 (PDT)
In-Reply-To: <4f1805e3-2208-4c23-85b8-668c6d407a56n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=71.162.243.192; posting-account=zTPg1AoAAABx_LtAQ3dW6FBnU1dwmSvl
NNTP-Posting-Host: 71.162.243.192
References: <3669f2bc-39d1-4b11-9497-ea7da9e47587n@googlegroups.com>
<11b56000-3905-441f-8463-3d83b3c8b3a9n@googlegroups.com> <4f1805e3-2208-4c23-85b8-668c6d407a56n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <3595c94d-22ee-458e-80fc-57ec2bdb30b0n@googlegroups.com>
Subject: Re: How to start replication process without needing to backup
database first
From: ksbhaskar@gmail.com (K.S. Bhaskar)
Injection-Date: Wed, 26 Oct 2022 20:24:02 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 4899

by: K.S. Bhaskar - Wed, 26 Oct 2022 20:24 UTC

On Wednesday, October 26, 2022 at 2:51:15 PM UTC-4, Hieu Nguyen wrote:
> On Wednesday, 26 October 2022 at 23:07:54 UTC+7, K.S. Bhaskar wrote:
> > On Tuesday, October 25, 2022 at 7:19:58 AM UTC-4, Hieu Nguyen wrote:
> > > Hello all,
> > >
> > > A little context for my usecase:
> > >
> > > I need to stream YottaDB/GT.M's journal entries to Apache Kafka. I'm currently using golang to implements an external replication filter. However, it runs too slow compared to replication without using filter.
> > >
> > > Even if the filter does nothing but printing from STDIN to STDOUT, its performance doesn't reach 10% of the one running without filter.
> > >
> > > As this would massively impacts business, I would like to know if it's possible to start a replication instance without any data, so the replicated instance would only acts as a way to stream journal entry from the source instance in realtime, instead of it being another standby server?
> > >
> > > Or, more ideally, can the replication process with external filter be tuned so that its replication speed can reach at least half of the one running without filter?
> > >
> > > Thanks and Best Regards,
> > > Hieu Nguyen
> > Hieu –
> >
> > There is some overhead to using a filter, since the binary data of the replication stream is converted to text, and the text is then converted back. Also, Go may not be the fastest language to write a filter.
> >
> > 1. Is the filter running on the (presumably fast) source machine, or the receiving machine, and if the latter, is it as fast as the source machine?
> >
> > 2. If you use cat as a filter, how fast is it?
> >
> > Regards
> > – Bhaskar
> Hello Bhaskar -
>
> I run the filter on the receiver end, both servers have the same specs
>
> If I run the filter using cat ( `-filter=/usr/bin/cat` ), it runs about 20% slower than without using filter
>
> I have tried increasing the receiver’s buffer pool and the replication speed increased significantly, it is now within acceptable range for our requirements. I will try to improve this further by using replication helper processes.
>
> Can you recommends a number for helper processes based on server’s specs (e.g: 4 CPU cores = 8 writers …)
>
> Also, if Go is not the most optimal language, what would you recommend for implementing a filter?
>
> As the source server on production is running on AIX, I cannot use C/C++ without writing a custom Kafka library, which is part of the reasons why I wanted to create a replication process without needing to backup and restore the source database in the first place, to run the receiver on other platform.
>
>
> Thanks and Best Regards,
> - Hieu Nguyen

Hieu –

There is no algorithm for tuning either the number of helper processes or the balance between read-helpers and write helpers. It has to be determined empirically. In general, read helpers are more important than write helpers..

I would guess that the languages for writing the fastest filters are probably C/C++, Rust, Lua and M.

As long as you have the required access & permission, you can certainly run the filter on the Source Server on AIX. The receiving side can be Linux, and running the filter on the source side will reduce the network traffic.

Regards
– Bhaskar

Re: How to start replication process without needing to backup database first

<4db86018-546c-4598-a65e-145359516d4dn@googlegroups.com>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=464&group=comp.lang.mumps#464

copy link Newsgroups: comp.lang.mumps

X-Received: by 2002:a05:6214:2483:b0:4bb:59ec:c5a7 with SMTP id gi3-20020a056214248300b004bb59ecc5a7mr26963009qvb.94.1666883951085;
Thu, 27 Oct 2022 08:19:11 -0700 (PDT)
X-Received: by 2002:a05:620a:152:b0:6ea:d82e:f7e2 with SMTP id
e18-20020a05620a015200b006ead82ef7e2mr35028041qkn.164.1666883950814; Thu, 27
Oct 2022 08:19:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Thu, 27 Oct 2022 08:19:10 -0700 (PDT)
In-Reply-To: <3595c94d-22ee-458e-80fc-57ec2bdb30b0n@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=71.162.243.192; posting-account=zTPg1AoAAABx_LtAQ3dW6FBnU1dwmSvl
NNTP-Posting-Host: 71.162.243.192
References: <3669f2bc-39d1-4b11-9497-ea7da9e47587n@googlegroups.com>
<11b56000-3905-441f-8463-3d83b3c8b3a9n@googlegroups.com> <4f1805e3-2208-4c23-85b8-668c6d407a56n@googlegroups.com>
<3595c94d-22ee-458e-80fc-57ec2bdb30b0n@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <4db86018-546c-4598-a65e-145359516d4dn@googlegroups.com>
Subject: Re: How to start replication process without needing to backup
database first
From: ksbhaskar@gmail.com (K.S. Bhaskar)
Injection-Date: Thu, 27 Oct 2022 15:19:11 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 5322

by: K.S. Bhaskar - Thu, 27 Oct 2022 15:19 UTC

On Wednesday, October 26, 2022 at 4:24:13 PM UTC-4, K.S. Bhaskar wrote:
> On Wednesday, October 26, 2022 at 2:51:15 PM UTC-4, Hieu Nguyen wrote:
> > On Wednesday, 26 October 2022 at 23:07:54 UTC+7, K.S. Bhaskar wrote:
> > > On Tuesday, October 25, 2022 at 7:19:58 AM UTC-4, Hieu Nguyen wrote:
> > > > Hello all,
> > > >
> > > > A little context for my usecase:
> > > >
> > > > I need to stream YottaDB/GT.M's journal entries to Apache Kafka. I'm currently using golang to implements an external replication filter. However, it runs too slow compared to replication without using filter.
> > > >
> > > > Even if the filter does nothing but printing from STDIN to STDOUT, its performance doesn't reach 10% of the one running without filter.
> > > >
> > > > As this would massively impacts business, I would like to know if it's possible to start a replication instance without any data, so the replicated instance would only acts as a way to stream journal entry from the source instance in realtime, instead of it being another standby server?
> > > >
> > > > Or, more ideally, can the replication process with external filter be tuned so that its replication speed can reach at least half of the one running without filter?
> > > >
> > > > Thanks and Best Regards,
> > > > Hieu Nguyen
> > > Hieu –
> > >
> > > There is some overhead to using a filter, since the binary data of the replication stream is converted to text, and the text is then converted back. Also, Go may not be the fastest language to write a filter.
> > >
> > > 1. Is the filter running on the (presumably fast) source machine, or the receiving machine, and if the latter, is it as fast as the source machine?
> > >
> > > 2. If you use cat as a filter, how fast is it?
> > >
> > > Regards
> > > – Bhaskar
> > Hello Bhaskar -
> >
> > I run the filter on the receiver end, both servers have the same specs
> >
> > If I run the filter using cat ( `-filter=/usr/bin/cat` ), it runs about 20% slower than without using filter
> >
> > I have tried increasing the receiver’s buffer pool and the replication speed increased significantly, it is now within acceptable range for our requirements. I will try to improve this further by using replication helper processes.
> >
> > Can you recommends a number for helper processes based on server’s specs (e.g: 4 CPU cores = 8 writers …)
> >
> > Also, if Go is not the most optimal language, what would you recommend for implementing a filter?
> >
> > As the source server on production is running on AIX, I cannot use C/C++ without writing a custom Kafka library, which is part of the reasons why I wanted to create a replication process without needing to backup and restore the source database in the first place, to run the receiver on other platform.
> >
> >
> > Thanks and Best Regards,
> > - Hieu Nguyen
> Hieu –
>
> There is no algorithm for tuning either the number of helper processes or the balance between read-helpers and write helpers. It has to be determined empirically. In general, read helpers are more important than write helpers.
>
> I would guess that the languages for writing the fastest filters are probably C/C++, Rust, Lua and M.
>
> As long as you have the required access & permission, you can certainly run the filter on the Source Server on AIX. The receiving side can be Linux, and running the filter on the source side will reduce the network traffic.
>
> Regards
> – Bhaskar

Hieu –

Please do close the loop when you are done, and tell us what your solution is. Thank you.

Regards
– Bhaskar

Re: How to start replication process without needing to backup database first

<15b58978-9d26-43b0-ab2c-f0cf2620819cn@googlegroups.com>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=465&group=comp.lang.mumps#465

copy link Newsgroups: comp.lang.mumps

X-Received: by 2002:a05:620a:294f:b0:6ee:b598:2625 with SMTP id n15-20020a05620a294f00b006eeb5982625mr35801710qkp.415.1666884550804;
Thu, 27 Oct 2022 08:29:10 -0700 (PDT)
X-Received: by 2002:a05:6214:20eb:b0:4bb:7349:84f with SMTP id
11-20020a05621420eb00b004bb7349084fmr17613871qvk.110.1666884550579; Thu, 27
Oct 2022 08:29:10 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer01.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Thu, 27 Oct 2022 08:29:10 -0700 (PDT)
In-Reply-To: <4db86018-546c-4598-a65e-145359516d4dn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=2402:800:bbec:35d8:2d32:b505:ab4b:4ac1;
posting-account=bOW4SAoAAADW7tNhnLgXXfPw9LHkksOR
NNTP-Posting-Host: 2402:800:bbec:35d8:2d32:b505:ab4b:4ac1
References: <3669f2bc-39d1-4b11-9497-ea7da9e47587n@googlegroups.com>
<11b56000-3905-441f-8463-3d83b3c8b3a9n@googlegroups.com> <4f1805e3-2208-4c23-85b8-668c6d407a56n@googlegroups.com>
<3595c94d-22ee-458e-80fc-57ec2bdb30b0n@googlegroups.com> <4db86018-546c-4598-a65e-145359516d4dn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <15b58978-9d26-43b0-ab2c-f0cf2620819cn@googlegroups.com>
Subject: Re: How to start replication process without needing to backup
database first
From: hieund2102@gmail.com (Hieu Nguyen)
Injection-Date: Thu, 27 Oct 2022 15:29:10 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6211

by: Hieu Nguyen - Thu, 27 Oct 2022 15:29 UTC

On Thursday, 27 October 2022 at 22:19:11 UTC+7, K.S. Bhaskar wrote:
> On Wednesday, October 26, 2022 at 4:24:13 PM UTC-4, K.S. Bhaskar wrote:
> > On Wednesday, October 26, 2022 at 2:51:15 PM UTC-4, Hieu Nguyen wrote:
> > > On Wednesday, 26 October 2022 at 23:07:54 UTC+7, K.S. Bhaskar wrote:
> > > > On Tuesday, October 25, 2022 at 7:19:58 AM UTC-4, Hieu Nguyen wrote:
> > > > > Hello all,
> > > > >
> > > > > A little context for my usecase:
> > > > >
> > > > > I need to stream YottaDB/GT.M's journal entries to Apache Kafka. I'm currently using golang to implements an external replication filter. However, it runs too slow compared to replication without using filter.
> > > > >
> > > > > Even if the filter does nothing but printing from STDIN to STDOUT, its performance doesn't reach 10% of the one running without filter.
> > > > >
> > > > > As this would massively impacts business, I would like to know if it's possible to start a replication instance without any data, so the replicated instance would only acts as a way to stream journal entry from the source instance in realtime, instead of it being another standby server?
> > > > >
> > > > > Or, more ideally, can the replication process with external filter be tuned so that its replication speed can reach at least half of the one running without filter?
> > > > >
> > > > > Thanks and Best Regards,
> > > > > Hieu Nguyen
> > > > Hieu –
> > > >
> > > > There is some overhead to using a filter, since the binary data of the replication stream is converted to text, and the text is then converted back. Also, Go may not be the fastest language to write a filter.
> > > >
> > > > 1. Is the filter running on the (presumably fast) source machine, or the receiving machine, and if the latter, is it as fast as the source machine?
> > > >
> > > > 2. If you use cat as a filter, how fast is it?
> > > >
> > > > Regards
> > > > – Bhaskar
> > > Hello Bhaskar -
> > >
> > > I run the filter on the receiver end, both servers have the same specs
> > >
> > > If I run the filter using cat ( `-filter=/usr/bin/cat` ), it runs about 20% slower than without using filter
> > >
> > > I have tried increasing the receiver’s buffer pool and the replication speed increased significantly, it is now within acceptable range for our requirements. I will try to improve this further by using replication helper processes.
> > >
> > > Can you recommends a number for helper processes based on server’s specs (e.g: 4 CPU cores = 8 writers …)
> > >
> > > Also, if Go is not the most optimal language, what would you recommend for implementing a filter?
> > >
> > > As the source server on production is running on AIX, I cannot use C/C++ without writing a custom Kafka library, which is part of the reasons why I wanted to create a replication process without needing to backup and restore the source database in the first place, to run the receiver on other platform.
> > >
> > >
> > > Thanks and Best Regards,
> > > - Hieu Nguyen
> > Hieu –
> >
> > There is no algorithm for tuning either the number of helper processes or the balance between read-helpers and write helpers. It has to be determined empirically. In general, read helpers are more important than write helpers.
> >
> > I would guess that the languages for writing the fastest filters are probably C/C++, Rust, Lua and M.
> >
> > As long as you have the required access & permission, you can certainly run the filter on the Source Server on AIX. The receiving side can be Linux, and running the filter on the source side will reduce the network traffic.
> >
> > Regards
> > – Bhaskar
> Hieu –
>
> Please do close the loop when you are done, and tell us what your solution is. Thank you.
>
> Regards
> – Bhaskar

Hello Bhaskar -

I'm planning to implements the filter using C++ to see if there are any performance improvements.

Instead of publishing directly to Apache Kafka topics, I will instead use Kafka Proxy to publish using REST API instead.
There will be delays for publishing messages but our main goal is keep replication speed's impacts to a minimum.

Please close the loop for me. As this is my first time using Google Groups, I'm not too familiar with its functionality. My apologies.

Thanks and Best Regards,
Hieu Nguyen

Re: How to start replication process without needing to backup database first

<bb0c602e-ab1f-4823-9418-e8651bd79599n@googlegroups.com>

copy mid

https://www.rocksolidbbs.com/devel/article-flat.php?id=466&group=comp.lang.mumps#466

copy link Newsgroups: comp.lang.mumps

X-Received: by 2002:a05:620a:13e5:b0:6f9:9fe4:db13 with SMTP id h5-20020a05620a13e500b006f99fe4db13mr6221886qkl.578.1666901714757;
Thu, 27 Oct 2022 13:15:14 -0700 (PDT)
X-Received: by 2002:a05:622a:3ca:b0:39c:c860:fc2f with SMTP id
k10-20020a05622a03ca00b0039cc860fc2fmr42983193qtx.489.1666901714528; Thu, 27
Oct 2022 13:15:14 -0700 (PDT)
Path: i2pn2.org!i2pn.org!usenet.blueworldhosting.com!feed1.usenet.blueworldhosting.com!peer03.iad!feed-me.highwinds-media.com!news.highwinds-media.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Thu, 27 Oct 2022 13:15:14 -0700 (PDT)
In-Reply-To: <15b58978-9d26-43b0-ab2c-f0cf2620819cn@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=71.162.243.192; posting-account=zTPg1AoAAABx_LtAQ3dW6FBnU1dwmSvl
NNTP-Posting-Host: 71.162.243.192
References: <3669f2bc-39d1-4b11-9497-ea7da9e47587n@googlegroups.com>
<11b56000-3905-441f-8463-3d83b3c8b3a9n@googlegroups.com> <4f1805e3-2208-4c23-85b8-668c6d407a56n@googlegroups.com>
<3595c94d-22ee-458e-80fc-57ec2bdb30b0n@googlegroups.com> <4db86018-546c-4598-a65e-145359516d4dn@googlegroups.com>
<15b58978-9d26-43b0-ab2c-f0cf2620819cn@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <bb0c602e-ab1f-4823-9418-e8651bd79599n@googlegroups.com>
Subject: Re: How to start replication process without needing to backup
database first
From: ksbhaskar@gmail.com (K.S. Bhaskar)
Injection-Date: Thu, 27 Oct 2022 20:15:14 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
X-Received-Bytes: 6751

by: K.S. Bhaskar - Thu, 27 Oct 2022 20:15 UTC

On Thursday, October 27, 2022 at 11:29:11 AM UTC-4, Hieu Nguyen wrote:
> On Thursday, 27 October 2022 at 22:19:11 UTC+7, K.S. Bhaskar wrote:
> > On Wednesday, October 26, 2022 at 4:24:13 PM UTC-4, K.S. Bhaskar wrote:
> > > On Wednesday, October 26, 2022 at 2:51:15 PM UTC-4, Hieu Nguyen wrote:
> > > > On Wednesday, 26 October 2022 at 23:07:54 UTC+7, K.S. Bhaskar wrote:
> > > > > On Tuesday, October 25, 2022 at 7:19:58 AM UTC-4, Hieu Nguyen wrote:
> > > > > > Hello all,
> > > > > >
> > > > > > A little context for my usecase:
> > > > > >
> > > > > > I need to stream YottaDB/GT.M's journal entries to Apache Kafka.. I'm currently using golang to implements an external replication filter. However, it runs too slow compared to replication without using filter.
> > > > > >
> > > > > > Even if the filter does nothing but printing from STDIN to STDOUT, its performance doesn't reach 10% of the one running without filter.
> > > > > >
> > > > > > As this would massively impacts business, I would like to know if it's possible to start a replication instance without any data, so the replicated instance would only acts as a way to stream journal entry from the source instance in realtime, instead of it being another standby server?
> > > > > >
> > > > > > Or, more ideally, can the replication process with external filter be tuned so that its replication speed can reach at least half of the one running without filter?
> > > > > >
> > > > > > Thanks and Best Regards,
> > > > > > Hieu Nguyen
> > > > > Hieu –
> > > > >
> > > > > There is some overhead to using a filter, since the binary data of the replication stream is converted to text, and the text is then converted back. Also, Go may not be the fastest language to write a filter.
> > > > >
> > > > > 1. Is the filter running on the (presumably fast) source machine, or the receiving machine, and if the latter, is it as fast as the source machine?
> > > > >
> > > > > 2. If you use cat as a filter, how fast is it?
> > > > >
> > > > > Regards
> > > > > – Bhaskar
> > > > Hello Bhaskar -
> > > >
> > > > I run the filter on the receiver end, both servers have the same specs
> > > >
> > > > If I run the filter using cat ( `-filter=/usr/bin/cat` ), it runs about 20% slower than without using filter
> > > >
> > > > I have tried increasing the receiver’s buffer pool and the replication speed increased significantly, it is now within acceptable range for our requirements. I will try to improve this further by using replication helper processes.
> > > >
> > > > Can you recommends a number for helper processes based on server’s specs (e.g: 4 CPU cores = 8 writers …)
> > > >
> > > > Also, if Go is not the most optimal language, what would you recommend for implementing a filter?
> > > >
> > > > As the source server on production is running on AIX, I cannot use C/C++ without writing a custom Kafka library, which is part of the reasons why I wanted to create a replication process without needing to backup and restore the source database in the first place, to run the receiver on other platform.
> > > >
> > > >
> > > > Thanks and Best Regards,
> > > > - Hieu Nguyen
> > > Hieu –
> > >
> > > There is no algorithm for tuning either the number of helper processes or the balance between read-helpers and write helpers. It has to be determined empirically. In general, read helpers are more important than write helpers.
> > >
> > > I would guess that the languages for writing the fastest filters are probably C/C++, Rust, Lua and M.
> > >
> > > As long as you have the required access & permission, you can certainly run the filter on the Source Server on AIX. The receiving side can be Linux, and running the filter on the source side will reduce the network traffic.
> > >
> > > Regards
> > > – Bhaskar
> > Hieu –
> >
> > Please do close the loop when you are done, and tell us what your solution is. Thank you.
> >
> > Regards
> > – Bhaskar
> Hello Bhaskar -
>
> I'm planning to implements the filter using C++ to see if there are any performance improvements.
>
> Instead of publishing directly to Apache Kafka topics, I will instead use Kafka Proxy to publish using REST API instead.
> There will be delays for publishing messages but our main goal is keep replication speed's impacts to a minimum.
>
> Please close the loop for me. As this is my first time using Google Groups, I'm not too familiar with its functionality. My apologies.
> Thanks and Best Regards,
> Hieu Nguyen
Thanks for the update, Hieu. There is no formal closing the loop on a discussion thread. My apologies for using an American colloquialism. I was just requesting you to give an update when you complete the project so that people can learn from your experience.

Regards
– Bhaskar

Don't panic.

devel / comp.lang.mumps / How to start replication process without needing to backup database first

Subject	Author
How to start replication process without needing to backup database first	Hieu Nguyen
Re: How to start replication process without needing to backup	K.S. Bhaskar
Re: How to start replication process without needing to backup	Hieu Nguyen
Re: How to start replication process without needing to backup	K.S. Bhaskar
Re: How to start replication process without needing to backup	K.S. Bhaskar
Re: How to start replication process without needing to backup	Hieu Nguyen
Re: How to start replication process without needing to backup	K.S. Bhaskar