Message-ID:

Can't open /usr/share/games/fortunes/fortunes. Lid stuck on cookie jar.

devel / comp.lang.mumps / Re: GTM replication monitoring

Re: GTM replication monitoring

<a1001d59-6aae-4dcc-98d0-07e1e908d18cn@googlegroups.com>

https://www.rocksolidbbs.com/devel/article-flat.php?id=264&group=comp.lang.mumps#264

X-Received: by 2002:a05:622a:20e:: with SMTP id b14mr41592212qtx.288.1638176016659;
Mon, 29 Nov 2021 00:53:36 -0800 (PST)
X-Received: by 2002:ac8:59d4:: with SMTP id f20mr41957030qtf.241.1638176016499;
Mon, 29 Nov 2021 00:53:36 -0800 (PST)
Path: i2pn2.org!i2pn.org!weretis.net!feeder6.news.weretis.net!news.misty.com!border2.nntp.dca1.giganews.com!border1.nntp.dca1.giganews.com!nntp.giganews.com!news-out.google.com!nntp.google.com!postnews.google.com!google-groups.googlegroups.com!not-for-mail
Newsgroups: comp.lang.mumps
Date: Mon, 29 Nov 2021 00:53:36 -0800 (PST)
In-Reply-To: <678ec7f4-4a4c-487a-b317-ab4e66f131a5@googlegroups.com>
Injection-Info: google-groups.googlegroups.com; posting-host=94.241.112.70; posting-account=TmkjFQkAAACs8cEhmvdNVmIeGrxuFwgF
NNTP-Posting-Host: 94.241.112.70
References: <7d9fa6c3-6849-42a8-85e6-6779421413f7@googlegroups.com>
<fba8d8b5-ade7-4e21-b50f-b69534a6f9d1@googlegroups.com> <b2ab0e55-21f6-4a54-af71-597cd5c72899@googlegroups.com>
<848c7c6b-6040-47db-abed-812f7b872392@googlegroups.com> <678ec7f4-4a4c-487a-b317-ab4e66f131a5@googlegroups.com>
User-Agent: G2/1.0
MIME-Version: 1.0
Message-ID: <a1001d59-6aae-4dcc-98d0-07e1e908d18cn@googlegroups.com>
Subject: Re: GTM replication monitoring
From: xbarinka@gmail.com (Jan Barinka)
Injection-Date: Mon, 29 Nov 2021 08:53:36 +0000
Content-Type: text/plain; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable
Lines: 149

by: Jan Barinka - Mon, 29 Nov 2021 08:53 UTC

Hello all,

we've run into the same problem of the broken connection between master and slave not being detected by the slave. We are using some older version of GTM so my question is if there is something new regarding this issue in newer versions? For exmaple optional keep alive support or other solution?

Jan B.

Dne čtvrtek 13. října 2016 v 1:25:24 UTC+2 uživatel attila....@gmail.com napsal:
> Bhaskar,
>
>
> I believe I have a fairly good handle on why the replication stalls.
> If you live "in the cloud" then you may have very (actually close to zero) influence on the network gear and its operation between your sites. So I learned that if reboot our network equipment or the service provider decides to rearrange the network then we end up in a situation where the source realizes that the connection is broken and tries to reconnect. Unfortunately the receiver does not and keeps the connection open - with netstat one may just see it. Now since the receiver thinks it has a valid connection it refuses the connection attempt from the source. Nice deadlock. It can stay that way for a considerable time - even days. We are in the process to upgrade to GTM version 6 and I hope this behavior is fixed.
>
> Well, if not then I can at least monitor it with the command you provided.. Thank you for that. I was checking the documentation if something similar exists for the receiver side but no luck.
>
> I was having a second thought about Infohub and I wonder where the Infohub database should reside ? The documentations says: " ...one InfoHub can monitor multiple data sources, and a single data source can be monitored by multiple InfoHubs. "
> But it is not clear for me neither from the text nor from the diagram if the datasource and the Infohub DB should reside in the same machine.
>
>
> Attila
> On Monday, October 10, 2016 at 4:14:45 PM UTC+2, K.S. Bhaskar wrote:
> > On Friday, October 7, 2016 at 2:58:19 PM UTC-4, Attila Csikai wrote:
> > > On Friday, October 7, 2016 at 4:04:42 PM UTC+2, K.S. Bhaskar wrote:
> > > > On Thursday, October 6, 2016 at 7:49:47 PM UTC-4, Attila Csikai wrote:
> > > > > Hi,
> > > > >
> > > > >
> > > > >
> > > > > I am working on monitoring GTM replication and I have two questions:
> > > > >
> > > > > 1. I wonder how one can detect a situation when the replication (data flow) stops but both the source and the receiver server processes are out there.
> > > > > mupip replicate -source/receiver -checkhealth shows no problem and also
> > > > > mupip replicate -source/receiver -showbacklog looks OK.
> > > > > (Well, after a -short- while the source side starts to accumulate backlog, but that is all.)
> > > > >
> > > > > The only place to indicate an error is the source server log containing hard and soft connection attempts.
> > > > >
> > > > > Is the log the only place to detect disconnect ?
> > > > >
> > > > > 2. My monitoring agent is a C program and I have identified the following options to get information about replication status (short of mining the log):
> > > > > a, running "mupip replicate -source/receiver ... " by opening a pipe -popen- and grabbing the output of the command,
> > > > > b, %PEEKBYNAME() may provide some information.
> > > > >
> > > > > Unfortunately option "a" is somewhat resource intensive carrying a significant overhead. Option "b" is only available in version 6.3 and I either spin up a MUMPS process every time to read the information (significant overhead) or keep one alive to periodically emit the required info -then I need to have IPC and manage this process as well.
> > > > >
> > > > > The best would be to invoke mupip functionality as a shared library function but it is not possible as far as I know.
> > > > >
> > > > > Are there other possibilities?
> > > > >
> > > > > Thank you,
> > > > > Attila
> > > >
> > > > Attila --
> > > >
> > > > It is not possible to call into mupip as a shared library. However, you don't have to spin up a mumps process every time - you can call a MUMPS routine from your C program (look at Chapter 11 - Integrating External Routines - of the Programmers Guide; it even has a downloadable working example of calling M code from C code).
> > > >
> > > > Did you consider using InfoHub for your monitoring?
> > > >
> > > > Regards
> > > > -- Bhaskar
> > >
> > >
> > > Bashkar,
> > >
> > >
> > > Thank you for bringing up the CI interface. Actually I know it and used it (previously) but obviously overlooked this possibility.
> > >
> > > I had superficial knowledge about Infohub but now took a more serious look.
> > > However as I only have to do data gathering for an existing monitoring tool I doubt there is room for Infohub as a whole.
> > > Nevertheless the IHRRPCmdXXXXXXXXX.m routine is very interesting.
> > >
> > > But even in that routine I do not seem to find the answer to my first question about how to detect a broken replication dataflow. I wonder if you can give a hint.
> > >
> > >
> > > Thank you,
> > > Attila
> >
> > Attila --
> >
> > You seem to have a situation where a Source Server and a Receiver Server have a connection (or had a connection), which is not a normal situation. You will need to diagnose what went wrong, and for that you will need to look at the logs.
> >
> > For monitoring, perhaps you can try a command like: mupip replic -source -jnlpool -show | & grep "Processing State"
> >
> > An output like: SRC # 0 : Processing State WAITING_FOR_CONNECTION
> > indicates the source server has not yet connected with a receiver.
> >
> > Once connected, you can expect to see: SRC # 0 : Processing State SENDING_JNLRECS
> >
> > Below is the list of possible values.
> >
> > "DUMMY_STATE",
> > "START",
> > "WAITING_FOR_CONNECTION",
> > "WAITING_FOR_RESTART",
> > "SEARCHING_FOR_RESTART",
> > "SENDING_JNLRECS",
> > "WAITING_FOR_XON",
> > "CHANGING_MODE"
> >
> > Regards
> > -- Bhaskar

Subject	Author
Re: GTM replication monitoring	Jan Barinka