Master End
Download Mk-heartbeat
wget http://www.maatkit.org/get/mk-heartbeat
Create a database and grant for maatkit
create database maatkit;
grant all on maatkit.* to maatkit@localhost identified by 'Whatever';
Start the heartbeat and let it auto-create the heartbeat table
perl mk-heartbeat -u maatkit -pWhatever --database maatkit --table heartbeat --update --create-table
Slave End
Verify that the DDL replicated
mysql> select * from maatkit.heartbeat;
+----+---------------------+
| id | ts |
+----+---------------------+
| 1 | 2010-03-02 16:30:32 |
+----+---------------------+
Let mk-heartbeat run
mk-heartbeat -u maatkit -pWhatever --database maatkit --table heartbeat --monitor --interval 15 --file /tmp/heartbeat --daemonize
Fix file selinux context (I don't think it's the right one but it works )
chcon system_u:object_r:snmpd_var_run_t:s0 /tmp/heartbeat
Why not look at how far we are lagging by querying (Wow! We are lagging)
mysql> select ts,now() from maatkit.heartbeat;
+---------------------+---------------------+
| ts | now() |
+---------------------+---------------------+
| 2010-03-16 12:52:45 | 2010-03-17 17:59:23 |
+---------------------+---------------------+
1 row in set (0.00 sec)
mysql> select ts,now() from maatkit.heartbeat;
+---------------------+---------------------+
| ts | now() |
+---------------------+---------------------+
| 2010-03-16 12:53:15 | 2010-03-17 17:59:29 |
+---------------------+---------------------+
1 row in set (0.00 sec)
mysql> select ts,now() from maatkit.heartbeat;
+---------------------+---------------------+
| ts | now() |
+---------------------+---------------------+
| 2010-03-16 12:53:30 | 2010-03-17 17:59:32 |
+---------------------+---------------------+
Integrate with SNMP using dedicated to this purpose
Extend snmpd.conf
extend .1.3.6.1.4.1.30630.2.2.1 mk-heartbeat-lag /usr/local/bin/repl_lag 1
extend .1.3.6.1.4.1.30630.2.2.2 mk-heartbeat-lag /usr/local/bin/repl_lag 2
extend .1.3.6.1.4.1.30630.2.2.3 mk-heartbeat-lag /usr/local/bin/repl_lag 3
extend .1.3.6.1.4.1.30630.2.2.4 mk-heartbeat-lag /usr/local/bin/repl_lag 4
Create repl_lag script
#!/bin/sh
#Monitor file
LAGFILE=/tmp/heartbeat
case "$1" in
'1')
#Current LAG
cat $LAGFILE|awk '{print $1}'|sed -e "s/s//"
;;
'2')
#1 minute average
cat $LAGFILE|awk -F[ '{print $2}'|awk '{print $1}'|sed -e "s/s,//"
;;
'3')
#5 Minute average
cat $LAGFILE|awk -F[ '{print $2}'|awk '{print $2}'|sed -e "s/s,//"
;;
'4')
cat $LAGFILE|awk -F[ '{print $2}'|awk '{print $3}'|sed -e "s/s//"
;;
esac
Restart snmpd and test
snmpwalk -c Community -v2c my.mu.company.net .1.3.6.1.4.1.30630.2.2.4
SNMPv2-SMI::enterprises.30630.2.2.4.1.0 = INTEGER: 1
SNMPv2-SMI::enterprises.30630.2.2.4.2.1.2.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = STRING: "/usr/local/bin/repl_lag"
SNMPv2-SMI::enterprises.30630.2.2.4.2.1.3.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = STRING: "4"
SNMPv2-SMI::enterprises.30630.2.2.4.2.1.4.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = ""
SNMPv2-SMI::enterprises.30630.2.2.4.2.1.5.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = INTEGER: 5
SNMPv2-SMI::enterprises.30630.2.2.4.2.1.6.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = INTEGER: 1
SNMPv2-SMI::enterprises.30630.2.2.4.2.1.7.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = INTEGER: 1
SNMPv2-SMI::enterprises.30630.2.2.4.2.1.20.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = INTEGER: 4
SNMPv2-SMI::enterprises.30630.2.2.4.2.1.21.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = INTEGER: 1
SNMPv2-SMI::enterprises.30630.2.2.4.3.1.1.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = STRING: "0.41"
SNMPv2-SMI::enterprises.30630.2.2.4.3.1.2.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = STRING: "0.41"
SNMPv2-SMI::enterprises.30630.2.2.4.3.1.3.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = INTEGER: 1
SNMPv2-SMI::enterprises.30630.2.2.4.3.1.4.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103 = INTEGER: 0
SNMPv2-SMI::enterprises.30630.2.2.4.4.1.2.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103.1 = STRING: "0.41"
Monitor End
Log into zenoss and navigate to Devices->Server->[Linux o r whatever you use]
Click on the Templates tab
From the Available Perfomance Templates tab, click on add template
Type in an in. Mine was MySQL Replication
Add a Description and save
Under Data sources, Add a new snmp datasource. Example: ID: MySQL Replicationg Lag
In the Data Souce add the OID derived from the snmpwalk e.g 1.3.6.1.4.1.30630.2.2.1.4.1.2.16.109.107.45.104.101.97.114.116.98.101.97.116.45.108.97.103.1
The type is a gauge (Like a speed-o-meter ) rather than a counter (Which is like the miles covered)
Click Save
You may add the IP/Hostname of a test device and click on test (One of your MySQL snmpd hosts)
Save and iterate for the other counters (1min avg, 5 min avg and 15 min avg)
Go back to the Performance Template page
Under Thresholds, Add a new threshold say MySQL Lag Warn
Click on the datapoints to apply the threshold to. I opted to track the realtime Lag
Put the number of seconds on the max field. 120 is 2 minutes of lag. Any more and I will be paged
Add a critical threshold
Go back to the Performance Template page
Add a Graph Definition.
Add the datapoints created earlier under Graph points. The relevant associated thresholds should also be auto-added.
You may want to resequence the points
Click save.
Naviagate back to Device List using the menu bar on the left.
Click on a MySQL host that you want to add a replication lag graph for
On the dropdown menu, select More->Templates
Under Perfomance Templates, Select Bind Templates
Select all the templates you are interested in (Mine was Device and MySQL Replication)
Click OK
Click on the perf tab should give you a graph.
Here's mine showing a tunnel break that caused a > 24 hour lag and the subsequent catching up