Zabbix à prova de falhas: migrando de replicação assíncrona para replicação de grupo

Introdução

Zabbix suporta vários bancos de dados, mas apenas MySQL e PostgreSQL foram considerados os mais adequados para minha instalação. PostgreSQL com seu repomgr e pgbouncer ou algum stolon de um lado e MySQL Group Replication do outro. Devido ao uso do MySQL na configuração atual e ao desejo de equipamentos padrão, a escolha recaiu na segunda opção.





Então, o que exatamente é a replicação de grupo MySQL. Como o nome sugere, este é um grupo de servidores que armazena o mesmo conjunto de dados. O número máximo de nós em um grupo é limitado a 9. Pode funcionar no modo primário único ou múltiplo primário. Mas o mais interessante é que tudo funciona automaticamente, seja a eleição de um novo servidor master, detecção de um nó quebrado, Split-brain ou recuperação de banco de dados. Esta funcionalidade é fornecida como plugins group_replication e mysql_clone, a comunicação ocorre através do protocolo Group Communication System, que é baseado no algoritmo Paxos. Esse tipo de replicação é compatível desde as versões 5.7.17 e 8.0.1.





Minha instalação atual funciona em Zabbix 5.0 LTS e MySQL 5.7, a migração será realizada com o aumento da versão do MySQL para 8.0, portanto é mais interessante).





Monitoramento de replicação

Sim Sim. É como o TDD, só na administração, primeiro você precisa preparar o monitoramento para que o novo cluster entre imediatamente nos radares do nosso sistema de monitoramento e nenhum problema escape ao seu olhar atento. Como você ainda não tem replicação de grupo (GR), a saída dos comandos abaixo ficará em branco, então estou dando um exemplo de saída de um cluster em execução.





A principal fonte de informações sobre o status dos nós é o comando:





SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| CHANNEL_NAME              | MEMBER_ID	                           | MEMBER_HOST  | MEMBER_PORT | MEMBER_STATE | MEMBER_ROLE | MEMBER_VERSION |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
| group_replication_applier | 500049c2-99b7-11e9-8d36-e4434b5f9d0c | example1.com |      3306   | ONLINE       | SECONDARY   | 8.0.13         |
| group_replication_applier | 50024be2-9889-11eb-83da-e4434ba03de0 | example2.com |      3306   | ONLINE       | PRIMARY     | 8.0.13         |
| group_replication_applier | 500b2035-986e-11eb-a9f8-564d00018ad1 | example3.com |      3306   | ONLINE       | SECONDARY   | 8.0.13         |
+---------------------------+--------------------------------------+--------------+-------------+--------------+-------------+----------------+
      
      



MEMBER_STATE . https://dev.mysql.com/doc/refman/8.0/en/group-replication-server-states.html. , , .





:





SELECT * FROM performance_schema.replication_group_member_stats\G
*************************** 1. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 16178860996821458:41
                                 MEMBER_ID: 500049c2-99b7-11e9-8d36-e4434b5f9d0c
               COUNT_TRANSACTIONS_IN_QUEUE: 0
                COUNT_TRANSACTIONS_CHECKED: 75715997
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 1957048
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: 500049c2-99b7-11e9-8d36-e4434b5f9d0c:1-1821470279,
500293cf-594c-11ea-aafd-e4434ba03de0:1-622868371,
5000d25c-059e-11e8-822b-564d00018ad1:1-140221041,
c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:1-125382195
            LAST_CONFLICT_FREE_TRANSACTION: c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:125471159
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 5664
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 75710337
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
*************************** 2. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 16178860996821458:41
                                 MEMBER_ID: 50024be2-9889-11eb-83da-e4434ba03de0
               COUNT_TRANSACTIONS_IN_QUEUE: 0
                COUNT_TRANSACTIONS_CHECKED: 75720452
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 1955202
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: 500049c2-99b7-11e9-8d36-e4434b5f9d0c:1-1821470279,
500293cf-594c-11ea-aafd-e4434ba03de0:1-622868371,
5000d25c-059e-11e8-822b-564d00018ad1:1-140221041,
c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:1-125377993
            LAST_CONFLICT_FREE_TRANSACTION: c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:125470919
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 0
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 75711354
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 9105
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
*************************** 3. row ***************************
                              CHANNEL_NAME: group_replication_applier
                                   VIEW_ID: 16178860996821458:41
                                 MEMBER_ID: 500b2035-986e-11eb-a9f8-564d00018ad1
               COUNT_TRANSACTIONS_IN_QUEUE: 38727
                COUNT_TRANSACTIONS_CHECKED: 49955241
                  COUNT_CONFLICTS_DETECTED: 0
        COUNT_TRANSACTIONS_ROWS_VALIDATING: 1250063
        TRANSACTIONS_COMMITTED_ALL_MEMBERS: 500049c2-99b7-11e9-8d36-e4434b5f9d0c:1-1821470279,
500293cf-594c-11ea-aafd-e4434ba03de0:1-622868371,
5000d25c-059e-11e8-822b-564d00018ad1:1-140221041,
c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:1-125382195
            LAST_CONFLICT_FREE_TRANSACTION: c9aae4fb-97a6-11eb-89d1-e4434b5f9d0c:125430975
COUNT_TRANSACTIONS_REMOTE_IN_APPLIER_QUEUE: 47096
         COUNT_TRANSACTIONS_REMOTE_APPLIED: 49908155
         COUNT_TRANSACTIONS_LOCAL_PROPOSED: 0
         COUNT_TRANSACTIONS_LOCAL_ROLLBACK: 0
3 rows in set (0.00 sec)
      
      



COUNT_TRANSACTIONS_IN_QUEUE, Seconds_Behind_Master . , .





, () . , , . , - , . , , , , .





, - . , . , .





:





  1. ( TCP 33061 ). ;





  2. MySQL 8.0 (FreeBSD, Poudriere - );





  3. , Zabbix ( );





  4. , Secondary ( , - ). ;





  5. MySQL 5.7 ;





  6. ( , );





  7. ;





  8. MySQL 8.0 (mysql_upgrade , 8 );





  9. , ( , , . . ). , ;





  10. , , ( , );





  11. ( RESET SLAVE ALL;);





  12. ;





  13. Zabbix Zabbix ;





  14. ( 4 8, 8 , . . );





  15. ;





  16. Ansible Playbook' ;





  17. ;





  18. HADNS;





  19. ;





:





  1. MySQL ;





  2. ;





  3. , MySQL ;





  4. ;





, .





9, 12 14 .





9:

. .





.





SELECT tables.table_schema , tables.table_name , tables.engine
 FROM information_schema.tables
 LEFT JOIN (
 SELECT table_schema , table_name
 FROM information_schema.statistics
 GROUP BY table_schema, table_name, index_name HAVING
 SUM( case when non_unique = 0 and nullable != 'YES' then 1 else 0 end ) = count(*) ) puks
ON tables.table_schema = puks.table_schema and tables.table_name = puks.table_name
WHERE puks.table_name is null
 AND tables.table_type = 'BASE TABLE' AND Engine="InnoDB";
      
      



Zabbix, . Zabbix, dbversion . .





ALTER TABLE history ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE history_uint ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE history_text ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE history_str ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE history_log ADD COLUMN `id` BIGINT UNSIGNED NOT NULL PRIMARY KEY AUTO_INCREMENT;
ALTER TABLE dbversion ADD PRIMARY KEY (mandatory);
      
      



. , - Zabbix.





12:

, , .





server-id=[     ]
gtid_mode=ON
enforce_gtid_consistency=ON
log_bin=binlog
log_slave_updates=ON
binlog_format=ROW
master_info_repository=TABLE
relay_log_info_repository=TABLE
transaction_write_set_extraction=XXHASH64

disabled_storage_engines="MyISAM,BLACKHOLE,FEDERATED,ARCHIVE,MEMORY"
plugin_load_add='group_replication.so;mysql_clone.so'
ssl-ca=/usr/local/etc/ssl/mysql/ca.crt
ssl-cert=/usr/local/etc/ssl/mysql/server.crt
ssl-key=/usr/local/etc/ssl/mysql/server.key
group_replication_ssl_mode=VERIFY_IDENTITY
group_replication_group_name="[   ,     SELECT UUID();]"
group_replication_start_on_boot=off #       
group_replication_local_address="[   ].com:33061"
group_replication_group_seeds="example1.com:33061,example2.com:33061,example3.com:33061"
group_replication_ip_allowlist="2.2.2.2/32,3.3.3.3/32,4.4.4.4/32"
group_replication_member_weight=50

group_replication_recovery_use_ssl=ON
group_replication_recovery_ssl_verify_server_cert=ON
group_replication_recovery_ssl_ca=/usr/local/etc/ssl/mysql/ca.crt
group_replication_recovery_ssl_cert=/usr/local/etc/ssl/mysql/server.crt
group_replication_recovery_ssl_key=/usr/local/etc/ssl/mysql/server.key
      
      



my.cnf, , , . , . group_replication_start_on_boot, , .





SHOW VARIABLES LIKE 'binlog_format'; SET GLOBAL binlog_format = RAW; , .





group_replication_ssl_mode group_replication_recovery_ssl_verify_server_cert , Subject Alternative Name (SAN) , group_replication_group_seeds.





group_replication_member_weight . , , , .





:





SET SQL_LOG_BIN=0;
CREATE USER 'replication'@'%' IDENTIFIED BY '[ ]' REQUIRE SSL;
GRANT replication slave ON *.* TO 'replication'@'%';
GRANT BACKUP_ADMIN ON *.* TO 'replication'@'%';
FLUSH PRIVILEGES;
SET SQL_LOG_BIN=1;
      
      



:





INSTALL PLUGIN group_replication SONAME 'group_replication.so';
INSTALL PLUGIN clone SONAME 'mysql_clone.so';
SHOW PLUGINS;
      
      



, :





CHANGE REPLICATION SOURCE TO SOURCE_USER='replication', SOURCE_PASSWORD='[ ]' \\
  FOR CHANNEL 'group_replication_recovery';
      
      



. group_replication_bootstrap_group , , :





SET GLOBAL group_replication_bootstrap_group=ON; #     
START GROUP_REPLICATION;
SET GLOBAL group_replication_bootstrap_group=OFF; #     
      
      



, :





mysql> SELECT * FROM performance_schema.replication_group_members;
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| CHANNEL_NAME              | MEMBER_ID                            | MEMBER_HOST | MEMBER_PORT | MEMBER_STATE  |
+---------------------------+--------------------------------------+-------------+-------------+---------------+
| group_replication_applier | ce9be252-2b71-11e6-b8f4-00212844f856 |example1.com |       3306  | ONLINE        |
+---------------------------+--------------------------------------+-------------+-------------+---------------+
      
      



MySQL .





14:

Zabbix , . MySQL , , .





12- , (server-id, group_replication_local_address). , group_replication_bootstrap_group .





Distributed Recovery mysql_clone . , , , .





, , .





, my.cnf group_replication_start_on_boot off on MySQL .





SELECT * FROM performance_schema.replication_group_members; - .





SELECT * FROM performance_schema.replication_group_member_stats\G - .





SELECT group_replication_set_as_primary('[uuid ]'); - .





Zabbix

Zabbix , , . . , Primary , , Zabbix , , . HADNS, Zabbix IP DNS .





Talvez nem tudo seja feito com a elegância que gostaríamos. Você pode querer usar mysql-shell, mysqlrouter e converter Replicação de Grupo em Cluster InnoDB ou adicionar HAProxy, especialmente quando estiver implantando Zabbix do zero. Espero que esta história sirva como um bom ponto de partida e seja útil. Obrigado pela atenção!





literatura adicional

https://dev.mysql.com/doc/refman/8.0/en/group-replication.html





https://blog.zabbix.com/scaling-zabbix-with-mysql-innodb-cluster/8472/





https://en.wikipedia.org/wiki/Paxos_(computer_science)








All Articles