Monday 17 December 2012

Master crash, Replication broken, Master resumed, Replication still broken!!!



In search of MySQL replication topic for my blogs, Today I got struck with an unusual replication behavior. Well that gives me a topic (Never mind the issue has been fixed, as MySQL says.)
So the scenario is we have simple Master-->Slave replication setup for one of our client. Master crashes, that stops the replication, next the master is brought back up and running, now when we resume the replication, the IO thread slaps us with an error (Strange error, when nothing should have gone wrong). 

121216 22:12:46 [ERROR] Slave I/O thread: error reconnecting to master 'abc@xxx.xxx.xxx.xxx:3306': Error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 145'  errno: 2013  retry-time: 60  retries: 86400
121216 22:38:57 [Note] Slave: connected to master 'bfimapp@xx.xxx.xxx.xxx:3306',replication resumed in log 'master1-bin.001073' at position 1042308987
121216 22:38:57 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236)
121216 22:38:57 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log

So thats the error. It suggests that replication is trying to start from an impossible position. So we digged in the concerned binlogs just to find the last position (end_log_pos for the last event) was 1042308684. That doesnt seem correct. Since the master crashed it could not add a rotate event to the file. But thats just another case.

The question is what made the master.info to write such a position, which didnt exist in binlog files on master. 

In search of answers, lets follow logical work flow of MySQL Replication, a transaction (can be a single statement in case of autocommit=1) when executed is written to binlog, which then is signified to the slave for it to copy it to the relay-logs and then coordinates written to the master.info (the point till where IO thread has copied binlogs). Now the above situation suggests that, MySQL didnt get a chance to write the statement to binlog file (i.e. to call "fdatasync") before sending it to the slave. This is where "Binlog Cache" comes in picture.

So a DML when executed inside a transaction with binlogs enabled, will be written to "Binlog cache", and once the transaction completes the "Binlog Cache" is flushed to disk (written to the binlog files). This is the point where the statement got replicated to slave and new position got written to master.info however before flushing it to the binlog files the master crashed. This is not binlog corruption, nor the slave had any clue to jump on to the next binlog file, it just returned the error. 

A fix to this problem would be to execute the change master on slave for new binlog position (the file created after mysql start on master).

There are still some questions unanswered, like :
1) Does the innodb instance recovery rolls back the failed transaction, if it does, then we have an inconsistent slave.
2) Would it happen if the tables in question are MyISAM.
3) How will replication behave if there is no rotate event in binary logs.
4) Could a sync_binlog and sync_relay_log save us from this situation.

I will try to find answers to these questions till then enjoy replicating.

Friday 7 December 2012

MySQL Replication Recipe in a nutshell

               Today, one of our client's developer approached for guidance over setting up replication between two hosts, and some advice over general points. Although this is not an essay write-up for the blog, but I hope this will be useful for other Remote-DBAs out there like me to tackle such requests...

Below is the email thread transcript I replied to client :


The two database instances I have right now are in QE boxes - x.x.x.228 and x.x.x.229
Answer : Assuming that out of the two servers you have mentioned, x.x.x.228 should be the MASTER and x.x.x.229 should be the SLAVE.

The questions that I have are:

1. Where can I find the information on how to configure for master-slave replication.
Answer : Setting up replication is pretty straight-forward. If these are two fresh servers, without any data in it at this time, please follow the steps below :
1) Enable binary logging on the MASTER server. This can be done by adding "log-bin" directive in the /etc/my.cnf mysql configuration file. Add this directive in the [mysqld] section of the server.
2) Change or Add the "server-id" directive in [mysqld] section of mysql configuration file. The directive would be "server-id=228" for the MASTER, and "server-id=229" in the SLAVE.
3) Restart MySQL on both the MASTER and SLAVE. This can be done by using "/etc/init.d/mysql restart". In case of failure to start check the error log file for MySQL.
4) Upon successful restart execute the following command on MASTER "SHOW MASTER STATUS \G" to get the starting point of replication, commonly known as "Binlog Coordinates". These include the "File" and "Position". Make a note of this coordinates.
mysql> show master status \G
*************************** 1. row ***************************
        File: bin.000001
        Position: 107
Binlog_Do_DB:
Binlog_Ignore_DB:
1 row in set (0.00 sec)

5) Next we need to create a user on the MASTER. This user login will be used by the SLAVE to replicate from the MASTER. On the MASTER execute the command "GRANT replication client, replication slave ON *.* TO 'repl'@'x.x.x.229' IDENTIFIED BY PASSWORD 'replipass';".
6) Once user is created execute the following command on the SLAVE to initiate replication "CHANGE MASTER TO MASTER_HOST='x.x.x.228', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='replipass', MASTER_LOG_FILE='<"File" obtained in Step 4>', MASTER_LOG_POS=<Position obtained in Step 4>;"
7) Once successfuly executing this command, check the replication status by using the following command "SHOW SLAVE STATUS \G" in the SLAVE. This will output number of rows and column in vertical format. If you notice the value for "Slave_IO_Running" and "Slave_SQL_Running" is "No", hence we will need to start the replication now, earlier step initiated the replication it did not start it. To start replication ono the SLAVE execute "START SLAVE;", likewise to stop replication execute "STOP SLAVE;".
8) Now we have replication setup wherein all the writes on MASTER would be replicated to the SLAVE. MASTER --> SLAVE.

Now, if we have existing data on the MASTER server, please follow the steps below :
1) Follow Steps 1), 2) and 3) as mentioned in the above plan.
2) Next we will need to take backup of the databases present on the MASTER and restore it on the SLAVE, so that both MASTER and SLAVE will have the common point to start. To take the backup execute the following command on the shell prompt "mysqldump -u<username> -p<password> --all-databases --single-transaction --master-data=2 > <backup-file location>;"
3) Once the backup is completed, copy it to the SLAVE host and restore it using the following command on the shell prompt "mysql -u<username> -p<password> < <backup-file>"
4) Once restore is done, we will need the "Binlog Coordinates" which are present in the backup-file (we used an option --master-data for mysqldump which automatically gets us the binlog coordinates). Open teh backup-file using any file read utility for eg: less. Use the following command on the shell prompt : "less <backup-file>", once opened, refer to the part saying "CHANGE MASTER TO MASTER_LOG_FILE=<some binlog file>, MASTER_LOG_POS=<some binlog position>;", note down these binlog coordinates.
5) Follow Steps 5), 6), 7) as mentioned in the above plan for successfully starting replication.

You can use the following link for more information : http://dev.mysql.com/doc/refman/5.5/en/replication-howto.html

2. I have probably 40% writes and 60% reads. Although I will be using a cache on my application server to reduce the load on database, I still would like to know if I should write to master and read from slave?
Answer : This read-write balancing can be done at the application level by simply redirecting all the reads to the SLAVE and all the DMLs to the MASTER. For eg : INSERT INTO t VALUES(1); should be executed on the MASTER, and SELECT * FROM T; can be run on the SLAVE. This can be done by simply using the IP addresses of the MASTER and SLAVE hosts.

3. How frequently should I set up the replication to happen? Should it be a row level replication or statement level?
Answer : Usually RBR(Row Based Replication) is the safest, however it will fill up the binlogs easily and utilize more disk space, SBR(Statement Based Replication) is the simplest however is vulnerable to create some inconsistencies between MASTER and SLAVE. The best method for binlog-format to use would be "Mixed", wherein both the Statement and Row based methods are used interchangeably by MySQL. To use this method add the following directive "binlog-format=mixed" in the [mysqld] section of the mysql configuration file. You should add this directive while enabling binlogs in Step 1) of the above mentioned replication setup Plans.

4. In terms of tuning mysql database, what kind of parameters should I look at. I did copy over the same parameters as that of logout mysql server for now.
Answer : Generally tuning can be done after observing the server for any bottlenecks for sometime. For safety and to avoid any inconsistent slaves in future enable "read-only" in the SLAVE in the [mysqld] section of the MySQL configuration file. This will not allow any writes to the SLAVE except from the replication threads and users with SUPER privileges. If using "InnoDB" storage engines for all tables configure "innodb_buffer_pool_size" to 70% of the available RAM for eg. "innodb_buffer_pool_size=4G" for a box with 6GB RAM

5. There is a log table that¹s going to have most of the writes. Apart from that, there are three other tables which are written in to only 10 -
20 % of the time and read most of the time. Negligible or almost no updates or deletes.
Answer : We strongly recommend you to use MySQL or Percona-MySQL 5.5 version. And to convert all the tables to InnoDB storage engine. Percona has an edge over MySQL server in terms of InnoDB performance. InnoDB nowadays not only does "writes" faster but also "reads".


This was a question-answer pattern reply to the client, and hope many of the RemDBA could use the same. However the main essence in the end is still MySQL Replication Recipe in a nutshell (Question-Answer Cookbook style)... :)