tag:blogger.com,1999:blog-91176004895624322782024-03-20T18:12:34.953-07:00Akshay Suryawanshi's BlogAkshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.comBlogger11125tag:blogger.com,1999:blog-9117600489562432278.post-80820780352579676222013-02-11T14:03:00.000-08:002013-02-11T14:07:10.681-08:00Unique constraint and AUTO_INCREMENT. Need of the day!!!<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<br />
<span style="font-family: Trebuchet MS, sans-serif;">Recently one of our clients reported us disk space issues for volume containing the binary logs(Master) and relay logs on the slave. Our alerting reported the same too, but before we could actually find out, the slaves choked up (could not add more relay logs due to space issues). We were taken by surprise as to what made the Master generate so many binlogs at such a rapid rate, when client didn't change a code line(Suspecting initially the client scripts could have gone wrong).</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">So now the slaves out of hand and mysql on the master un-responsive due to binary logs full, we needed to purge few binlogs manually to continue investigation. After observing the processlist for sometime, we noticed something unusual with one of the tables (pretty large one 195G). The table is used by internal scripts and was not web facing, that gave us some room to breath. So moving further, the unusual thing was the series of updates on same "ID" value, which should not be the case as per client.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Next we started with the table structure analysis (using "SHOW CREATE TABLE ...") to notice that the AUTO_INCREMENT value was not incremental Also the last_insert_id() seemed suspicious, a very well known number. Something we learnt at college days. BANG!!!</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">We had hit the max value for INT datatype 2^32 and the column by default is type "SIGNED". 2147483647 was the magic number.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">We were depressed as ALTERing such a huge table is going to be a nightmare. After some assistance from client we were able to drop the historical partition data to shrink the table. Still we had to deal with the ALTER and removal of "what now seemed to be garbage data with ID 2147483647". </span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">The ALTER was done but was still unsure about the start point for the auto_increment attribute on the table. So after checking the table structure again we noticed the IDs now were getting generated as required.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Further moving on to the investigation of the issue, like "what made such a huge pile-up of binary logs", I noticed that the auto_increment behavior was real unsubtle. The auto_increment column after reaching the max value of INT type, kept on generating the same number. In our clients case for almost 1.7Million records. Next as I mentioned above a series of updates running on the same ID (using a single condition in the where clause "... ID=magic number"), the updates actually changed 1.7Mil rows for every run. The binlog format was set to "mixed" and as per its working all updates were logged in "row" based formats. This started a chain reaction which generated Binary logs eventually filling up the volume, slave got overwhelmed with this and it filled up 100G volume for relay logs too. Next the updates started stalling (since unable to write the statements to binlogs), and finally timing out.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Now everything is back on track, and we decided to rebuild the slaves again. </span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">However, I realized the fact that, how a unique constraint on AUTO_INCREMENT column would have saved our day. A duplicate key error would have been sufficient for us to let know that INT has maxed out.</span><br />
<br /></div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com0tag:blogger.com,1999:blog-9117600489562432278.post-91654579046505998652013-02-05T23:31:00.002-08:002013-02-05T23:32:47.030-08:00Restoring single or some tables from Xtrabackup!!! (Replication Gotchas)<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">There is this wonderful <a href="http://www.mysqlperformanceblog.com/2012/01/25/how-to-recover-a-single-innodb-table-from-a-full-backup/" target="_blank">blog post</a> from one of the Perconian Miguel Angel Nieto, regarding restoring single table from an Xtrabackup backup. Since this is a very useful feature with respect to InnoDB where direct filecopy in MyISAM-style is not allowed, I thought of giving it a try, plus one of our client demanded this (Its not a good idea to restore 1TB of data just for a few MB table).</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">So moving forward, I wont list out the steps already mentioned by Miguel, but would like to mention one addition to the process. While you import the backup .ibd file in the data directory, the changed row data wont be visible unless you do a mysql restart. For eg : Here's what I did :</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">1) Deleted some records from a table.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> select count(*) from titles_back;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| count(*) |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 443308 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">1 row in set (0.99 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> delete from titles_back limit 20;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">Query OK, 20 rows affected (0.15 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> select count(*) from titles_back;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| count(*) |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 443288 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">1 row in set (0.08 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">2) Prepared the backup using --apply-log and --export option (I used Innobackupex script, however --prepare can be used instead if you are using xtrabackup binary).</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">3) Copied the tablespace file from the backup to the database schema directory. And made sure the file permissions are correct.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">4) Imported the tablespace back to the table.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> alter table titles_back import tablespace;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">Query OK, 0 rows affected (0.00 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> select count(*) from titles_back;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| count(*) |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 443288 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">1 row in set (0.08 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">As we can see above, the tablespace was imported without any problems, however still the deleted data is not visible, Although it exists in the files. Looks like somehow InnoDB is still referring the page for the deleted records from buffer pool (the page-ids are same), but I am not sure about it.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">5) Next an addition as I mentioned above would be to do a mysql restart for the instance.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">6) After the restart we have the missing rows back :</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">mysql> select count(*) from titles_back;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| count(*) |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 443308 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+----------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">1 row in set (1.03 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Now the restart can be actually scheduled for a later time, if not possible. InnoDB will allow any data manipulations to rows execpt the previously changed rows (the one we needed to restore from backup). Please look at the example below :</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> select * from titles_back limit 5;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+--------+-----------------+------------+------------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| emp_no | title | from_date | to_date |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+--------+-----------------+------------+------------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10001 | Senior Engineer | 1986-06-26 | 9999-01-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10002 | Staff | 1996-08-03 | 9999-01-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10003 | Senior Engineer | 1995-12-03 | 9999-01-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10004 | Engineer | 1986-12-01 | 1995-12-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10004 | Senior Engineer | 1995-12-01 | 9999-01-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+--------+-----------------+------------+------------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">5 rows in set (0.02 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> delete from titles_back where emp_no=10001;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">Query OK, 1 row affected (0.21 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> select * from titles_back limit 5;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+--------+-----------------+------------+------------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| emp_no | title | from_date | to_date |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+--------+-----------------+------------+------------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10002 | Staff | 1996-08-03 | 9999-01-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10003 | Senior Engineer | 1995-12-03 | 9999-01-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10004 | Engineer | 1986-12-01 | 1995-12-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10004 | Senior Engineer | 1995-12-01 | 9999-01-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10005 | Senior Staff | 1996-09-12 | 9999-01-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+--------+-----------------+------------+------------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">5 rows in set (0.00 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> alter table titles_back discard tablespace;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">Query OK, 0 rows affected (0.06 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">/* Copied the files from the backup to the database directory. */</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> alter table titles_back import tablespace;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">Query OK, 0 rows affected (0.00 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> select * from titles_back where emp_no=10001;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">Empty set (0.00 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">/* MySQL restart in between */</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">mysql> select * from titles_back where emp_no=10001;</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+--------+-----------------+------------+------------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| emp_no | title | from_date | to_date |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+--------+-----------------+------------+------------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">| 10001 | Senior Engineer | 1986-06-26 | 9999-01-01 |</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">+--------+-----------------+------------+------------+</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">1 row in set (4.47 sec)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">I hope this was clear. </span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Now lets observe what happens when you carry out the partial restore on a Master-Slave replication setup :</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">I followed all the steps mentioned above, to notice that the slave had broken with an error : </span><br />
<span style="font-family: Trebuchet MS, sans-serif;"> Last_Errno: 1030</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"> Last_Error: Error 'Got error -1 from storage engine' on query. Default database: 'employees'. Query: 'alter table titles_back import tablespace'</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span> </span><br />
<span style="font-family: Trebuchet MS, sans-serif;">This is pretty obvious since the Replication slave was able to execute the "discard tablespace" command (which infact deletes the .ibd file for the table), however it was not able to import. Simply because of the fact that .ibd file doesnt exist. Remember we copied the backup file to the master and replication doesnt handle a transfer of .ibd file as it does with "LOAD DATA INFILE ...". So all you need to do is restore the same file to the slave, and restart the mysqld on the slave. Simple.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Now this becomes a necessity, when you are discarding a tablespace and importing it, people might think of turning off sql_log_bin for the session, although this wont cause any errors and the slave to break it, but this will surely lead to inconsistent slaves (The deleted records would still not be present on the slave, but on the master.)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Hope this was helpful. Xtrabackup is a great utility to backup databases, which can save a lot of time, during the restore especially.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Disclaimer : This was strictly done on Percona MySQL server, to carry out the same on Oracle's MySQL server, some server config variables needs to set, you can refer Miguel's blogpost for that.</span><br />
<br /></div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com2tag:blogger.com,1999:blog-9117600489562432278.post-15345779631408580382013-02-03T09:07:00.001-08:002013-02-03T09:53:54.754-08:00How MySQL Replication works, somewhat a detailed pipeline approach<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">From many days I had been willing to write this blog, but due to some un-answered questions in my mind and literally tons of blogs already available on this topic, I went to see how it works at the code level and try to write something different. The MySQL code is obviously over-whelming and not strictly suggested unless you truly know what to look for. BANG!!! I hit this road-block "What to look for and where to ?". With some help from <a href="http://mysqlentomologist.blogspot.com/">Valeriy Kravchuk</a> </span><span style="font-family: 'Trebuchet MS', sans-serif;">one of the Prinicipal support engineer at <a href="http://www.percona.com/">Percona</a>, I got a starting point.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">So to start on, below are the files to look for the Replication code : </span><br />
<span style="font-family: Trebuchet MS, sans-serif;">/sql/slave.cc</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">/sql/sql_repl.cc</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Anyways I'll write another post with thorough source code implementation since this post can get long and we will shift from the topic.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">So this is what most of the books, manual and other blog posts have to explain about MySQL replication, "In a simple MySQL replication setup, an event (a DML, DDL statement most often) is written to binary logs which are stored on the originating server called Master. Then there is one more server running mysql which gets those events (generally some SQL or exact SQL statements) stored in Master's binary logs and stores them locally in a file called Relay-Log file. Further a specialized thread runs in MySQL on the another server which is usually named as a Slave which will execute the SQL statements stored in form of events in Relay-log, in simple terms as you would execute a SQL script file".</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<br />
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_iaMPQLFJ_KPgx5GlD6TfTKnUYOI41a1eq86od7u44Sa8thI_foSfkBCCCOkiSEyoP1rC57T6gLiVySHy4-3wkK2zrcSR_upFh-IG3RF6Pm9AajU5ozGctQ_rv545hqwRNBl_iOsxkQk/s1600/replication.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="154" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEj_iaMPQLFJ_KPgx5GlD6TfTKnUYOI41a1eq86od7u44Sa8thI_foSfkBCCCOkiSEyoP1rC57T6gLiVySHy4-3wkK2zrcSR_upFh-IG3RF6Pm9AajU5ozGctQ_rv545hqwRNBl_iOsxkQk/s320/replication.png" width="320" /></a></div>
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">This being a very simple definition, doesnt answer some of annoying questions which could come up in a novice's mind, like :</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">1) How does the Slave gets those events (statements) in its Relay log (trust me in my earlier days I used a think file-copy happens, yes, the manual make it looks something like this. Duh!!)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">2) Why do we create and grant replication privileges to a user on Master ?</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">3) How does the Slave knows when to get the data, MySQL replication being asynchronous, this seemed confusing.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">4) Finally who actually transfers the data ? Is it the Slave or the Master. (This really made me investigate the whole process).</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">So I'll try to answer these questions as per my understanding of code and some help from the debug process. (Disclaimer : I might not be accurate with all the process as this still needs to worked on).</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Answer 1) Lets take an example of a simple INSERT statement for a MyISAM table. An insert when executed on the master, opens up a bifurcated process, first which executes the statement (in short changes the required pages at the storage engine level, blah blah blah) and second adds the statement to a file-cache (a memory area opened up for Binary logs, named as Binlog cache). Now when the statement completes processing on the first part of the process, it waits until the statement is written to Binlog cache, once written the statement completes processing. Next the Binlog_Dump thread sends a broadcast message saying somewhat like "Hey, All the slaves I have a new statement in my cache, is anyone interested in copying it". This broadcast message is listened by the Slave_IO thread and it says "I am connected and I want that statement to be added in my own Relay-logs", the Slave_IO threads uses the same Binlog cache to copy the statement. The copy is done at packet level (every event would be checked with slave_max_allowed_packet size variable) and then it is written to relay log with the next Binlog file coordinates updated in the "master_info". There are several status messages for replication threads, which in our case after the events are copied to relay logs would be "Waiting for master to send event" on the Slave and "Master has sent all binlog to slave; waiting for binlog to be updated" on the Master.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Answer 2) The replication user account, is actually the whole bread-butter for the Slave_IO threads. Logically, what happens on the Master side is, the replication user logs in to the mysqld on Master (using the username, password and special grant privileges) and retrieves the events. Just imagine you doing a "Show binlog events in ... from ...", what you get in return is the events actually contained in the Binary logs on master. The replication user uses the same concept, although not the same code (since it has to do some extra work of accessing a totally different cache and writing to the relay-logs and updating the master-info). When there are no more events to be copied the Slave_IO thread waits for another broadcast from the Binlog_dump thread.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Answer 3) So what if your replication was paused for a month and then you start it. How does the slave knows where to start, well it simply looks into the master-info. So does it means the Binlog Dump thread will run forever, absolutely not. The Binlog Dump threads disconnects just like every other thread. However, when we start replication, the Slave_IO threads first registers himself with the master, if all goes well, the Master will spawn the Binlog Dump for the that slave. Now the Binlog_dump thread doesnt store the information like at what position the Slave_IO thread stopped a month ago, it just checks the current Binlog position and tells the Slave_IO thread "Hey you have to copy all the events till this position, I will let you know if there are any new events upcoming." So master-info, relay-log.index and bin-log.index plays an important role in this situation.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Answer 4) Logically, the answer is damn simple, the slaves pulls the latest Binary log data. So the code says that, the Binlog Dump thread will keep on sending broadcast messages, thats all it does. The Slave_IO thread will actually copy those events and write them to its relay logs.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">So after all this process completes, the Slave_SQL threads get the signal to execute the events newly added in relay logs, and once it has executed all the events, the Slave_SQL will show the status "Slave has read all relay log; waiting for the slave I/O thread to update it".</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">The working of Slave_SQL threads would fill up its own blog post and frankly there are several of them available on the internet.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">So MySQL replication is basically composed of three threads :</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">1) Binlog Dump : The one which runs on the Master and who signals the corresponding slave about new events in the binary logs. I mentioned corresponding since, every Slave_IO thread will have its own Binlog_Dump thread (Running as system_user in the Master).</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">2) Slave_IO : The one who copies the binary log events from the master, writes them to the relay log on Slave, and this is strictly not a file-copy :P</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">3) Slave_SQL : The one who executes the events (usually SQL statements) contained in the relay logs and purges the relay logs when all of the events have finished executing. This is one of the bottlenecks in MySQL replication being that all the statements are executed serially.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">Hope this new way of understanding MySQL replication was helpful.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">More to come soon...</span></div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com1tag:blogger.com,1999:blog-9117600489562432278.post-59007618347516255392013-01-08T16:11:00.001-08:002013-01-08T16:11:46.915-08:00How InnoDB writes data (Something's FUZZY out there!) ?<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<span style="font-family: Georgia, Times New Roman, serif;">Internals of InnoDB is one of the most challenging topics to learn in MySQL. Hence in my earlier blogpost, I tried to simplify how the InnoDB disk and Memory layout looks like. Although once you have understood the basic understanding, questions will start cropping in your minds, How and When exactly InnoDB writes its data. Lets start with our usual simplified approach to understanding this.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">Lets assume users are executing some updates on the some InnoDB tables. Usually InnoDB will check if the records(contained in data Pages) to be manipulated are existing inside the innodb buffer pool, if found then InnoDB will proceed with updating the concerned record(in reality the pages). If the page is not found in buffer pool, InnoDB Read I/O thread will the page from the concerned table's datafile and put it in the innodb buffer pool. Then the appropriate record in the page would be manipulated, a special LSN(Log Sequence Number) would be written to Transaction logs(ib_logfile) and the block will be marked as "Dirty".</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">This was straight-forward to understand, wasn't it ? Now what happens if someone goes and kills the MySQL process, as usual on restart InnoDB will starts its transaction recovery of committed and un-committed transaction. Wait a minute un-committed transactions are ok can be rolled back from RSegs (Undo or Rollback Statement situated in the ibdata file), but what about committed transactions. Isn't InnoDB suppose to write a dirty block as soon as a COMMIT(Checkpoint in too much technical terms) occurs. Well for your information this is known as Undo phase and Redo phase resp.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">So the instance recovery process raised some interesting questions, like when does InnoDB actually writes data to the persistent files on disk. Looking at the recovery process it doesnt seem to write immediately though. That's something fishy... I mean "FUZZY". So going back to our update statement's Dirty page, the actual page will not be flushed to disk or disk-cache, until the following scenarios occur :</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>1) The LRU(Least Recently Used) page list becomes full. InnoDB reads pages from the disk into buffer to do manipulations on them or as data pages for SELECT statements. Now if the page was modified it will land-up in the Dirty pages list. If it was used for SELECT, it will still be in the LRU list. Now since innoDB buffer is of finite size, the LRU list is bound to get full, so at this moment some Dirty buffers will be flushed to disk to accommodate any new READ buffers. In layman's words "A Select can flush a DML".</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>2) Second scenario is quite compelling, this is when the "Flush List" itself runs out of room to accommodate more dirty buffers, Oh Yes! this happens and is controlled by "innodb_max_dirty_pages_pct" mainly. So to make room Least manipulated pages are flushed (FUZZY!!! Which means an update on a page which happened long while ago, could still be kept in Flush List, and the page which is somewhat recently manipulated could also be flushed to disk, however the older page is continuously updated by other statements too, this is a kinda write-optimization, Percona counts this using "Young-making rate").</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">BTW, if you are wondering which process actually writes these pages in their specific table's datafiles, I have a hint for you, they are the counter-part of READ I/O threads, although this all happens in the background and we dont need to worry about.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">Note : There are three ways in which the data page would be written, the above two (Fuzzy Check-pointing), plus a "Sharp checkpoint" as explained by Baron Schwartz in his blog-posts at Xaprb.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">So are there any performance bottlenecks and improvements ? Well logically dirty pages flushed due to LRU list signifies more memory should be added, a bigger footprint for innodb_buffer_pool_size. Dirty pages flushed due to Flush list signifies more complicated optimization involving innodb_adaptive_flushing, innodb_max_dirty_pages_pct, innodb_io_capacity and innodb_log_file_size (Courtesy mysqlperformanceblog.com).</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="font-family: Georgia, Times New Roman, serif;">Just to leave you tickling, all these factors depends upon an important setting known as "innodb-flush-method", keep reading...</span><br />
</div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com0tag:blogger.com,1999:blog-9117600489562432278.post-60655000387409954462012-12-17T00:19:00.003-08:002012-12-17T00:21:21.894-08:00Master crash, Replication broken, Master resumed, Replication still broken!!!<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">In search of MySQL replication topic for my blogs, Today I got struck with an unusual replication behavior. Well that gives me a topic (Never mind the issue has been fixed, as MySQL says.)</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">So the scenario is we have simple Master-->Slave replication setup for one of our client. Master crashes, that stops the replication, next the master is brought back up and running, now when we resume the replication, the IO thread slaps us with an error (Strange error, when nothing should have gone wrong). </span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="color: red; font-family: Trebuchet MS, sans-serif;">121216 22:12:46 [ERROR] Slave I/O thread: error reconnecting to master 'abc@xxx.xxx.xxx.xxx:3306': Error: 'Lost connection to MySQL server at 'reading initial communication packet', system error: 145' errno: 2013 retry-time: 60 retries: 86400</span><br />
<span style="color: red; font-family: Trebuchet MS, sans-serif;">121216 22:38:57 [Note] Slave: connected to master 'bfimapp@xx.xxx.xxx.xxx:3306',replication resumed in log 'master1-bin.001073' at position <b>1042308987</b></span><br />
<span style="color: red; font-family: Trebuchet MS, sans-serif;">121216 22:38:57 [ERROR] Error reading packet from server: Client requested master to start replication from impossible position ( server_errno=1236)</span><br />
<span style="color: red; font-family: Trebuchet MS, sans-serif;">121216 22:38:57 [ERROR] Got fatal error 1236: 'Client requested master to start replication from impossible position' from master when reading data from binary log</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">So thats the error. It suggests that replication is trying to start from an impossible position. So we digged in the concerned binlogs just to find the last position (end_log_pos for the last event) was <b><span style="color: red;">1042308684</span></b>. That doesnt seem correct. Since the master crashed it could not add a rotate event to the file. But thats just another case.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">The question is what made the master.info to write such a position, which didnt exist in binlog files on master. </span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">In search of answers, lets follow logical work flow of MySQL Replication, a transaction (can be a single statement in case of autocommit=1) when executed is written to binlog, which then is signified to the slave for it to copy it to the relay-logs and then coordinates written to the master.info (the point till where IO thread has copied binlogs). Now the above situation suggests that, MySQL didnt get a chance to write the statement to binlog file (i.e. to call "fdatasync") before sending it to the slave. This is where "Binlog Cache" comes in picture.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">So a DML when executed inside a transaction with binlogs enabled, will be written to "Binlog cache", and once the transaction completes the "Binlog Cache" is flushed to disk (written to the binlog files). This is the point where the statement got replicated to slave and new position got written to master.info however before flushing it to the binlog files the master crashed. This is not binlog corruption, nor the slave had any clue to jump on to the next binlog file, it just returned the error. </span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">A fix to this problem would be to execute the change master on slave for new binlog position (the file created after mysql start on master).</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">There are still some questions unanswered, like :</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">1) Does the innodb instance recovery rolls back the failed transaction, if it does, then we have an inconsistent slave.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">2) Would it happen if the tables in question are MyISAM.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">3) How will replication behave if there is no rotate event in binary logs.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;">4) Could a sync_binlog and sync_relay_log save us from this situation.</span><br />
<span style="font-family: Trebuchet MS, sans-serif;"><br /></span>
<span style="font-family: Trebuchet MS, sans-serif;">I will try to find answers to these questions till then enjoy replicating.</span></div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com0tag:blogger.com,1999:blog-9117600489562432278.post-13155125000394639312012-12-07T00:10:00.000-08:002012-12-07T00:14:30.146-08:00MySQL Replication Recipe in a nutshell<div dir="ltr" style="text-align: left;" trbidi="on">
<b><span style="font-family: Georgia, Times New Roman, serif;"> Today, one of our client's developer approached for guidance over setting up replication between two hosts, and some advice over general points. Although this is not an essay write-up for the blog, but I hope this will be useful for other Remote-DBAs out there like me to tackle such requests...</span></b><br />
<b><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></b>
<b><span style="font-family: Georgia, Times New Roman, serif;">Below is the email thread transcript I replied to client :</span></b><br />
<br />
<br />
<span style="color: red; font-family: Georgia, Times New Roman, serif;">The two database instances I have right now are in QE boxes - x.x.x.228 and x.x.x.229</span><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;">Answer : Assuming that out of the two servers you have mentioned, x.x.x.228 should be the MASTER and x.x.x.229 should be the SLAVE.</span></i></b><br />
<br />
<span style="color: red; font-family: Georgia, Times New Roman, serif;">The questions that I have are:</span><br />
<br />
<span style="color: red; font-family: Georgia, Times New Roman, serif;">1. Where can I find the information on how to configure for master-slave replication.</span><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;">Answer : Setting up replication is pretty straight-forward. If these are two fresh servers, without any data in it at this time, please follow the steps below :</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>1) Enable binary logging on the MASTER server. This can be done by adding "log-bin" directive in the /etc/my.cnf mysql configuration file. Add this directive in the [mysqld] section of the server.</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>2) Change or Add the "server-id" directive in [mysqld] section of mysql configuration file. The directive would be "server-id=228" for the MASTER, and "server-id=229" in the SLAVE.</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>3) Restart MySQL on both the MASTER and SLAVE. This can be done by using "/etc/init.d/mysql restart". In case of failure to start check the error log file for MySQL.</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>4) Upon successful restart execute the following command on MASTER "SHOW MASTER STATUS \G" to get the starting point of replication, commonly known as "Binlog Coordinates". These include the "File" and "Position". Make a note of this coordinates.</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>mysql> show master status \G</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>*************************** 1. row ***************************</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"> File: bin.000001</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"> Position: 107</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Binlog_Do_DB:</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Binlog_Ignore_DB:</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>1 row in set (0.00 sec)</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></i></b>
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>5) Next we need to create a user on the MASTER. This user login will be used by the SLAVE to replicate from the MASTER. On the MASTER execute the command "GRANT replication client, replication slave ON *.* TO 'repl'@'x.x.x.229' IDENTIFIED BY PASSWORD 'replipass';".</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>6) Once user is created execute the following command on the SLAVE to initiate replication "CHANGE MASTER TO MASTER_HOST='x.x.x.228', MASTER_PORT=3306, MASTER_USER='repl', MASTER_PASSWORD='replipass', MASTER_LOG_FILE='<"File" obtained in Step 4>', MASTER_LOG_POS=<Position obtained in Step 4>;"</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>7) Once successfuly executing this command, check the replication status by using the following command "SHOW SLAVE STATUS \G" in the SLAVE. This will output number of rows and column in vertical format. If you notice the value for "Slave_IO_Running" and "Slave_SQL_Running" is "No", hence we will need to start the replication now, earlier step initiated the replication it did not start it. To start replication ono the SLAVE execute "START SLAVE;", likewise to stop replication execute "STOP SLAVE;".</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>8) Now we have replication setup wherein all the writes on MASTER would be replicated to the SLAVE. MASTER --> SLAVE.</span></i></b><br />
<span class="Apple-tab-span" style="white-space: pre;"><b><i><span style="font-family: Georgia, Times New Roman, serif;"> </span></i></b></span><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>Now, if we have existing data on the MASTER server, please follow the steps below :</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>1) Follow Steps 1), 2) and 3) as mentioned in the above plan.</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>2) Next we will need to take backup of the databases present on the MASTER and restore it on the SLAVE, so that both MASTER and SLAVE will have the common point to start. To take the backup execute the following command on the shell prompt "mysqldump -u<username> -p<password> --all-databases --single-transaction --master-data=2 > <backup-file location>;"</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>3) Once the backup is completed, copy it to the SLAVE host and restore it using the following command on the shell prompt "mysql -u<username> -p<password> < <backup-file>"</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>4) Once restore is done, we will need the "Binlog Coordinates" which are present in the backup-file (we used an option --master-data for mysqldump which automatically gets us the binlog coordinates). Open teh backup-file using any file read utility for eg: less. Use the following command on the shell prompt : "less <backup-file>", once opened, refer to the part saying "CHANGE MASTER TO MASTER_LOG_FILE=<some binlog file>, MASTER_LOG_POS=<some binlog position>;", note down these binlog coordinates.</span></i></b><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><span class="Apple-tab-span" style="white-space: pre;"> </span>5) Follow Steps 5), 6), 7) as mentioned in the above plan for successfully starting replication.</span></i></b><br />
<span class="Apple-tab-span" style="white-space: pre;"><b><i><span style="font-family: Georgia, Times New Roman, serif;"> </span></i></b></span><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;">You can use the following link for more information : http://dev.mysql.com/doc/refman/5.5/en/replication-howto.html</span></i></b><br />
<span class="Apple-tab-span" style="white-space: pre;"><span style="font-family: Georgia, Times New Roman, serif;"> </span></span><br />
<span style="color: red; font-family: Georgia, Times New Roman, serif;">2. I have probably 40% writes and 60% reads. Although I will be using a cache on my application server to reduce the load on database, I still would like to know if I should write to master and read from slave?</span><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;">Answer : This read-write balancing can be done at the application level by simply redirecting all the reads to the SLAVE and all the DMLs to the MASTER. For eg : INSERT INTO t VALUES(1); should be executed on the MASTER, and SELECT * FROM T; can be run on the SLAVE. This can be done by simply using the IP addresses of the MASTER and SLAVE hosts.</span></i></b><br />
<br />
<span style="color: red; font-family: Georgia, Times New Roman, serif;">3. How frequently should I set up the replication to happen? Should it be a row level replication or statement level?</span><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;">Answer : Usually RBR(Row Based Replication) is the safest, however it will fill up the binlogs easily and utilize more disk space, SBR(Statement Based Replication) is the simplest however is vulnerable to create some inconsistencies between MASTER and SLAVE. The best method for binlog-format to use would be "Mixed", wherein both the Statement and Row based methods are used interchangeably by MySQL. To use this method add the following directive "binlog-format=mixed" in the [mysqld] section of the mysql configuration file. You should add this directive while enabling binlogs in Step 1) of the above mentioned replication setup Plans.</span></i></b><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="color: red; font-family: Georgia, Times New Roman, serif;">4. In terms of tuning mysql database, what kind of parameters should I look at. I did copy over the same parameters as that of logout mysql server for now.</span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><b><i>Answer : Generally tuning can be done after observing the server for any bottlenecks for sometime. For safety and to avoid any inconsistent slaves in future enable "read-only" in the SLAVE in the [mysqld] section of the MySQL configuration file. This will not allow any writes to the SLAVE except from the replication threads and users with SUPER privileges. If using "InnoDB" storage engines for all tables configure "innodb_buffer_pool_size" to 70% of the available RAM for eg. "innodb_buffer_pool_size=4G" for a box with 6GB RAM</i></b>. </span><br />
<span style="font-family: Georgia, Times New Roman, serif;"><br /></span>
<span style="color: red; font-family: Georgia, Times New Roman, serif;">5. There is a log table that¹s going to have most of the writes. Apart from that, there are three other tables which are written in to only 10 -</span><br />
<span style="color: red; font-family: Georgia, Times New Roman, serif;">20 % of the time and read most of the time. Negligible or almost no updates or deletes.</span><br />
<b><i><span style="font-family: Georgia, Times New Roman, serif;">Answer : We strongly recommend you to use MySQL or Percona-MySQL 5.5 version. And to convert all the tables to InnoDB storage engine. Percona has an edge over MySQL server in terms of InnoDB performance. InnoDB nowadays not only does "writes" faster but also "reads".</span></i></b><br />
<b><i><br /></i></b>
<b><i><span style="font-family: Georgia, Times New Roman, serif;"><br /></span></i></b>
<b><span style="font-family: Georgia, Times New Roman, serif;">This was a question-answer pattern reply to the client, and hope many of the RemDBA could use the same. However the main essence in the end is still MySQL Replication Recipe in a nutshell (Question-Answer Cookbook style)... :)</span></b></div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com0tag:blogger.com,1999:blog-9117600489562432278.post-88732420099546467162012-11-19T02:55:00.001-08:002012-11-19T02:55:47.625-08:00Beware !!! Danger ahead "sql_log_bin"... <div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<span style="font-family: inherit;">Some days ago, one of our clients reported that data for present day then, was missing from the slaves, upon querying the table data, we were taken by surprise to find that none of the data was replicated to the slaves, none of the slaves. The large slave lag cleared in seconds, which was not the case earlier with the client generating more than a Gig of data daily. </span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">To start the investigation we started checking if the relay log is writing the event data properly, checking the relay we saw that the latest data was absent from that. Strange. It was strange because, none of the problems occurred on the slave or master which would have caused the issue, the "exec_master_log_pos" wasn't changing anymore.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">The only thing happened the prior weekend was a table conversion. We had a adequately large, almost 40G table to convert to InnoDB, the client needed timelines for that, hence we started the process on one of the slaves. We got the timeline, client was OK with it, then we proceeded with the master's table. Just as obvious we didn't want this statement to replicate to the slaves, hence asked the upcoming team to turn off "sql_log_bin" before starting the alter. </span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Boom !!! We had done this before, using sql_log_bin was a cake walk on 5.0 and 5.1, but we were on a Percona 5.5, also we had executed "SET GLOBAL sql_log_bin=OFF", had it been executed on 5.1 or 5.0, this would have returned an error, since its a "SESSION" variable. However in mysql 5.5 this became a "GLOBAL" and "SESSION" variable, all the threads executing DMLs on the servers were not logged in binary logs. The replication had no issues, however it became a pain when we discovered it, and that was almost after a day. </span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">We had no choice but to refresh the slaves, but the data was huge 1.2T, after 3 days of boredom, we finally had the slaves in sync.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">If there was a "Remove feature request" in MySQL or Percona, I would strongly recommend removing the global scope of this configuration.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">As an advice, make a habit of using "SESSION" while turning ON or OFF such params or even while viewing the server STATUS.</span><br />
<span style="font-family: inherit;"><br /></span>
<span style="font-family: inherit;">Happy Replicating...</span><br />
<br />
</div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com0tag:blogger.com,1999:blog-9117600489562432278.post-37695076830438639672012-08-25T14:07:00.003-07:002012-11-19T02:57:55.625-08:00Speeding Optimize for InnoDB tables<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="MsoNormal">
<b><u>Speeding up the OPTIMIZE table for InnoDB tables :<o:p></o:p></u></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Recently we learnt about an incident with one of our client,
where we tried to rebuild a huge InnoDB table (180G) to reclaim unused space
using command "<b>ALTER TABLE ... ENGINE = INNODB ;</b>", which took
forever to rebuild even after turning "<b>fast_index_creation</b>"
server variable ON. Let’s dig into details as to why it took such a long time.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>Purpose of fast_index_creation :-</b> If a table is
altered using <b>ALTER TABLE ... ENGINE=INNODB;</b> then this option will
create a temporary table with only clustered index in it, then reload the data
from original table and then create secondary indexes on top of it. However
there is a serious limitation with this server variable, <b><span style="color: red;">this works only for secondary indexes created with
"ALTER TABLE ADD INDEX ..." and "CREATE INDEX indexname
...".</span></b><o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Percona has overcome this limitation by creating a new
server variable known as "<b>expand_fast_index_creation</b>". This
option is set to OFF by default.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Please see the below test results for performance
improvement on using "<b>expand_fast_index_creation</b>" option over
plain "<b>OPTIMIZE TABLE</b>" command. The innodb_buffer_pool_size is
set to 256M.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> set expand_fast_index_creation=<b><span style="color: red;">OFF</span></b>;<o:p></o:p></div>
<div class="MsoNormal">
Query OK, 0 rows affected (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> set profiling=ON;<o:p></o:p></div>
<div class="MsoNormal">
Query OK, 0 rows affected (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> optimize table employees;<o:p></o:p></div>
<div class="MsoNormal">
+---------------------+----------+----------+-------------------------------------------------------------------+<o:p></o:p></div>
<div class="MsoNormal">
|
Table
| Op | Msg_type |
Msg_text
|<o:p></o:p></div>
<div class="MsoNormal">
+---------------------+----------+----------+-------------------------------------------------------------------+<o:p></o:p></div>
<div class="MsoNormal">
| employees.employees | optimize |
note | Table does not support optimize, doing recreate
+ analyze instead |<o:p></o:p></div>
<div class="MsoNormal">
| employees.employees | optimize | status |
OK
|<o:p></o:p></div>
<div class="MsoNormal">
+---------------------+----------+----------+-------------------------------------------------------------------+<o:p></o:p></div>
<div class="MsoNormal">
2 rows in set (<b><span style="color: red;">32.42 sec</span></b>)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> show profile;<o:p></o:p></div>
<div class="MsoNormal">
+------------------------------+-----------+<o:p></o:p></div>
<div class="MsoNormal">
|
Status
| Duration |<o:p></o:p></div>
<div class="MsoNormal">
+------------------------------+-----------+<o:p></o:p></div>
<div class="MsoNormal">
|
starting
| 0.000418 |<o:p></o:p></div>
<div class="MsoNormal">
| checking
permissions | 0.000023 |<o:p></o:p></div>
<div class="MsoNormal">
| Opening
tables
| 0.000034 |<o:p></o:p></div>
<div class="MsoNormal">
| System
lock
| 0.000016 |<o:p></o:p></div>
<div class="MsoNormal">
| Waiting for query cache lock | 0.000041 |<o:p></o:p></div>
<div class="MsoNormal">
|
init
| 0.000008 |<o:p></o:p></div>
<div class="MsoNormal">
| Opening tables
| 0.000124 |<o:p></o:p></div>
<div class="MsoNormal">
| System
lock
| 0.000007 |<o:p></o:p></div>
<div class="MsoNormal">
|
setup
| 0.000043 |<o:p></o:p></div>
<div class="MsoNormal">
| creating
table
| 0.396054 |<o:p></o:p></div>
<div class="MsoNormal">
| After
create
| 0.000090 |<o:p></o:p></div>
<div class="MsoNormal">
<b><span style="color: red;">| copy to tmp
table |
31.557635 |<o:p></o:p></span></b></div>
<div class="MsoNormal">
| rename result
table | 0.373926 |<o:p></o:p></div>
<div class="MsoNormal">
|
end
| 0.000009 |<o:p></o:p></div>
<div class="MsoNormal">
| Waiting for query cache lock | 0.000002 |<o:p></o:p></div>
<div class="MsoNormal">
|
end
| 0.000017 |<o:p></o:p></div>
<div class="MsoNormal">
| Opening
table
| 0.000121 |<o:p></o:p></div>
<div class="MsoNormal">
| System
lock
| 0.097834 |<o:p></o:p></div>
<div class="MsoNormal">
| query
end
| 0.000007 |<o:p></o:p></div>
<div class="MsoNormal">
| closing
tables
| 0.000003 |<o:p></o:p></div>
<div class="MsoNormal">
| freeing
items
| 0.000035 |<o:p></o:p></div>
<div class="MsoNormal">
| cleaning
up
| 0.000006 |<o:p></o:p></div>
<div class="MsoNormal">
+------------------------------+-----------+<o:p></o:p></div>
<div class="MsoNormal">
22 rows in set (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> set profiling=OFF;<o:p></o:p></div>
<div class="MsoNormal">
Query OK, 0 rows affected (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
As you can notice from the PROFILE that the major part of
time was spent in copying the data to tmp table. This involves inserting
individual records into the tmp file and the corresponding indexes, after this
the index will be sorted.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> set expand_fast_index_creation=<b><span style="color: #00b050;">ON</span></b>; #### Server
variable is turned ON #### <o:p></o:p></div>
<div class="MsoNormal">
Query OK, 0 rows affected (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> show variables like '%expand%';<o:p></o:p></div>
<div class="MsoNormal">
+----------------------------+-------+<o:p></o:p></div>
<div class="MsoNormal">
|
Variable_name
| Value |<o:p></o:p></div>
<div class="MsoNormal">
+----------------------------+-------+<o:p></o:p></div>
<div class="MsoNormal">
| expand_fast_index_creation | ON |<o:p></o:p></div>
<div class="MsoNormal">
+----------------------------+-------+<o:p></o:p></div>
<div class="MsoNormal">
1 row in set (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> set profiling=ON;<o:p></o:p></div>
<div class="MsoNormal">
Query OK, 0 rows affected (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> optimize table employees;<o:p></o:p></div>
<div class="MsoNormal">
+---------------------+----------+----------+-------------------------------------------------------------------+<o:p></o:p></div>
<div class="MsoNormal">
|
Table
| Op | Msg_type |
Msg_text
|<o:p></o:p></div>
<div class="MsoNormal">
+---------------------+----------+----------+-------------------------------------------------------------------+<o:p></o:p></div>
<div class="MsoNormal">
| employees.employees | optimize |
note | Table does not support optimize, doing recreate
+ analyze instead |<o:p></o:p></div>
<div class="MsoNormal">
| employees.employees | optimize | status |
OK
|<o:p></o:p></div>
<div class="MsoNormal">
+---------------------+----------+----------+-------------------------------------------------------------------+<o:p></o:p></div>
<div class="MsoNormal">
2 rows in set (<b><span style="color: #00b050;">15.31 sec</span></b>)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> show profile;<o:p></o:p></div>
<div class="MsoNormal">
+------------------------------+----------+<o:p></o:p></div>
<div class="MsoNormal">
|
Status
| Duration |<o:p></o:p></div>
<div class="MsoNormal">
+------------------------------+----------+<o:p></o:p></div>
<div class="MsoNormal">
|
starting
| 0.000055 |<o:p></o:p></div>
<div class="MsoNormal">
| checking
permissions | 0.000021 |<o:p></o:p></div>
<div class="MsoNormal">
| Opening tables
| 0.000031 |<o:p></o:p></div>
<div class="MsoNormal">
| System
lock
| 0.000014 |<o:p></o:p></div>
<div class="MsoNormal">
| Waiting for query cache lock | 0.000051 |<o:p></o:p></div>
<div class="MsoNormal">
|
init
| 0.000019 |<o:p></o:p></div>
<div class="MsoNormal">
| Opening
tables
| 0.000141 |<o:p></o:p></div>
<div class="MsoNormal">
| System
lock
| 0.000010 |<o:p></o:p></div>
<div class="MsoNormal">
| setup
| 0.000042 |<o:p></o:p></div>
<div class="MsoNormal">
| creating
table
| 0.334066 |<o:p></o:p></div>
<div class="MsoNormal">
| After
create
| 0.000093 |<o:p></o:p></div>
<div class="MsoNormal">
<b><span style="color: #00b050;">| copy to tmp
table |
9.129202 |<o:p></o:p></span></b></div>
<div class="MsoNormal">
<b><span style="color: #00b050;">| restoring secondary
keys | 5.676336 |<o:p></o:p></span></b></div>
<div class="MsoNormal">
| rename result
table | 0.164292 |<o:p></o:p></div>
<div class="MsoNormal">
|
end
| 0.000011 |<o:p></o:p></div>
<div class="MsoNormal">
| Waiting for query cache lock | 0.000003 |<o:p></o:p></div>
<div class="MsoNormal">
|
end
| 0.000021 |<o:p></o:p></div>
<div class="MsoNormal">
| Opening
table
| 0.000146 |<o:p></o:p></div>
<div class="MsoNormal">
| System
lock
| 0.003445 |<o:p></o:p></div>
<div class="MsoNormal">
| query
end
| 0.000006 |<o:p></o:p></div>
<div class="MsoNormal">
| closing
tables
| 0.000003 |<o:p></o:p></div>
<div class="MsoNormal">
| freeing
items
| 0.000030 |<o:p></o:p></div>
<div class="MsoNormal">
| cleaning
up
| 0.000005 |<o:p></o:p></div>
<div class="MsoNormal">
+------------------------------+----------+<o:p></o:p></div>
<div class="MsoNormal">
23 rows in set (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
In the above results we can see the records are first copied
to a tmp table consisting only of clustered index and then secondary indexes
are applied (<b>Restoring secondary keys</b>).<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
However there are some caveats at using this option, one of
which is the <b><span style="color: red;">FOREIGN KEY constraint</span></b>. If
the table has a foreign key constraint, then percona server will consider
"<b><span style="color: red;">expand_fast_index_creation=OFF</span></b>",
since this constraint cannot be disabled. (This is reason we did not notice any
performance improvement while rebuilding the table for one of our client.)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Please see the results below, after a foreign key constraint
was added to the table.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> set expand_fast_index_creation=<b><span style="color: #00b050;">ON</span></b>;<o:p></o:p></div>
<div class="MsoNormal">
Query OK, 0 rows affected (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> set profiling=ON;<o:p></o:p></div>
<div class="MsoNormal">
Query OK, 0 rows affected (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> optimize table employees;<o:p></o:p></div>
<div class="MsoNormal">
+---------------------+----------+----------+-------------------------------------------------------------------+<o:p></o:p></div>
<div class="MsoNormal">
|
Table
| Op | Msg_type |
Msg_text
|<o:p></o:p></div>
<div class="MsoNormal">
+---------------------+----------+----------+-------------------------------------------------------------------+<o:p></o:p></div>
<div class="MsoNormal">
| employees.employees | optimize |
note | Table does not support optimize, doing recreate
+ analyze instead |<o:p></o:p></div>
<div class="MsoNormal">
| employees.employees | optimize | status |
OK
|<o:p></o:p></div>
<div class="MsoNormal">
+---------------------+----------+----------+-------------------------------------------------------------------+<o:p></o:p></div>
<div class="MsoNormal">
2 rows in set (<b><span style="color: red;">33.86 sec</span></b>)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
mysql> show profile;<o:p></o:p></div>
<div class="MsoNormal">
+------------------------------+-----------+<o:p></o:p></div>
<div class="MsoNormal">
|
Status
| Duration |<o:p></o:p></div>
<div class="MsoNormal">
+------------------------------+-----------+<o:p></o:p></div>
<div class="MsoNormal">
|
starting
| 0.000050 |<o:p></o:p></div>
<div class="MsoNormal">
| checking
permissions | 0.000022 |<o:p></o:p></div>
<div class="MsoNormal">
| Opening
tables
| 0.000031 |<o:p></o:p></div>
<div class="MsoNormal">
| System
lock
| 0.000014 |<o:p></o:p></div>
<div class="MsoNormal">
| Waiting for query cache lock | 0.000071 |<o:p></o:p></div>
<div class="MsoNormal">
|
init
| 0.000009 |<o:p></o:p></div>
<div class="MsoNormal">
| Opening
tables
| 0.000121 |<o:p></o:p></div>
<div class="MsoNormal">
| System
lock
| 0.000006 |<o:p></o:p></div>
<div class="MsoNormal">
|
setup
| 0.000049 |<o:p></o:p></div>
<div class="MsoNormal">
| creating
table
| 0.353180 |<o:p></o:p></div>
<div class="MsoNormal">
| After
create
| 0.000101 |<o:p></o:p></div>
<div class="MsoNormal">
<b><span style="color: red;">| copy to tmp
table |
33.049338 |<o:p></o:p></span></b></div>
<div class="MsoNormal">
| rename result
table | 0.460403 |<o:p></o:p></div>
<div class="MsoNormal">
|
end
| 0.000010 |<o:p></o:p></div>
<div class="MsoNormal">
| Waiting for query cache lock | 0.000002 |<o:p></o:p></div>
<div class="MsoNormal">
|
end
| 0.000019 |<o:p></o:p></div>
<div class="MsoNormal">
| Opening
table
| 0.000162 |<o:p></o:p></div>
<div class="MsoNormal">
| System lock
|
0.004451 |<o:p></o:p></div>
<div class="MsoNormal">
| query
end
| 0.000004 |<o:p></o:p></div>
<div class="MsoNormal">
| closing
tables
| 0.000002 |<o:p></o:p></div>
<div class="MsoNormal">
| freeing
items
| 0.000021 |<o:p></o:p></div>
<div class="MsoNormal">
| cleaning
up
| 0.000003 |<o:p></o:p></div>
<div class="MsoNormal">
+------------------------------+-----------+<o:p></o:p></div>
<div class="MsoNormal">
22 rows in set (0.00 sec)<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
As seen above there are no performance improvements while
the table had foreign key constraint. Disabling through
"foreign_key_checks" didnt work either. Same case is seen in terms of
"<b><span style="color: red;">UNIQUE</span></b>" keys.<o:p></o:p></div>
<div class="MsoNormal">
<br /></div>
</div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com0tag:blogger.com,1999:blog-9117600489562432278.post-11322462352178598862012-06-01T00:06:00.002-07:002012-11-19T02:59:01.241-08:00MySQL InnoDB disk and memory layout<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="MsoNormal">
<b><u>InnoDB Storage engine Disk and Memory Layout :<o:p></o:p></u></b></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgjWu3dRvtbsGC4Asa5yR7tiIrf7RkeAbLGo6gYIEfdI0MkKPVWB-ZEUjMiR0sR5SeWM8rb2cmwE6zO09_5DyM3jRvJEDd54IQroL1VhQjrtcX0By_sdOMe2jbqp8XF7-V8t9upZy3AyE/s1600/Innodb_architecture.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="452" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEhgjWu3dRvtbsGC4Asa5yR7tiIrf7RkeAbLGo6gYIEfdI0MkKPVWB-ZEUjMiR0sR5SeWM8rb2cmwE6zO09_5DyM3jRvJEDd54IQroL1VhQjrtcX0By_sdOMe2jbqp8XF7-V8t9upZy3AyE/s640/Innodb_architecture.png" width="640" /></a></div>
<div class="MsoNormal">
<br />
<!--[endif]--></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
InnoDB is one of the most important storage engines in
MySQL. Due to its <b>transactional capabilities, locking levels and foreign key
support</b> it has become one of the widely used storage engines for MySQL.
However unlike MyISAM, InnoDB is fairly complex in its architecture. Let's review
how the architecture looks like on disk and memory (RAM) subsystem. The
following components are the most important in InnoDB :</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>1)</b> InnoDB buffer pool</div>
<div class="MsoNormal">
<b>2)</b> Transaction log buffer</div>
<div class="MsoNormal">
<b>3)</b> InnoDB IO threads</div>
<div class="MsoNormal">
<b>4)</b> Transaction Log files</div>
<div class="MsoNormal">
<b>5)</b> Table-space files</div>
<div class="MsoNormal">
<b>6)</b> Datafiles</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
To demonstrate the use of all the components, let's take a
simple batch of Insert statement into consideration. The statement would be <b>"INSERT
INTO employee.employees (empno, empname, sal, hiredate, dept) VALUES (1,
'Akshay', 'XXXXXX','01-12-2011', 'MySQL')".</b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
So the above statement makes it through components at the
server level like we referred earlier in our previous document. Likewise coming
from Client, scanning the Query cache, parsing, pre-processing, optimizing and
then finally to the storage engine, let's see what happens further at the
storage engine level :</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>1)</b> Once the statement enters the InnoDB kernel,
innodb checks whether the requested data "<b>page</b>" exist in the
"<b>BUFFER POOL</b>". The buffer pool contains all the data pages
which needs to be changed (INSERT, UPDATE and DELETE) or read ( SELECT). <b>It
will contain both the Index as well as Data Pages</b>. So the INSERT statement
above will check if the page in which record with values "(1, 'Akshay',
'XXXXXX','01-12-2011', 'MySQL')" needs to be inserted already exist in the
pool, if it finds the page it will make the changes to page or if it doesnt
find the page it will read from the Disk (datafile) in to the memory (Buffer
pool) and then change it. InnoDB Buffer pool is most important memory structure
in InnoDB, and is set using "<b>innodb_buffer_pool_size</b>"
variable. Usually this is set to around 50-80% of the total RAM.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>2)</b> Once the data and index pages are changed in the
buffer pool the pages are marked "DIRTY" and the INSERT statement is
logged in the <b>transaction log buffer</b>. The function of transaction log
buffer is very trivial in an RDBMS. Later the contents of transaction log
buffer are written to the transaction log files (specifically on COMMITs). Lets
summarize the document later with note on understanding transaction logs.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>3)</b> InnoDB IO threads are internal to InnoDB kernel
and not related to any connection threads or O.S. threads (<b>InnoDB IO threads
works at the storage engine layer, whereas Connection threads works at MySQL
Server layer</b>). These IO threads (mainly known as Innodb Read threads and
Write threads) does the job of writing <b>DIRTY</b> pages to the disk files
from buffer pool and log buffer and reading pages from the files. Hence the
above INSERT's data will written to the disk by one of the IO threads. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>4)</b> Transaction log files contains the contents from
transaction log buffer on a durable media (Hard disks). <b>It's used for
transaction recovery during Instance crash (We will visit Instance crash in
next sessions) and for POINT IN TIME RECOVERY</b>. These files can be found in
MySQL datadir namely "<b>ib_logfile0</b>" and "<b>ib_logfile1</b>".
The files are used in a circular fashion, like, initially innodb will start
filling up "ib_logfile0" and then "ib_logfile1".</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>5)</b> Table-space files usually named as "<b>ibdata1</b>"
files are used for multiple purposes. It stores the actual table and index data
(if "<b>innodb_file_per_table</b>" is disabled), data-dictionary (<b>meta-data
about Innodb tables</b>) and the undo-logs (<b>used for ROLLBACK</b>). So the
DIRTY pages from BUFFER POOL will be written to the table-space files by the IO
threads. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>6)</b> Datafiles are the files created when "<b>innodb_file_per_table</b>"
is set. These files have filenames like <b><table_name>.ibd</b>. These
files <b>contains index as well as actual data</b> of the tables. These files
allow easy maintenance as compared single tablespace file due to its size
considerations. Every table will have its own <b>.ibd</b> file created in its
respective data directory.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b><span style="color: red;">InnoDB uses its log to reduce the
cost of committing transactions. Instead of flushing the buffer pool to disk
when each transaction commits, it logs the transactions. The changes
transactions make to data and indexes often map to random locations in the
tablespace, so flushing these changes to disk would require random I/O. InnoDB
assumes it’s using conventional disks, where random I/O is much more expensive
than sequential I/O because of the time it takes to seek to the correct
location on disk and wait for the desired part of the disk to rotate under the
head.<o:p></o:p></span></b></div>
<div class="MsoNormal">
<b><span style="color: red;">InnoDB uses its log to convert
this random disk I/O into sequential I/O. Once the log is safely on disk, the
transactions are permanent, even though the changes haven’t been written to the
data files yet. If something bad happens (such as a power failure), InnoDB can
replay the log and recover the committed transactions.<o:p></o:p></span></b></div>
<div class="MsoNormal">
<b><span style="color: red;">Of course, InnoDB does ultimately
have to write the changes to the data files, because the log has a fixed size.
It writes to the log in a circular fashion: when it reaches the end of the log,
it wraps around to the beginning. It can’t overwrite a log record if the
changes contained there haven’t been applied to the data files, because this
would erase the only permanent record of the committed transaction.<o:p></o:p></span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
To understand this more deeply, please read about difference
between Random I/O and Sequential I/O. There are some SQL statements which can
force such an I/O, take it as a Homework to find such Statements. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<span style="color: #1f497d;"><br /></span></div>
<div class="MsoNormal">
<span style="color: #1f497d;">Courtesy : High Performance
MySQL <o:p></o:p></span></div>
</div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com6tag:blogger.com,1999:blog-9117600489562432278.post-9255135713617535312012-06-01T00:04:00.002-07:002012-11-19T02:59:50.398-08:00Walk through of a simple SELECT (MySQL way)<div dir="ltr" style="text-align: left;" trbidi="on">
<br />
<div class="MsoNormal">
<b>Journey of a SQL SELECT statement in MySQL RDBMS engine :<o:p></o:p></b></div>
<div class="MsoNormal">
<br /></div>
<div class="separator" style="clear: both; text-align: center;">
<a href="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEije4s-MWstmrNdeY60eBHqlLObsYUJ2P8NKWVOcb6nR4oKXFBEBlC61r2CtnB-dEsQPcr7aS749eAnwzZxjYAzohyphenhyphenLgd0inlVWLepaiEuJQbknUWS60fiTrv2hxy3ktD3iRxNHyBiLsus/s1600/mysql_architecture_sql_way.png" imageanchor="1" style="margin-left: 1em; margin-right: 1em;"><img border="0" height="561" src="https://blogger.googleusercontent.com/img/b/R29vZ2xl/AVvXsEije4s-MWstmrNdeY60eBHqlLObsYUJ2P8NKWVOcb6nR4oKXFBEBlC61r2CtnB-dEsQPcr7aS749eAnwzZxjYAzohyphenhyphenLgd0inlVWLepaiEuJQbknUWS60fiTrv2hxy3ktD3iRxNHyBiLsus/s640/mysql_architecture_sql_way.png" width="640" /></a></div>
<div class="MsoNormal">
<br />
<!--[endif]--></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Understanding the basics of a query execution is key to
success in understanding core concepts of any DBMS engine. Let's examine how a
simple SELECT finds its way through most of the component (at a Higher level)
to a print its output to the screen (stdout). The MySQL architecture consists
of the following major components :</div>
<div class="MsoNormal">
<b>1) MySQL Client<o:p></o:p></b></div>
<div class="MsoNormal">
<b>2) Query Cache<o:p></o:p></b></div>
<div class="MsoNormal">
<b>3) Parser<o:p></o:p></b></div>
<div class="MsoNormal">
<b>4) Preprocessor<o:p></o:p></b></div>
<div class="MsoNormal">
<b>5) Query optimizer<o:p></o:p></b></div>
<div class="MsoNormal">
<b>6) Query execution engine<o:p></o:p></b></div>
<div class="MsoNormal">
<b>7) Storage engine APIs<o:p></o:p></b></div>
<div class="MsoNormal">
<b>8) Data<o:p></o:p></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>a)</b> Suppose we want to find all details of an employee
having ID 999. A simple SQL query might look something like this <b>"SELECT
* FROM employee.employees WHERE id = 999;"</b>. Now to execute the query
we would want to start a MySQL client session and run the query (MySQL Client
is a utility to execute all MySQL commands and return an output).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>b)</b> MySQL Client once started will show a prompt
something like "mysql >". When a client session is established a <b>"MySQL
THREAD"</b> is created in MySQL to handle all the command executions and
sessions. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>c)</b> Upon entering the above SELECT query, the query is
submitted to the MySQL server which is a "mysqld" process running on
the remote or local server (<b>mysql process is a client and mysqld process is
a server in this case and they need not be on the same host</b>).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>d)</b> Once entering these server process the query is
checked against a list of previously executed SELECTs maintained by MySQL DBMS
engine in an area of memory known as "<b>QUERY CACHE</b>". Query
cache holds the output of SELECT statements (exactly the same statements)
executed earlier. So if another user had executed "SELECT * FROM
employee.employees WHERE id = 999;" previously, the result set will be
stored in the Query cache. And if we again issue the same query to MySQL again
it will directly output the result set stored in Query cache. This saves MySQL
from doing any work of parsing, preprocessing and retrieving data from disk or
memory resulting in lightning fast output. However there are some limitations
which can be studied in depth later.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>e)</b> If the same exact query is not present in QUERY
CACHE, MySQL will move to the next component i.e "<b>PARSER</b>" to
parse the Query Syntax. Syntax is very important part of query execution. What
if the query we executed was something like this "<b>SELECT *
employee.employees WHERE id = 999;</b>", MySQL parser will parse the query
thoroughly to find out that the "<b>FROM</b>" is missing in the
statement and immediately return us an error. Parsing at such an early stage in
executions avoids MySQL from consuming resources to scan the tables and object
on disk as well as memory.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>f)</b> Once the syntax checking is done (and hopefully the
syntax is correct), MySQL will move onto next component i.e. "<b>QUERY
PREPROCESSOR</b>". As the name suggest, the preprocessor will check the
GRANTS of user on the objects he/she wants to access through the query (We want
to retrieve data from <b>`employee` database's `employees` table</b>). If the
user doesn't have the appropriate GRANTS, MySQL will report an error. Also the
preprocessor is responsible for object checking (Checking to see if `employee`
database exist and `employees` table exist), in case object specified doesnt
exist MySQL will report an error.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>g)</b> After preprocessing if we have the required GRANTS
and the also object exists, then MySQL will take the query to the most
important component i.e "<b>QUERY OPTIMIZER</b>". This is the place
where MySQL will decide on the execution plan to access and retrieve the data
as fast as possible. Query Optimizer is responsible for checking which indexes
exist on the specified table and if any exist whether it can use the index to
retrieve data faster. An index here on `id` column of `employees` table will
help the storage engine layer to locate and retrieve data faster. Once
confirmed MySQL will create an "<b>EXECUTION PLAN</b>" and pass on
the plan to "<b>Storage engine</b>" layer.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b><span style="color: red;">NOTE : All of the above steps are
independent of the Storage engine used by MySQL. Hence we can refer the above
components as part of "SERVER layer" and below referred components as
the "STORAGE engine layer". This is one of biggest advantage of MySQL
over any RDBMS product available. For more info on Storage engines refer the
MySQL reference manual.<o:p></o:p></span></b></div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>h)</b> Moving ahead, the <b>STORAGE engine APIs</b> now
have the query execution plan, which it can use to access and retrieve the
required indexes into the memory and also retrieve the "DATA" from
exact memory address specified in the indexes. This index lookup in memory is
of magnitudes faster as compared to random disk access. </div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
<b>i)</b> DATA can be stored differently depending upon the
storage engine used, for eg <b>: MyISAM and InnoDB storage engines will have
DATA stored in files (on DISK), whereas Memory storage engine will store data
in Memory (RAM)</b>.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Once the desired data block is located on the disk by the
Storage engine API, MySQL or O.S. will cache the data in memory (RAM) and
display the result set to client's screen (stdout).</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Please refer to the diagram above.</div>
<div class="MsoNormal">
<br /></div>
<div class="MsoNormal">
Processing a SELECT is different from an INSERT, UPDATE and
DELETE, hence do not refer this for the same. Hope you found it useful.</div>
<div class="MsoNormal">
<span style="color: #1f497d;"><br /></span></div>
</div>
Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com2tag:blogger.com,1999:blog-9117600489562432278.post-59888378161966121432011-11-16T08:26:00.000-08:002011-11-16T08:26:31.389-08:00Profiling MySQL<div dir="ltr" style="text-align: left;" trbidi="on"><br />
MySQL lacks in providing a good profiling support for its RDBMS. But we all have found some hacks around it. Here are some of them as follows :<br />
<br />
1 General query log<br />
This is the most basic and the best profiling tool for MySQL as it records every statement executed at mysqld (Be it an erroneous statement). Its very helpful for finding out the type of statements executed on the database. But the problem lies in enabling the general log. The mysqld needs a restart, which is not a good option in production systems. Steps to enable general query log can be found in the MySQL reference manual. Also general query log fails to record the hostname or IP of the clients connecting MySQL server.<br />
<br />
2 Show Processlist<br />
This can be a good profiling tool except that it catches data about queries upon quering it. So in a high concurrency environment it becomes difficult to catch the data through "Show Processlist". Anyways these are some common hacks at using this handy command.<br />
<br />
$ watch -n 1 'mysql -u <user> -p<password> -h <hostname> -e "Show Processlist"'<br />
<br />
$ watch -n .5 'mysqladmin -u <user> -p<password> -h <hostname> --verbose processlist'<br />
<br />
Well the above hacks will surely get the data but at the expense of new connections every time "watch" executes (for eg in an hour we might have threads_connected increased by 7200), that is a problem isnt it.<br />
<br />
Or else<br />
<br />
$ mysqladmin -u <user> -p<password> -ri 1 --verbose -processlist<br />
<br />
This hack will get you the data like the other iterative operations but the connection will be persistent (Hmmm this one is good, isnt it ?). However we miss the sub-millisecond interval like the slow query logs, dont we ? In high concurrency environment scripting this hack will make you miss most of the statements.<br />
<br />
The processlist generated above can be piped through "sed" (to remove the "|" "+" "-") and redirected to a file. This file can be used for future reference.<br />
<br />
3 Using the Slow Query Log<br />
This is the most simple yet powerful hack. Most of the production systems are enabled with Slow Query Log. The default long_query_time is 10 seconds but can be dyanamically set to 0 seconds. Newer versions of MySQL do come with sub-milliseconds long_query_time, but 0 is sufficient for our purpose. This can be very handy and nifty for the purpose of profiling. Also one can use mk-query-digest to dig into this log. The only disadvantage in this approach as compared to general query log being erroneous statements are missed. Thats not a problem, is it.<br />
<br />
To conclude these are some common hacks which can be done at a very low or no setup cost, with little performance degradation though. Many options exist including commercial tools like Jet-Profiler, MySQL Enterprise monitor or open source tools like innotop , mytop, Maatkit for the purpose of profiling MySQL.<br />
<br />
</div>Akshay Suryavanshihttp://www.blogger.com/profile/11486438095809406675noreply@blogger.com0