Monday, 19 November 2012

Beware !!! Danger ahead "sql_log_bin"...


Some days ago, one of our clients reported that data for present day then, was missing from the slaves, upon querying the table data, we were taken by surprise to find that none of the data was replicated to the slaves, none of the slaves. The large slave lag cleared in seconds, which was not the case earlier with the client generating more than a Gig of data daily. 

To start the investigation we started checking if the relay log is writing the event data properly, checking the relay we saw that the latest data was absent from that. Strange. It was strange because, none of the problems occurred on the slave or master which would have caused the issue, the "exec_master_log_pos" wasn't changing anymore.

The only thing happened the prior weekend was a table conversion. We had a adequately large, almost 40G table to convert to InnoDB, the client needed timelines for that, hence we started the process on one of the slaves. We got the timeline, client was OK with it, then we proceeded with the master's table. Just as obvious we didn't want this statement to replicate to the slaves, hence asked the upcoming team to turn off "sql_log_bin" before starting the alter. 

Boom !!! We had done this before, using sql_log_bin was a cake walk on 5.0 and 5.1, but we were on a Percona 5.5, also we had executed "SET GLOBAL sql_log_bin=OFF", had it been executed on 5.1 or 5.0, this would have returned an error, since its a "SESSION" variable. However in mysql 5.5 this became a "GLOBAL" and "SESSION" variable, all the threads executing DMLs on the servers were not logged in binary logs. The replication had no issues, however it became a pain when we discovered it, and that was almost after a day. 

We had no choice but to refresh the slaves, but the data was huge 1.2T, after 3 days of boredom, we finally had the slaves in sync.

If there was a "Remove feature request" in MySQL or Percona, I would strongly recommend removing the global scope of this configuration.

As an advice, make a habit of using "SESSION" while turning ON or OFF such params or even while viewing the server STATUS.

Happy Replicating...

Saturday, 25 August 2012

Speeding Optimize for InnoDB tables


Speeding up the OPTIMIZE table for InnoDB tables :

Recently we learnt about an incident with one of our client, where we tried to rebuild a huge InnoDB table (180G) to reclaim unused space using command "ALTER TABLE ... ENGINE = INNODB ;", which took forever to rebuild even after turning "fast_index_creation" server variable ON. Let’s dig into details as to why it took such a long time.

Purpose of fast_index_creation :- If a table is altered using ALTER TABLE ... ENGINE=INNODB; then this option will create a temporary table with only clustered index in it, then reload the data from original table and then create secondary indexes on top of it. However there is a serious limitation with this server variable, this works only for secondary indexes created with "ALTER TABLE ADD INDEX ..." and "CREATE INDEX indexname ...".

Percona has overcome this limitation by creating a new server variable known as "expand_fast_index_creation". This option is set to OFF by default.

Please see the below test results for performance improvement on using "expand_fast_index_creation" option over plain "OPTIMIZE TABLE" command. The innodb_buffer_pool_size is set to 256M.

mysql> set expand_fast_index_creation=OFF;
Query OK, 0 rows affected (0.00 sec)

mysql> set profiling=ON;
Query OK, 0 rows affected (0.00 sec)

mysql> optimize table employees;
+---------------------+----------+----------+-------------------------------------------------------------------+
| Table               | Op       | Msg_type | Msg_text                                                          |
+---------------------+----------+----------+-------------------------------------------------------------------+
| employees.employees | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| employees.employees | optimize | status   | OK                                                                |
+---------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (32.42 sec)

mysql> show profile;
+------------------------------+-----------+
| Status                       | Duration  |
+------------------------------+-----------+
| starting                     |  0.000418 |
| checking permissions         |  0.000023 |
| Opening tables               |  0.000034 |
| System lock                  |  0.000016 |
| Waiting for query cache lock |  0.000041 |
| init                         |  0.000008 |
| Opening tables               |  0.000124 |
| System lock                  |  0.000007 |
| setup                        |  0.000043 |
| creating table               |  0.396054 |
| After create                 |  0.000090 |
| copy to tmp table            | 31.557635 |
| rename result table          |  0.373926 |
| end                          |  0.000009 |
| Waiting for query cache lock |  0.000002 |
| end                          |  0.000017 |
| Opening table                |  0.000121 |
| System lock                  |  0.097834 |
| query end                    |  0.000007 |
| closing tables               |  0.000003 |
| freeing items                |  0.000035 |
| cleaning up                  |  0.000006 |
+------------------------------+-----------+
22 rows in set (0.00 sec)

mysql> set profiling=OFF;
Query OK, 0 rows affected (0.00 sec)

As you can notice from the PROFILE that the major part of time was spent in copying the data to tmp table. This involves inserting individual records into the tmp file and the corresponding indexes, after this the index will be sorted.

mysql> set expand_fast_index_creation=ON;     #### Server variable is turned ON ####
Query OK, 0 rows affected (0.00 sec)

mysql> show variables like '%expand%';
+----------------------------+-------+
| Variable_name              | Value |
+----------------------------+-------+
| expand_fast_index_creation | ON    |
+----------------------------+-------+
1 row in set (0.00 sec)

mysql> set profiling=ON;
Query OK, 0 rows affected (0.00 sec)

mysql> optimize table employees;
+---------------------+----------+----------+-------------------------------------------------------------------+
| Table               | Op       | Msg_type | Msg_text                                                          |
+---------------------+----------+----------+-------------------------------------------------------------------+
| employees.employees | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| employees.employees | optimize | status   | OK                                                                |
+---------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (15.31 sec)

mysql> show profile;
+------------------------------+----------+
| Status                       | Duration |
+------------------------------+----------+
| starting                     | 0.000055 |
| checking permissions         | 0.000021 |
| Opening tables               | 0.000031 |
| System lock                  | 0.000014 |
| Waiting for query cache lock | 0.000051 |
| init                         | 0.000019 |
| Opening tables               | 0.000141 |
| System lock                  | 0.000010 |
| setup                        | 0.000042 |
| creating table               | 0.334066 |
| After create                 | 0.000093 |
| copy to tmp table            | 9.129202 |
| restoring secondary keys     | 5.676336 |
| rename result table          | 0.164292 |
| end                          | 0.000011 |
| Waiting for query cache lock | 0.000003 |
| end                          | 0.000021 |
| Opening table                | 0.000146 |
| System lock                  | 0.003445 |
| query end                    | 0.000006 |
| closing tables               | 0.000003 |
| freeing items                | 0.000030 |
| cleaning up                  | 0.000005 |
+------------------------------+----------+
23 rows in set (0.00 sec)

In the above results we can see the records are first copied to a tmp table consisting only of clustered index and then secondary indexes are applied (Restoring secondary keys).

However there are some caveats at using this option, one of which is the FOREIGN KEY constraint. If the table has a foreign key constraint, then percona server will consider "expand_fast_index_creation=OFF", since this constraint cannot be disabled. (This is reason we did not notice any performance improvement while rebuilding the table for one of our client.)

Please see the results below, after a foreign key constraint was added to the table.

mysql> set expand_fast_index_creation=ON;
Query OK, 0 rows affected (0.00 sec)

mysql> set profiling=ON;
Query OK, 0 rows affected (0.00 sec)

mysql> optimize table employees;
+---------------------+----------+----------+-------------------------------------------------------------------+
| Table               | Op       | Msg_type | Msg_text                                                          |
+---------------------+----------+----------+-------------------------------------------------------------------+
| employees.employees | optimize | note     | Table does not support optimize, doing recreate + analyze instead |
| employees.employees | optimize | status   | OK                                                                |
+---------------------+----------+----------+-------------------------------------------------------------------+
2 rows in set (33.86 sec)

mysql> show profile;
+------------------------------+-----------+
| Status                       | Duration  |
+------------------------------+-----------+
| starting                     |  0.000050 |
| checking permissions         |  0.000022 |
| Opening tables               |  0.000031 |
| System lock                  |  0.000014 |
| Waiting for query cache lock |  0.000071 |
| init                         |  0.000009 |
| Opening tables               |  0.000121 |
| System lock                  |  0.000006 |
| setup                        |  0.000049 |
| creating table               |  0.353180 |
| After create                 |  0.000101 |
| copy to tmp table            | 33.049338 |
| rename result table          |  0.460403 |
| end                          |  0.000010 |
| Waiting for query cache lock |  0.000002 |
| end                          |  0.000019 |
| Opening table                |  0.000162 |
| System lock                  |  0.004451 |
| query end                    |  0.000004 |
| closing tables               |  0.000002 |
| freeing items                |  0.000021 |
| cleaning up                  |  0.000003 |
+------------------------------+-----------+
22 rows in set (0.00 sec)

As seen above there are no performance improvements while the table had foreign key constraint. Disabling through "foreign_key_checks" didnt work either. Same case is seen in terms of "UNIQUE" keys.

Friday, 1 June 2012

MySQL InnoDB disk and memory layout


InnoDB Storage engine Disk and Memory Layout :



InnoDB is one of the most important storage engines in MySQL. Due to its transactional capabilities, locking levels and foreign key support it has become one of the widely used storage engines for MySQL. However unlike MyISAM, InnoDB is fairly complex in its architecture. Let's review how the architecture looks like on disk and memory (RAM) subsystem. The following components are the most important in InnoDB :

1) InnoDB buffer pool
2) Transaction log buffer
3) InnoDB IO threads
4) Transaction Log files
5) Table-space files
6) Datafiles

To demonstrate the use of all the components, let's take a simple batch of Insert statement into consideration. The statement would be "INSERT INTO employee.employees (empno, empname, sal, hiredate, dept) VALUES (1, 'Akshay', 'XXXXXX','01-12-2011', 'MySQL')".

So the above statement makes it through components at the server level like we referred earlier in our previous document. Likewise coming from Client, scanning the Query cache, parsing, pre-processing, optimizing and then finally to the storage engine, let's see what happens further at the storage engine level :

1) Once the statement enters the InnoDB kernel, innodb checks whether the requested data "page" exist in the "BUFFER POOL". The buffer pool contains all the data pages which needs to be changed (INSERT, UPDATE and DELETE) or read ( SELECT). It will contain both the Index as well as Data Pages. So the INSERT statement above will check if the page in which record with values "(1, 'Akshay', 'XXXXXX','01-12-2011', 'MySQL')" needs to be inserted already exist in the pool, if it finds the page it will make the changes to page or if it doesnt find the page it will read from the Disk (datafile) in to the memory (Buffer pool) and then change it. InnoDB Buffer pool is most important memory structure in InnoDB, and is set using "innodb_buffer_pool_size" variable. Usually this is set to around 50-80% of the total RAM.

2) Once the data and index pages are changed in the buffer pool the pages are marked "DIRTY" and the INSERT statement is logged in the transaction log buffer. The function of transaction log buffer is very trivial in an RDBMS. Later the contents of transaction log buffer are written to the transaction log files (specifically on COMMITs). Lets summarize the document later with note on understanding transaction logs.

3) InnoDB IO threads are internal to InnoDB kernel and not related to any connection threads or O.S. threads (InnoDB IO threads works at the storage engine layer, whereas Connection threads works at MySQL Server layer). These IO threads (mainly known as Innodb Read threads and Write threads) does the job of writing DIRTY pages to the disk files from buffer pool and log buffer and reading pages from the files. Hence the above INSERT's data will written to the disk by one of the IO threads.

4) Transaction log files contains the contents from transaction log buffer on a durable media (Hard disks). It's used for transaction recovery during Instance crash (We will visit Instance crash in next sessions) and for POINT IN TIME RECOVERY. These files can be found in MySQL datadir namely "ib_logfile0" and "ib_logfile1". The files are used in a circular fashion, like, initially innodb will start filling up "ib_logfile0" and then "ib_logfile1".

5) Table-space files usually named as "ibdata1" files are used for multiple purposes. It stores the actual table and index data (if "innodb_file_per_table" is disabled), data-dictionary (meta-data about Innodb tables) and the undo-logs (used for ROLLBACK). So the DIRTY pages from BUFFER POOL will be written to the table-space files by the IO threads.

6) Datafiles are the files created when "innodb_file_per_table" is set. These files have filenames like <table_name>.ibd. These files contains index as well as actual data of the tables. These files allow easy maintenance as compared single tablespace file due to its size considerations. Every table will have its own .ibd file created in its respective data directory.

InnoDB uses its log to reduce the cost of committing transactions. Instead of flushing the buffer pool to disk when each transaction commits, it logs the transactions. The changes transactions make to data and indexes often map to random locations in the tablespace, so flushing these changes to disk would require random I/O. InnoDB assumes it’s using conventional disks, where random I/O is much more expensive than sequential I/O because of the time it takes to seek to the correct location on disk and wait for the desired part of the disk to rotate under the head.
InnoDB uses its log to convert this random disk I/O into sequential I/O. Once the log is safely on disk, the transactions are permanent, even though the changes haven’t been written to the data files yet. If something bad happens (such as a power failure), InnoDB can replay the log and recover the committed transactions.
Of course, InnoDB does ultimately have to write the changes to the data files, because the log has a fixed size. It writes to the log in a circular fashion: when it reaches the end of the log, it wraps around to the beginning. It can’t overwrite a log record if the changes contained there haven’t been applied to the data files, because this would erase the only permanent record of the committed transaction.

To understand this more deeply, please read about difference between Random I/O and Sequential I/O. There are some SQL statements which can force such an I/O, take it as a Homework to find such Statements.


Courtesy : High Performance MySQL 

Walk through of a simple SELECT (MySQL way)


Journey of a SQL SELECT statement in MySQL RDBMS engine :



Understanding the basics of a query execution is key to success in understanding core concepts of any DBMS engine. Let's examine how a simple SELECT finds its way through most of the component (at a Higher level) to a print its output to the screen (stdout). The MySQL architecture consists of the following major components :
1) MySQL Client
2) Query Cache
3) Parser
4) Preprocessor
5) Query optimizer
6) Query execution engine
7) Storage engine APIs
8) Data

a) Suppose we want to find all details of an employee having ID 999. A simple SQL query might look something like this "SELECT * FROM employee.employees WHERE id = 999;". Now to execute the query we would want to start a MySQL client session and run the query (MySQL Client is a utility to execute all MySQL commands and return an output).

b) MySQL Client once started will show a prompt something like "mysql >". When a client session is established a "MySQL THREAD" is created in MySQL to handle all the command executions and sessions.

c) Upon entering the above SELECT query, the query is submitted to the MySQL server which is a "mysqld" process running on the remote or local server (mysql process is a client and mysqld process is a server in this case and they need not be on the same host).

d) Once entering these server process the query is checked against a list of previously executed SELECTs maintained by MySQL DBMS engine in an area of memory known as "QUERY CACHE". Query cache holds the output of SELECT statements (exactly the same statements) executed earlier. So if another user had executed "SELECT * FROM employee.employees WHERE id = 999;" previously, the result set will be stored in the Query cache. And if we again issue the same query to MySQL again it will directly output the result set stored in Query cache. This saves MySQL from doing any work of parsing, preprocessing and retrieving data from disk or memory resulting in lightning fast output. However there are some limitations which can be studied in depth later.

e) If the same exact query is not present in QUERY CACHE, MySQL will move to the next component i.e "PARSER" to parse the Query Syntax. Syntax is very important part of query execution. What if the query we executed was something like this "SELECT * employee.employees WHERE id = 999;", MySQL parser will parse the query thoroughly to find out that the "FROM" is missing in the statement and immediately return us an error. Parsing at such an early stage in executions avoids MySQL from consuming resources to scan the tables and object on disk as well as memory.

f) Once the syntax checking is done (and hopefully the syntax is correct), MySQL will move onto next component i.e. "QUERY PREPROCESSOR". As the name suggest, the preprocessor will check the GRANTS of user on the objects he/she wants to access through the query (We want to retrieve data from `employee` database's `employees` table). If the user doesn't have the appropriate GRANTS, MySQL will report an error. Also the preprocessor is responsible for object checking (Checking to see if `employee` database exist and `employees` table exist), in case object specified doesnt exist MySQL will report an error.

g) After preprocessing if we have the required GRANTS and the also object exists, then MySQL will take the query to the most important component i.e "QUERY OPTIMIZER". This is the place where MySQL will decide on the execution plan to access and retrieve the data as fast as possible. Query Optimizer is responsible for checking which indexes exist on the specified table and if any exist whether it can use the index to retrieve data faster. An index here on `id` column of `employees` table will help the storage engine layer to locate and retrieve data faster. Once confirmed MySQL will create an "EXECUTION PLAN" and pass on the plan to "Storage engine" layer.

NOTE : All of the above steps are independent of the Storage engine used by MySQL. Hence we can refer the above components as part of "SERVER layer" and below referred components as the "STORAGE engine layer". This is one of biggest advantage of MySQL over any RDBMS product available. For more info on Storage engines refer the MySQL reference manual.

h) Moving ahead, the STORAGE engine APIs now have the query execution plan, which it can use to access and retrieve the required indexes into the memory and also retrieve the "DATA" from exact memory address specified in the indexes. This index lookup in memory is of magnitudes faster as compared to random disk access.

i) DATA can be stored differently depending upon the storage engine used, for eg : MyISAM and InnoDB storage engines will have DATA stored in files (on DISK), whereas Memory storage engine will store data in Memory (RAM).

Once the desired data block is located on the disk by the Storage engine API, MySQL or O.S. will cache the data in memory (RAM) and display the result set to client's screen (stdout).

Please refer to the diagram above.

Processing a SELECT is different from an INSERT, UPDATE and DELETE, hence do not refer this for the same. Hope you found it useful.

Wednesday, 16 November 2011

Profiling MySQL


MySQL lacks in providing a good profiling support for its RDBMS. But we all have found some hacks around it. Here are some of them as follows :

1 General query log
This is the most basic and the best profiling tool for MySQL as it records every statement executed at mysqld (Be it an erroneous statement). Its very helpful for finding out the type of statements executed on the database. But the problem lies in enabling the general log. The mysqld needs a restart, which is not a good option in production systems. Steps to enable general query log can be found in the MySQL reference manual. Also general query log fails to record the hostname or IP of the clients connecting MySQL server.

2 Show Processlist
This can be a good profiling tool except that it catches data about queries upon quering it. So in a high concurrency environment it becomes difficult to catch the data through "Show Processlist". Anyways these are some common hacks at using this handy command.

$ watch -n 1 'mysql -u <user> -p<password> -h <hostname> -e "Show Processlist"'

$ watch -n .5 'mysqladmin -u <user> -p<password> -h <hostname> --verbose processlist'

Well the above hacks will surely get the data but at the expense of new connections every time "watch" executes (for eg in an hour we might have threads_connected increased by 7200), that is a problem isnt it.

Or else

$ mysqladmin -u <user> -p<password> -ri 1 --verbose -processlist

This hack will get you the data like the other iterative operations but the connection will be persistent (Hmmm this one is good, isnt it ?). However we miss the sub-millisecond interval like the slow query logs, dont we ? In high concurrency environment scripting this hack will make you miss most of the statements.

The processlist generated above can be piped through "sed" (to remove the "|" "+" "-") and redirected to a file. This file can be used for future reference.

3 Using the Slow Query Log
This is the most simple yet powerful hack. Most of the production systems are enabled with Slow Query Log. The default long_query_time is 10 seconds but can be dyanamically set to 0 seconds. Newer versions of MySQL do come with sub-milliseconds long_query_time, but 0 is sufficient for our purpose. This can be very handy and nifty for the purpose of profiling. Also one can use mk-query-digest to dig into this log. The only disadvantage in this approach as compared to general query log being erroneous statements are missed. Thats not a problem, is it.

To conclude these are some common hacks which can be done at a very low or no setup cost, with little performance degradation though. Many options exist including commercial tools like Jet-Profiler, MySQL Enterprise monitor or open source tools like innotop , mytop, Maatkit for the purpose of profiling MySQL.