Tuesday, 8 January 2013

How InnoDB writes data (Something's FUZZY out there!) ?

Internals of InnoDB is one of the most challenging topics to learn in MySQL. Hence in my earlier blogpost, I tried to simplify how the InnoDB disk and Memory layout looks like. Although once you have understood the basic understanding, questions will start cropping in your minds, How and When exactly InnoDB writes its data. Lets start with our usual simplified approach to understanding this.

Lets assume users are executing some updates on the some InnoDB tables. Usually InnoDB will check if the records(contained in data Pages) to be manipulated are existing inside the innodb buffer pool, if found then InnoDB will proceed with updating the concerned record(in reality the pages). If the page is not found in buffer pool, InnoDB Read I/O thread will the page from the concerned table's datafile and put it in the innodb buffer pool. Then the appropriate record in the page would be manipulated, a special LSN(Log Sequence Number) would be written to Transaction logs(ib_logfile) and the block will be marked as "Dirty".

This was straight-forward to understand, wasn't it ? Now what happens if someone goes and kills the MySQL process, as usual on restart InnoDB will starts its transaction recovery of committed and un-committed transaction. Wait a minute un-committed transactions are ok can be rolled back from RSegs (Undo or Rollback Statement situated in the ibdata file), but what about committed transactions. Isn't InnoDB suppose to write a dirty block as soon as a COMMIT(Checkpoint in too much technical terms) occurs. Well for your information this is known as Undo phase and Redo phase resp.

So the instance recovery process raised some interesting questions, like when does InnoDB actually writes data to the persistent files on disk. Looking at the recovery process it doesnt seem to write immediately though. That's something fishy... I mean "FUZZY". So going back to our update statement's Dirty page, the actual page will not be flushed to disk or disk-cache, until the following scenarios occur :
1) The LRU(Least Recently Used) page list becomes full. InnoDB reads pages from the disk into buffer to do manipulations on them or as data pages for SELECT statements. Now if the page was modified it will land-up in the Dirty pages list. If it was used for SELECT, it will still be in the LRU list. Now since innoDB buffer is of finite size, the LRU list is bound to get full, so at this moment some Dirty buffers will be flushed to disk to accommodate any new READ buffers. In layman's words "A Select can flush a DML".
2) Second scenario is quite compelling, this is when the "Flush List" itself runs out of room to accommodate more dirty buffers, Oh Yes! this happens and is controlled by "innodb_max_dirty_pages_pct" mainly. So to make room Least manipulated pages are flushed (FUZZY!!! Which means an update on a page which happened long while ago, could still be kept in Flush List, and the page which is somewhat recently manipulated could also be flushed to disk, however the older page is continuously updated by other statements too, this is a kinda write-optimization, Percona counts this using "Young-making rate").

BTW, if you are wondering which process actually writes these pages in their specific table's datafiles, I have a hint for you, they are the counter-part of READ I/O threads, although this all happens in the background and we dont need to worry about.

Note : There are three ways in which the data page would be written, the above two (Fuzzy Check-pointing), plus a "Sharp checkpoint" as explained by Baron Schwartz in his blog-posts at Xaprb.

So are there any performance bottlenecks and improvements ? Well logically dirty pages flushed due to LRU list signifies more memory should be added, a bigger footprint for innodb_buffer_pool_size. Dirty pages flushed due to Flush list signifies more complicated optimization involving innodb_adaptive_flushing, innodb_max_dirty_pages_pct, innodb_io_capacity and innodb_log_file_size (Courtesy mysqlperformanceblog.com).

Just to leave you tickling, all these factors depends upon an important setting known as "innodb-flush-method", keep reading...