InnoDB Storage engine Disk and Memory Layout :
InnoDB is one of the most important storage engines in
MySQL. Due to its transactional capabilities, locking levels and foreign key
support it has become one of the widely used storage engines for MySQL.
However unlike MyISAM, InnoDB is fairly complex in its architecture. Let's review
how the architecture looks like on disk and memory (RAM) subsystem. The
following components are the most important in InnoDB :
1) InnoDB buffer pool
2) Transaction log buffer
3) InnoDB IO threads
4) Transaction Log files
5) Table-space files
6) Datafiles
To demonstrate the use of all the components, let's take a
simple batch of Insert statement into consideration. The statement would be "INSERT
INTO employee.employees (empno, empname, sal, hiredate, dept) VALUES (1,
'Akshay', 'XXXXXX','01-12-2011', 'MySQL')".
So the above statement makes it through components at the
server level like we referred earlier in our previous document. Likewise coming
from Client, scanning the Query cache, parsing, pre-processing, optimizing and
then finally to the storage engine, let's see what happens further at the
storage engine level :
1) Once the statement enters the InnoDB kernel,
innodb checks whether the requested data "page" exist in the
"BUFFER POOL". The buffer pool contains all the data pages
which needs to be changed (INSERT, UPDATE and DELETE) or read ( SELECT). It
will contain both the Index as well as Data Pages. So the INSERT statement
above will check if the page in which record with values "(1, 'Akshay',
'XXXXXX','01-12-2011', 'MySQL')" needs to be inserted already exist in the
pool, if it finds the page it will make the changes to page or if it doesnt
find the page it will read from the Disk (datafile) in to the memory (Buffer
pool) and then change it. InnoDB Buffer pool is most important memory structure
in InnoDB, and is set using "innodb_buffer_pool_size"
variable. Usually this is set to around 50-80% of the total RAM.
2) Once the data and index pages are changed in the
buffer pool the pages are marked "DIRTY" and the INSERT statement is
logged in the transaction log buffer. The function of transaction log
buffer is very trivial in an RDBMS. Later the contents of transaction log
buffer are written to the transaction log files (specifically on COMMITs). Lets
summarize the document later with note on understanding transaction logs.
3) InnoDB IO threads are internal to InnoDB kernel
and not related to any connection threads or O.S. threads (InnoDB IO threads
works at the storage engine layer, whereas Connection threads works at MySQL
Server layer). These IO threads (mainly known as Innodb Read threads and
Write threads) does the job of writing DIRTY pages to the disk files
from buffer pool and log buffer and reading pages from the files. Hence the
above INSERT's data will written to the disk by one of the IO threads.
4) Transaction log files contains the contents from
transaction log buffer on a durable media (Hard disks). It's used for
transaction recovery during Instance crash (We will visit Instance crash in
next sessions) and for POINT IN TIME RECOVERY. These files can be found in
MySQL datadir namely "ib_logfile0" and "ib_logfile1".
The files are used in a circular fashion, like, initially innodb will start
filling up "ib_logfile0" and then "ib_logfile1".
5) Table-space files usually named as "ibdata1"
files are used for multiple purposes. It stores the actual table and index data
(if "innodb_file_per_table" is disabled), data-dictionary (meta-data
about Innodb tables) and the undo-logs (used for ROLLBACK). So the
DIRTY pages from BUFFER POOL will be written to the table-space files by the IO
threads.
6) Datafiles are the files created when "innodb_file_per_table"
is set. These files have filenames like <table_name>.ibd. These
files contains index as well as actual data of the tables. These files
allow easy maintenance as compared single tablespace file due to its size
considerations. Every table will have its own .ibd file created in its
respective data directory.
InnoDB uses its log to reduce the
cost of committing transactions. Instead of flushing the buffer pool to disk
when each transaction commits, it logs the transactions. The changes
transactions make to data and indexes often map to random locations in the
tablespace, so flushing these changes to disk would require random I/O. InnoDB
assumes it’s using conventional disks, where random I/O is much more expensive
than sequential I/O because of the time it takes to seek to the correct
location on disk and wait for the desired part of the disk to rotate under the
head.
InnoDB uses its log to convert
this random disk I/O into sequential I/O. Once the log is safely on disk, the
transactions are permanent, even though the changes haven’t been written to the
data files yet. If something bad happens (such as a power failure), InnoDB can
replay the log and recover the committed transactions.
Of course, InnoDB does ultimately
have to write the changes to the data files, because the log has a fixed size.
It writes to the log in a circular fashion: when it reaches the end of the log,
it wraps around to the beginning. It can’t overwrite a log record if the
changes contained there haven’t been applied to the data files, because this
would erase the only permanent record of the committed transaction.
To understand this more deeply, please read about difference
between Random I/O and Sequential I/O. There are some SQL statements which can
force such an I/O, take it as a Homework to find such Statements.
Courtesy : High Performance
MySQL
Hey Akshay,
ReplyDeleteThis is really nice stuff. I was curious to know more about the isolation of threads. Lets say one user is firing insert or some update sql on employee table and another user at the same time is firing select on employee table. In such case how it works, if you can throw some light on this it will be helpful.
Thanks,
Pritam
@Pritam : You might wanna read about MVCC to get a clearer idea about how the locking works. I would try to write on it in future.
DeleteThe INSERT or any DML will hold exclusive locks, while SELECTs will hold a Shared lock. SELECT will have to wait till the INSERT completes. InnoDB does have a good feature of MVCC (Multi Version Concurrency Control), which will acquire row level locks , but INSERT on table with auto_increment column will usually hold a table-level lock at the server level and not storage-engine level. This is case with InnoDB however.
ReplyDeleteThanks Akshay. It clears the picture.
ReplyDeleteThis was really usefull stuff with simple example. Thank you.
ReplyDeletemy long time doubt is cleared now... nice article , thanks Akshay ...*
ReplyDelete