Journey of a SQL SELECT statement in MySQL RDBMS engine :
Understanding the basics of a query execution is key to
success in understanding core concepts of any DBMS engine. Let's examine how a
simple SELECT finds its way through most of the component (at a Higher level)
to a print its output to the screen (stdout). The MySQL architecture consists
of the following major components :
1) MySQL Client
2) Query Cache
3) Parser
4) Preprocessor
5) Query optimizer
6) Query execution engine
7) Storage engine APIs
8) Data
a) Suppose we want to find all details of an employee
having ID 999. A simple SQL query might look something like this "SELECT
* FROM employee.employees WHERE id = 999;". Now to execute the query
we would want to start a MySQL client session and run the query (MySQL Client
is a utility to execute all MySQL commands and return an output).
b) MySQL Client once started will show a prompt
something like "mysql >". When a client session is established a "MySQL
THREAD" is created in MySQL to handle all the command executions and
sessions.
c) Upon entering the above SELECT query, the query is
submitted to the MySQL server which is a "mysqld" process running on
the remote or local server (mysql process is a client and mysqld process is
a server in this case and they need not be on the same host).
d) Once entering these server process the query is
checked against a list of previously executed SELECTs maintained by MySQL DBMS
engine in an area of memory known as "QUERY CACHE". Query
cache holds the output of SELECT statements (exactly the same statements)
executed earlier. So if another user had executed "SELECT * FROM
employee.employees WHERE id = 999;" previously, the result set will be
stored in the Query cache. And if we again issue the same query to MySQL again
it will directly output the result set stored in Query cache. This saves MySQL
from doing any work of parsing, preprocessing and retrieving data from disk or
memory resulting in lightning fast output. However there are some limitations
which can be studied in depth later.
e) If the same exact query is not present in QUERY
CACHE, MySQL will move to the next component i.e "PARSER" to
parse the Query Syntax. Syntax is very important part of query execution. What
if the query we executed was something like this "SELECT *
employee.employees WHERE id = 999;", MySQL parser will parse the query
thoroughly to find out that the "FROM" is missing in the
statement and immediately return us an error. Parsing at such an early stage in
executions avoids MySQL from consuming resources to scan the tables and object
on disk as well as memory.
f) Once the syntax checking is done (and hopefully the
syntax is correct), MySQL will move onto next component i.e. "QUERY
PREPROCESSOR". As the name suggest, the preprocessor will check the
GRANTS of user on the objects he/she wants to access through the query (We want
to retrieve data from `employee` database's `employees` table). If the
user doesn't have the appropriate GRANTS, MySQL will report an error. Also the
preprocessor is responsible for object checking (Checking to see if `employee`
database exist and `employees` table exist), in case object specified doesnt
exist MySQL will report an error.
g) After preprocessing if we have the required GRANTS
and the also object exists, then MySQL will take the query to the most
important component i.e "QUERY OPTIMIZER". This is the place
where MySQL will decide on the execution plan to access and retrieve the data
as fast as possible. Query Optimizer is responsible for checking which indexes
exist on the specified table and if any exist whether it can use the index to
retrieve data faster. An index here on `id` column of `employees` table will
help the storage engine layer to locate and retrieve data faster. Once
confirmed MySQL will create an "EXECUTION PLAN" and pass on
the plan to "Storage engine" layer.
NOTE : All of the above steps are
independent of the Storage engine used by MySQL. Hence we can refer the above
components as part of "SERVER layer" and below referred components as
the "STORAGE engine layer". This is one of biggest advantage of MySQL
over any RDBMS product available. For more info on Storage engines refer the
MySQL reference manual.
h) Moving ahead, the STORAGE engine APIs now
have the query execution plan, which it can use to access and retrieve the
required indexes into the memory and also retrieve the "DATA" from
exact memory address specified in the indexes. This index lookup in memory is
of magnitudes faster as compared to random disk access.
i) DATA can be stored differently depending upon the
storage engine used, for eg : MyISAM and InnoDB storage engines will have
DATA stored in files (on DISK), whereas Memory storage engine will store data
in Memory (RAM).
Once the desired data block is located on the disk by the
Storage engine API, MySQL or O.S. will cache the data in memory (RAM) and
display the result set to client's screen (stdout).
Please refer to the diagram above.
Processing a SELECT is different from an INSERT, UPDATE and
DELETE, hence do not refer this for the same. Hope you found it useful.
Hi Akshay,
ReplyDeleteThank you for such a wonderful article.
This article has really pinched me in believing the fact that strong basics are really a great foundation for building the castle of knowledge.
Looking forward for more such interesting article.
Thanks again,
Sameer
Hi Akshay, Thank you, it is very easy to understand for beginners like me.
ReplyDelete